History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
Revision Date Author Comments
# 91a72ada 26-Dec-2023 Gal Pressman <gal@nvidia.com>

net/mlx5: Remove initial segmentation duplicate definitions

Device definitions belong in mlx5_ifc, remove the duplicates in
mlx5_core.h.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f5e95632 07-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5: Expose Management PCIe Index Register (MPIR)

MPIR register allows to query the PCIe indexes
and Socket-Direct related parameters.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0d2d6bc7 12-Oct-2023 Yue Haibing <yuehaibing@huawei.com>

net/mlx5: Remove unused declaration

Commit 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
declared mlx5e_ipsec_inverse_table_init() but never implemented it.
Commit f52f2faee581 ("net/mlx5e: Introduce flow steering API")
declared mlx5e_fs_set_tc() but never implemented it.
Commit f2f3df550139 ("net/mlx5: EQ, Privatize eq_table and friends")
declared mlx5_eq_comp_cpumask() but never implemented it.
Commit cac1eb2cf2e3 ("net/mlx5: Lag, properly lock eswitch if needed")
removed mlx5_lag_update() but not its declaration.
Commit 35ba005d820b ("net/mlx5: DR, Set flex parser for TNL_MPLS dynamically")
removed mlx5dr_ste_build_tnl_mpls() but not its declaration.

Commit e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
declared but never implemented mlx5_alloc_cmd_mailbox_chain() and mlx5_free_cmd_mailbox_chain().
Commit 0cf53c124756 ("net/mlx5: FWPage, Use async events chain")
removed mlx5_core_req_pages_handler() but not its declaration.
Commit 938fe83c8dcb ("net/mlx5_core: New device capabilities handling")
removed mlx5_query_odp_caps() but not its declaration.
Commit f6a8a19bb11b ("RDMA/netdev: Hoist alloc_netdev_mqs out of the driver")
removed mlx5_rdma_netdev_alloc() but not its declaration.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b430c1b4 12-Oct-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock

mlx5_intf_lock is used to sync between LAG changes and its slaves
mlx5 core dev aux devices changes, which means every time mlx5 core
dev add/remove aux devices, mlx5 is taking this global lock, even if
LAG functionality isn't supported over the core dev.
This cause a bottleneck when probing VFs/SFs in parallel.

Hence, replace mlx5_intf_lock with HCA devcom component lock, or no
lock if LAG functionality isn't supported.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e534552c 12-Oct-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom

LAG peer device lookout bus logic required the usage of global lock,
mlx5_intf_mutex.
As part of the effort to remove this global lock, refactor LAG peer
device lookout to use mlx5 devcom layer.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3f7f31ff 12-Oct-2023 Wei Zhang <weizhang@nvidia.com>

net/mlx5: Parallelize vhca event handling

At present, mlx5 driver have a general purpose
event handler which not only handles vhca event
but also many other events. This incurs a huge
bottleneck because the event handler is
implemented by single threaded workqueue and all
events are forced to be handled in serial manner
even though application tries to create multiple
SFs simultaneously.

Introduce a dedicated vhca event handler which
manages SFs parallel creation.

Signed-off-by: Wei Zhang <weizhang@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8c894f88 21-Sep-2023 Patrisious Haddad <phaddad@nvidia.com>

net/mlx5: Implement alias object allow and create functions

Add functions which allow one vhca to access another vhca object,
and functions that creates an alias object or destroys it.

Together they can be used to create cross vhca flow table that is able
jump from the steering domain that is managed by one vport,
to the steering domain on a different vport.

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/f45a9c85319fa783186b8988abcd64955b5f2a0c.1695296682.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# a41cb591 11-Jul-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Remove unused MAX HCA capabilities

Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is never used.

Hence, remove all driver queries of the MAX mode of the said caps as
well as their helper MACROs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 383a4de3 07-Aug-2023 Adham Faris <afaris@nvidia.com>

net/mlx5: Expose port.c/mlx5_query_module_num() function

Make mlx5_query_module_num() defined in port.c, a non-static, so it can
be used by other files.

CC: Jean Delvare <jdelvare@suse.com>
CC: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230807180507.22984-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 06cd555f 18-Jan-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: split mlx5_cmd_init() to probe and reload routines

There is no need to destroy and allocate cmd SW structs during reload,
this is time consuming for no reason.
Hence, split mlx5_cmd_init() to probe and reload routines.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 06c868fd 26-Jun-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Return correct EC_VF function ID

The ECVF function ID range is 1..max_ec_vfs. Currently
mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which
results in a syndrome when querying the caps with more
recent firmware, or reading incorrect caps with older
firmware that supports EC VFs.

Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8bbe544e 13-Jun-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: DR, update query of HCA caps for EC VFs

This change is needed to use EC VFs with metadata based steering.

There was an assumption that vport was equal to function ID. That's
not the case for EC VF functions. Adjust to function ID and set the
ec_vf_function bit accordingly.

Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 617f5db1 05-Jun-2023 Mark Bloch <mbloch@nvidia.com>

RDMA/mlx5: Fix affinity assignment

The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.

However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.

The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.

To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.

Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# e71383fb 03-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Light probe local SFs

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6d98f314 07-Mar-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Update SRIOV enable/disable to handle EC/VFs

Previously on the embedded CPU platform SRIOV was never enabled/disabled
via mlx5_core_sriov_configure. Host VF updates are provided by an event
handler. Now in the disable flow it must be known if this is a disable
due to driver unload or SRIOV detach, or if the user updated the number
of VFs. If due to change in the number of VFs only wait for the pages of
ECVFs.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9ac0b128 07-Mar-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Update vport caps query/set for EC VFs

These functions are for query/set by vport, there was an underlying
assumption that vport was equal to function ID. That's not the case for
EC VF functions. Set the ec_vf_function bit accordingly.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dc131808 07-Mar-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Enable devlink port for embedded cpu VF vports

Enable creation of a devlink port for EC VF vports.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 21608a2c 09-Apr-2023 Moshe Shemesh <moshe@nvidia.com>

Revert "net/mlx5: Remove "recovery" arg from mlx5_load_one() function"

This reverts commit 5977ac3910f1cbaf44dca48179118b25c206ac29.

Revert this patch as we need the "recovery" arg back in mlx5_load_one()
function. This arg will be used in the next patch for using recovery
timeout during sync reset flow.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9df839a7 23-Apr-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Create a new profile for SFs

Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.

Use a new profile to provide for future differences between SFs and PFs.

The mr_cache not used for non-PF functions, so it is excluded from the
new profile.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 72ed5d56 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

net/mlx5: Suspend auxiliary devices only in case of PCI device suspend

The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add
support for devlink-port in non-representors mode") correctly
re-instantiated uplink devlink port and related netdevice during devlink
reload. However with migration to auxiliary devices, this behaviour
changed.

Restore the original behaviour and tear down auxiliary devices
completely during devlink reload.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5977ac39 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

net/mlx5: Remove "recovery" arg from mlx5_load_one() function

mlx5_load_one() is always called with recovery==false, so remove the
unneeded function arg.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7db98396 06-Dec-2022 Yishai Hadas <yishaih@nvidia.com>

net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE

Implement devlink port function commands to enable / disable RoCE.
This is used to control the RoCE device capabilities.

This patch implement infrastructure which will be used by downstream
patches that will add additional capabilities.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 47d0c500 06-Dec-2022 Shay Drory <shayd@nvidia.com>

net/mlx5: Add generic getters for other functions caps

Downstream patch requires to get other function GENERAL2 caps while
mlx5_vport_get_other_func_cap() gets only one type of caps (general).
Rename it to represent this and introduce a generic implementation
of mlx5_vport_get_other_func_cap().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c4418f34 29-Sep-2022 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5: Add MLX5_FLEXIBLE_INLEN to safely calculate cmd inlen

Some commands use a flexible array after a common header. Add a macro to
safely calculate the total input length of the command, detecting
overflows and printing errors with specific values when such overflows
happen.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 84a433a4 28-Jul-2022 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Lock mlx5 devlink reload callbacks

Change devlink instance locks in mlx5 driver to have devlink reload
callbacks locked, while keeping all driver paths which lead to devl_ API
functions called by the driver locked.

Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which
are used by the paths which are already locked such as devlink reload
callbacks.

This patch makes the driver use devl_ API also for traps register as
these functions are called from the driver paths parallel to reload that
requires locking now.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3008e6a0 25-May-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5: E-Switch, pair only capable devices

OFFLOADS paring using devcom is possible only on devices
that support LAG. Filter based on lag capabilities.

This fixes an issue where mlx5_get_next_phys_dev() was
called without holding the interface lock.

This issue was found when commit
bc4c2f2e0179 ("net/mlx5: Lag, filter non compatible devices")
added an assert that verifies the interface lock is held.

WARNING: CPU: 9 PID: 1706 at drivers/net/ethernet/mellanox/mlx5/core/dev.c:642 mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core]
Modules linked in: mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core overlay fuse [last unloaded: mlx5_core]
CPU: 9 PID: 1706 Comm: devlink Not tainted 5.18.0-rc7+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5_get_next_phys_dev+0xd2/0x100 [mlx5_core]
Code: 02 00 75 48 48 8b 85 80 04 00 00 5d c3 31 c0 5d c3 be ff ff ff ff 48 c7 c7 08 41 5b a0 e8 36 87 28 e3 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 48 c7 c7 0c 91 cc 84 e8 cb 36 6f e1 e9 4d ff
RSP: 0018:ffff88811bf47458 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88811b398000 RCX: 0000000000000001
RDX: 0000000080000000 RSI: ffffffffa05b4108 RDI: ffff88812daaaa78
RBP: ffff88812d050380 R08: 0000000000000001 R09: ffff88811d6b3437
R10: 0000000000000001 R11: 00000000fddd3581 R12: ffff88815238c000
R13: ffff88812d050380 R14: ffff8881018aa7e0 R15: ffff88811d6b3428
FS: 00007fc82e18ae80(0000) GS:ffff88842e080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9630d1b421 CR3: 0000000149802004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
mlx5_esw_offloads_devcom_event+0x99/0x3b0 [mlx5_core]
mlx5_devcom_send_event+0x167/0x1d0 [mlx5_core]
esw_offloads_enable+0x1153/0x1500 [mlx5_core]
? mlx5_esw_offloads_controller_valid+0x170/0x170 [mlx5_core]
? wait_for_completion_io_timeout+0x20/0x20
? mlx5_rescan_drivers_locked+0x318/0x810 [mlx5_core]
mlx5_eswitch_enable_locked+0x586/0xc50 [mlx5_core]
? mlx5_eswitch_disable_pf_vf_vports+0x1d0/0x1d0 [mlx5_core]
? mlx5_esw_try_lock+0x1b/0xb0 [mlx5_core]
? mlx5_eswitch_enable+0x270/0x270 [mlx5_core]
? __debugfs_create_file+0x260/0x3e0
mlx5_devlink_eswitch_mode_set+0x27e/0x870 [mlx5_core]
? mutex_lock_io_nested+0x12c0/0x12c0
? esw_offloads_disable+0x250/0x250 [mlx5_core]
? devlink_nl_cmd_trap_get_dumpit+0x470/0x470
? rcu_read_lock_sched_held+0x3f/0x70
devlink_nl_cmd_eswitch_set_doit+0x217/0x620

Fixes: dd3fddb82780 ("net/mlx5: E-Switch, handle devcom events only for ports on the same device")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# bc4c2f2e 26-Feb-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Lag, filter non compatible devices

When search for a peer lag device we can filter based on that
device's capabilities.

Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 37ca95e6 27-Mar-2022 Gavin Li <gavinl@nvidia.com>

net/mlx5: Increase FW pre-init timeout for health recovery

Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.

To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.

The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 60796198 09-Mar-2022 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Statify function mlx5_cmd_trigger_completions

Starting from commit
4cab346bcf74 ("net/mlx5: No command allowed when command interface is not ready"),
no calls to mlx5_cmd_trigger_completions() are external to cmd.c anymore.
Make it a static function.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 143a41d7 24-Feb-2022 Yishai Hadas <yishaih@nvidia.com>

net/mlx5: Disable SRIOV before PF removal

Virtual functions depend on physical function for device access (for example
firmware host PAGE management), so make sure to disable SR-IOV once PF is gone.

This will prevent also the below warning if PF has gone before disabling SR-IOV.
"driver left SR-IOV enabled after remove"

Next patch from this series will rely on that when the VF may need to
access safely the PF 'driver data'.

Link: https://lore.kernel.org/all/20220224142024.147653-4-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# c9c079b4 03-Jan-2022 Paul Blakey <paulb@nvidia.com>

net/mlx5: CT: Set flow source hint from provided tuple device

Get originating device from tuple offload metadata match ingress_ifindex,
and set flow_source hint to either LOCAL for vf/sf reps, UPLINK for
uplink/wire/tunnel devices/bond, or ANY (as before this patch)
for all others.

This allows lower layer (software steering or firmware) to insert the tuple
rule only in one table (either rx or tx) instead of two (rx and tx).

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b87ef75c 11-Oct-2021 Aya Levin <ayal@nvidia.com>

net/mlx5: Print health buffer by log level

Add log macro which gets log level as a parameter. Use the severity
read from the health buffer and the new log macro to log the health buffer
with severity as log level. Prior to this patch, health buffer was
printed in error log level regardless of its severity. Now the user may
filter dmesg (--level) or change kernel log level to focus on different
severity levels of firmware errors.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 70862a5d 10-Aug-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: Support enable_vnet devlink dev param

Enable user to disable VDPA net auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
name enable_vnet value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create auxiliary device
mlx5_core.vnet.2 for the VDPA net functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87158ced 10-Aug-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: Support enable_rdma devlink dev param

Enable user to disable RDMA auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
name enable_rdma value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create auxiliary device
mlx5_core.rdma.2 for the RDMA functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a17beb28 10-Aug-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: Support enable_eth devlink dev param

Enable user to disable Ethernet auxiliary device so that when it is not
required, user can disable it.

For example,

$ devlink dev param set pci/0000:06:00.0 \
name enable_eth value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0

At this point devlink instance do not create mlx5_core.eth.2 auxiliary
device for the Ethernet functionality.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cac1eb2c 03-Aug-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Lag, properly lock eswitch if needed

Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.

As the code needs to sync between two different devices an extra
care is needed.

- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
progress.
- When attaching a new mdev to lag, block until there is no mode change
in progress. In order for the mode change to finish the interface lock
will have to be taken. Release the lock and sleep for 100ms to
allow forward progress. As this is a very rare condition (can happen if
the user unbinds and binds a PCI function while also changing eswitch
mode of the other PCI function) it has no real world impact.

As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c633e799 02-Aug-2021 Leon Romanovsky <leon@kernel.org>

net/mlx5: Don't skip subfunction cleanup in case of error in module init

Clean SF resources if mlx5 eth failed to initialize.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3b43190b 06-Apr-2021 Shay Drory <shayd@nvidia.com>

net/mlx5: Introduce API for request and release IRQs

Introduce new API that will allow IRQs users to hold a pointer to
mlx5_irq.
In the end of this series, IRQs will be allocated on demand. Hence,
this will allow us to properly manage and use IRQs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8a66e458 14-Apr-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Change ownership model for lag

Lag is used to combine two PCI functions of the same HCA into a single
logical unit. This is a core functionality and as such should be managed by
the core driver. Currently this isn't the case. While we store the lag
software structure inside the lower device, its lifetime (creation /
destruction) is dictated by the mlx5e part. Change the ownership model so
lag is tied to the lifetime of the lower level driver instead to the
mlx5e part.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e71b75f7 14-Mar-2021 Leon Romanovsky <leon@kernel.org>

net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks

The mlx5 implementation executes a firmware command on the PF to change
the configuration of the selected VF.

Link: https://lore.kernel.org/linux-pci/20210314124256.70253-5-leon@kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 604774ad 14-Mar-2021 Leon Romanovsky <leon@kernel.org>

net/mlx5: Dynamically assign MSI-X vectors count

The number of MSI-X vectors is a PCI property visible through lspci. The
field is read-only and configured by the device. The mlx5 devices work in
a static or dynamic assignment mode.

Static assignment means that all newly created VFs have a preset number of
MSI-X vectors determined by device configuration parameters. This can
result in some VFs having too many or too few MSI-X vectors. Till now this
has been the only means of fine-tuning the MSI-X vector count and it was
acceptable for small numbers of VFs.

With dynamic assignment the inefficiency of having a fixed number of MSI-X
vectors can be avoided with each VF having exactly the required
vectors. Userspace will provide this information while provisioning the VF
for use, based on the intended use. For instance if being used with a VM,
the MSI-X vector count might be matched to the CPU count of the VM.

For compatibility mlx5 continues to start up with MSI-X vector assignment,
but the kernel can now access a larger dynamic vector pool and assign more
vectors to created VFs.

Link: https://lore.kernel.org/linux-pci/20210314124256.70253-4-leon@kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# fe06992b 03-Nov-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Check returned value from health recover sequence

MLX5_INTERFACE_STATE_UP is far from being reliable check for success to
recover, because it can be changed any time and health logic doesn't
have any locks to protect from it.

The locks are not needed here because health recover is good to have,
but not must to success, so rely on the returned value from the
mlx5_recover_device() as a marker for success/failure.

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6dea2f7e 02-Nov-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Separate probe vs. reload flows

The mix between probe/unprobe and reload flows causes to have an extra
mutex lock intf_state_mutex that generates LOCKDEP warning between it
and devlink_mutex. As a preparation for the future removal, separate
those flows.

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 84ae9c1f 23-Sep-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping

Following patches in the series need to be able to map VF netdev to vport.
Since it is trivial to obtain vhca_id from netdev, maintain mapping from
vhca_id to vport_num inside eswitch offloads using xarray. Provide function
mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches
to obtain the mapping.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1958fc2f 11-Dec-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: SF, Add auxiliary device driver

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f3196bb0 11-Dec-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: Introduce vhca state event notifier

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 601c10c8 05-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Delete custom device management logic

After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 912cebf4 04-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5e: Connect ethernet part to auxiliary bus

Reuse auxiliary bus to perform device management of the
ethernet part of the mlx5 driver.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# a925b5e3 08-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Register mlx5 devices to auxiliary virtual bus

Create auxiliary devices under new virtual bus. This will replace
the custom-made mlx5 ->add()/->remove() interfaces and next patches
will fill the missing callback and remove the old interface logic.

The attachment of auxiliary drivers to the devices is possible in
1-to-1 manner only and it requires us to create device for every protocol,
so that device (module) will be able to connect to it.

System with 2 IB and 1 RoCE cards:
[leonro@vm ~]$ lspci |grep nox
00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0
mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1
mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2
mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1
mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2
[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457

System with RoCE SR-IOV card with 4 VFs:
[leonro@vm ~]$ lspci |grep nox
01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0
mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1
mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2
mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3
mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4
mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1
mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2
mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3
mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4
[leonro@vm ~]$ rdma dev
0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 17a7612b 04-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5_core: Clean driver version and name

Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.

And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# e5dfe6b5 20-Nov-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: Avoid exposing driver internal command helpers

mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 38b9f903 07-Oct-2020 Moshe Shemesh <moshe@mellanox.com>

net/mlx5: Handle sync reset request event

Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7be3412a 09-Sep-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: Use dma device access helper

Use the PCI device directly for dma accesses as non PCI device unlikely
support IOMMU and dma mappings.
Introduce and use helper routine to access DMA device.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 90bf1c8d 07-May-2020 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5: Move internal timer read function to clock library

Move mlx5_read_internal_timer() into lib/clock.c file as it is being
used there. As such, make this function a static one.

In addition, rearrange headers include to support function move.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f999b706 08-Mar-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Simplify mlx5_unload_one() and its callers

mlx5_unload_one() always returns 0.
Simplify callers of mlx5_unload_one() and remove the dead code.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ecd01db8 08-Mar-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Simplify mlx5_register_device to return void

mlx5_register_device() doesn't check for any error and always returns 0.
Simplify mlx5_register_device() to return void and its caller, remove
dead code related to it.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e387f7d5 27-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlx5: register lag notifier for init network namespace only

The current code causes problems when the unregistering netdevice could
be different then the registering one.

Since the check in mlx5_lag_netdev_event() does not allow any other
network namespace anyway, fix this by registerting the lag notifier
per init network namespace only.

Fixes: d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Aya Levin <ayal@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d48834f9 24-Jan-2020 Jiri Pirko <jiri@mellanox.com>

mlx5: Use dev_net netdevice notifier registrations

Register the dev_net notifier and allow the per-net notifier to follow
the device into different namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4383cfcc 27-Oct-2019 Michael Guralnik <michaelgur@mellanox.com>

net/mlx5: Add devlink reload

Implement devlink reload for mlx5.

Usage example:
devlink dev reload pci/0000:06:00.0

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c9b9dcb4 29-Aug-2019 Ariel Levkovich <lariel@mellanox.com>

net/mlx5: Move device memory management to mlx5_core

Move the device memory allocation and deallocation commands
SW ICM memory to mlx5_core to expose this API for all
mlx5_core users.

This comes as preparation for supporting SW steering in kernel
where it will be required to allocate and register device
memory for direct rule insertion.

In addition, an API to register this device memory for future
remote access operations is introduced using the create_mkey
commands.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9f818c8a 09-Aug-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mlx5: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

This cleans up a lot of unneeded code and logic around the debugfs
files, making all of this much simpler and easier to understand as we
don't need to keep the dentries saved anymore.

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9c86b07e 02-Jul-2019 Shay Agroskin <shayag@mellanox.com>

net/mlx5: Added fw version query command

Using the MCQI and MCQS registers, we query the running and pending
fw version of the HCA.
The MCQS is queried with sequentially increasing component index, until
a component of type BOOT_IMG is found. Querying this component's version
using the MCQI register yields the running and pending fw version of the
HCA.

Querying MCQI for the pending fw version should be done only after
validating that such fw version exists. This is done my checking
'component update state' field in MCQS output.

Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3e5b72ac 12-Nov-2018 Feras Daoud <ferasda@mellanox.com>

net/mlx5: Issue SW reset on FW assert

If a FW assert is considered fatal, indicated by a new bit in the health
buffer, reset the FW. After the reset go through the normal recovery
flow. Only one PF needs to issue the reset, so an attempt is made to
prevent the 2nd function from also issuing the reset.
It's not an error if that happens, it just slows recovery.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1ef6f1a1 02-Dec-2018 Feras Daoud <ferasda@mellanox.com>

net/mlx5: Control CR-space access by different PFs

Since the FW can be shared between different PFs/VFs it is common
that more than one health poll will detected a failure, this can
lead to multiple resets which are unneeded.

The solution is to use a FW locking mechanism using semaphore space
to provide a way to allow only one device to collect the cr-dump and
to issue a sw-reset.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 63cbc552 12-Nov-2018 Feras Daoud <ferasda@mellanox.com>

net/mlx5: Handle SW reset of FW in error flow

New mlx5 adapters allow the driver to reset the FW in the event of an
error, this action called "SW Reset". When an SW reset is issued on any
PF all PFs enter reset state which is a recoverable condition. The
existing recovery flow was designed to allow the recovery of a VF after
a PF driver reload. This patch adds the sw reset to the NIC states
as a preparation for sw reset handling.

When a software reset is issued the following occurs:
1. The NIC interface mode is set to 7 while the reset is in progress.
2. Once the reset completes the NIC interface mode is set to 1.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 256cf690 10-Jun-2019 Yuval Avnery <yuvalav@mellanox.com>

net/mlx5: Move all IRQ logic to pci_irq.c

Finalize IRQ separation and expose irq interface.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e1706e62 10-Jun-2019 Yuval Avnery <yuvalav@mellanox.com>

net/mlx5: Separate IRQ table creation from EQ table creation

IRQ allocation should be part of the IRQ table life-cycle.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 561aa15a 10-Jun-2019 Yuval Avnery <yuvalav@mellanox.com>

net/mlx5: Separate IRQ data from EQ table data

IRQ table should only exist for mlx5_core_dev for PF and VF only.
EQ table of mediated devices should hold a pointer to the IRQ table
of the parent PCI device.

Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 44f18db5 04-Jun-2019 Jiri Pirko <jiri@mellanox.com>

mlxfw: Propagate error messages through extack

Currently the error messages are printed to dmesg. Propagate them also
to directly to user doing the flashing through extack.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27b942fb 29-Apr-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Get rid of storing copy of device name

Currently mlx5 core stores copy of the PCI device name in a
mlx5_priv structure and uses pr_warn, pr_err helpers.

Get rid of the copy of this name; instead store the parent device
pointer that contains name as well as dma specific parameters.
This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg
device specific print routines.

This is also a preparation patch to access non PCI parent device in
future.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3732b972 29-Mar-2019 Aya Levin <ayal@mellanox.com>

net/mlx5: Add rate limit print macros

Add rate limited print macros for warning and info level. This protects
the system from burst of prints depleting HW resources and spamming dmesg.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d05120f5 29-Mar-2019 Huy Nguyen <huyn@mellanox.com>

net/mlx5: Make mlx5_core messages independent from mdev->pdev

Detach mlx5_core mdev messages from pci device mdev->pdev messages and
provide a better report/debug of different mlx5 device types.

This patch does not change any functionality.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>


# eb5cc431 21-Mar-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Simplify mlx5_sriov_is_enabled() by using pci core API

It is desired to get rid of num_vfs stored inside mlx5_core_sriov to
safely support vports more than vfs.
To reduce dependency on mlx5_core_sriov num_vfs, start using
pci_num_vf() from pci core.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6ffb6303 26-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5: Remove redundant lag function to get pf num

The function is not being used.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 591905ba 12-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: Introduce Mellanox SmartNIC and modify page management logic

Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.

With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.

When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.

The encoding before this patch was as follows:
function_id == 0: pages are requested for the function receiving
the EQE.
function_id != 0: pages are requested for VF identified by the
function_id value

A new one bit field in the EQE identifies that pages are requested for
the ECPF.

The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:

1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function

This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4cab346b 07-Feb-2019 Huy Nguyen <huyn@mellanox.com>

net/mlx5: No command allowed when command interface is not ready

When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.

Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.

Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c12ecc23 25-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Move to use common phys port names for vport representors

With VF LAG commit 491c37e49b48 "net/mlx5e: In case of LAG, one switch
parent id is used for all representors", both uplinks and all the VFs
(on both of them) get the same switchdev id.

This cause the provisioning system method to identify the rep of a given
VF from the parent PF PCI device using switchev id and physical port
name to break, since VFm of PF0 will have the (id, name) as VFm of PF1.

To fix that, we align to use the framework agreed upstream and set by
nfp commit 168c478e107e "nfp: wire get_phys_port_name on representors":

$ cat /sys/class/net/eth4_*/phys_port_name
p0
pf0vf0
pf0vf1

Now, the names will be different, e.g. pf0vf0 vs. pf1vf0.

Fixes: 491c37e49b48 ("net/mlx5e: In case of LAG, one switch parent id is used for all representors")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Waleed Musa <waleedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4a0475d5 03-Dec-2018 Miroslav Lichvar <mlichvar@redhat.com>

mlx5: extend PTP gettime function to read system clock

Read the system time right before and immediately after reading the low
register of the internal timer. This adds support for the
PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# eff849b2 06-Jun-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5: Allow/disallow LAG according to pre-req only

Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 02039fb6 26-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Remove unused events callback and logic

The mlx5_interface->event callback is not used by mlx5e/mlx5_ib anymore.

We totally remove the delayed events logic work around, since with
the dynamic notifier registration API it is not needed anymore, mlx5_ib
can register its notifier and start receiving events exactly at the moment
it is ready to handle them.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 69c1280b 20-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Device events, Use async events chain

Move all the generic async events handling into new specific events
handling file events.c to keep eq.c file clean from concrete event logic
handling.

Use new API to register for NOTIFY_ANY to handle generic events and
dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e)

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 71edc69c 20-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: CmdIF, Use async events chain

Remove the explicit call to mlx5_cmd_comp_handler on MLX5_EVENT_TYPE_CMD
and let command interface to register its own handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0f597ed4 20-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Introduce atomic notifier chain subscription API

Use atomic_notifier_chain to fire firmware events at internal mlx5 core
components such as eswitch/fpga/clock/FW tracer/etc.., this is to
avoid explicit calls from low level mlx5_core to upper components and to
simplify the mlx5_core API for future developments.

Simply provide register/unregister notifiers API and call the notifier
chain on firmware async events.

Example: to subscribe to a FW event:
struct mlx5_nb port_event;

MLX5_NB_INIT(&port_event, port_event_handler, PORT_CHANGE);
mlx5_eq_notifier_register(mdev, &port_event);

where:
- port_event_handler is the notifier block callback.
- PORT_EVENT is the suffix of MLX5_EVENT_TYPE_PORT_CHANGE.

The above will guarantee that port_event_handler will receive all FW
events of the type MLX5_EVENT_TYPE_PORT_CHANGE.

To receive all FW/HW events one can subscribe to
MLX5_EVENT_TYPE_NOTIFY_ANY.

The next few patches will start moving all mlx5 core components to use
this new API and cleanup mlx5_eq_async_int misx handler from component
explicit calls and specific logic.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d5d284b8 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA

Use the new generic EQ API to move all ODP RDMA data structures and logic
form mlx5 core driver into mlx5_ib driver.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 16d76083 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Different EQ types

In mlx5 we have three types of usages for EQs,
1. Asynchronous EQs, used internally by mlx5 core for
a. FW command completions
b. FW page requests
c. one EQ for all other Asynchronous events

2. Completion EQs, used for CQ completion (we create one per core)

3. *Special type of EQ (page fault) used for RDMA on demand paging
(ODP).

*The 3rd type shouldn't be special at least in mlx5 core, it is yet
another async events EQ with specific use case, it will be removed in
the next two patches, and will completely move its logic to mlx5_ib,
as it is rdma specific.

In this patch we remove use case (eq type) specific fields from
struct mlx5_eq into a new eq type specific structures.

struct mlx5_eq_async;
truct mlx5_eq_comp;
struct mlx5_eq_pagefault;

Separate between their type specific flows.

In the future we will allow users to create there own generic EQs.
for now we will allow only one for ODP in next patches.

We will introduce event listeners registration API for those who
want to receive mlx5 async events.
After that mlx5 eq handling will be clean from feature/user specific
handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# f2f3df55 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Privatize eq_table and friends

Move unnecessary EQ table structures and declaration from the
public include/linux/mlx5/driver.h into the private area of mlx5_core
and into eq.c/eq.h.

Introduce new mlx5 EQ APIs:

mlx5_comp_vectors_count(dev);
mlx5_comp_irq_get_affinity_mask(dev, vector);

And use them from mlx5_ib or mlx5e netdevice instead of direct access to
mlx5_core internal structures.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# c8e21b3b 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Create all EQs in one place

Instead of creating the EQ table in three steps at driver load,
- allocate irq vectors
- allocate async EQs
- allocate completion EQs
Gather all of the procedures into one function in eq.c and call it from
driver load.

This will help us reduce the EQ and EQ table private structures
visibility to eq.c in downstream refactoring.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# ca828cb4 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Move all EQ logic to eq.c

Move completion EQs flows from main.c to eq.c, reasons:
1) It is where this logic belongs.
2) It will help centralize the EQ logic in one file for downstream
refactoring, and future extensions/updates.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# fcd29ad1 09-Aug-2018 Feras Daoud <ferasda@mellanox.com>

net/mlx5: Add Fast teardown support

Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown

This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.

Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 50acec06 28-Aug-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Export packet reformat alloc/dealloc functions

This will allow for the RDMA side to allocate packet reformat context.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 31ca3648 28-Aug-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Pass a namespace for packet reformat ID allocation

Currently we attach packet reformat actions only to the FDB namespace.
In preparation to be able to use that for NIC steering, pass the actual
namespace as a parameter.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 60786f09 28-Aug-2018 Mark Bloch <markb@mellanox.com>

{net, RDMA}/mlx5: Rename encap to reformat packet

Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 90c1d1b8 28-Aug-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Export modify header alloc/dealloc functions

Those functions will be used by the RDMA side to create modify header
actions to be attached to flow steering rules via verbs.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 6dbc80ca 29-Jul-2018 Moshe Shemesh <moshe@mellanox.com>

net/mlx5e: clock.c depends on CONFIG_PTP_1588_CLOCK

lib/clock.c includes clock related functions which require ptp support.
Thus compile out lib/clock.c and add the needed function stubs in case
kconfig CONFIG_PTP_1588_CLOCK is off.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b30408d7 24-Jun-2018 Leon Romanovsky <leon@kernel.org>

net/mlx5: Rate limit errors in command interface

Any error status returned by FW will trigger a print similar to the
following error message in the dmesg.

[ 55.884355] mlx5_core 0000:00:04.0: mlx5_cmd_check:712:(pid 555):
ALLOC_UAR(0x802) op_mod(0x0) failed, status limits exceeded(0x8),
syndrome (0x0)

Those prints are extremely valuable to diagnose issues with running system
and it is important to keep them. However, not-so-careful user can trigger
endless number of such prints by depleting HW resources and will spam
dmesg.

Rate limiting of such messages solves this issue.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 1ef903bf 26-Mar-2018 Daniel Jurgens <danielj@mellanox.com>

net/mlx5: Free IRQs in shutdown path

Some platforms require IRQs to be free'd in the shutdown path. Otherwise
they will fail to be reallocated after a kexec.

Fixes: 8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0608d4db 17-Jan-2018 Tariq Toukan <tariqt@mellanox.com>

net/mlx5e: Unify slow PCI heuristic

Get the link/pci speed query and logic into a single function.
Unify the heuristics and use a single PCI threshold (16G) for all.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c5447c70 23-Jan-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Reload IB interface when switching devlink modes

Up until this point it wasn't possible to activate IB representors
when switching to switchdev mode, remove this limitation.

We trigger reload of the PF IB interface in order to make sure that
already allocated resources are invalid and new resources will be opened
correctly with all the limitations of switchdev mode applied (only raw
packet capabilities, without RoCE). We also move the remove/add to a
place where the E-Switch mode is set/unset to better control when to
trigger this action, this will allow the IB side to start in the correct
mode.

For better code reuse, create a function which reloads an interface and
export it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 57cbd893 16-Jan-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Move representors definition to a global scope

In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3ec5693b 01-Feb-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Remove redundant EQ API exports

EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.

Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>


# d5c07157 01-Feb-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ add/del CQ API

Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ
creation/destruction, as CQ table is now private to eq.c.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>


# 7ca560b5 19-Dec-2017 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5e: Poll event queue upon TX timeout before performing full channels recovery

Up until this patch, on every TX timeout we would try to do channels
recovery. However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).

This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.

Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue. One of the above actions will move the queue to be ready for
transmit again.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8737f818 04-Jan-2018 Daniel Jurgens <danielj@mellanox.com>

net/mlx5: Set software owner ID during init HCA

Generate a unique 128bit identifier for each host and pass that value to
firmware in the INIT_HCA command if it reports the sw_owner_id
capability. Each device bound to the mlx5_core driver will have the same
software owner ID.

In subsequent patches mlx5_core devices will be bound via a new VPort
command so that they can operate together under a single InfiniBand
device. Only devices that have the same software owner ID can be bound,
to prevent traffic intended for one host arriving at another.

The INIT_HCA command length was expanded by 128 bits. The command
length is provided as an input FW commands. Older FW does not have a
problem receiving this command in the new longer form.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# c02762eb 18-Jul-2017 Huy Nguyen <huyn@mellanox.com>

net/mlx5: QCAM register firmware command support

The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7c39afb3 15-Aug-2017 Feras Daoud <ferasda@mellanox.com>

net/mlx5: PTP code migration to driver core section

PTP code is moved to core section of mlx5 driver in order to share
it between ethernet and infiniband. This movement involves the following
changes:
- Change mlx5e_ prefix to be mlx5_
- Add clock structs to Core
- Add clock object to mlx5_core_dev
- Call Init/Uninit clock from core init/cleanup
- Rename mlx5e_tstamp to be mlx5_clock

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 78249c42 13-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

mlx5: convert to generic pci_alloc_irq_vectors

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# a9f7705f 11-Jun-2017 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Unify vport manager capability check

Expose MLX5_VPORT_MANAGER macro to check for strict vport manager
E-switch and MPFS (Multi Physical Function Switch) abilities.

VPORT manager must be a PF with an ethernet link and with FW advertised
vport group manager capability

Replace older checks with the new macro and use it where needed in
eswitch.c and mlx5e netdev eswitch related flows.

The same macro will be reused in MPFS separation downstream patch.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fa367688 25-May-2017 Eugenia Emantayev <eugenia@mellanox.com>

net/mlx5e: Add field select to MTPPS register

In order to mark relevant fields while setting the MTPPS register
add field select. Otherwise it can cause a misconfiguration in
firmware.

Fixes: ee7f12205abc ('net/mlx5e: Implement 1PPS support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 62bd22cf 18-Apr-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Add mlxfw callbacks

Add mlx5 implementation for the ones defined by the mlxfw
shared module to be used while flashing the device firmware.

The callbacks do their job through the MCQI, MCC and MCDA registers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8812c24d 09-Feb-2017 Majd Dibbiny <majd@mellanox.com>

net/mlx5: Add fast unload support in shutdown flow

Adding a support to flush all HW resources with one FW command and
skip all the heavy unload flows of the driver on kernel shutdown.
There's no need to free all the SW context since a new fresh kernel
will be loaded afterwards.

Regarding the FW resources, they should be closed, otherwise we will
have leakage in the FW. To accelerate this flow, we execute one command
in the beginning that tells the FW that the driver isn't going to close
any of the FW resources and asks the FW to clean up everything.
Once the commands complete, it's safe to close the PCI resources and
finish the routine.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 552db7bc 08-May-2017 Moni Shoua <monis@mellanox.com>

net/mlx5: Undo LAG upon request to create virtual functions

LAG cannot work if virtual functions are present. Therefore, if LAG is
configured, the attempt to create virtual functions will fail. This gives
precedence to LAG over SRIOV which is not the desired behavior as users
might want to use the bonding/teaming driver also want to work with SRIOV.
In that case we don't want to force an order of actions, first create
virtual functions and only than configure a bonding/teaming net device.
To fix, if LAG is configured during a request to create virtual
functions, remove it and continue.

We ignore ENODEV when trying to forbid lag. This makes sense
because "No such device" means that lag is forbidden anyway.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7913d205 22-Feb-2017 Tariq Toukan <tariqt@mellanox.com>

net/mlx5: Bump driver version

Remove date and bump version for mlx5_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2de24fed 19-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Introduce alloc/dealloc modify header context commands

Implement the low-level commands to support packet header re-write.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c835ad64 08-Dec-2016 Gal Pressman <galp@mellanox.com>

net/mlx5: Implement PCAM, MCAM access register commands

Introduced registers will expose capabilities of new registers and
features related to port/management.
Driver will query MCAM and PCAM in order to avoid failing on old
firmwares with lack of support.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f9a1ef72 10-Oct-2016 Eugenia Emantayev <eugenia@mellanox.com>

net/mlx5: Add MTPPS and MTPPSE registers infrastructure

Implement query and set functionality for MTPPS and MTPPSE registers.
MTPPS (Management Pulse Per Second) provides the device PPS capabilities,
configures the PPS in and out modules and holds the PPS in time stamp.
Query MTPPS is supported only when HCA_CAP.pps is set and modify is supported
when HCA_CAP.pps_modify is set.

MTPPSE (Management Pulse Per Second Event) configures the different event
generation modes for PPS. Supported when HCA_CAP.pps is set.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d9aaed83 02-Jan-2017 Artemy Kovalyov <artemyko@mellanox.com>

{net,IB}/mlx5: Refactor page fault handling

* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5a1d1c2 21-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

clocksource: Use a plain u64 instead of cycle_t

There is no point in having an extra type for extra confusion. u64 is
unambiguous.

Conversion was done with the following coccinelle script:

@rem@
@@
-typedef u64 cycle_t;

@fix@
typedef cycle_t;
@@
-cycle_t
+u64

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>


# f9c14e46 06-Dec-2016 Kamal Heib <kamalh@mellanox.com>

net/mlx5: Fix query ISSI flow

In old FWs query ISSI command is not supported and for some of those FWs
it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR".

In such case instead of failing the driver load, we will treat any FW
status other than 0 for Query ISSI FW command as ISSI not supported and
assume ISSI=0 (most basic driver/FW interface).

In case of driver syndrom (query ISSI failure by driver) we will fail
driver load.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e5b2fc1 06-Dec-2016 Kamal Heib <kamalh@mellanox.com>

net/mlx5: Remove duplicate pci dev name print

Remove duplicate pci dev name printing from mlx5_core_warn/dbg.

Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f663ad98 06-Dec-2016 Kamal Heib <kamalh@mellanox.com>

net/mlx5: Verify module parameters

Verify the mlx5_core module parameters by making sure that they are in
the expected range and if they aren't restore them to their default
values.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4eb4cd7 17-Nov-2016 Huy Nguyen <huyn@mellanox.com>

net/mlx5: Add handling for port module event

For each asynchronous port module event:
1. print with ratelimit to the dmesg log
2. increment the corresponding event counter

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae9f83ac 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Move alloc/dealloc encap commands declarations to common header file

The alloc and dealloc encap commands will be used in the mlx5e driver,
as such, declare them in a common header file.

Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with
mlx5_encap_{de}alloc.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 813f8540 11-Aug-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Introduce TSAR manipulation firmware commands

TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.

In this patch we implement the API of manipulating the TSAR.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 04c0c1ab 25-Oct-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: PCI error recovery health care simulation

In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.

The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.

We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.

Fixes: 89d44f0a6c73 ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f1ee87fe 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Organize device list API in one place

Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose
an organized modular API to manage and manipulate mlx5 devices list.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# acab721b 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Implement SRIOV attach/detach flows

Needed for lightweight and modular internal/pci error handling.
Implement sriov attach function which enables pre-saved number of vfs on
the device side.
Implement sriov detach function which disable the current vfs on the
device side.
Init/cleanup function only handles sriov software context allocation and
destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b6adee3 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: SRIOV core code refactoring

Simplify the code and makes it look modular and symmetric.
Split sriov enable/disable to two levels: device level and pci level.
When user enable/disable sriov (via sriov_configure driver callback) we
will enable/disable both device and pci sriov.
When driver load/unload we will enable/disable (on demand) only device
sriov while keeping the PCI sriov enabled for next driver load.
On internal/pci error, VFs will be kept enabled on PCI and the reset
is done only in device level.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 917b41aa 30-May-2016 Aviv Heller <avivh@mellanox.com>

net/mlx5: Configure IB devices according to LAG state

When mlx5_ib is loaded, we would like each card's IB devices
to be added according to its LAG state (one IB device, instead of
two, is to be added if LAG is active).

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# edb31b16 17-Apr-2016 Aviv Heller <avivh@mellanox.com>

net/mlx5: LAG and SRIOV cannot be used together

Until support will be added for RoCE LAG SRIOV.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# db60b802 30-May-2016 Aviv Heller <avivh@mellanox.com>

net/mlx5e: Avoid port remapping of mlx5e netdev TISes

TISes belonging to the mlx5e NIC should not be
subject to port remap.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 7907f23a 17-Apr-2016 Aviv Heller <avivh@mellanox.com>

net/mlx5: Implement RoCE LAG feature

Available on dual port cards only, this feature keeps
track, using netdev LAG events, of the bonding
and link status of each port's PF netdev.

When both of the card's PF netdevs are enslaved to the
same bond/team master, and only them, LAG state
is active.

During LAG, only one IB device is present for both ports.

In addition to the above, this commit includes FW commands
used for managing the LAG, new facilities for adding and removing
a single device by interface, and port remap functionality according to
bond events.

Please note that this feature is currently used only for mimicking
Ethernet bonding for RoCE - netdevs functionality is not altered,
and their bonding continues to be managed solely by bond/team driver.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 2974ab6e 28-Jul-2016 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Improve driver log messages

Remove duplicate pci dev name printing in mlx5_core_err.
Use mlx5_core_{warn,info,err} where possible to have the pci info in the
driver log messages.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# c4f287c4 19-Jul-2016 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Unify and improve command interface

Now as all commands use mlx5 ifc interface, instead of doing two calls
for executing a command we embed command status checking into
mlx5_cmd_exec to simplify the interface.

Also we do here some cleanup for redundant software structures
(inbox/outbox) and functions and improved command failure output.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 94c6825e 17-Apr-2016 Matan Barak <matanb@mellanox.com>

net/mlx5_core: Use tasklet for user-space CQ completion events

Previously, we've fired all our completion callbacks straight from
our ISR.

Some of those callbacks were lightweight (for example, mlx5 Ethernet
napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over
those events could be so long, that we'll get into a hard lockup by
the system watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive
completion event to a per-EQ list and schedule a tasklet. In the
tasklet context we loop over all the CQs in the list and invoke the
user callback.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# efdc810b 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Flow steering, Add vport ACL support

Update the relevant flow steering device structs and commands to
support vport.
Update the flow steering core API to receive vport number.
Add ingress and egress ACL flow table name spaces.
Add ACL flow table support:
* ACL (Access Control List) flow table is a table that contains
only allow/drop steering rules.

* We have two types of ACL flow tables - ingress and egress.

* ACLs handle traffic sent from/to E-Switch FDB table, Ingress refers to
traffic sent from Vport to E-Switch and Egress refers to traffic sent
from E-Switch to vport.

* Ingress ACL flow table allow/drop rules is checked against traffic
sent from VF.

* Egress ACL flow table allow/drop rules is checked against traffic sent
to VF.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# daa21560 01-Mar-2016 Tariq Toukan <tariqt@mellanox.com>

net/mlx5e: Replace async events spinlock with synchronize_irq()

We only need to flush the irq handler to make sure it does not
queue a work into the global work queue after we start to flush it.
So using synchronize_irq() is more appropriate than a spin lock.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0844444 29-Dec-2015 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5_core: Introduce access function to read internal timer

A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 108805fc 10-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Assign random MAC address if needed

Under SRIOV there might be a case where VFs are loaded
without pre-assigned MAC address. In this case, the VF
will randomize its own MAC. This will address the case
of administrator not assigning MAC to the VF through
the PF OS APIs and keep udev happy.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81848731 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Add SR-IOV (FDB) support

Enabling E-Switch SRIOV for nvfs+1 vports.

Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and
external vport (Uplink).

FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
Unmached Traffic (src vport != Uplink) -> Uplink.
Unmached Traffic (src vport == Uplink) -> vport0(PF).

FDB rules population:
Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc50db98 01-Dec-2015 Eli Cohen <eli@mellanox.com>

net/mlx5_core: Add base sriov support

This patch adds SRIOV base support for mlx5 supported devices. The same
driver is used for both PFs and VFs; VFs are identified by the driver
through the flag MLX5_PCI_DEV_IS_VF added to the pci table entries.
Virtual functions are created as usual through writing a value to the
sriov_numvs sysfs file of the PF device. Upon instantiating VFs, they will
all be probed by the driver on the hypervisor. One can gracefully unbind
them through /sys/bus/pci/drivers/mlx5_core/unbind.

mlx5_wait_for_vf_pages() was added to ensure that when a VF dies without
executing proper teardown, the hypervisor driver waits till all of the
pages that were allocated at the hypervisor to maintain its operation
are returned.

In order for the VF to be operational, the PF needs to call enable_hca
for it. This can be done before the VFs are created through a call to
pci_enable_sriov.

If the there are VFs assigned to a VMs when the driver of the PF is
unloaded, all the VF will experience system error and PF driver unloads
cleanly; in this case pci_disable_sriov is not called and the devices
will show when running lspci. Once the PF driver is reloaded, it will
sync its data structures which maintain state on its VFs.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b107106 01-Dec-2015 Eli Cohen <eli@mellanox.com>

net/mlx5_core: Modify enable/disable hca functions

Modify these functions to have func_id argument to state which device we
are referring to. This is done as a preparation for SRIOV support where
a PF driver needs to control its virtual functions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89d44f0a 14-Oct-2015 Majd Dibbiny <majd@mellanox.com>

net/mlx5_core: Add pci error handlers to mlx5_core driver

This patch implement the pci_error_handlers for mlx5_core which allow the
driver to recover from PCI error.

Once an error is detected in the PCI, the mlx5_pci_err_detected is called
and it:
1) Marks the device to be in 'Internal Error' state.
2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes
with error.
3) Returns all the on going commands with error.
4) Unloads the driver.

Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it
enables the device and restore it's pci state.

If the later succeeds, mlx5_pci_resume is called, and it loads the SW
stack.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a788398 08-Oct-2015 Eli Cohen <eli@mellanox.com>

net/mlx5_core: Improve mlx5 messages

Improve the messages printed by the mlx5 macros to include the device
string. In addition, prefix names used by the macros with two underscores
to avoid possible name collisions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c928ed55 29-Jul-2015 Haggai Abramonvsky <hagaya@mellanox.com>

net/mlx5_core: Check the return value of mlx5_command_exec()

mlx5_cmd_exec() might fail - need to check return value.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 211e6c80 04-Jun-2015 Majd Dibbiny <majd@mellanox.com>

net/mlx5_core: Get vendor-id using the query adapter command

Add two wrapper functions to the query adapter command:

1. mlx5_query_board_id -- replaces the old mlx5_cmd_query_adapter.

2. mlx5_core_query_vendor_id -- retrieves the vendor_id from the
query_adapter command.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f62b8bb8 28-May-2015 Amir Vadai <amirv@mellanox.com>

net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality

This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4
Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver
extends the existing mlx5 driver with Ethernet functionality.

This patch contains the driver entry points but does not include
transmit and receive (see the previous patch in the series) routines.

It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the
Ethernet functionality. Currently, Kconfig is programmed to make
Ethernet and Infiniband functionality mutally exclusive.
Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of
selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND
being selected.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 938fe83c 28-May-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5_core: New device capabilities handling

- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
cap).
- Modify IB driver to use new macros for checking caps.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 302bdf68 02-Apr-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5_core: Fix Mellanox copyright note

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a91de28 07-May-2014 Joe Perches <joe@perches.com>

mellanox: Logging message cleanups

Use a more current logging style.

o Coalesce formats
o Add missing spaces for coalesced formats
o Align arguments for modified formats
o Add missing newlines for some logging messages
o Use DRV_NAME as part of format instead of %s, DRV_NAME to
reduce overall text.
o Use ..., ##__VA_ARGS__ instead of args... in macros
o Correct a few format typos
o Use a single line message where appropriate

Signed-off-by: Joe Perches <joe@perches.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e126ba97 07-Jul-2013 Eli Cohen <eli@mellanox.com>

mlx5: Add driver for Mellanox Connect-IB adapters

The driver is comprised of two kernel modules: mlx5_ib and mlx5_core.
This partitioning resembles what we have for mlx4, except that mlx5_ib
is the pci device driver and not mlx5_core.

mlx5_core is essentially a library that provides general functionality
that is intended to be used by other Mellanox devices that will be
introduced in the future. mlx5_ib has a similar role as any hardware
device under drivers/infiniband/hw.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>.
- Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>