History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
Revision Date Author Comments
# bf729988 11-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Restore mistakenly dropped parts in register devlink flow

Code parts from cited commit were mistakenly dropped while rebasing
before submission. Add them here.

Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240411115444.374475-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c6e77aa9 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: Register devlink first under devlink lock

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
<TASK>
? __warn+0x79/0x120
? devlink_recover_notify.constprop.0+0xb8/0xc0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_recover_notify.constprop.0+0xb8/0xc0
devlink_health_report+0x4a/0x1c0
mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
process_one_work+0x1bb/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x4d/0x3c0
? process_one_work+0x3c0/0x3c0
kthread+0xc6/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 137cef6d 25-Jan-2024 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: SF, Stop waiting for FW as teardown was called

When PF/VF teardown is called the driver sets the flag
MLX5_BREAK_FW_WAIT to stop waiting for FW loading and initializing. Same
should be applied to SF driver teardown to cut waiting time. On
mlx5_sf_dev_remove() set the flag before draining health WQ as recovery
flow may also wait for FW reloading while it is not relevant anymore.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ac5f3956 13-Sep-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: SF, Implement peer devlink set for SF representor devlink port

Benefit from the existence of internal mlx5 notifier and extend it by
event MLX5_DRIVER_EVENT_SF_PEER_DEVLINK. Use this event from SF
auxiliary device probe/remove functions to pass the registered SF
devlink instance to the SF representor.

Process the new event in SF representor code and call
devl_port_fn_devlink_set() to do the assignments. Implement this in work
to avoid possible deadlock when probe/remove function of SF may be
called with devlink instance lock held during devlink reload.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e71383fb 03-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Light probe local SFs

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.

In order to save the time of the reload, local SFs will probe without
any auxiliary sub-device, so that the SFs can be configured prior to
its full probe.

The defaults of the enable_* devlink params of these SFs are set to
false.

Usage example:
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b4646da0 23-Apr-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: SF, Drain health before removing device

There is no point in recovery during device removal. Also, if health
work started need to wait for it to avoid races and NULL pointer
access.

Hence, drain health WQ before removing device.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9df839a7 23-Apr-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Create a new profile for SFs

Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.

Use a new profile to provide for future differences between SFs and PFs.

The mr_cache not used for non-PF functions, so it is excluded from the
new profile.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 72ed5d56 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

net/mlx5: Suspend auxiliary devices only in case of PCI device suspend

The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add
support for devlink-port in non-representors mode") correctly
re-instantiated uplink devlink port and related netdevice during devlink
reload. However with migration to auxiliary devices, this behaviour
changed.

Restore the original behaviour and tear down auxiliary devices
completely during devlink reload.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 82465bec 12-Oct-2021 Leon Romanovsky <leon@kernel.org>

devlink: Delete reload enable/disable interface

Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe/dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 64ea2d0e 25-Sep-2021 Leon Romanovsky <leon@kernel.org>

net/mlx5: Accept devlink user input after driver initialization complete

The change of devlink_alloc() to accept device makes sure that device
is fully initialized and device_register() does nothing except allowing
users to use that devlink instance.

Such change ensures that no user input will be usable till that point and
it eliminates the need to worry about internal locking as long as devlink_register
is called last since all accesses to the devlink are during initialization.

This change fixes the following lockdep warning.

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #27 Not tainted
------------------------------------------------------
devlink/265 is trying to acquire lock:
ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core]
but task is already holding lock:
ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:

-> #1 (devlink_mutex){+.+.}-{3:3}:
__mutex_lock+0x149/0x1310
devlink_register+0xe7/0x280
mlx5_devlink_register+0x118/0x480 [mlx5_core]
mlx5_init_one+0x34b/0x440 [mlx5_core]
probe_one+0x480/0x6e0 [mlx5_core]
pci_device_probe+0x2a0/0x4a0
really_probe+0x1cb/0xba0
__driver_probe_device+0x18f/0x470
driver_probe_device+0x49/0x120
__driver_attach+0x1ce/0x400
bus_for_each_dev+0x11e/0x1a0
bus_add_driver+0x309/0x570
driver_register+0x20f/0x390
0xffffffffa04a0062
do_one_initcall+0xd5/0x400
do_init_module+0x1c8/0x760
load_module+0x7d9d/0xa4b0
__do_sys_finit_module+0x118/0x1a0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
__lock_acquire+0x2999/0x5a40
lock_acquire+0x1a9/0x4a0
__mutex_lock+0x149/0x1310
mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
devlink_reload+0x1f2/0x640
devlink_nl_cmd_reload+0x6c3/0x10d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
genl_rcv_msg+0x27f/0x4a0
netlink_rcv_skb+0x11e/0x340
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
netlink_sendmsg+0x6fb/0xbe0
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x192/0x240
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(devlink_mutex);
lock(&dev->intf_state_mutex);
lock(devlink_mutex);
lock(&dev->intf_state_mutex);

*** DEADLOCK ***

3 locks held by devlink/265:
#0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
#1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0
#2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0

stack backtrace:
CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x45/0x59
check_noncircular+0x268/0x310
? print_circular_bug+0x460/0x460
? __kernel_text_address+0xe/0x30
? alloc_chain_hlocks+0x1e6/0x5a0
__lock_acquire+0x2999/0x5a40
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? add_lock_to_list.constprop.0+0x6c/0x530
lock_acquire+0x1a9/0x4a0
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? lock_release+0x6c0/0x6c0
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? lock_is_held_type+0x98/0x110
__mutex_lock+0x149/0x1310
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? lock_is_held_type+0x98/0x110
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
? find_held_lock+0x2d/0x110
? mutex_lock_io_nested+0x1160/0x1160
? mlx5_lag_is_active+0x72/0x90 [mlx5_core]
? lock_downgrade+0x6d0/0x6d0
? do_raw_spin_lock+0x12e/0x270
? rwlock_bug.part.0+0x90/0x90
? mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_unload_one+0x1e/0xa0 [mlx5_core]
mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core]
? netlink_broadcast_filtered+0x308/0xac0
? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core]
? __build_skb_around+0x110/0x2b0
? __alloc_skb+0x113/0x2b0
devlink_reload+0x1f2/0x640
? devlink_unregister+0x1e0/0x1e0
? security_capable+0x51/0x90
devlink_nl_cmd_reload+0x6c3/0x10d0
? devlink_nl_cmd_get_doit+0x1e0/0x1e0
? devlink_nl_pre_doit+0x72/0x8d0
genl_family_rcv_msg_doit+0x1e9/0x2f0
? __lock_acquire+0x15e2/0x5a40
? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
? mutex_lock_io_nested+0x1160/0x1160
? security_capable+0x51/0x90
genl_rcv_msg+0x27f/0x4a0
? genl_get_cmd+0x3c0/0x3c0
? lock_acquire+0x1a9/0x4a0
? devlink_nl_cmd_get_doit+0x1e0/0x1e0
? lock_release+0x6c0/0x6c0
netlink_rcv_skb+0x11e/0x340
? genl_get_cmd+0x3c0/0x3c0
? netlink_ack+0x930/0x930
genl_rcv+0x24/0x40
netlink_unicast+0x433/0x700
? netlink_attachskb+0x750/0x750
? __alloc_skb+0x113/0x2b0
netlink_sendmsg+0x6fb/0xbe0
? netlink_unicast+0x700/0x700
? netlink_unicast+0x700/0x700
sock_sendmsg+0xb0/0xe0
__sys_sendto+0x192/0x240
? __x64_sys_getpeername+0xb0/0xb0
? do_sys_openat2+0x10a/0x370
? down_write_nested+0x150/0x150
? do_user_addr_fault+0x215/0xd50
? __x64_sys_openat+0x11f/0x1d0
? __x64_sys_open+0x1a0/0x1a0
__x64_sys_sendto+0xdc/0x1b0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f50b50b6b3a
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a
RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003
RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440
mlx5_core 0000:00:09.0: firmware version: 4.8.9999
mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link)
mlx5_core 0000:00:09.0 eth1: Link up

Fixes: a6f3b62386a0 ("net/mlx5: Move devlink registration before interfaces load")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 919d13a7 08-Aug-2021 Leon Romanovsky <leon@kernel.org>

devlink: Set device as early as possible

All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6dea2f7e 02-Nov-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Separate probe vs. reload flows

The mix between probe/unprobe and reload flows causes to have an extra
mutex lock intf_state_mutex that generates LOCKDEP warning between it
and devlink_mutex. As a preparation for the future removal, separate
those flows.

Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b50c4892 10-Feb-2021 Wei Yongjun <weiyongjun1@huawei.com>

net/mlx5: SF, Fix error return code in mlx5_sf_dev_probe()

Fix to return negative error code -ENOMEM from the ioremap() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1958fc2f 11-Dec-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: SF, Add auxiliary device driver

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>