History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c
Revision Date Author Comments
# aa4ac90d 11-Apr-2024 Tariq Toukan <tariqt@nvidia.com>

net/mlx5: SD, Handle possible devcom ERR_PTR

Check if devcom holds an error pointer and return immediately.

This fixes Smatch static checker warning:
drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register()
error: 'devcom' dereferencing possible ERR_PTR()

Enhance mlx5_devcom_register_component() so it stops returning NULL,
making it easier for its callers.

Fixes: d3d057666090 ("net/mlx5: SD, Implement devcom communication and primary election")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9bb1ac80 21-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5: devcom, Add component size getter

Add a getter for the number of participants in a devcom
component (those who share the same component id and key).

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b430c1b4 12-Oct-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock

mlx5_intf_lock is used to sync between LAG changes and its slaves
mlx5 core dev aux devices changes, which means every time mlx5 core
dev add/remove aux devices, mlx5 is taking this global lock, even if
LAG functionality isn't supported over the core dev.
This cause a bottleneck when probing VFs/SFs in parallel.

Hence, replace mlx5_intf_lock with HCA devcom component lock, or no
lock if LAG functionality isn't supported.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e534552c 12-Oct-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom

LAG peer device lookout bus logic required the usage of global lock,
mlx5_intf_mutex.
As part of the effort to remove this global lock, refactor LAG peer
device lookout to use mlx5 devcom layer.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 89d351c2 12-Oct-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Avoid false positive lockdep warning by adding lock_class_key

Downstream patch will add devcom component which will be locked in
many places. This can lead to a false positive "possible circular
locking dependency" warning by lockdep, on flows which lock more than
one mlx5 devcom component, such as probing ETH aux device.
Hence, add a lock_class_key per mlx5 device.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7d7c6e8c 14-Aug-2023 Li Zetao <lizetao1@huawei.com>

net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event()

There is a warning reported by kernel test robot:

smatch warnings:
drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c:264
mlx5_devcom_send_event() warn: variable dereferenced before
IS_ERR check devcom (see line 259)

The reason for the warning is that the pointer is used before check, put
the assignment to comp after devcom check to silence the warning.

Fixes: 88d162b47981 ("net/mlx5: Devcom, Infrastructure changes")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/r/202308041028.AkXYDwJ6-lkp@intel.com/
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 88d162b4 03-May-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: Devcom, Infrastructure changes

Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6ec0b55e 06-Jun-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Enable 4 ports VF LAG

Now, after all preparation are done, enable 4 ports VF LAG

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e2a82bf8 06-Feb-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Devcom, extend mlx5_devcom_send_event to work with more than two devices

mlx5_devcom_send_event is used to send event from one eswitch to the
other. In other words, only one event is sent, which means, no error
mechanism is needed.
However, In case devcom have more than two eswitches, a proper error
mechanism is needed. Hence, in case of error, devcom will perform the
error unwind, since devcom knows how many events were successful.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 90ca127c 02-Jun-2023 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Devcom, introduce devcom_for_each_peer_entry

Introduce generic APIs which will retrieve all peers.
This API replace mlx5_devcom_get/release_peer_data which retrieve
only a single peer.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e67f928a 07-Feb-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Devcom, Rename paired to ready

In downstream patch devcom will provide support for more than two
devices. The term 'paired' will be renamed as 'ready' to convey a
more accurate meaning.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1f893f57 02-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Devcom, serialize devcom registration

From one hand, mlx5 driver is allowing to probe PFs in parallel.
From the other hand, devcom, which is a share resource between PFs, is
registered without any lock. This might resulted in memory problems.

Hence, use the global mlx5_dev_list_lock in order to serialize devcom
registration.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# af871943 02-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Devcom, fix error flow in mlx5_devcom_register_device

In case devcom allocation is failed, mlx5 is always freeing the priv.
However, this priv might have been allocated by a different thread,
and freeing it might lead to use-after-free bugs.
Fix it by freeing the priv only in case it was allocated by the
running thread.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 691c041b 31-Mar-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Fix deadlock in tc route query code

Cited commit causes ABBA deadlock[0] when peer flows are created while
holding the devcom rw semaphore. Due to peer flows offload implementation
the lock is taken much higher up the call chain and there is no obvious way
to easily fix the deadlock. Instead, since tc route query code needs the
peer eswitch structure only to perform a lookup in xarray and doesn't
perform any sleeping operations with it, refactor the code for lockless
execution in following ways:

- RCUify the devcom 'data' pointer. When resetting the pointer
synchronously wait for RCU grace period before returning. This is fine
since devcom is currently only used for synchronization of
pairing/unpairing of eswitches which is rare and already expensive as-is.

- Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has
already been used in some unlocked contexts without proper
annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't
an issue since all relevant code paths checked it again after obtaining the
devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as
"best effort" check to return NULL when devcom is being unpaired. Note that
while RCU read lock doesn't prevent the unpaired flag from being changed
concurrently it still guarantees that reader can continue to use 'data'.

- Refactor mlx5e_tc_query_route_vport() function to use new
mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.

[0]:

[ 164.599612] ======================================================
[ 164.600142] WARNING: possible circular locking dependency detected
[ 164.600667] 6.3.0-rc3+ #1 Not tainted
[ 164.601021] ------------------------------------------------------
[ 164.601557] handler1/3456 is trying to acquire lock:
[ 164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.603078]
but task is already holding lock:
[ 164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.604459]
which lock already depends on the new lock.

[ 164.605190]
the existing dependency chain (in reverse order) is:
[ 164.605848]
-> #1 (&comp->sem){++++}-{3:3}:
[ 164.606380] down_read+0x39/0x50
[ 164.606772] mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.607336] mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core]
[ 164.607914] mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core]
[ 164.608495] mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core]
[ 164.609063] mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core]
[ 164.609627] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.610175] mlx5e_configure_flower+0x952/0x1a20 [mlx5_core]
[ 164.610741] tc_setup_cb_add+0xd4/0x200
[ 164.611146] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.611661] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.612116] tc_new_tfilter+0x3fc/0xd20
[ 164.612516] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.612936] netlink_rcv_skb+0x54/0x100
[ 164.613339] netlink_unicast+0x190/0x250
[ 164.613746] netlink_sendmsg+0x245/0x4a0
[ 164.614150] sock_sendmsg+0x38/0x60
[ 164.614522] ____sys_sendmsg+0x1d0/0x1e0
[ 164.614934] ___sys_sendmsg+0x80/0xc0
[ 164.615320] __sys_sendmsg+0x51/0x90
[ 164.615701] do_syscall_64+0x3d/0x90
[ 164.616083] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.616568]
-> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}:
[ 164.617210] __lock_acquire+0x159e/0x26e0
[ 164.617638] lock_acquire+0xc2/0x2a0
[ 164.618018] __mutex_lock+0x92/0xcd0
[ 164.618401] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.618943] post_process_attr+0x153/0x2d0 [mlx5_core]
[ 164.619471] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[ 164.620021] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.620564] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[ 164.621125] tc_setup_cb_add+0xd4/0x200
[ 164.621531] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.622047] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.622500] tc_new_tfilter+0x3fc/0xd20
[ 164.622906] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.623324] netlink_rcv_skb+0x54/0x100
[ 164.623727] netlink_unicast+0x190/0x250
[ 164.624138] netlink_sendmsg+0x245/0x4a0
[ 164.624544] sock_sendmsg+0x38/0x60
[ 164.624919] ____sys_sendmsg+0x1d0/0x1e0
[ 164.625340] ___sys_sendmsg+0x80/0xc0
[ 164.625731] __sys_sendmsg+0x51/0x90
[ 164.626117] do_syscall_64+0x3d/0x90
[ 164.626502] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.626995]
other info that might help us debug this:

[ 164.627725] Possible unsafe locking scenario:

[ 164.628268] CPU0 CPU1
[ 164.628683] ---- ----
[ 164.629098] lock(&comp->sem);
[ 164.629421] lock(&esw->offloads.encap_tbl_lock);
[ 164.630066] lock(&comp->sem);
[ 164.630555] lock(&esw->offloads.encap_tbl_lock);
[ 164.630993]
*** DEADLOCK ***

[ 164.631575] 3 locks held by handler1/3456:
[ 164.631962] #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[ 164.632703] #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
[ 164.633552] #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.634435]
stack backtrace:
[ 164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1
[ 164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 164.636340] Call Trace:
[ 164.636616] <TASK>
[ 164.636863] dump_stack_lvl+0x47/0x70
[ 164.637217] check_noncircular+0xfe/0x110
[ 164.637601] __lock_acquire+0x159e/0x26e0
[ 164.637977] ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core]
[ 164.638472] lock_acquire+0xc2/0x2a0
[ 164.638828] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.639339] ? lock_is_held_type+0x98/0x110
[ 164.639728] __mutex_lock+0x92/0xcd0
[ 164.640074] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.640576] ? __lock_acquire+0x382/0x26e0
[ 164.640958] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.641468] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.641965] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.642454] ? lock_release+0xbf/0x240
[ 164.642819] post_process_attr+0x153/0x2d0 [mlx5_core]
[ 164.643318] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[ 164.643835] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.644340] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[ 164.644862] ? lock_acquire+0xc2/0x2a0
[ 164.645219] tc_setup_cb_add+0xd4/0x200
[ 164.645588] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.646067] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.646488] tc_new_tfilter+0x3fc/0xd20
[ 164.646861] ? tc_del_tfilter+0x810/0x810
[ 164.647236] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.647621] ? rtnl_setlink+0x160/0x160
[ 164.647982] netlink_rcv_skb+0x54/0x100
[ 164.648348] netlink_unicast+0x190/0x250
[ 164.648722] netlink_sendmsg+0x245/0x4a0
[ 164.649090] sock_sendmsg+0x38/0x60
[ 164.649434] ____sys_sendmsg+0x1d0/0x1e0
[ 164.649804] ? copy_msghdr_from_user+0x6d/0xa0
[ 164.650213] ___sys_sendmsg+0x80/0xc0
[ 164.650563] ? lock_acquire+0xc2/0x2a0
[ 164.650926] ? lock_acquire+0xc2/0x2a0
[ 164.651286] ? __fget_files+0x5/0x190
[ 164.651644] ? find_held_lock+0x2b/0x80
[ 164.652006] ? __fget_files+0xb9/0x190
[ 164.652365] ? lock_release+0xbf/0x240
[ 164.652723] ? __fget_files+0xd3/0x190
[ 164.653079] __sys_sendmsg+0x51/0x90
[ 164.653435] do_syscall_64+0x3d/0x90
[ 164.653784] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.654229] RIP: 0033:0x7f378054f8bd
[ 164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48
[ 164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd
[ 164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014
[ 164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c
[ 164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0
[ 164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540

Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8a6e75e5 26-Feb-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5: devcom only supports 2 ports

Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fadd59fc 04-Dec-2018 Aviv Heller <avivh@mellanox.com>

net/mlx5: Introduce inter-device communication mechanism

This introduces devcom, a generic mechanism for performing operations
on both physical functions of the same Connect-X card.

The first user of this API is merged eswitch, which will be introduced
in subsequent patches.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>