History log of /linux-master/drivers/net/netdevsim/dev.c
Revision Date Author Comments
# ba5e1272 01-Feb-2024 Eric Dumazet <edumazet@google.com>

netdevsim: avoid potential loop in nsim_dev_trap_report_work()

Many syzbot reports include the following trace [1]

If nsim_dev_trap_report_work() can not grab the mutex,
it should rearm itself at least one jiffie later.

[1]
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 32383 Comm: kworker/0:2 Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0x101/0x190 mm/kasan/generic.c:189
Code: 07 49 39 d1 75 0a 45 3a 11 b8 01 00 00 00 7c 0b 44 89 c2 e8 21 ed ff ff 83 f0 01 5b 5d 41 5c c3 48 85 d2 74 4f 48 01 ea eb 09 <48> 83 c0 01 48 39 d0 74 41 80 38 00 74 f2 eb b6 41 bc 08 00 00 00
RSP: 0018:ffffc90012dcf998 EFLAGS: 00000046
RAX: fffffbfff258af1e RBX: fffffbfff258af1f RCX: ffffffff8168eda3
RDX: fffffbfff258af1f RSI: 0000000000000004 RDI: ffffffff92c578f0
RBP: fffffbfff258af1e R08: 0000000000000000 R09: fffffbfff258af1e
R10: ffffffff92c578f3 R11: ffffffff8acbcbc0 R12: 0000000000000002
R13: ffff88806db38400 R14: 1ffff920025b9f42 R15: ffffffff92c578e8
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00994e078 CR3: 000000002c250000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_read include/linux/atomic/atomic-instrumented.h:32 [inline]
queued_spin_is_locked include/asm-generic/qspinlock.h:57 [inline]
debug_spin_unlock kernel/locking/spinlock_debug.c:101 [inline]
do_raw_spin_unlock+0x53/0x230 kernel/locking/spinlock_debug.c:141
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:150 [inline]
_raw_spin_unlock_irqrestore+0x22/0x70 kernel/locking/spinlock.c:194
debug_object_activate+0x349/0x540 lib/debugobjects.c:726
debug_work_activate kernel/workqueue.c:578 [inline]
insert_work+0x30/0x230 kernel/workqueue.c:1650
__queue_work+0x62e/0x11d0 kernel/workqueue.c:1802
__queue_delayed_work+0x1bf/0x270 kernel/workqueue.c:1953
queue_delayed_work_on+0x106/0x130 kernel/workqueue.c:1989
queue_delayed_work include/linux/workqueue.h:563 [inline]
schedule_delayed_work include/linux/workqueue.h:677 [inline]
nsim_dev_trap_report_work+0x9c0/0xc80 drivers/net/netdevsim/dev.c:842
process_one_work+0x886/0x15d0 kernel/workqueue.c:2633
process_scheduled_works kernel/workqueue.c:2706 [inline]
worker_thread+0x8b9/0x1290 kernel/workqueue.c:2787
kthread+0x2c6/0x3a0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>

Fixes: 012ec02ae441 ("netdevsim: convert driver to use unlocked devlink API during init/fini")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201175324.3752746-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f72207a5 11-Jul-2023 Dan Carpenter <dan.carpenter@linaro.org>

netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()

The simple_write_to_buffer() function is designed to handle partial
writes. It returns negatives on error, otherwise it returns the number
of bytes that were able to be copied. This code doesn't check the
return properly. We only know that the first byte is written, the rest
of the buffer might be uninitialized.

There is no need to use the simple_write_to_buffer() function.
Partial writes are prohibited by the "if (*ppos != 0)" check at the
start of the function. Just use memdup_user() and copy the whole
buffer.

Fixes: d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/7c1f950b-3a7d-4252-82a6-876e53078ef7@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fb8421a9 27-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: remove devlink features

Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.

However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.

Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 075935f0 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: protect devlink param list by instance lock

Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82a3aef2 05-Jan-2023 Jakub Kicinski <kuba@kernel.org>

netdevsim: move devlink registration under the instance lock

To prevent races with netdev code accessing free devlink instances
move the registration under the devlink instance lock.
Core now waits for the instance to be registered before accessing it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c5ea1d0 05-Jan-2023 Jakub Kicinski <kuba@kernel.org>

netdevsim: rename a label

err_dl_unregister should unregister the devlink instance.
Looks like renaming it was missed in one of the reshufflings.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 226bf980 29-Nov-2022 Vincent Mailhol <mailhol.vincent@wanadoo.fr>

net: devlink: let the core report the driver name instead of the drivers

The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.

In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.

Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f2fc15e2 15-Nov-2022 Michal Wilczynski <michal.wilczynski@intel.com>

devlink: Allow to set up parent in devl_rate_leaf_create()

Currently the driver is able to create leaf nodes for the devlink-rate,
but is unable to set parent for them. This wasn't as issue before the
possibility to export hierarchy from the driver. After adding the export
feature, in order for the driver to supply correct hierarchy, it's
necessary for it to be able to supply a parent name to
devl_rate_leaf_create().

Introduce a new parameter 'parent_name' in devl_rate_leaf_create().

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ac73d4bf 02-Nov-2022 Jiri Pirko <jiri@nvidia.com>

net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port

Benefit from the previously implemented tracking of netdev events in
devlink code and instead of calling devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port
pointer to netdevice which is about to be registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 064bc731 15-Nov-2022 Wang Yufen <wangyufen@huawei.com>

netdevsim: Fix memory leak of nsim_dev->fa_cookie

kmemleak reports this issue:

unreferenced object 0xffff8881bac872d0 (size 8):
comm "sh", pid 58603, jiffies 4481524462 (age 68.065s)
hex dump (first 8 bytes):
04 00 00 00 de ad be ef ........
backtrace:
[<00000000c80b8577>] __kmalloc+0x49/0x150
[<000000005292b8c6>] nsim_dev_trap_fa_cookie_write+0xc1/0x210 [netdevsim]
[<0000000093d78e77>] full_proxy_write+0xf3/0x180
[<000000005a662c16>] vfs_write+0x1c5/0xaf0
[<000000007aabf84a>] ksys_write+0xed/0x1c0
[<000000005f1d2e47>] do_syscall_64+0x3b/0x90
[<000000006001c6ec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

The issue occurs in the following scenarios:

nsim_dev_trap_fa_cookie_write()
kmalloc() fa_cookie
nsim_dev->fa_cookie = fa_cookie
..
nsim_drv_remove()

The fa_cookie allocked in nsim_dev_trap_fa_cookie_write() is not freed. To
fix, add kfree(nsim_dev->fa_cookie) to nsim_drv_remove().

Fixes: d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Link: https://lore.kernel.org/r/1668504625-14698-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a6aa8d0c 25-Oct-2022 Zhengchao Shao <shaozhengchao@huawei.com>

netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed

Remove dir in nsim_dev_debugfs_init() when creating ports dir failed.
Otherwise, the netdevsim device will not be created next time. Kernel
reports an error: debugfs: Directory 'netdevsim1' with parent 'netdevsim'
already present!

Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6b1da9f7 25-Oct-2022 Zhengchao Shao <shaozhengchao@huawei.com>

netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed

If some items in nsim_dev_resources_register() fail, memory leak will
occur. The following is the memory leak information.

unreferenced object 0xffff888074c02600 (size 128):
comm "echo", pid 8159, jiffies 4294945184 (age 493.530s)
hex dump (first 32 bytes):
40 47 ea 89 ff ff ff ff 01 00 00 00 00 00 00 00 @G..............
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
backtrace:
[<0000000011a31c98>] kmalloc_trace+0x22/0x60
[<0000000027384c69>] devl_resource_register+0x144/0x4e0
[<00000000a16db248>] nsim_drv_probe+0x37a/0x1260
[<000000007d1f448c>] really_probe+0x20b/0xb10
[<00000000c416848a>] __driver_probe_device+0x1b3/0x4a0
[<00000000077e0351>] driver_probe_device+0x49/0x140
[<0000000054f2465a>] __device_attach_driver+0x18c/0x2a0
[<000000008538f359>] bus_for_each_drv+0x151/0x1d0
[<0000000038e09747>] __device_attach+0x1c9/0x4e0
[<00000000dd86e533>] bus_probe_device+0x1d5/0x280
[<00000000839bea35>] device_add+0xae0/0x1cb0
[<000000009c2abf46>] new_device_store+0x3b6/0x5f0
[<00000000fb823d7f>] bus_attr_store+0x72/0xa0
[<000000007acc4295>] sysfs_kf_write+0x106/0x160
[<000000005f50cb4d>] kernfs_fop_write_iter+0x3a8/0x5a0
[<0000000075eb41bf>] vfs_write+0x8f0/0xc80

Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5603072e 31-Aug-2022 Jinpeng Cui <cui.jinpeng2@zte.com.cn>

netdevsim: remove redundant variable ret

Return value directly from nsim_dev_reload_create()
instead of getting value from redundant variable ret.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Link: https://lore.kernel.org/r/20220831154329.305372-1-cui.jinpeng2@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f94b6063 24-Aug-2022 Jiri Pirko <jiri@nvidia.com>

net: devlink: limit flash component name to match version returned by info_get()

Limit the acceptance of component name passed to cmd_flash_update() to
match one of the versions returned by info_get(), marked by version type.
This makes things clearer and enforces 1:1 mapping between exposed
version and accepted flash component.

Check VERSION_TYPE_COMPONENT version type during cmd_flash_update()
execution by calling info_get() with different "req" context.
That causes info_get() to lookup the component name instead of
filling-up the netlink message.

Remove "UPDATE_COMPONENT" flag which becomes used.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0c198975 24-Aug-2022 Jiri Pirko <jiri@nvidia.com>

netdevsim: add version fw.mgmt info info_get() and mark as a component

Fix the only component user which is netdevsim. It uses component named
"fw.mgmt" in selftests. So add this version to info_get() output with
version type component.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 644a66c6 29-Jul-2022 Jiri Pirko <jiri@nvidia.com>

net: devlink: convert reload command to take implicit devlink->lock

Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 012ec02a 16-Jul-2022 Jiri Pirko <jiri@nvidia.com>

netdevsim: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 14e426bf 18-Mar-2022 Jakub Kicinski <kuba@kernel.org>

devlink: hold the instance lock during eswitch_mode callbacks

Make the devlink core hold the instance lock during eswitch_mode
callbacks. Cheat in case of mlx5 (see the cover letter).

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aff3a925 18-Mar-2022 Jakub Kicinski <kuba@kernel.org>

netdevsim: replace vfs_lock with devlink instance lock

Similarly to the previous commit, use the devlink instance
lock and let it replace the vfs_lock.

nsim_esw_legacy_enable() was locked by both port lock and
vfs lock so one set of lock/unlocks goes away.

netdevsim's .eswitch_mode_set callback is now ready for
the callback to take the instance lock.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76eea6c2 18-Mar-2022 Jakub Kicinski <kuba@kernel.org>

netdevsim: replace port_list_lock with devlink instance lock

Take advantage of the devlink instance lock for protecting
the port list. This will simplify locking even more once
all devlink callbacks hold the instance lock.

We need to add locking in nsim_dev_port_add_all() which used
to assume higher layer protection when accessing the list.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a6d7ae7 14-Mar-2022 Petr Machata <petrm@nvidia.com>

netdevsim: Introduce support for L3 offload xstats

Add support for testing of HW stats support that was added recently, namely
the L3 stats support. L3 stats are provided for devices for which the L3
stats have been turned on, and that were enabled for netdevsim through a
debugfs toggle:

# echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/enable_ifindex

For fully enabled netdevices, netdevsim counts 10pps of ingress traffic and
20pps of egress traffic. Similarly, L3 stats can be disabled for a given
device, and netdevsim ceases pretending there is any HW traffic going on:

# echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/disable_ifindex

Besides this, there is a third toggle to mark a device for future failure:

# echo $ifindex > /sys/kernel/debug/netdevsim/$DEV/hwstats/l3/fail_next_enable

A future request to enable L3 stats on such netdevice will be bounced by
netdevsim:

# ip -j l sh dev d | jq '.[].ifindex'
66
# echo 66 > /sys/kernel/debug/netdevsim/netdevsim10/hwstats/l3/enable_ifindex
# echo 66 > /sys/kernel/debug/netdevsim/netdevsim10/hwstats/l3/fail_next_enable
# ip stats set dev d l3_stats on
Error: netdevsim: Stats enablement set to fail.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 4c897cfc 29-Nov-2021 Leon Romanovsky <leon@kernel.org>

devlink: Simplify devlink resources unregister call

The devlink_resources_unregister() used second parameter as an
entry point for the recursive removal of devlink resources. None
of the callers outside of devlink core needed to use this field,
so let's remove it.

As part of this removal, the "struct devlink_resource" was moved
from .h to .c file as it is not possible to use in any place in
the code except devlink.c.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 047304d0 01-Nov-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: fix uninit value in nsim_drv_configure_vfs()

Build bot points out that I missed initializing ret
after refactoring.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1c401078bcf3 ("netdevsim: move details of vf config to dev")
Link: https://lore.kernel.org/r/20211101221845.3188490-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a66f64b8 30-Oct-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: rename 'driver' entry points

Rename functions serving as driver entry points
from nsim_dev_... to nsim_drv_... this makes the
API boundary between bus and dev clearer.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3353ec3 30-Oct-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: move max vf config to dev

max_vfs is a strange little beast because the file
hangs off of nsim's debugfs, but it configures a field
in the bus device. Move it to dev.c, let's look at it
as if the device driver was imposing VF limit based
on FW info (like pci_sriov_set_totalvfs()).

Again, when moving refactor the function not to hold
the vfs lock pointlessly while parsing the input.
Wrap the access from the read side in READ_ONCE()
to appease concurrency checkers. Do not check if
return value from snprintf() is negative...

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c401078 30-Oct-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: move details of vf config to dev

Since "eswitch" configuration was added bus.c contains
a lot of device details which really belong to dev.c.

Restructure the code while moving it.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e388f3d 30-Oct-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: move vfconfig to nsim_dev

When netdevsim got split into the faux bus vfconfig ended
up in the bus device (think pci_dev) which is strange because
it contains very networky not to say netdevy information.
Move it to nsim_dev, which is the driver "priv" structure
for the device.

To make sure we don't race with probe/remove take
the device lock (much like PCI).

While at it remove the NULL-checking of vfconfigs.
It appears to be pointless.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba064e4c 28-Oct-2021 Jakub Kicinski <kuba@kernel.org>

netdevsim: remove max_vfs dentry

Commit d395381909a3 ("netdevsim: Add max_vfs to bus_dev")
added this file and saved the dentry for no apparent reason.

Link: https://lore.kernel.org/r/20211028211753.22612-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 82465bec 12-Oct-2021 Leon Romanovsky <leon@kernel.org>

devlink: Delete reload enable/disable interface

Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe/dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bd032e35 12-Oct-2021 Leon Romanovsky <leon@kernel.org>

devlink: Allow control devlink ops behavior through feature mask

Introduce new devlink call to set feature mask to control devlink
behavior during device initialization phase after devlink_alloc()
is already called.

This allows us to set reload ops based on device property which
is not known at the beginning of driver initialization.

For the sake of simplicity, this API lacks any type of locking and
needs to be called before devlink_register() to make sure that no
parallel access to the ops is possible at this stage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 71c1b525 25-Sep-2021 Leon Romanovsky <leon@kernel.org>

netdevsim: Move devlink registration to be last devlink command

This change prevents from users to access device before devlink is
fully configured.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db4278c5 22-Sep-2021 Leon Romanovsky <leon@kernel.org>

devlink: Make devlink_register to be void

devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.

Make devlink_register() void and simplify the drivers that use that
API call.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 919d13a7 08-Aug-2021 Leon Romanovsky <leon@kernel.org>

devlink: Set device as early as possible

All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c0418ed 05-Aug-2021 Leon Romanovsky <leon@kernel.org>

netdevsim: Protect both reload_down and reload_up paths

Don't progress with adding and deleting ports as long as devlink
reload is running.

Fixes: 23809a726c0d ("netdevsim: Forbid devlink reload when adding or deleting ports")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23809a72 05-Aug-2021 Leon Romanovsky <leon@kernel.org>

netdevsim: Forbid devlink reload when adding or deleting ports

In order to remove complexity in devlink core related to
devlink_reload_enable/disable, let's rewrite new_port/del_port
logic to rely on internal to netdevsim lcok.

We should protect only reload_down flow because it destroys nsim_dev,
which is needed for nsim_dev_port_add/nsim_dev_port_del to hold
port_list_lock.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26713455 29-Jul-2021 Leon Romanovsky <leon@kernel.org>

devlink: Allocate devlink directly in requested net namespace

There is no need in extra call indirection and check from impossible
flow where someone tries to set namespace without prior call
to devlink_alloc().

Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
devlink allocation function that receives net namespace as an argument.

Such specialized API allows clear view when devlink initialized in wrong
net namespace and/or kernel users don't try to change devlink namespace
under the hood.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 275b51c2 17-Jun-2021 Oleksandr Mazur <oleksandr.mazur@plvision.eu>

drivers: net: netdevsim: fix devlink_trap selftests failing

devlink_trap tests for the netdevsim fail due to misspelled
debugfs file name. Change this name, as well as name of callback
function, to match the naming as in the devlink itself - 'trap_drop_counter'.

Test-results:
selftests: drivers/net/netdevsim: devlink_trap.sh
TEST: Initialization [ OK ]
TEST: Trap action [ OK ]
TEST: Trap metadata [ OK ]
TEST: Non-existing trap [ OK ]
TEST: Non-existing trap action [ OK ]
TEST: Trap statistics [ OK ]
TEST: Trap group action [ OK ]
TEST: Non-existing trap group [ OK ]
TEST: Trap group statistics [ OK ]
TEST: Trap policer [ OK ]
TEST: Trap policer binding [ OK ]
TEST: Port delete [ OK ]
TEST: Device delete [ OK ]

Fixes: a7b3527a43fe ("drivers: net: netdevsim: add devlink trap_drop_counter_get implementation")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7b3527a 14-Jun-2021 Oleksandr Mazur <oleksandr.mazur@plvision.eu>

drivers: net: netdevsim: add devlink trap_drop_counter_get implementation

Whenever query statistics is issued for trap with DROP action,
devlink subsystem would also fill-in statistics 'dropped' field.
In case if device driver did't register callback for hard drop
statistics querying, 'dropped' field will be omitted and not filled.
Add trap_drop_counter_get callback implementation to the netdevsim.
Add new test cases for netdevsim, to test both the callback
functionality, as well as drop statistics alteration check.

Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e744cb8 08-Jun-2021 Dan Carpenter <dan.carpenter@oracle.com>

netdevsim: delete unnecessary debugfs checking

In normal situations where the driver doesn't dereference
"nsim_node->ddir" or "nsim_node->rate_parent" itself then we are not
supposed to check the return from debugfs functions. In the case of
debugfs_create_dir() the check was wrong as well because it doesn't
return NULL, it returns error pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f3d101b4 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Allow setting parent node of rate objects

Implement new devlink ops that allow setting rate node as a parent for
devlink port (leaf) or another devlink node through devlink API.
Expose parent names to netdevsim debugfs in read only mode.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 885226f5 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Implement support for devlink rate nodes

Implement new devlink ops that allow creation, deletion and setting of
shared/max tx rate of devlink rate nodes through devlink API.
Expose rate node and it's tx rates to netdevsim debugfs.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 605c4f8f 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Implement devlink rate leafs tx rate support

Implement new devlink ops that allow shared and max tx rate control for
devlink port rate objects (leafs) through devlink API.

Expose rate values of VF ports to netdevsim debugfs.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 885dfe12 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Register devlink rate leaf objects per VF

Register devlink rate leaf objects per VF.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 160dc373 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Implement legacy/switchdev mode for VFs

Implement callbacks to set/get eswitch mode value. Add helpers to check
current mode.

Instantiate VFs' net devices and devlink ports on switchdev enabling and
remove them on legacy enabling. Changing number of VFs while in
switchdev mode triggers VFs creation/deletion.

Also disable NDO API callback to set VF rate, since it's legacy API.
Switchdev API to set VF rate will be implemented in one of the next
patches.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92ba1f29 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Implement VFs

Allow creation of netdevsim ports for VFs along with allocations of
corresponding net devices and devlink ports.
Add enums and helpers to distinguish PFs' ports from VFs' ports.

Ports creation/deletion debugfs API intended to be used with physical
ports only.
VFs instantiation will be done in one of the next patches.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 814b9ce6 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Implement port types and indexing

Define type of ports, which netdevsim driver currently operates with as
PF. Define new port type - VF, which will be implemented in following
patches. Add helper functions to distinguish them. Add helper function
to get VF index from port index.

Add port indexing logic where PFs' indexes starts from 0, VFs' - from
NSIM_DEV_VF_PORT_INDEX_BASE.
All ports uses same index pool, which means that PF port may be created
with index from VFs' indexes range.
Maximum number of VFs, which the driver can allocate, is limited by
UINT_MAX - BASE.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32ac15d8 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Disable VFs on nsim_dev_reload_destroy() call

Move VFs disabling from device release() to nsim_dev_reload_destroy() to
make VFs disabling and ports removal simultaneous.
This is a requirement for VFs ports implemented in next patches.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3953819 02-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

netdevsim: Add max_vfs to bus_dev

Currently there is no limit to the number of VFs netdevsim can enable.
In a real systems this value exist and used by the driver.
Fore example, some features might need to consider this value when
allocating memory.

Expose max_vfs variable to debugfs as configurable resource. If are VFs
configured (num_vfs != 0) then changing of max_vfs not allowed.

Co-developed-by: Yuval Avnery <yuvalav@nvidia.com>
Signed-off-by: Yuval Avnery <yuvalav@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8700c3d 14-Mar-2021 Ido Schimmel <idosch@nvidia.com>

netdevsim: Add dummy psample implementation

Allow netdevsim to report "sampled" packets to the psample module by
periodically generating packets from a work queue. The behavior can be
enabled / disabled (default) and the various meta data attributes can be
controlled via debugfs knobs.

This implementation enables both testing of the psample module with all
the optional attributes as well as development of user space
applications on top of psample such as hsflowd and a Wireshark dissector
for psample generic netlink packets.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f57ab5b7 07-Feb-2021 Ido Schimmel <idosch@nvidia.com>

netdevsim: dev: Initialize FIB module after debugfs

Initialize the dummy FIB offload module after debugfs, so that the FIB
module could create its own directory there.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 52cc5f3a 18-Nov-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: move flash end and begin to core devlink

When performing a flash update via devlink, device drivers may inform
user space of status updates via
devlink_flash_update_(begin|end|timeout|status)_notify functions.

It is expected that drivers do not send any status notifications unless
they send a begin and end message. If a driver sends a status
notification without sending the appropriate end notification upon
finishing (regardless of success or failure), the current implementation
of the devlink userspace program can get stuck endlessly waiting for the
end notification that will never come.

The current ice driver implementation may send such a status message
without the appropriate end notification in rare cases.

Fixing the ice driver is relatively simple: we just need to send the
begin_notify at the start of the function and always send an end_notify
no matter how the function exits.

Rather than assuming driver authors will always get this right in the
future, lets just fix the API so that it is not possible to get wrong.
Make devlink_flash_update_begin_notify and
devlink_flash_update_end_notify static, and call them in devlink.c core
code. Always send the begin_notify just before calling the driver's
flash_update routine. Always send the end_notify just after the routine
returns regardless of success or failure.

Doing this makes the status notification easier to use from the driver,
as it no longer needs to worry about catching failures and cleaning up
by calling devlink_flash_update_end_notify. It is now no longer possible
to do the wrong thing in this regard. We also save a couple of lines of
code in each driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a5bbcbf2 15-Nov-2020 Taehee Yoo <ap420073@gmail.com>

netdevsim: set .owner to THIS_MODULE

If THIS_MODULE is not set, the module would be removed while debugfs is
being used.
It eventually makes kernel panic.

Fixes: 82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters")
Fixes: 424be63ad831 ("netdevsim: add UDP tunnel port offload support")
Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Fixes: d3cbb907ae57 ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201115103041.30701-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 35266255 04-Nov-2020 Ido Schimmel <idosch@nvidia.com>

netdevsim: Add devlink resource for nexthops

The Spectrum ASIC has a dedicated table where nexthops (i.e., adjacency
entries) are populated. The size of this table can be controlled via
devlink-resource.

Add such a resource to netdevsim so that its occupancy will reflect the
number of nexthop objects currently programmed to the device.

By limiting the size of the resource, error paths could be exercised and
tested.

Example output:

# devlink resource show netdevsim/netdevsim10
netdevsim/netdevsim10:
name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 1 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 2 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name nexthops size unlimited occ 0 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# dc64cc7c 07-Oct-2020 Moshe Shemesh <moshe@mellanox.com>

devlink: Add devlink reload limit option

Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.

By default reload limit is unspecified and so no constraints on reload
actions are required.

Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.

The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.

Have the uapi parameter of reload limit ready for future support of
multiselection.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ccdf0721 07-Oct-2020 Moshe Shemesh <moshe@mellanox.com>

devlink: Add reload action option to devlink reload command

Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.

command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cbb58368 25-Sep-2020 Jacob Keller <jacob.e.keller@intel.com>

netdevsim: add support for flash_update overwrite mask

The devlink interface recently gained support for a new "overwrite mask"
parameter that allows specifying how various sub-sections of a flash
component are modified when updating.

Add support for this to netdevsim, to enable easily testing the
interface. Make the allowed overwrite mask values controllable via
a debugfs parameter. This enables testing a flow where the driver
rejects an unsupportable overwrite mask.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc75c054 25-Sep-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: convert flash_update to use params structure

The devlink core recently gained support for checking whether the driver
supports a flash_update parameter, via `supported_flash_update_params`.
However, parameters are specified as function arguments. Adding a new
parameter still requires modifying the signature of the .flash_update
callback in all drivers.

Convert the .flash_update function to take a new `struct
devlink_flash_update_params` instead. By using this structure, and the
`supported_flash_update_params` bit field, a new parameter to
flash_update can be added without requiring modification to existing
drivers.

As before, all parameters except file_name will require driver opt-in.
Because file_name is a necessary field to for the flash_update to make
sense, no "SUPPORTED" bitflag is provided and it is always considered
valid. All future additional parameters will require a new bit in the
supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22ec3d23 25-Sep-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: check flash_update parameter support in net core

When implementing .flash_update, drivers which do not support
per-component update are manually checking the component parameter to
verify that it is NULL. Without this check, the driver might accept an
update request with a component specified even though it will not honor
such a request.

Instead of having each driver check this, move the logic into
net/core/devlink.c, and use a new `supported_flash_update_params` field
in the devlink_ops. Drivers which will support per-component update must
now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in
the supported_flash_update_params in their devlink_ops.

This helps ensure that drivers do not forget to check for a NULL
component if they do not support per-component update. This also enables
a slightly better error message by enabling the core stack to set the
netlink bad attribute message to indicate precisely the unsupported
attribute in the message.

Going forward, any new additional parameter to flash update will require
a bit in the supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Cc: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4602a9f 18-Sep-2020 Andrew Lunn <andrew@lunn.ch>

net: devlink: region: Pass the region ops to the snapshot function

Pass the region to be snapshotted to the function performing the
snapshot. This allows one function to operate on numerous regions.

v4:
Add missing kerneldoc for ICE

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b311b001 17-Sep-2020 Shannon Nelson <snelson@pensando.io>

netdevsim: devlink flash timeout message

Add a simple devlink flash timeout message to exercise
the message mechanism.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c88e11e0 03-Aug-2020 Ido Schimmel <idosch@mellanox.com>

devlink: Pass extack when setting trap's action and group's parameters

A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 424be63a 09-Jul-2020 Jakub Kicinski <kuba@kernel.org>

netdevsim: add UDP tunnel port offload support

Add UDP tunnel port handlers to our fake driver so we can test
the core infra.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 71ad8d55 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

devlink: Replace devlink_port_attrs_set parameters with a struct

Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 18979367 29-May-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Register control traps

Register two control traps with devlink. The existing selftest at
tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
over all registered traps and checks that the action of non-drop traps
cannot be changed. Up until now only exception traps were tested, now
control traps will be tested as well.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85176f19 29-May-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Move layer 3 exceptions to exceptions trap group

The layer 3 exceptions are still subject to the same trap policer, so
nothing changes, but user space can choose to assign a different one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be43224f 21-May-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Ensure policer drop counter always increases

In case the policer drop counter is retrieved when the jiffies value is
a multiple of 64, the counter will not be incremented.

This randomly breaks a selftest [1] the reads the counter twice and
checks that it was incremented:

```
TEST: Trap policer [FAIL]
Policer drop counter was not incremented
```

Fix by always incrementing the counter by 1.

[1] tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh

Fixes: ad188458d012 ("netdevsim: Add devlink-trap policer support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3902baf9 30-Mar-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write

In case memory resources for dummy_data were allocated, release them
before return.

Addresses-Coverity-ID: 1491997 ("Resource leak")
Fixes: 7ef19d3b1d5e ("devlink: report error once U32_MAX snapshot ids have been used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0dc8249a 30-Mar-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Add support for setting of packet trap group parameters

Add a dummy callback to set trap group parameters. Return an error when
the 'fail_trap_group_set' debugfs file is set in order to exercise error
paths and verify that error is propagated to user space when should.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f9f54392 30-Mar-2020 Ido Schimmel <idosch@mellanox.com>

devlink: Add packet trap group parameters support

Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.

In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.

Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.

The next patch will enable user space to change this default binding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad188458 30-Mar-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Add devlink-trap policer support

Register three dummy packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.

This will be used later on in the series to test the devlink-trap
policer infrastructure.

v2:
* Remove check about burst size being a power of 2 and instead add a
debugfs knob to fail the operation
* Provide max/min rate/burst size when registering policers and remove
the validity checks from nsim_dev_devlink_trap_policer_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fe0fd53 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

netdevsim: support taking immediate snapshot via devlink

Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 12102436 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: track snapshot id usage count using an xarray

Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.

There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.

The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.

The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.

Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.

By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.

With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.

This work also paves the way to introduce a mechanism for userspace to
request a snapshot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ef19d3b 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: report error once U32_MAX snapshot ids have been used

The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.

A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.

This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.

Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0a09f6b 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: convert snapshot destructor callback to region op

It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.

This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.

Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e8937681 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: prepare to support region operations

Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 107f1678 22-Mar-2020 Ido Schimmel <idosch@mellanox.com>

devlink: Only pass packet trap group identifier in trap structure

Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.

Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b29545d8 22-Mar-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: Explicitly register packet trap groups

Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3cbb907 25-Feb-2020 Jiri Pirko <jiri@mellanox.com>

netdevsim: add ACL trap reporting cookie as a metadata

Add new trap ACL which reports flow action cookie in a metadata. Allow
used to setup the cookie using debugfs file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a2e106c 25-Feb-2020 Jiri Pirko <jiri@mellanox.com>

devlink: extend devlink_trap_report() to accept cookie and pass

Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34611e69 04-Feb-2020 kbuild test robot <lkp@intel.com>

netdevsim: fix ptr_ret.cocci warnings

drivers/net/netdevsim/dev.c:937:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 6556ff32f12d ("netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs")
CC: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6556ff32 01-Feb-2020 Taehee Yoo <ap420073@gmail.com>

netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs

Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6fb8852b 01-Feb-2020 Taehee Yoo <ap420073@gmail.com>

netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()

When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.

Test commands:
modprobe netdevsim
echo "1000000000 0" > /sys/bus/netdevsim/new_device

Splat looks like:
[ 249.622710][ T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[ 249.623658][ T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[ 249.624521][ T900]
[ 249.624830][ T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[ 249.625691][ T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 249.626712][ T900] Call Trace:
[ 249.627103][ T900] dump_stack+0x96/0xdb
[ 249.627639][ T900] ? number+0x824/0x880
[ 249.628173][ T900] print_address_description.constprop.5+0x1be/0x360
[ 249.629022][ T900] ? number+0x824/0x880
[ 249.629569][ T900] ? number+0x824/0x880
[ 249.630105][ T900] __kasan_report+0x12a/0x170
[ 249.630717][ T900] ? number+0x824/0x880
[ 249.631201][ T900] kasan_report+0xe/0x20
[ 249.631723][ T900] number+0x824/0x880
[ 249.632235][ T900] ? put_dec+0xa0/0xa0
[ 249.632716][ T900] ? rcu_read_lock_sched_held+0x90/0xc0
[ 249.633392][ T900] vsnprintf+0x63c/0x10b0
[ 249.633983][ T900] ? pointer+0x5b0/0x5b0
[ 249.634543][ T900] ? mark_lock+0x11d/0xc40
[ 249.635200][ T900] sprintf+0x9b/0xd0
[ 249.635750][ T900] ? scnprintf+0xe0/0xe0
[ 249.636370][ T900] nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8526ad96 01-Feb-2020 Taehee Yoo <ap420073@gmail.com>

netdevsim: fix panic in nsim_dev_take_snapshot_write()

nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.

There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().

2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.

Test commands:
modprobe netdevsim
while :
do
echo 1 > /sys/bus/netdevsim/new_device &
echo 1 > /sys/bus/netdevsim/del_device &
devlink dev reload netdevsim/netdevsim1 &
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
done

Splat looks like:
[ 45.564513][ T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[ 45.566131][ T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[ 45.566135][ T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[ 45.569020][ T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 45.569026][ T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 45.570518][ T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 45.570522][ T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[ 45.572305][ T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 45.572308][ T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[ 45.576843][ T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[ 45.576847][ T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[ 45.578471][ T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[ 45.578475][ T975] FS: 00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[ 45.581492][ T975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.582942][ T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[ 45.584543][ T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.586633][ T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.589889][ T975] Call Trace:
[ 45.591445][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.601250][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.602817][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.603875][ T975] ? mark_held_locks+0xa5/0xe0
[ 45.604769][ T975] ? _raw_spin_unlock_irqrestore+0x2d/0x50
[ 45.606147][ T975] ? __mutex_unlock_slowpath+0xd0/0x670
[ 45.607723][ T975] ? crng_backtrack_protect+0x80/0x80
[ 45.613530][ T975] ? wait_for_completion+0x390/0x390
[ 45.615152][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.616834][ T975] devlink_region_snapshot_create+0x55/0x4a0
[ ... ]

Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b0efcae5 09-Jan-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: correct misspelling of snapshot

The function to obtain a unique snapshot id was mistakenly typo'd as
devlink_region_shapshot_id_get. Fix this typo by renaming the function
and all of its users.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04e4272c 09-Jan-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: rename and expand devlink-trap-netdevsim.rst

Rename the trap-specific netdevimsim.rst file, and expand it to include
documentation of all the devlink features currently implemented by the
netdevsim driver code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f4bdd710 09-Jan-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: move devlink documentation to subfolder

Combine the documentation for devlink into a subfolder, and provide an
index.rst file that can be used to generally describe devlink.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a508a25 09-Nov-2019 Jiri Pirko <jiri@mellanox.com>

devlink: disallow reload operation during device cleanup

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/bind
devlink dev reload pci/0000:00:10.0 &
echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/unbind
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0c76345 08-Nov-2019 Jiri Pirko <jiri@mellanox.com>

devlink: disallow reload operation during device cleanup

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
echo 10 > /sys/bus/netdevsim/new_device
devlink dev reload netdevsim/netdevsim10 &
echo 10 > /sys/bus/netdevsim/del_device
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bfcccfe7 05-Nov-2019 Jakub Kicinski <kuba@kernel.org>

netdevsim: drop code duplicated by a merge

Looks like the port adding loop makes a re-appearance on net-next
after net was merged back into it (even though it doesn't feature
in the merge diff).

The ports are already added in nsim_dev_create() so when we try
to add them again get EEXIST, and see:

netdevsim: probe of netdevsim0 failed with error -17

in the logs. When we remove the loop again the nsim_dev_probe()
and nsim_dev_remove() become a wrapper of nsim_dev_create() and
nsim_dev_destroy(). Remove this layer of indirection.

Fixes: d31e95585ca6 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d6f0383 31-Oct-2019 Ido Schimmel <idosch@mellanox.com>

netdevsim: Fix use-after-free during device dismantle

Commit da58f90f11f5 ("netdevsim: Add devlink-trap support") added
delayed work to netdevsim that periodically iterates over the registered
netdevsim ports and reports various packet traps via devlink.

While the delayed work takes the 'port_list_lock' mutex to protect
against concurrent addition / deletion of ports, during device creation
/ dismantle ports are added / deleted without this lock, which can
result in a use-after-free [1].

Fix this by making sure that the ports list is always modified under the
lock.

[1]
[ 59.205543] ==================================================================
[ 59.207748] BUG: KASAN: use-after-free in nsim_dev_trap_report_work+0xa67/0xad0
[ 59.210247] Read of size 8 at addr ffff8883cbdd3398 by task kworker/3:1/38
[ 59.212584]
[ 59.213148] CPU: 3 PID: 38 Comm: kworker/3:1 Not tainted 5.4.0-rc3-custom-16119-ge6abb5f0261e #2013
[ 59.215896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 59.218384] Workqueue: events nsim_dev_trap_report_work
[ 59.219428] Call Trace:
[ 59.219924] dump_stack+0xa9/0x10e
[ 59.220623] print_address_description.constprop.4+0x21/0x340
[ 59.221976] ? vprintk_func+0x66/0x240
[ 59.222752] __kasan_report.cold.8+0x78/0x91
[ 59.223602] ? nsim_dev_trap_report_work+0xa67/0xad0
[ 59.224603] kasan_report+0xe/0x20
[ 59.225296] nsim_dev_trap_report_work+0xa67/0xad0
[ 59.226435] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.227512] ? trace_event_raw_event_rcu_quiescent_state_report+0x360/0x360
[ 59.228851] process_one_work+0x98f/0x1760
[ 59.229684] ? pwq_dec_nr_in_flight+0x330/0x330
[ 59.230656] worker_thread+0x91/0xc40
[ 59.231587] ? process_one_work+0x1760/0x1760
[ 59.232451] kthread+0x34a/0x410
[ 59.233104] ? __kthread_queue_delayed_work+0x240/0x240
[ 59.234141] ret_from_fork+0x3a/0x50
[ 59.234982]
[ 59.235371] Allocated by task 187:
[ 59.236189] save_stack+0x19/0x80
[ 59.236853] __kasan_kmalloc.constprop.5+0xc1/0xd0
[ 59.237822] kmem_cache_alloc_trace+0x14c/0x380
[ 59.238769] __nsim_dev_port_add+0xaf/0x5c0
[ 59.239627] nsim_dev_probe+0x4fc/0x1140
[ 59.240550] really_probe+0x264/0xc00
[ 59.241418] driver_probe_device+0x208/0x2e0
[ 59.242255] __device_attach_driver+0x215/0x2d0
[ 59.243150] bus_for_each_drv+0x154/0x1d0
[ 59.243944] __device_attach+0x1ba/0x2b0
[ 59.244923] bus_probe_device+0x1dd/0x290
[ 59.245805] device_add+0xbac/0x1550
[ 59.246528] new_device_store+0x1f4/0x400
[ 59.247306] bus_attr_store+0x7b/0xa0
[ 59.248047] sysfs_kf_write+0x10f/0x170
[ 59.248941] kernfs_fop_write+0x283/0x430
[ 59.249843] __vfs_write+0x81/0x100
[ 59.250546] vfs_write+0x1ce/0x510
[ 59.251190] ksys_write+0x104/0x200
[ 59.251873] do_syscall_64+0xa4/0x4e0
[ 59.252642] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.253837]
[ 59.254203] Freed by task 187:
[ 59.254811] save_stack+0x19/0x80
[ 59.255463] __kasan_slab_free+0x125/0x170
[ 59.256265] kfree+0x100/0x440
[ 59.256870] nsim_dev_remove+0x98/0x100
[ 59.257651] nsim_bus_remove+0x16/0x20
[ 59.258382] device_release_driver_internal+0x20b/0x4d0
[ 59.259588] bus_remove_device+0x2e9/0x5a0
[ 59.260551] device_del+0x410/0xad0
[ 59.263777] device_unregister+0x26/0xc0
[ 59.264616] nsim_bus_dev_del+0x16/0x60
[ 59.265381] del_device_store+0x2d6/0x3c0
[ 59.266295] bus_attr_store+0x7b/0xa0
[ 59.267192] sysfs_kf_write+0x10f/0x170
[ 59.267960] kernfs_fop_write+0x283/0x430
[ 59.268800] __vfs_write+0x81/0x100
[ 59.269551] vfs_write+0x1ce/0x510
[ 59.270252] ksys_write+0x104/0x200
[ 59.270910] do_syscall_64+0xa4/0x4e0
[ 59.271680] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.272812]
[ 59.273211] The buggy address belongs to the object at ffff8883cbdd3200
[ 59.273211] which belongs to the cache kmalloc-512 of size 512
[ 59.275838] The buggy address is located 408 bytes inside of
[ 59.275838] 512-byte region [ffff8883cbdd3200, ffff8883cbdd3400)
[ 59.278151] The buggy address belongs to the page:
[ 59.279215] page:ffffea000f2f7400 refcount:1 mapcount:0 mapping:ffff8883ecc0ce00 index:0x0 compound_mapcount: 0
[ 59.281449] flags: 0x200000000010200(slab|head)
[ 59.282356] raw: 0200000000010200 ffffea000f2f3a08 ffffea000f2fd608 ffff8883ecc0ce00
[ 59.283949] raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[ 59.285608] page dumped because: kasan: bad access detected
[ 59.286981]
[ 59.287337] Memory state around the buggy address:
[ 59.288310] ffff8883cbdd3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.289763] ffff8883cbdd3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.291452] >ffff8883cbdd3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.292945] ^
[ 59.293815] ffff8883cbdd3400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.295220] ffff8883cbdd3480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.296872] ==================================================================

Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: syzbot+9ed8f68ab30761f3678e@syzkaller.appspotmail.com
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82c93a87 10-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement couple of testing devlink health reporters

Implement "empty" and "dummy" reporters. The first one is really simple
and does nothing. The other one has debugfs files to trigger breakage
and it is able to do recovery. The ops also implement dummy fmsg
content.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f9867b51 08-Oct-2019 Colin Ian King <colin.king@canonical.com>

netdevsim: fix spelling mistake "forbidded" -> "forbid"

There is a spelling mistake in a NL_SET_ERR_MSG_MOD message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# 8e23cc03 07-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement devlink dev_info op

Do simple dev_info devlink operation implementation which only fills up
the driver name.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 155ddfc5 06-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: add couple of debugfs bools to debug devlink reload

Add flag to disallow reload and another one that causes reload to
always fail.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7b60027b 05-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: create devlink and netdev instances in namespace

When user does create new netdevsim instance using sysfs bus file,
create the devlink instance and related netdev instance in the namespace
of the caller.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 070c63f2 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: allow to change namespaces during reload

All devlink instances are created in init_net and stay there for a
lifetime. Allow user to be able to move devlink instances into
namespaces during devlink reload operation. That ensures proper
re-instantiation of driver objects, including netdevices.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75ba029f 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement proper devlink reload

During devlink reload, all driver objects should be reinstantiated with
the exception of devlink instance and devlink resources and params.
Move existing devlink_resource_size_get() calls into fib_create() just
before fib notifier is registered. Also, make sure that extack is
propagated down to fib_notifier_register() call.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f36a77a 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: add all ports in nsim_dev_create() and del them in destroy()

Currently the probe/remove function does this separately. Put the
addition an deletion of ports into nsim_dev_create() and
nsim_dev_destroy().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5facc4c 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: change fib accounting and limitations to be per-device

Currently, the accounting is done per-namespace. However, devlink
instance is always in init_net namespace for now, so only the accounting
related to init_net is used. Limitations set using devlink resources
are only considered for init_net. nsim_devlink_net() always
returns init_net always.

Make the accounting per-device. This brings no functional change.
Per-device accounting has the same values as per-net.
For a single netdevsim instance, the behaviour is exactly the same
as before. When multiple netdevsim instances are created, each
can have different limits.

This is in prepare to implement proper devlink netns support. After
that, the devlink instance which would exist in particular netns would
account and limit that netns.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58a406de 15-Sep-2019 Ido Schimmel <idosch@mellanox.com>

netdevsim: Set offsets to various protocol layers

The driver periodically generates "trapped" UDP packets that it then
passes on to devlink. Set the offsets to the various protocol layers.

This is a prerequisite to the next patch, where drop monitor is taught
to check that the offset to the MAC header was set.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97691069 12-Sep-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: split reload op into two

In order to properly implement failure indication during reload,
split the reload op into two ops, one for down phase and one for
up phase.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d9bd6d27 20-Aug-2019 YueHaibing <yuehaibing@huawei.com>

netdevsim: Fix build error without CONFIG_INET

If CONFIG_INET is not set, building fails:

drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
dev.c:(.text+0x67b): undefined reference to `ip_send_check'

Use ip_fast_csum instead of ip_send_check to avoid
dependencies on CONFIG_INET.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e087457 17-Aug-2019 Ido Schimmel <idosch@mellanox.com>

Documentation: Add description of netdevsim traps

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da58f90f 17-Aug-2019 Ido Schimmel <idosch@mellanox.com>

netdevsim: Add devlink-trap support

Have netdevsim register its trap groups and traps with devlink during
initialization and periodically report trapped packets to devlink core.

Since netdevsim is not a real device, the trapped packets are emulated
using a workqueue that periodically reports a UDP packet with a random
5-tuple from each active packet trap and from each running netdev.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4418f862 15-Aug-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement support for devlink region and snapshots

Implement dummy region of size 32K and allow user to create snapshots
or random data using debugfs file trigger.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 150e8f8a 09-Aug-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: register couple of devlink params

Register couple of devlink params, one generic, one driver-specific.
Make the values available over debugfs.

Example:
$ echo "111" > /sys/bus/netdevsim/new_device
$ devlink dev param
netdevsim/netdevsim111:
name max_macs type generic
values:
cmode driverinit value 32
name test1 type driver-specific
values:
cmode driverinit value true
$ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
32
$ cat /sys/kernel/debug/netdevsim/netdevsim111/test1
Y
$ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit value 16
$ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit value false
$ devlink dev reload netdevsim/netdevsim111
$ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs
16
$ cat /sys/kernel/debug/netdevsim/netdevsim111/test1

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59c84b9f 06-Aug-2019 David Ahern <dsahern@gmail.com>

netdevsim: Restore per-network namespace accounting for fib entries

Prior to the commit in the fixes tag, the resource controller in netdevsim
tracked fib entries and rules per network namespace. Restore that behavior.

Fixes: 5fc494225c1e ("netdevsim: create devlink instance per netdevsim instance")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fa4dfc4a 04-Jun-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement fake flash updating with notifications

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e05b2d14 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: move netdev creation/destruction to dev probe

Remove the existing way to create netdevsim over rtnetlink and move the
netdev creation/destruction to dev probe, so for every probed port,
a netdevsim-netdev instance is created.

Adjust selftests to work with new interface.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 794b2c05 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: extend device attrs to support port addition and deletion

In order to test flows in core, it is beneficial to maintain previously
supported possibility to add and delete ports during netdevsim lifetime.
Do it by extending device sysfs attrs by "new_port" and "del_port".

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8320d145 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement dev probe/remove skeleton with port initialization

Implement netdevsim bus probing of netdevsim devices. For every probed
device create a devlink instance. According to the user-passed value,
create a number of ports represented by devlink port instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab1d0cc0 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: change debugfs tree topology

With the model where dev is represented by devlink and ports are
represented by devlink ports, make debugfs file names independent
on netdev names. Change the topology to the one illustrated
by the following example:

$ ls /sys/kernel/debug/netdevsim/
netdevsim1
$ ls /sys/kernel/debug/netdevsim/netdevsim1/
bpf_bind_accept bpf_bind_verifier_delay bpf_bound_progs ports
$ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/
0 1
$ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/0/
bpf_map_accept bpf_offloaded_id bpf_tc_accept bpf_tc_non_bound_accept bpf_xdpdrv_accept bpf_xdpoffload_accept dev ipsec
$ ls /sys/kernel/debug/netdevsim/netdevsim1/ports/0/dev -l
lrwxrwxrwx 1 root root 0 Apr 13 15:58 /sys/kernel/debug/netdevsim/netdevsim1/ports/0/dev -> ../../../netdevsim1

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 514cf64c 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: generate random switch id instead of using dev id

Current implementation of parent_id/switch_id does not follow the
original idea of being unique. The values are "0", "1", etc. Instead of
that, generate 32 random bytes.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d514f41e 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: merge sdev into dev

As previously introduce dev which is mapped 1:1 to a bus device covers
the purpose of the original shared device, merge the sdev code into dev.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a60f9e48 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: rename dev_init/exit() functions and make them independent on ns

These functions are going to be called from bus probe/release(),
therefore make them independent on ns struct and rename accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40e4fe4c 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: move device registration and related code to bus.c

Move netdevsim device registration into bus.c and alongside with that
the related sysfs attributes. Introduce new struct nsim_bus_dev to
represent a netdevsim device on netdevsim bus.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8fb4bc6f 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: rename devlink.c to dev.c to contain per-dev(asic) items

The existing devlink.c code is going to be extended to represent asic
device on a bus. As this is about more than just devlink,
rename the file. Do appropriate prefix renaming alongside with that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>