History log of /linux-master/drivers/net/netdevsim/fib.c
Revision Date Author Comments
# 974be75f 28-Jul-2022 Ido Schimmel <idosch@nvidia.com>

netdevsim: fib: Add debugfs knob to simulate route deletion failure

The previous patch ("netdevsim: fib: Fix reference count leak on route
deletion failure") fixed a reference count leak that happens on route
deletion failure.

Such failures can only be simulated by injecting slab allocation
failures, which cannot be surgically injected.

In order to be able to specifically test this scenario, add a debugfs
knob that allows user space to fail route deletion requests when
enabled.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 180a6a3e 28-Jul-2022 Ido Schimmel <idosch@nvidia.com>

netdevsim: fib: Fix reference count leak on route deletion failure

As part of FIB offload simulation, netdevsim stores IPv4 and IPv6 routes
and holds a reference on FIB info structures that in turn hold a
reference on the associated nexthop device(s).

In the unlikely case where we are unable to allocate memory to process a
route deletion request, netdevsim will not release the reference from
the associated FIB info structure, thereby preventing the associated
nexthop device(s) from ever being removed [1].

Fix this by scheduling a work item that will flush netdevsim's FIB table
upon route deletion failure. This will cause netdevsim to release its
reference from all the FIB info structures in its table.

Reported by Lucas Leong of Trend Micro Zero Day Initiative.

Fixes: 0ae3eb7b4611 ("netdevsim: fib: Perform the route programming in a non-atomic context")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 012ec02a 16-Jul-2022 Jiri Pirko <jiri@nvidia.com>

netdevsim: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 20bbf32e 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

netdevsim: Use dscp_t in struct nsim_fib4_rt

Use the new dscp_t type to replace the tos field of struct
nsim_fib4_rt. This ensures ECN bits are ignored and makes it compatible
with the dscp fields of struct fib_entry_notifier_info and struct
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 568a3f33 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

ipv4: Use dscp_t in struct fib_entry_notifier_info

Use the new dscp_t type to replace the tos field of struct
fib_entry_notifier_info. This ensures ECN bits are ignored and makes it
compatible with the dscp field of struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 888ade8f 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

ipv4: Use dscp_t in struct fib_rt_info

Use the new dscp_t type to replace the tos field of struct fib_rt_info.
This ensures ECN bits are ignored and makes it compatible with the
fa_dscp field of struct fib_alias.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d95d6320 16-Feb-2022 Eric Dumazet <edumazet@google.com>

ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt

Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
__ip6_del_rt net/ipv6/route.c:3876 [inline]
ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
__ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
__ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
__sock_release net/socket.c:650 [inline]
sock_close+0x6c/0x150 net/socket.c:1318
__fput+0x295/0x520 fs/file_table.c:280
____fput+0x11/0x20 fs/file_table.c:313
task_work_run+0x8e/0x110 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f36c82ac 01-Aug-2021 Colin Ian King <colin.king@canonical.com>

netdevsim: make array res_ids static const, makes object smaller

Don't populate the array res_ids on the stack but instead it
static const. Makes the object code smaller by 14 bytes.

Before:
text data bss dec hex filename
50833 8314 256 59403 e80b ./drivers/net/netdevsim/fib.o

After:
text data bss dec hex filename
50755 8378 256 59389 e7fd ./drivers/net/netdevsim/fib.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210801065328.138906-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# be107538 05-Apr-2021 Qiheng Lin <linqiheng@huawei.com>

netdevsim: remove unneeded semicolon

Eliminate the following coccicheck warning:
drivers/net/netdevsim/fib.c:569:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6385c0b 12-Mar-2021 Ido Schimmel <idosch@nvidia.com>

netdevsim: Allow reporting activity on nexthop buckets

A key component of the resilient hashing algorithm is the hash buckets'
activity. If a bucket is active, it will not be populated with a new
nexthop in order not to break existing flows. Therefore, in order to
easily and thoroughly test the algorithm, we need to be in full control
over the reported activity.

Add a debugfs interface that allows user space to have netdevsim report
a nexthop bucket within a resilient nexthop group as active. For
example:

# echo 10 23 > /sys/kernel/debug/netdevsim/netdevsim10/fib/nexthop_bucket_activity

Will mark bucket 23 in nexthop group 10 as active.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8eaa4fa 12-Mar-2021 Ido Schimmel <idosch@nvidia.com>

netdevsim: Add support for resilient nexthop groups

Allow resilient nexthop groups to be programmed and account their
occupancy according to their number of buckets. The nexthop group itself
as well as its buckets are marked with hardware flags (i.e.,
'RTNH_F_TRAP').

Replacement of a single nexthop bucket can fail using the following
debugfs knob:

# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
N
# echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
Y

Replacement of a resilient nexthop group can fail using the following
debugfs knob:

# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
N
# echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
# cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
Y

This enables testing of various error paths.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40ff8371 12-Mar-2021 Ido Schimmel <idosch@nvidia.com>

netdevsim: Create a helper for setting nexthop hardware flags

Instead of calling nexthop_set_hw_flags(), call a helper. It will be
used to also set nexthop bucket flags in a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86927c9c 12-Mar-2021 Petr Machata <petrm@nvidia.com>

netdevsim: fib: Introduce a lock to guard nexthop hashtable

Currently netdevsim relies on RTNL to maintain exclusivity in accessing the
nexthop hash table. However, bucket notification may be called without RTNL
having been held. Instead, introduce a custom lock to guard the table.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c53d21af 11-Mar-2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

netdevsim: fib: Remove redundant code

Fix the following coccicheck warnings:

./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return
"0" on line 889.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 134c7532 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

netdevsim: fib: Add debugfs to debug route offload failure

Add "fail_route_offload" flag to disallow offloading routes.
It is needed to test "offload failed" notifications.

Create the flag as part of nsim_fib_create() under fib directory and set
it to false by default.

When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and
"fail_route_offload" value is true, set the appropriate hardware flag to
make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED
flag.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 484a4dfb 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

netdevsim: fib: Do not warn if route was not found for several events

The next patch will add the ability to fail route offload controlled by
debugfs variable called "fail_route_offload".

If we vetoed the addition, we might get a delete or append notification
for a route we do not have. Therefore, do not warn if route was not found.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c5fcf9e 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

IPv6: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36c5100e 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

IPv4: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# efc42879 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled

With the next patch mlxsw and netdevsim will fail in compilation if
CONFIG_IPV6 is disabled.

Do not call fib6_info_hw_flags_set() when IPv6 is disabled.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fbaca8f8 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()

The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.

To know sysctl values, net struct is needed.
This change is consistent with the IPv4 version, which gets 'net' struct
as its first argument.

Currently, the only callers of this function are mlxsw and netdevsim.
Patch the callers to pass net.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0ae3eb7b 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

netdevsim: fib: Perform the route programming in a non-atomic context

Currently, netdevsim implements dummy FIB offload and marks notified
routes with RTM_F_TRAP flag. netdevsim does not defer route notifications
to a work queue because it does not need to program any hardware.

Given that netdevsim's purpose is to both give an example implementation
and allow developers to test their code, align netdevsim to a "real"
hardware device driver like mlxsw and have it also perform the route
"programming" in a non-atomic context.

It will be used to test route flags notifications which will be added in
the next patches.

The following changes are needed when route handling is performed in WQ:
- Handle the accounting in the main context, to be able to return an
error for adding route when all the routes are used.
For FIB_EVENT_ENTRY_REPLACE increase the counter before scheduling
the delayed work, and in case that this event replaces an existing route,
decrease the counter as part of the delayed work.

- For IPv6, cannot use fen6_info->rt->fib6_siblings list because it
might be changed during handling the delayed work.
Save an array with the nexthops as part of fib6_event struct, and take
a reference for each nexthop to prevent them from being freed while
event is queued.

- Change GFP_ATOMIC allocations to GFP_KERNEL.

- Use single work item that is handling a list of ordered routes.
Handling routes must be processed in the order they were submitted to
avoid logical errors that could lead to unexpected failures.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9e635a21 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

netdevsim: fib: Convert the current occupancy to an atomic variable

When route is added/deleted, the appropriate counter is increased/decreased
to maintain number of routes.

User can limit the number of routes and then according to the appropriate
counter, adding more routes than the limitation is forbidden.

Currently, there is one lock which protects hashtable, list and accounting.

Handling the counters will be performed from both atomic context and
non-atomic context, while the hashtable and the list will be used only from
non-atomic context and therefore will be protected by a separate lock.

Protect accounting by using an atomic variable, so lock is not needed.

v2:
* Use atomic64_sub() in nsim_nexthop_account()'s error path

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 09ad6bec 28-Jan-2021 Ido Schimmel <idosch@nvidia.com>

nexthop: Use enum to encode notification type

Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.

As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 66e58bf0 04-Nov-2020 Ido Schimmel <idosch@nvidia.com>

netdevsim: Allow programming routes with nexthop objects

Previous patches added the ability to program nexthop objects.
Therefore, no longer forbid the programming of routes that point to such
objects.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8fa84742 04-Nov-2020 Ido Schimmel <idosch@nvidia.com>

netdevsim: Add dummy implementation for nexthop offload

Implement dummy nexthop "offload" in the driver by storing currently
"programmed" nexthops in a hash table. Each nexthop in the hash table is
marked with "trap" indication and increments the nexthops resource
occupancy.

This will later allow us to test the nexthop offload API on top of
netdevsim.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 35266255 04-Nov-2020 Ido Schimmel <idosch@nvidia.com>

netdevsim: Add devlink resource for nexthops

The Spectrum ASIC has a dedicated table where nexthops (i.e., adjacency
entries) are populated. The size of this table can be controlled via
devlink-resource.

Add such a resource to netdevsim so that its occupancy will reflect the
number of nexthop objects currently programmed to the device.

By limiting the size of the resource, error paths could be exercised and
tested.

Example output:

# devlink resource show netdevsim/netdevsim10
netdevsim/netdevsim10:
name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 1 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 2 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name nexthops size unlimited occ 0 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 41cdc741 15-Jan-2020 Eric Dumazet <edumazet@google.com>

netdevsim: fix nsim_fib6_rt_create() error path

It seems nsim_fib6_rt_create() intent was to return
either a valid pointer or an embedded error code.

BUG: unable to handle page fault for address: fffffffffffffff4
PGD 9870067 P4D 9870067 PUD 9872067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 22851 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:jhash2 include/linux/jhash.h:125 [inline]
RIP: 0010:rhashtable_jhash2+0x76/0x2c0 lib/rhashtable.c:963
Code: b9 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 14 08 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 30 02 00 00 49 8d 7e 04 <41> 8b 06 48 be 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6
RSP: 0018:ffffc90016127190 EFLAGS: 00010246
RAX: 0000000000000007 RBX: 00000000dfb3ab49 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff839ba7c8 RDI: fffffffffffffff8
RBP: ffffc900161271c0 R08: ffff8880951f8640 R09: ffffed1015d0703d
R10: ffffed1015d0703c R11: ffff8880ae8381e3 R12: 00000000dfb3ab49
R13: 00000000dfb3ab49 R14: fffffffffffffff4 R15: 0000000000000007
FS: 00007f40bfbc6700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff4 CR3: 0000000093660000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
rht_key_get_hash include/linux/rhashtable.h:133 [inline]
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
rht_head_hashfn include/linux/rhashtable.h:174 [inline]
__rhashtable_insert_fast.constprop.0+0xe15/0x1180 include/linux/rhashtable.h:723
rhashtable_insert_fast include/linux/rhashtable.h:832 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:603 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:658 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:719 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:744 [inline]
nsim_fib_event_nb+0x1b16/0x2600 drivers/net/netdevsim/fib.c:772
notifier_call_chain+0xc2/0x230 kernel/notifier.c:83
__atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:173
atomic_notifier_call_chain+0x2e/0x40 kernel/notifier.c:183
call_fib_notifiers+0x173/0x2a0 net/core/fib_notifier.c:35
call_fib6_notifiers+0x4b/0x60 net/ipv6/fib6_notifier.c:22
call_fib6_entry_notifiers+0xfb/0x150 net/ipv6/ip6_fib.c:399
fib6_add_rt2node net/ipv6/ip6_fib.c:1216 [inline]
fib6_add+0x20cd/0x3ec0 net/ipv6/ip6_fib.c:1471
__ip6_ins_rt+0x54/0x80 net/ipv6/route.c:1315
ip6_ins_rt+0x96/0xd0 net/ipv6/route.c:1325
__ipv6_dev_ac_inc+0x76f/0xb20 net/ipv6/anycast.c:324
ipv6_sock_ac_join+0x4c1/0x790 net/ipv6/anycast.c:139
do_ipv6_setsockopt.isra.0+0x3908/0x4290 net/ipv6/ipv6_sockglue.c:670
ipv6_setsockopt+0xff/0x180 net/ipv6/ipv6_sockglue.c:944
udpv6_setsockopt+0x68/0xb0 net/ipv6/udp.c:1564
sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3149
__sys_setsockopt+0x261/0x4c0 net/socket.c:2130
__do_sys_setsockopt net/socket.c:2146 [inline]
__se_sys_setsockopt net/socket.c:2143 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2143
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45aff9

Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48bb9eb4 14-Jan-2020 Ido Schimmel <idosch@mellanox.com>

netdevsim: fib: Add dummy implementation for FIB offload

Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing
currently "programmed" routes in a hash table. Each route in the hash
table is marked with "trap" indication. The indication is cleared when
the route is replaced or when the netdevsim instance is deleted.

This will later allow us to test the route offload API on top of
netdevsim.

v2:
* Convert to new fib_alias_hw_flags_set() interface

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# caafb250 23-Dec-2019 Ido Schimmel <idosch@mellanox.com>

ipv6: Remove old route notifications and convert listeners

Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 446f7391 14-Dec-2019 Ido Schimmel <idosch@mellanox.com>

ipv4: Remove old route notifications and convert listeners

Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33902b4a 11-Oct-2019 YueHaibing <yuehaibing@huawei.com>

netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit

In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
should be unregistered before return.

In nsim_fib_exit(), unregister_fib_notifier should be called before
nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:

BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499

CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xa9/0x10e lib/dump_stack.c:113
print_address_description+0x65/0x380 mm/kasan/report.c:351
__kasan_report+0x149/0x18d mm/kasan/report.c:482
kasan_report+0xe/0x20 mm/kasan/common.c:618
nsim_fib_event_nb+0x342/0x570 [netdevsim]
notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
__atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
fib6_add+0x92e/0x1b10 [ipv6]
__ip6_ins_rt+0x40/0x60 [ipv6]
ip6_ins_rt+0x84/0xb0 [ipv6]
__ipv6_ifa_notify+0x4b6/0x550 [ipv6]
ipv6_ifa_notify+0xa5/0x180 [ipv6]
addrconf_dad_completed+0xca/0x640 [ipv6]
addrconf_dad_work+0x296/0x960 [ipv6]
process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
worker_thread+0x5c/0x670 kernel/workqueue.c:2415
kthread+0x1d7/0x200 kernel/kthread.c:255
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Allocated by task 3388:
save_stack+0x19/0x80 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
kmalloc include/linux/slab.h:557 [inline]
kzalloc include/linux/slab.h:748 [inline]
ops_init+0xa9/0x220 net/core/net_namespace.c:127
__register_pernet_operations net/core/net_namespace.c:1135 [inline]
register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
nsim_fib_init+0x12/0x70 [netdevsim]
veth_get_link_ksettings+0x2b/0x50 [veth]
do_one_initcall+0xd4/0x454 init/main.c:939
do_init_module+0xe0/0x330 kernel/module.c:3490
load_module+0x3c2f/0x4620 kernel/module.c:3841
__do_sys_finit_module+0x163/0x190 kernel/module.c:3931
do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3534:
save_stack+0x19/0x80 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1423 [inline]
slab_free_freelist_hook mm/slub.c:1474 [inline]
slab_free mm/slub.c:3016 [inline]
kfree+0xe9/0x2d0 mm/slub.c:3957
ops_free net/core/net_namespace.c:151 [inline]
ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
ops_free_list net/core/net_namespace.c:182 [inline]
__unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
nsim_fib_exit+0x11/0x20 [netdevsim]
nsim_module_exit+0x16/0x21 [netdevsim]
__do_sys_delete_module kernel/module.c:1015 [inline]
__se_sys_delete_module kernel/module.c:958 [inline]
__x64_sys_delete_module+0x244/0x330 kernel/module.c:958
do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 59c84b9fcf42 ("netdevsim: Restore per-network namespace accounting for fib entries")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f174bbc 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: take devlink net instead of init_net

Follow-up patch is going to allow to reload devlink instance into
different network namespace, so use devlink_net() helper instead
of init_net.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75ba029f 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: implement proper devlink reload

During devlink reload, all driver objects should be reinstantiated with
the exception of devlink instance and devlink resources and params.
Move existing devlink_resource_size_get() calls into fib_create() just
before fib notifier is registered. Also, make sure that extack is
propagated down to fib_notifier_register() call.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7a59557 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: propagate extack down to the notifier block callback

Since errors are propagated all the way up to the caller, propagate
possible extack of the caller all the way down to the notifier block
callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c550daf 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: make FIB notifier per-netns

Currently all users of FIB notifier only cares about events in init_net.
Later in this patchset, users get interested in other namespaces too.
However, for every registered block user is interested only about one
namespace. Make the FIB notifier registration per-netns and avoid
unnecessary calls of notifier block for other namespaces.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5facc4c 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: change fib accounting and limitations to be per-device

Currently, the accounting is done per-namespace. However, devlink
instance is always in init_net namespace for now, so only the accounting
related to init_net is used. Limitations set using devlink resources
are only considered for init_net. nsim_devlink_net() always
returns init_net always.

Make the accounting per-device. This brings no functional change.
Per-device accounting has the same values as per-net.
For a single netdevsim instance, the behaviour is exactly the same
as before. When multiple netdevsim instances are created, each
can have different limits.

This is in prepare to implement proper devlink netns support. After
that, the devlink instance which would exist in particular netns would
account and limit that netns.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59c84b9f 06-Aug-2019 David Ahern <dsahern@gmail.com>

netdevsim: Restore per-network namespace accounting for fib entries

Prior to the commit in the fixes tag, the resource controller in netdevsim
tracked fib entries and rules per network namespace. Restore that behavior.

Fixes: 5fc494225c1e ("netdevsim: create devlink instance per netdevsim instance")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d5382fef 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

ipv6: Stop sending in-kernel notifications for each nexthop

Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now
ready to handle IPv6 multipath notifications.

Therefore, stop ignoring such notifications in both drivers and stop
sending notification for each added / deleted nexthop.

v2:
* Remove 'multipath_rt' from 'struct fib6_entry_notifier_info'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d133e4f1 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

netdevsim: Ignore IPv6 multipath notifications

In a similar fashion to previous patch, have netdevsim ignore IPv6
multipath notifications for now.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5fc49422 25-Apr-2019 Jiri Pirko <jiri@mellanox.com>

netdevsim: create devlink instance per netdevsim instance

Currently there is one devlink instance created per network namespace.
That is quite odd considering the fact that devlink instance should
represent an ASIC. The following patches are going to move the devlink
instance even more down to a bus device, but until then, have one
devlink instance per netdevsim instance. Struct nsim_devlink is
introduced to hold fib setting.

The changes in the fib code are only related to holding the
configuration per devlink instance instead of network namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7fa76d77 05-Jun-2018 David Ahern <dsahern@gmail.com>

netdevsim: Add extack error message for devlink reload

devlink reset command can fail if a FIB resource limit is set to a value
lower than the current occupancy. Return a proper message indicating the
reason for the failure.

$ devlink resource sh netdevsim/netdevsim0
netdevsim/netdevsim0:
name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
resources:
name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none

$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40

$ devlink dev reload netdevsim/netdevsim0
Error: netdevsim: New size is less than current occupancy.
devlink answers: Invalid argument

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87248d31 04-Apr-2018 Arnd Bergmann <arnd@arndb.de>

netdevsim: remove incorrect __net_initdata annotations

The __net_initdata section cannot currently be used for structures that
get cleaned up in an exitcall using unregister_pernet_operations:

WARNING: vmlinux.o(.text+0x868c34): Section mismatch in reference from the function nsim_devlink_exit() to the (unknown reference) .init.data:(unknown)
The function nsim_devlink_exit() references
the (unknown reference) __initdata (unknown).
This is often because nsim_devlink_exit lacks a __initdata
annotation or the annotation of (unknown) is wrong.
WARNING: vmlinux.o(.text+0x868c64): Section mismatch in reference from the function nsim_devlink_init() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x8692bc): Section mismatch in reference from the function nsim_fib_exit() to the (unknown reference) .init.data:(unknown)
WARNING: vmlinux.o(.text+0x869300): Section mismatch in reference from the function nsim_fib_init() to the (unknown reference) .init.data:(unknown)

As that warning tells us, discarding the structure after a module is
loaded would lead to a undefined behavior when that module is removed.

It might be possible to change that annotation so it has no effect for
loadable modules, but I have not figured out exactly how to do that, and
we want this to be fixed in -rc1.

This just removes the annotations, just like we do for all other such
modules.

Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via devlink")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37923ed6 27-Mar-2018 David Ahern <dsa@cumulusnetworks.com>

netdevsim: Add simple FIB resource controller via devlink

Add devlink support to netdevsim and use it to implement a simple,
profile based resource controller. Only one controller is needed
per namespace, so the first netdevsim netdevice in a namespace
registers with devlink. If that device is deleted, the resource
settings are deleted.

The resource controller allows a user to limit the number of IPv4 and
IPv6 FIB entries and FIB rules. The resource paths are:
/IPv4
/IPv4/fib
/IPv4/fib-rules
/IPv6
/IPv6/fib
/IPv6/fib-rules

The IPv4 and IPv6 top level resources are unlimited in size and can not
be changed. From there, the number of FIB entries and FIB rule entries
are unlimited by default. A user can specify a limit for the fib and
fib-rules resources:

$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
$ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
$ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
$ devlink dev reload netdevsim/netdevsim0

such that the number of rules or routes is limited (96 ipv4 routes in the
example above):
$ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done
Error: netdevsim: Exceeded number of supported fib entries.

$ devlink resource show netdevsim/netdevsim0
netdevsim/netdevsim0:
name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non
resources:
name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables
...

With this template in place for resource management, it is fairly trivial
to extend and shows one way to implement a simple counter based resource
controller typical of network profiles.

Currently, devlink only supports initial namespace. Code is in place to
adapt netdevsim to a per namespace controller once the network namespace
issues are resolved.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>