History log of /linux-master/net/sched/act_api.c
Revision Date Author Comments
# 2c15a5ae 01-Feb-2024 Michal Koutný <mkoutny@suse.com>

net/sched: Load modules via their alias

The cls_,sch_,act_ modules may be loaded lazily during network
configuration but without user's awareness and control.

Switch the lazy loading from canonical module names to a module alias.
This allows finer control over lazy loading, the precedent from
commit 7f78e0351394 ("fs: Limit sys_mount to only request filesystem
modules.") explains it already:

Using aliases means user space can control the policy of which
filesystem^W net/sched modules are auto-loaded by editing
/etc/modprobe.d/*.conf with blacklist and alias directives.
Allowing simple, safe, well understood work-arounds to known
problematic software.

By default, nothing changes. However, if a specific module is
blacklisted (its canonical name), it won't be modprobe'd when requested
under its alias (i.e. kernel auto-loading). It would appear as if the
given module was unknown.

The module can still be loaded under its canonical name, which is an
explicit (privileged) user action.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201130943.19536-4-mkoutny@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 405cd9fc 04-Jan-2024 Pedro Tammela <pctammela@mojatatu.com>

net/sched: simplify tc_action_load_ops parameters

Instead of using two bools derived from a flags passed as arguments to
the parent function of tc_action_load_ops, just pass the flags itself
to tc_action_load_ops to simplify its parameters.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2a67de9 29-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: introduce ACT_P_BOUND return code

Bound actions always return '0' and as of today we rely on '0'
being returned in order to properly skip bound actions in
tcf_idr_insert_many. In order to further improve maintainability,
introduce the ACT_P_BOUND return code.

Actions are updated to return 'ACT_P_BOUND' instead of plain '0'.
tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4cf24dc8 16-Dec-2023 Victor Nogueira <victor@mojatatu.com>

net: sched: Add initial TC error skb drop reasons

Continue expanding Daniel's patch by adding new skb drop reasons that
are idiosyncratic to TC.

More specifically:

- SKB_DROP_REASON_TC_COOKIE_ERROR: An error occurred whilst
processing a tc ext cookie.

- SKB_DROP_REASON_TC_CHAIN_NOTFOUND: tc chain lookup failed.

- SKB_DROP_REASON_TC_RECLASSIFY_LOOP: tc exceeded max reclassify loop
iterations

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fb278072 16-Dec-2023 Victor Nogueira <victor@mojatatu.com>

net: sched: Move drop_reason to struct tc_skb_cb

Move drop_reason from struct tcf_result to skb cb - more specifically to
struct tc_skb_cb. With that, we'll be able to also set the drop reason for
the remaining qdiscs (aside from clsact) that do not have access to
tcf_result when time comes to set the skb drop reason.

Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1dd7f18f 11-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: skip idr replace on bound actions

tcf_idr_insert_many will replace the allocated -EBUSY pointer in
tcf_idr_check_alloc with the real action pointer, exposing it
to all operations. This operation is only needed when the action pointer
is created (ACT_P_CREATED). For actions which are bound to (returned 0),
the pointer already resides in the idr making such operation a nop.

Even though it's a nop, it's still not a cheap operation as internally
the idr code walks the idr and then does a replace on the appropriate slot.
So if the action was bound, better skip the idr replace entirely.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20231211181807.96028-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4b55e867 11-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: rely on rcu in tcf_idr_check_alloc

Instead of relying only on the idrinfo->lock mutex for
bind/alloc logic, rely on a combination of rcu + mutex + atomics
to better scale the case where multiple rtnl-less filters are
binding to the same action object.

Action binding happens when an action index is specified explicitly and
an action exists which such index exists. Example:
tc actions add action drop index 1
tc filter add ... matchall action drop index 1
tc filter add ... matchall action drop index 1
tc filter add ... matchall action drop index 1
tc filter ls ...
filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3

filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3

filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3

When no index is specified, as before, grab the mutex and allocate
in the idr the next available id. In this version, as opposed to before,
it's simplified to store the -EBUSY pointer instead of the previous
alloc + replace combination.

When an index is specified, rely on rcu to find if there's an object in
such index. If there's none, fallback to the above, serializing on the
mutex and reserving the specified id. If there's one, it can be an -EBUSY
pointer, in which case we just try again until it's an action, or an action.
Given the rcu guarantees, the action found could be dead and therefore
we need to bump the refcount if it's not 0, handling the case it's
in fact 0.

As bind and the action refcount are already atomics, these increments can
happen without the mutex protection while many tcf_idr_check_alloc race
to bind to the same action instance.

In case binding encounters a parallel delete or add, it will return
-EAGAIN in order to try again. Both filter and action apis already
have the retry machinery in-place. In case it's an unlocked filter it
retries under the rtnl lock.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20231211181807.96028-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8d4390f5 08-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: conditional notification of events

As of today tc-action events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c73724bf 08-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: don't open code max()

Use max() in a couple of places that are open coding it with the
ternary operator.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f9bfc8eb 01-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: use tcf_act_for_each_action in tcf_idr_insert_many

The actions array is contiguous, so stop processing whenever a NULL
is found. This is already the assumption for tcf_action_destroy[1],
which is called from tcf_actions_init.

[1] https://elixir.bootlin.com/linux/v6.7-rc3/source/net/sched/act_api.c#L1115

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# e09ac779 01-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: stop loop over ops array on NULL in tcf_action_init

The ops array is contiguous, so stop processing whenever a NULL is found

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# a0e947c9 01-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: avoid non-contiguous action array

In tcf_action_add, when putting the reference for the bound actions
it assigns NULLs to just created actions passing a non contiguous
array to tcf_action_put_many.
Refactor the code so the actions array is always contiguous.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 3872347e 01-Dec-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: use tcf_act_for_each_action

Use the auxiliary macro tcf_act_for_each_action in all the
functions that expect a contiguous action array

Suggested-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 40cb2fdf 28-Oct-2023 Jamal Hadi Salim <jhs@mojatatu.com>

net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config

Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
reproducer [2]. Problem seems to be that classifiers clear 'struct
tcf_result::drop_reason', thereby triggering the warning in
__kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).

Fixed by disambiguating a legit error from a verdict with a bogus drop_reason

[1]
WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
Modules linked in:
CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:kfree_skb_reason+0x38/0x130
[...]
Call Trace:
<IRQ>
__netif_receive_skb_core.constprop.0+0x837/0xdb0
__netif_receive_skb_one_core+0x3c/0x70
process_backlog+0x95/0x130
__napi_poll+0x25/0x1b0
net_rx_action+0x29b/0x310
__do_softirq+0xc0/0x29b
do_softirq+0x43/0x60
</IRQ>

[2]

ip link add name veth0 type veth peer name veth1
ip link set dev veth0 up
ip link set dev veth1 up
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1

Ido reported:

[...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
reproducer [2]. Problem seems to be that classifiers clear 'struct
tcf_result::drop_reason', thereby triggering the warning in
__kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]

[1]
WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
Modules linked in:
CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
RIP: 0010:kfree_skb_reason+0x38/0x130
[...]
Call Trace:
<IRQ>
__netif_receive_skb_core.constprop.0+0x837/0xdb0
__netif_receive_skb_one_core+0x3c/0x70
process_backlog+0x95/0x130
__napi_poll+0x25/0x1b0
net_rx_action+0x29b/0x310
__do_softirq+0xc0/0x29b
do_softirq+0x43/0x60
</IRQ>

[2]
#!/bin/bash

ip link add name veth0 type veth peer name veth1
ip link set dev veth0 up
ip link set dev veth1 up
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1

What happens is that inside most classifiers the tcf_result is copied over
from a filter template e.g. *res = f->res which then implicitly overrides
the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
set via sch_handle_{ingress,egress}() for kfree_skb_reason().

Commit text above copied verbatim from Daniel. The general idea of the patch
is not very different from what Ido originally posted but instead done at the
cls_api codepath.

Fixes: 54a59aed395c ("net, sched: Make tc-related drop reason more flexible")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 989b52cd 09-Jul-2023 Azeem Shaikh <azeemshaikh38@gmail.com>

net: sched: Replace strlcpy with strscpy

strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().

Direct replacement is safe here since return value of -errno
is used to check for truncation instead of sizeof(dest).

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fcb3a465 21-Mar-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: use the correct TCA_ACT attributes in dump

4 places in the act api code are using 'TCA_' definitions where they
should be using 'TCA_ACT_', which is confusing for the reader, although
functionally they are equivalent.

Cc: Hangbin Liu <haliu@redhat.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Hangbin Liu <haliu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f59823f 15-Mar-2023 Hangbin Liu <liuhangbin@gmail.com>

net/sched: act_api: add specific EXT_WARN_MSG for tc action

In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG
to report tc extact message") I didn't notice the tc action use different
enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action.
Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this
param before going to the TCA_ACT_TAB nest.

Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8de2bd02 15-Mar-2023 Hangbin Liu <liuhangbin@gmail.com>

Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"

This reverts commit 923b2e30dc9cd05931da0f64e2e23d040865c035.

This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to
TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding
TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in
TCA_ROOT_MAX as Jamal suggested.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 923b2e30 24-Feb-2023 Pedro Tammela <pctammela@mojatatu.com>

net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy

TCA_EXT_WARN_MSG is currently sitting outside of the expected hierarchy
for the tc actions code. It should sit within TCA_ACT_TAB.

Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 80cd22c3 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/sched: cls_api: Support hardware miss to tc action

For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# db4b4902 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/sched: Rename user cookie and act cookie

struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5246c896 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/sched: support per action hw stats

There are currently two mechanisms for populating hardware stats:
1. Using flow_offload api to query the flow's statistics.
The api assumes that the same stats values apply to all
the flow's actions.
This assumption breaks when action drops or jumps over following
actions.
2. Using hw_action api to query specific action stats via a driver
callback method. This api assures the correct action stats for
the offloaded action, however, it does not apply to the rest of the
actions in the flow's actions array.

Extend the flow_offload stats callback to indicate that a per action
stats update is required.
Use the existing flow_offload_action api to query the action's hw stats.
In addition, currently the tc action stats utility only updates hw actions.
Reuse the existing action stats cb infrastructure to query any action
stats.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# d307b2c6 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/sched: introduce flow_offload action cookie

Currently a hardware action is uniquely identified by the <id, hw_index>
tuple. However, the id is set by the flow_act_setup callback and tc core
cannot enforce this, and it is possible that a future change could break
this. In addition, <id, hw_index> are not unique across network namespaces.

Uniquely identify the action by setting an action cookie by the tc core.
Use the unique action cookie to query the action's hardware stats.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 8f2ca70c 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/sched: optimize action stats api calls

Currently the hw action stats update is called from tcf_exts_hw_stats_update,
when a tc filter is dumped, and from tcf_action_copy_stats, when a hw
action is dumped.
However, the tcf_action_copy_stats is also called from tcf_action_dump.
As such, the hw action stats update cb is called 3 times for every
tc flower filter dump.

Move the tc action hw stats update from tcf_action_copy_stats to
tcf_dump_walker to update the hw action stats when tc action is dumped.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 0349b8779 12-Jan-2023 Hangbin Liu <liuhangbin@gmail.com>

sched: add new attr TCA_EXT_WARN_MSG to report tc extact message

We will report extack message if there is an error via netlink_ack(). But
if the rule is not to be exclusively executed by the hardware, extack is not
passed along and offloading failures don't get logged.

In commit 81c7288b170a ("sched: cls: enable verbose logging") Marcelo
made cls could log verbose info for offloading failures, which helps
improving Open vSwitch debuggability when using flower offloading.

It would also be helpful if userspace monitor tools, like "tc monitor",
could log this kind of message, as it doesn't require vswitchd log level
adjusment. Let's add a new tc attributes to report the extack message so
the monitor program could receive the failures. e.g.

# tc monitor
added chain dev enp3s0f1np1 parent ffff: chain 0
added filter dev enp3s0f1np1 ingress protocol all pref 49152 flower chain 0 handle 0x1
ct_state +trk+new
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1

Warning: mlx5_core: matching on ct_state +new isn't supported.

In this patch I only report the extack message on add/del operations.
It doesn't look like we need to report the extack message on get/dump
operations.

Note this message not only reporte to multicast groups, it could also
be reported unicast, which may affect the current usersapce tool's behaivor.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230113034353.2766735-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 871cf386 06-Dec-2022 Pedro Tammela <pctammela@mojatatu.com>

net/sched: avoid indirect act functions on retpoline kernels

Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fae52d93 07-Sep-2022 Zhengchao Shao <shaozhengchao@huawei.com>

net: sched: act_api: implement generic walker and search for tc action

Being able to get tc_action_net by using net_id stored in tc_action_ops
and execute the generic walk/search function, add __tcf_generic_walker()
and __tcf_idr_search() helpers.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76b39b94 23-Jun-2022 Victor Nogueira <victor@mojatatu.com>

net/sched: act_api: Notify user space if any actions were flushed before error

If during an action flush operation one of the actions is still being
referenced, the flush operation is aborted and the kernel returns to
user space with an error. However, if the kernel was able to flush, for
example, 3 actions and failed on the fourth, the kernel will not notify
user space that it deleted 3 actions before failing.

This patch fixes that behaviour by notifying user space of how many
actions were deleted before flush failed and by setting extack with a
message describing what happened.

Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c2ccf84e 07-Apr-2022 Ido Schimmel <idosch@nvidia.com>

net/sched: act_api: Add extack to offload_act_setup() callback

The callback is used by various actions to populate the flow action
structure prior to offload. Pass extack to this callback so that the
various actions will be able to report accurate error messages to user
space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d922a99b 01-Mar-2022 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: improve extack msg for user when adding invalid filter

Add extack message to return exact message to user when adding invalid
filter with conflict flags for TC action.

In previous implement we just return EINVAL which is confusing for user.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Link: https://lore.kernel.org/r/1646191769-17761-1-git-send-email-baowen.zheng@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ecf4a24c 22-Feb-2022 Wan Jiabing <wanjiabing@vivo.com>

net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD

Fix following coccicheck warning:
./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5740d068 15-Feb-2022 Eric Dumazet <edumazet@google.com>

net: sched: limit TC_ACT_REPEAT loops

We have been living dangerously, at the mercy of malicious users,
abusing TC_ACT_REPEAT, as shown by this syzpot report [1].

Add an arbitrary limit (32) to the number of times an action can
return TC_ACT_REPEAT.

v2: switch the limit to 32 instead of 10.
Use net_warn_ratelimited() instead of pr_err_once().

[1] (C repro available on demand)

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0
(t=10502 jiffies g=5305 q=190)
rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu: Possible timer handling issue on cpu=0 timer-softirq=3527
rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:I stack:29344 pid: 14 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:4986 [inline]
__schedule+0xab2/0x4db0 kernel/sched/core.c:6295
schedule+0xd2/0x260 kernel/sched/core.c:6368
schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963
rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline]
RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline]
RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline]
RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508
Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84
RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206
RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e
RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e
R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80
FS: 00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115
spin_lock_bh include/linux/spinlock.h:354 [inline]
sch_tree_lock include/net/sch_generic.h:610 [inline]
sch_tree_lock include/net/sch_generic.h:605 [inline]
prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211
prio_init+0x5c/0x80 net/sched/sch_prio.c:244
qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253
tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ee98aae99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0
R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs
NMI backtrace for cpu 1
CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
print_cpu_stall kernel/rcu/tree_stall.h:604 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:688 [inline]
rcu_pending kernel/rcu/tree.c:3919 [inline]
rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617
update_process_times+0x16d/0x200 kernel/time/timer.c:1785
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
__sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286
Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b
RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246
RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0
R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8
tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256
tcf_action_exec net/sched/act_api.c:1049 [inline]
tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026
tcf_exts_exec include/net/pkt_cls.h:326 [inline]
route4_classify+0xef0/0x1400 net/sched/cls_route.c:179
__tcf_classify net/sched/cls_api.c:1549 [inline]
tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615
prio_classify net/sched/sch_prio.c:42 [inline]
prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75
dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668
__dev_xmit_skb net/core/dev.c:3756 [inline]
__dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081
neigh_hh_output include/net/neighbour.h:533 [inline]
neigh_output include/net/neighbour.h:547 [inline]
ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:306 [inline]
__ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288
ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip_output+0x196/0x310 net/ipv4/ip_output.c:430
dst_output include/net/dst.h:451 [inline]
ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126
iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82
geneve_xmit_skb drivers/net/geneve.c:966 [inline]
geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077
__netdev_start_xmit include/linux/netdevice.h:4683 [inline]
netdev_start_xmit include/linux/netdevice.h:4697 [inline]
xmit_one net/core/dev.c:3473 [inline]
dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489
__dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116
neigh_hh_output include/net/neighbour.h:533 [inline]
neigh_output include/net/neighbour.h:547 [inline]
ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126
__ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
__ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826
mld_send_cr net/ipv6/mcast.c:2127 [inline]
mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659
process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
----------------
Code disassembly (best guess):
0: 48 89 eb mov %rbp,%rbx
3: c6 45 01 01 movb $0x1,0x1(%rbp)
7: 41 bc 00 80 00 00 mov $0x8000,%r12d
d: 48 c1 e9 03 shr $0x3,%rcx
11: 83 e3 07 and $0x7,%ebx
14: 41 be 01 00 00 00 mov $0x1,%r14d
1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
21: fc ff df
24: 4c 8d 2c 01 lea (%rcx,%rax,1),%r13
28: eb 0c jmp 0x36
* 2a: f3 90 pause <-- trapping instruction
2c: 41 83 ec 01 sub $0x1,%r12d
30: 0f 84 72 04 00 00 je 0x4a8
36: 41 0f b6 45 00 movzbl 0x0(%r13),%eax
3b: 38 d8 cmp %bl,%al
3d: 7f 08 jg 0x47
3f: 84 .byte 0x84

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 963178a0 21-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: fix suspicious RCU usage when offloading tc action

Fix suspicious rcu_dereference_protected() usage when offloading tc action.

We should hold tcfa_lock to offload tc action in action initiation.

Without these changes, the following warning will be observed:

WARNING: suspicious RCU usage
5.16.0-rc5-net-next-01504-g7d1f236dcffa-dirty #50 Tainted: G I
-----------------------------
include/net/tc_act/tc_tunnel_key.h:33 suspicious rcu_dereference_protected() usage!
1 lock held by tc/12108:
CPU: 4 PID: 12108 Comm: tc Tainted: G
Hardware name: Dell Inc. PowerEdge R740/07WCGN, BIOS 1.6.11 11/20/2018
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x5e
dump_stack+0x10/0x12
lockdep_rcu_suspicious+0xed/0xf8
tcf_tunnel_key_offload_act_setup+0x1de/0x300 [act_tunnel_key]
tcf_action_offload_add_ex+0xc0/0x1f0
tcf_action_init+0x26a/0x2f0
tcf_action_add+0xa9/0x1f0
tc_ctl_action+0xfb/0x170
rtnetlink_rcv_msg+0x169/0x510
? sched_clock+0x9/0x10
? rtnl_newlink+0x70/0x70
netlink_rcv_skb+0x55/0x100
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x1a8/0x270
netlink_sendmsg+0x245/0x490
sock_sendmsg+0x65/0x70
____sys_sendmsg+0x219/0x260
? __import_iovec+0x2c/0x150
___sys_sendmsg+0xb7/0x100
? __lock_acquire+0x3d5/0x1f40
? __this_cpu_preempt_check+0x13/0x20
? lock_is_held_type+0xe4/0x140
? sched_clock+0x9/0x10
? ktime_get_coarse_real_ts64+0xbe/0xd0
? __this_cpu_preempt_check+0x13/0x20
? lockdep_hardirqs_on+0x7e/0x100
? ktime_get_coarse_real_ts64+0xbe/0xd0
? trace_hardirqs_on+0x2a/0xf0
__sys_sendmsg+0x5a/0xa0
? syscall_trace_enter.constprop.0+0x1dd/0x220
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4db7bb7a60

Fixes: 8cbfe939abe9 ("flow_offload: allow user to offload tc action to net device")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c86e0209 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: validate flags of filter and actions

Add process to validate flags of filter and actions when adding
a tc filter.

We need to prevent adding filter with flags conflicts with its actions.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13926d19 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: add reoffload process to update hw_count

Add reoffload process to update hw_count when driver
is inserted or removed.

We will delete the action if it is with skip_sw flag and
not offloaded to any hardware in reoffload process.

When reoffloading actions, we still offload the actions
that are added independent of filters.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e8cb5bcf 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

net: sched: save full flags for tc action

Save full action flags and return user flags when return flags to
user space.

Save full action flags to distinguish if the action is created
independent from classifier.

We made this change mainly for further patch to reoffload tc actions.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7a66f8d 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: add process to update action stats from hardware

When collecting stats for actions update them using both
hardware and software counters.

Stats update process should not run in context of preempt_disable.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7adc5765 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: add skip_hw and skip_sw to control if offload the action

We add skip_hw and skip_sw for user to control if offload the action
to hardware.

We also add in_hw_count for user to indicate if the action is offloaded
to any hardware.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8cbfe939 17-Dec-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: allow user to offload tc action to net device

Use flow_indr_dev_register/flow_indr_dev_setup_offload to
offload tc action.

We need to call tc_cleanup_flow_action to clean up tc action entry since
in tc_setup_action, some actions may hold dev refcnt, especially the mirror
action.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29cbcd85 16-Oct-2021 Ahmed S. Darwish <a.darwish@linutronix.de>

net: sched: Remove Qdisc::running sequence counter

The Qdisc::running sequence counter has two uses:

1. Reliably reading qdisc's tc statistics while the qdisc is running
(a seqcount read/retry loop at gnet_stats_add_basic()).

2. As a flag, indicating whether the qdisc in question is running
(without any retry loops).

For the first usage, the Qdisc::running sequence counter write section,
qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
is actually needed: the raw qdisc's bstats update. A u64_stats sync
point was thus introduced (in previous commits) inside the bstats
structure itself. A local u64_stats write section is then started and
stopped for the bstats updates.

Use that u64_stats sync point mechanism for the bstats read/retry loop
at gnet_stats_add_basic().

For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
accessed with atomic bitops, is sufficient. Using a bit flag instead of
a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
to the SMP barriers implicitly added through raw_read_seqcount() and
write_seqcount_begin/end() getting removed. All call sites have been
surveyed though, and no required ordering was identified.

Now that the qdisc->running sequence counter is no longer used, remove
it.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the qdisc tc statistics "packets" vs.
"bytes" values getting out of sync on rare occasions. The individual
values will still be valid.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50dc9a85 16-Oct-2021 Ahmed S. Darwish <a.darwish@linutronix.de>

net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types

The only factor differentiating per-CPU bstats data type (struct
gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
gnet_stats_basic_packed) was a u64_stats sync point inside the former.
The two data types are now equivalent: earlier commits added a u64_stats
sync point to the latter.

Combine both data types into "struct gnet_stats_basic_sync". This
eliminates redundancy and simplifies the bstats read/write APIs.

Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
architectures, u64_stats sync points do not use sequence counter
protection.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67c9e6270 16-Oct-2021 Ahmed S. Darwish <a.darwish@linutronix.de>

net: sched: Protect Qdisc::bstats with u64_stats

The not-per-CPU variant of qdisc tc (traffic control) statistics,
Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running
sequence counter.

This sequence counter is used for reliably protecting bstats reads from
parallel writes. Meanwhile, the seqcount's write section covers a much
wider area than bstats update: qdisc_run_begin() => qdisc_run_end().

That read/write section asymmetry can lead to needless retries of the
read section. To prepare for removing the Qdisc::running sequence
counter altogether, introduce a u64_stats sync point inside bstats
instead.

Modify _bstats_update() to start/end the bstats u64_stats write
section.

For bisectability, and finer commits granularity, the bstats read
section is still protected with a Qdisc::running read/retry loop and
qdisc_run_begin/end() still starts/ends that seqcount write section.
Once all call sites are modified to use _bstats_update(), the
Qdisc::running seqcount will be removed and bstats read/retry loop will
be modified to utilize the internal u64_stats sync point.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the statistics "packets" vs. "bytes"
values getting out of sync on rare occasions. The individual values will
still be valid.

[bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.]

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 695176bf 29-Jul-2021 Cong Wang <cong.wang@bytedance.com>

net_sched: refactor TC action init API

TC action ->init() API has 10 parameters, it becomes harder
to read. Some of them are just boolean and can be replaced
by flags. Similarly for the internal API tcf_action_init()
and tcf_exts_validate().

This patch converts them to flags and fold them into
the upper 16 bits of "flags", whose lower 16 bits are still
reserved for user-space. More specifically, the following
kernel flags are introduced:

TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
distinguish whether it is compatible with policer.

TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
this action is bound to a filter.

TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts,
means we are replacing an existing action.

TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
opposite meaning, because we still hold RTNL in most
cases.

The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
untouched and still stored as before.

I have tested this patch with tdc and I do not see any
failure related to this patch.

Tested-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f79a3bcb 15-Jul-2021 Yajun Deng <yajun.deng@linux.dev>

net/sched: Remove unnecessary if statement

It has been deal with the 'if (err' statement in rtnetlink_send()
and rtnl_unicast(). so remove unnecessary if statement.

v2: use the raw name rtnetlink_send().

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 55d96f72 17-Jun-2021 Yang Yingliang <yangyingliang@huawei.com>

net: sched: fix error return code in tcf_del_walker()

When nla_put_u32() fails, 'ret' could be 0, it should
return error code in tcf_del_walker().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3650bf7 07-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net: sched: fix err handler in tcf_action_init()

With recent changes that separated action module load from action
initialization tcf_action_init() function error handling code was modified
to manually release the loaded modules if loading/initialization of any
further action in same batch failed. For the case when all modules
successfully loaded and some of the actions were initialized before one of
them failed in init handler. In this case for all previous actions the
module will be released twice by the error handler: First time by the loop
that manually calls module_put() for all ops, and second time by the action
destroy code that puts the module after destroying the action.

Reproduction:

$ sudo tc actions add action simple sdata \"2\" index 2
$ sudo tc actions add action simple sdata \"1\" index 1 \
action simple sdata \"2\" index 2
RTNETLINK answers: File exists
We have an error talking to the kernel
$ sudo tc actions ls action simple
total acts 1

action order 0: Simple <"2">
index 2 ref 1 bind 0
$ sudo tc actions flush action simple
$ sudo tc actions ls action simple
$ sudo tc actions add action simple sdata \"2\" index 2
Error: Failed to load TC action module.
We have an error talking to the kernel
$ lsmod | grep simple
act_simple 20480 -1

Fix the issue by modifying module reference counting handling in action
initialization code:

- Get module reference in tcf_idr_create() and put it in tcf_idr_release()
instead of taking over the reference held by the caller.

- Modify users of tcf_action_init_1() to always release the module
reference which they obtain before calling init function instead of
assuming that created action takes over the reference.

- Finally, modify tcf_action_init_1() to not release the module reference
when overwriting existing action as this is no longer necessary since both
upper and lower layers obtain and manage their own module references
independently.

Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87c750e8 07-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net: sched: fix action overwrite reference counting

Action init code increments reference counter when it changes an action.
This is the desired behavior for cls API which needs to obtain action
reference for every classifier that points to action. However, act API just
needs to change the action and releases the reference before returning.
This sequence breaks when the requested action doesn't exist, which causes
act API init code to create new action with specified index, but action is
still released before returning and is deleted (unless it was referenced
concurrently by cls API).

Reproduction:

$ sudo tc actions ls action gact
$ sudo tc actions change action gact drop index 1
$ sudo tc actions ls action gact

Extend tcf_action_init() to accept 'init_res' array and initialize it with
action->ops->init() result. In tcf_action_add() remove pointers to created
actions from actions array before passing it to tcf_action_put_many().

Fixes: cae422f379f3 ("net: sched: use reference counting action init")
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ba86128 07-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

Revert "net: sched: bump refcount for new action in ACT replace mode"

This reverts commit 6855e8213e06efcaf7c02a15e12b1ae64b9a7149.

Following commit in series fixes the issue without introducing regression
in error rollback of tcf_action_destroy().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6855e821 29-Mar-2021 Kumar Kartikeya Dwivedi <memxor@gmail.com>

net: sched: bump refcount for new action in ACT replace mode

Currently, action creation using ACT API in replace mode is buggy.
When invoking for non-existent action index 42,

tc action replace action bpf obj foo.o sec <xyz> index 42

kernel creates the action, fills up the netlink response, and then just
deletes the action after notifying userspace.

tc action show action bpf

doesn't list the action.

This happens due to the following sequence when ovr = 1 (replace mode)
is enabled:

tcf_idr_check_alloc is used to atomically check and either obtain
reference for existing action at index, or reserve the index slot using
a dummy entry (ERR_PTR(-EBUSY)).

This is necessary as pointers to these actions will be held after
dropping the idrinfo lock, so bumping the reference count is necessary
as we need to insert the actions, and notify userspace by dumping their
attributes. Finally, we drop the reference we took using the
tcf_action_put_many call in tcf_action_add. However, for the case where
a new action is created due to free index, its refcount remains one.
This when paired with the put_many call leads to the kernel setting up
the action, notifying userspace of its creation, and then tearing it
down. For existing actions, the refcount is still held so they remain
unaffected.

Fortunately due to rtnl_lock serialization requirement, such an action
with refcount == 1 will not be concurrently deleted by anything else, at
best CLS API can move its refcount up and down by binding to it after it
has been published from tcf_idr_insert_many. Since refcount is atleast
one until put_many call, CLS API cannot delete it. Also __tcf_action_put
release path already ensures deterministic outcome (either new action
will be created or existing action will be reused in case CLS API tries
to bind to action concurrently) due to idr lock serialization.

We fix this by making refcount of newly created actions as 2 in ACT API
replace mode. A relaxed store will suffice as visibility is ensured only
after the tcf_idr_insert_many call.

Note that in case of creation or overwriting using CLS API only (i.e.
bind = 1), overwriting existing action object is not allowed, and any
such request is silently ignored (without error).

The refcount bump that occurs in tcf_idr_check_alloc call there for
existing action will pair with tcf_exts_destroy call made from the
owner module for the same action. In case of action creation, there
is no existing action, so no tcf_exts_destroy callback happens.

This means no code changes for CLS API.

Fixes: cae422f379f3 ("net: sched: use reference counting action init")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 396d7f23 16-Feb-2021 Vlad Buslov <vladbu@nvidia.com>

net: sched: fix police ext initialization

When police action is created by cls API tcf_exts_validate() first
conditional that calls tcf_action_init_1() directly, the action idr is not
updated according to latest changes in action API that require caller to
commit newly created action to idr with tcf_idr_insert_many(). This results
such action not being accessible through act API and causes crash reported
by syzbot:

==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204

CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
__kasan_report mm/kasan/report.c:400 [inline]
kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
check_memory_region_inline mm/kasan/generic.c:179 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
instrument_atomic_read include/linux/instrumented.h:71 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
__tcf_idr_release net/sched/act_api.c:178 [inline]
tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
tc_action_net_exit include/net/act_api.h:151 [inline]
police_exit_net+0x168/0x360 net/sched/act_police.c:390
ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G B 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
panic+0x306/0x73d kernel/panic.c:231
end_report+0x58/0x5e mm/kasan/report.c:100
__kasan_report mm/kasan/report.c:403 [inline]
kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
check_memory_region_inline mm/kasan/generic.c:179 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
instrument_atomic_read include/linux/instrumented.h:71 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
__tcf_idr_release net/sched/act_api.c:178 [inline]
tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
tc_action_net_exit include/net/act_api.h:151 [inline]
police_exit_net+0x168/0x360 net/sched/act_police.c:390
ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Kernel Offset: disabled

Fix the issue by calling tcf_idr_insert_many() after successful action
initialization.

Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d349f997 16-Jan-2021 Cong Wang <cong.wang@bytedance.com>

net_sched: fix RTNL deadlock again caused by request_module()

tcf_action_init_1() loads tc action modules automatically with
request_module() after parsing the tc action names, and it drops RTNL
lock and re-holds it before and after request_module(). This causes a
lot of troubles, as discovered by syzbot, because we can be in the
middle of batch initializations when we create an array of tc actions.

One of the problem is deadlock:

CPU 0 CPU 1
rtnl_lock();
for (...) {
tcf_action_init_1();
-> rtnl_unlock();
-> request_module();
rtnl_lock();
for (...) {
tcf_action_init_1();
-> tcf_idr_check_alloc();
// Insert one action into idr,
// but it is not committed until
// tcf_idr_insert_many(), then drop
// the RTNL lock in the _next_
// iteration
-> rtnl_unlock();
-> rtnl_lock();
-> a_o->init();
-> tcf_idr_check_alloc();
// Now waiting for the same index
// to be committed
-> request_module();
-> rtnl_lock()
// Now waiting for RTNL lock
}
rtnl_unlock();
}
rtnl_unlock();

This is not easy to solve, we can move the request_module() before
this loop and pre-load all the modules we need for this netlink
message and then do the rest initializations. So the loop breaks down
to two now:

for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
struct tc_action_ops *a_o;

a_o = tc_action_load_ops(name, tb[i]...);
ops[i - 1] = a_o;
}

for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
act = tcf_action_init_1(ops[i - 1]...);
}

Although this looks serious, it only has been reported by syzbot, so it
seems hard to trigger this by humans. And given the size of this patch,
I'd suggest to make it to net-next and not to backport to stable.

This patch has been tested by syzbot and tested with tdc.py by me.

Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com
Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c129412f 24-Nov-2020 wenxu <wenxu@ucloud.cn>

net/sched: sch_frag: add generic packet fragment support.

Currently kernel tc subsystem can do conntrack in cat_ct. But when several
fragment packets go through the act_ct, function tcf_ct_handle_fragments
will defrag the packets to a big one. But the last action will redirect
mirred to a device which maybe lead the reassembly big packet over the mtu
of target device.

This patch add support for a xmit hook to mirred, that gets executed before
xmiting the packet. Then, when act_ct gets loaded, it configs that hook.
The frag xmit hook maybe reused by other modules.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f460019b 24-Nov-2020 Vlad Buslov <vlad@buslov.dev>

net: sched: alias action flags with TCA_ACT_ prefix

Currently both filter and action flags use same "TCA_" prefix which makes
them hard to distinguish to code and confusing for users. Create aliases
for existing action flags constants with "TCA_ACT_" prefix.

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Link: https://lore.kernel.org/r/20201124164054.893168-1-vlad@buslov.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 872f6903 15-Nov-2020 Francis Laniel <laniel_francis@privacyrequired.com>

treewide: rename nla_strlcpy to nla_strscpy.

Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9ca71874 15-Nov-2020 Francis Laniel <laniel_francis@privacyrequired.com>

Modify return value of nla_strlcpy to match that of strscpy.

nla_strlcpy now returns -E2BIG if src was truncated when written to dst.
It also returns this error value if dstsize is 0 or higher than INT_MAX.

For example, if src is "foo\0" and dst is 3 bytes long, the result will be:
1. "foG" after memcpy (G means garbage).
2. "fo\0" after memset.
3. -E2BIG is returned because src was not completely written into dst.

The callers of nla_strlcpy were modified to take into account this modification.

Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e5a4b17d 09-Nov-2020 Menglong Dong <dong.menglong@zte.com.cn>

net: sched: fix misspellings using misspell-fixer tool

Some typos are found out by misspell-fixer tool:

$ misspell-fixer -rnv ./net/sched/
./net/sched/act_api.c:686
./net/sched/act_bpf.c:68
./net/sched/cls_rsvp.h:241
./net/sched/em_cmp.c:44
./net/sched/sch_pie.c:408

Fix typos found by misspell-fixer.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/5fa8e9d4.1c69fb81.5d889.5c64@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 94f44f28 02-Nov-2020 Vlad Buslov <vlad@buslov.dev>

net: sched: implement action-specific terse dump

Allow user to request action terse dump with new flag value
TCA_FLAG_TERSE_DUMP. Only output essential action info in terse dump (kind,
stats, index and cookie, if set by the user when creating the action). This
is different from filter terse dump where index is excluded (filter can be
identified by its own handle).

Move tcf_action_dump_terse() function to the beginning of source file in
order to call it from tcf_dump_walker().

Signed-off-by: Vlad Buslov <vlad@buslov.dev>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://lore.kernel.org/r/20201102201243.287486-1-vlad@buslov.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 580e4273 02-Oct-2020 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: check error pointer in tcf_dump_walker()

Although we take RTNL on dump path, it is possible to
skip RTNL on insertion path. So the following race condition
is possible:

rtnl_lock() // no rtnl lock
mutex_lock(&idrinfo->lock);
// insert ERR_PTR(-EBUSY)
mutex_unlock(&idrinfo->lock);
tc_dump_action()
rtnl_unlock()

So we have to skip those temporary -EBUSY entries on dump path
too.

Reported-and-tested-by: syzbot+b47bc4f247856fb4d9e1@syzkaller.appspotmail.com
Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1aad8049 28-Sep-2020 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove a redundant goto chain check

All TC actions call tcf_action_check_ctrlact() to validate
goto chain, so this check in tcf_action_init_1() is actually
redundant. Remove it to save troubles of leaking memory.

Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0fedc63f 22-Sep-2020 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: commit action insertions together

syzbot is able to trigger a failure case inside the loop in
tcf_action_init(), and when this happens we clean up with
tcf_action_destroy(). But, as these actions are already inserted
into the global IDR, other parallel process could free them
before tcf_action_destroy(), then we will trigger a use-after-free.

Fix this by deferring the insertions even later, after the loop,
and committing all the insertions in a separate loop, so we will
never fail in the middle of the insertions any more.

One side effect is that the window between alloction and final
insertion becomes larger, now it is more likely that the loop in
tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
to check for error pointer in tcf_del_walker().

Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e49d8c22 22-Sep-2020 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: defer tcf_idr_insert() in tcf_action_init_1()

All TC actions call tcf_idr_insert() for new action at the end
of their ->init(), so we can actually move it to a central place
in tcf_action_init_1().

And once the action is inserted into the global IDR, other parallel
process could free it immediately as its refcnt is still 1, so we can
not fail after this, we need to move it after the goto action
validation to avoid handling the failure case after insertion.

This is found during code review, is not directly triggered by syzbot.
And this prepares for the next patch.

Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1f1f16c 07-Sep-2020 Tom Rix <trix@redhat.com>

net: sched: skip an unnecessay check

Reviewing the error handling in tcf_action_init_1()
most of the early handling uses

err_out:
if (cookie) {
kfree(cookie->data);
kfree(cookie);
}

before cookie could ever be set.

So skip the unnecessay check.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8bf15395 19-Jun-2020 Gaurav Singh <gaurav1086@gmail.com>

Remove redundant skb null check

Remove the redundant null check for skb.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b61d3e8 19-Jun-2020 Po Liu <Po.Liu@nxp.com>

net: qos offload add flow status with dropped count

This patch adds a drop frames counter to tc flower offloading.
Reporting h/w dropped frames is necessary for some actions.
Some actions like police action and the coming introduced stream gate
action would produce dropped frames which is necessary for user. Status
update shows how many filtered packets increasing and how many dropped
in those packets.

v2: Changes
- Update commit comments suggest by Jiri Pirko.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca44b738 15-May-2020 Vlad Buslov <vladbu@mellanox.com>

net: sched: implement terse dump support in act

Extend tcf_action_dump() with boolean argument 'terse' that is used to
request terse-mode action dump. In terse mode only essential data needed to
identify particular action (action kind, cookie, etc.) and its stats is put
to resulting skb and everything else is omitted. Implement
tcf_exts_terse_dump() helper in cls API that is intended to be used to
request terse dump of all exts (actions) attached to the filter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47a1494b 30-Apr-2020 Johannes Berg <johannes.berg@intel.com>

netlink: remove type-unsafe validation_data pointer

In the netlink policy, we currently have a void *validation_data
that's pointing to different things:
* a u32 value for bitfield32,
* the netlink policy for nested/nested array
* the string for NLA_REJECT

Remove the pointer and place appropriate type-safe items in the
union instead.

While at it, completely dissolve the pointer for the bitfield32
case and just put the value there directly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93a129eb 28-Mar-2020 Jiri Pirko <jiri@mellanox.com>

net: sched: expose HW stats types per action used by drivers

It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8953b077 28-Mar-2020 Jiri Pirko <jiri@mellanox.com>

net: introduce nla_put_bitfield32() helper and use it

Introduce a helper to pass value and selector to. The helper packs them
into struct and puts them into netlink message.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0dfb2d82 19-Mar-2020 Jakub Kicinski <kuba@kernel.org>

net: sched: rename more stats_types

Commit 53eca1f3479f ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 44f86580 06-Mar-2020 Jiri Pirko <jiri@mellanox.com>

sched: act: allow user to specify type of HW stats for a filter

Currently, user who is adding an action expects HW to report stats,
however it does not have exact expectations about the stats types.
That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.

Allow user to specify the type of HW stats for an action and require it.

Pass the information down to flow_offload layer.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1521a67e 25-Feb-2020 Jiri Pirko <jiri@mellanox.com>

sched: act: count in the size of action flags bitfield

The put of the flags was added by the commit referenced in fixes tag,
however the size of the message was not extended accordingly.

Fix this by adding size of the flags bitfield to the message size.

Fixes: e38226786022 ("net: sched: update action implementations to support flags")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0e2b35b 12-Nov-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: actions: remove unused 'order'

after commit 4097e9d250fb ("net: sched: don't use tc_action->order during
action dump"), 'act->order' is initialized but then it's no more read, so
we can just remove this member of struct tc_action.

CC: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b33e699f 04-Nov-2019 Eric Dumazet <edumazet@google.com>

net_sched: add TCA_STATS_PKT64 attribute

Now the kernel uses 64bit packet counters in scheduler layer,
we want to export these counters to user space.

Instead risking breaking user space by adding fields
to struct gnet_stats_basic, add a new TCA_STATS_PKT64.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3822678 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: update action implementations to support flags

Extend struct tc_action with new "tcfa_flags" field. Set the field in
tcf_idr_create() function and provide new helper
tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
value. Update individual hardware-offloaded actions init() to pass their
"flags" argument to new helper in order to skip percpu stats allocation
when user requested it through flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abbb0d33 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: extend TCA_ACT space with TCA_ACT_FLAGS

Extend TCA_ACT space with nla_bitfield32 flags. Add
TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
tcf_action_init_1() and pass resulting value as additional argument to
a_o->init().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e174d5e 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: modify stats helper functions to support regular stats

Modify stats update helper functions introduced in previous patches in this
series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are
not allocated for the action argument. If regular non-percpu allocated
counters are in use, then obtain action tcfa_lock while modifying them.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8ecebd0 30-Oct-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: extract common action counters update code into function

Currently, all implementations of tc_action_ops->stats_update() callback
have almost exactly the same implementation of counters update
code (besides gact which also updates drop counter). In order to simplify
support for using both percpu-allocated and regular action counters
depending on run-time flag in following patches, extract action counters
update code into standalone function in act API.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 445d3749 23-Sep-2019 Paul E. McKenney <paulmck@kernel.org>

net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()

This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().

Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>


# 39f13ea2 14-Oct-2019 Eric Dumazet <edumazet@google.com>

net: avoid potential infinite loop in tc_ctl_action()

tc_ctl_action() has the ability to loop forever if tcf_action_add()
returns -EAGAIN.

This special case has been done in case a module needed to be loaded,
but it turns out that tcf_add_notify() could also return -EAGAIN
if the socket sk_rcvbuf limit is hit.

We need to separate the two cases, and only loop for the module
loading case.

While we are at it, add a limit of 10 attempts since unbounded
loops are always scary.

syzbot repro was something like :

socket(PF_NETLINK, SOCK_RAW|SOCK_NONBLOCK, NETLINK_ROUTE) = 3
write(3, ..., 38) = 38
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0
sendmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{..., 388}], msg_controllen=0, msg_flags=0x10}, ...)

NMI backtrace for cpu 0
CPU: 0 PID: 1054 Comm: khungtaskd Not tainted 5.4.0-rc1+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0x9d0/0xef0 kernel/hung_task.c:289
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8859 Comm: syz-executor910 Not tainted 5.4.0-rc1+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:arch_local_save_flags arch/x86/include/asm/paravirt.h:751 [inline]
RIP: 0010:lockdep_hardirqs_off+0x1df/0x2e0 kernel/locking/lockdep.c:3453
Code: 5c 08 00 00 5b 41 5c 41 5d 5d c3 48 c7 c0 58 1d f3 88 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d3 00 00 00 <48> 83 3d 21 9e 99 07 00 0f 84 b9 00 00 00 9c 58 0f 1f 44 00 00 f6
RSP: 0018:ffff8880a6f3f1b8 EFLAGS: 00000046
RAX: 1ffffffff11e63ab RBX: ffff88808c9c6080 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88808c9c6914
RBP: ffff8880a6f3f1d0 R08: ffff88808c9c6080 R09: fffffbfff16be5d1
R10: fffffbfff16be5d0 R11: 0000000000000003 R12: ffffffff8746591f
R13: ffff88808c9c6080 R14: ffffffff8746591f R15: 0000000000000003
FS: 00000000011e4880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000000a8920000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
trace_hardirqs_off+0x62/0x240 kernel/trace/trace_preemptirq.c:45
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
_raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
__wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
__wake_up+0xe/0x10 kernel/sched/wait.c:142
netlink_unlock_table net/netlink/af_netlink.c:466 [inline]
netlink_unlock_table net/netlink/af_netlink.c:463 [inline]
netlink_broadcast_filtered+0x705/0xb80 net/netlink/af_netlink.c:1514
netlink_broadcast+0x3a/0x50 net/netlink/af_netlink.c:1534
rtnetlink_send+0xdd/0x110 net/core/rtnetlink.c:714
tcf_add_notify net/sched/act_api.c:1343 [inline]
tcf_action_add+0x243/0x370 net/sched/act_api.c:1362
tc_ctl_action+0x3b5/0x4bc net/sched/act_api.c:1410
rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5386
netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5404
netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x803/0x920 net/socket.c:2311
__sys_sendmsg+0x105/0x1d0 net/socket.c:2356
__do_sys_sendmsg net/socket.c:2365 [inline]
__se_sys_sendmsg net/socket.c:2363 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440939

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+cf0adbb9c28c8866c788@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b793fec 07-Oct-2019 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: fix backward compatibility for TCA_ACT_KIND

For TCA_ACT_KIND, we have to keep the backward compatibility too,
and rely on nla_strlcpy() to check and terminate the string with
a NUL.

Note for TC actions, nla_strcmp() is already used to compare kind
strings, so we don't need to fix other places.

Fixes: 199ce850ce11 ("net_sched: add policy validation for action attributes")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# 199ce850 18-Sep-2019 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: add policy validation for action attributes

Similar to commit 8b4c3cdd9dd8
("net: sched: Add policy validation for tc attributes"), we need
to add proper policy validation for TC action attributes too.

Cc: David Ahern <dsahern@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# e33d2b74 28-Jun-2019 Cong Wang <xiyou.wangcong@gmail.com>

idr: fix overflow case for idr_for_each_entry_ul()

idr_for_each_entry_ul() is buggy as it can't handle overflow
case correctly. When we have an ID == UINT_MAX, it becomes an
infinite loop. This happens when running on 32-bit CPU where
unsigned long has the same size with unsigned int.

There is no better way to fix this than casting it to a larger
integer, but we can't just 64 bit integer on 32 bit CPU. Instead
we could just use an additional integer to help us to detect this
overflow case, that is, adding a new parameter to this macro.
Fortunately tc action is its only user right now.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Reported-by: Li Shuang <shuali@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 4097e9d2 23-May-2019 Vlad Buslov <vladbu@mellanox.com>

net: sched: don't use tc_action->order during action dump

Function tcf_action_dump() relies on tc_action->order field when starting
nested nla to send action data to userspace. This approach breaks in
several cases:

- When multiple filters point to same shared action, tc_action->order field
is overwritten each time it is attached to filter. This causes filter
dump to output action with incorrect attribute for all filters that have
the action in different position (different order) from the last set
tc_action->order value.

- When action data is displayed using tc action API (RTM_GETACTION), action
order is overwritten by tca_action_gd() according to its position in
resulting array of nl attributes, which will break filter dump for all
filters attached to that shared action that expect it to have different
order value.

Don't rely on tc_action->order when dumping actions. Set nla according to
action position in resulting array of actions instead.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee3bbfe8 20-Mar-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: let actions use RCU to access 'goto_chain'

use RCU when accessing the action chain, to avoid use after free in the
traffic path when 'goto chain' is replaced on existing TC actions (see
script below). Since the control action is read in the traffic path
without holding the action spinlock, we need to explicitly ensure that
a->goto_chain is not NULL before dereferencing (i.e it's not sufficient
to rely on the value of TC_ACT_GOTO_CHAIN bits). Not doing so caused NULL
dereferences in tcf_action_goto_chain_exec() when the following script:

# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto udp action pass index 4
# tc filter add dev dd0 ingress protocol ip flower \
> ip_proto udp action csum udp goto chain 42 index 66
# tc chain del dev dd0 chain 42 ingress
(start UDP traffic towards dd0)
# tc action replace action csum udp pass index 66

was run repeatedly for several hours.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85d0966f 20-Mar-2019 Davide Caratti <dcaratti@redhat.com>

net/sched: prepare TC actions to properly validate the control action

- pass a pointer to struct tcf_proto in each actions's init() handler,
to allow validating the control action, checking whether the chain
exists and (eventually) refcounting it.
- remove code that validates the control action after a successful call
to the action's init() handler, and replace it with a test that forbids
addition of actions having 'goto_chain' and NULL goto_chain pointer at
the same time.
- add tcf_action_check_ctrlact(), that will validate the control action
and eventually allocate the action 'goto_chain' within the init()
handler.
- add tcf_action_set_ctrlact(), that will assign the control action and
swap the current 'goto_chain' pointer with the new given one.

This disallows 'goto_chain' on actions that don't initialize it properly
in their init() handler, i.e. calling tcf_action_check_ctrlact() after
successful IDR reservation and then calling tcf_action_set_ctrlact()
to assign 'goto_chain' and 'tcf_action' consistently.

By doing this, the kernel does not leak anymore refcounts when a valid
'goto chain' handle is replaced in TC actions, causing kmemleak splats
like the following one:

# tc chain add dev dd0 chain 42 ingress protocol ip flower \
> ip_proto tcp action drop
# tc chain add dev dd0 chain 43 ingress protocol ip flower \
> ip_proto udp action drop
# tc filter add dev dd0 ingress matchall \
> action gact goto chain 42 index 66
# tc filter replace dev dd0 ingress matchall \
> action gact goto chain 43 index 66
# echo scan >/sys/kernel/debug/kmemleak
<...>
unreferenced object 0xffff93c0ee09f000 (size 1024):
comm "tc", pid 2565, jiffies 4295339808 (age 65.426s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 08 00 06 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000009b63f92d>] tc_ctl_chain+0x3d2/0x4c0
[<00000000683a8d72>] rtnetlink_rcv_msg+0x263/0x2d0
[<00000000ddd88f8e>] netlink_rcv_skb+0x4a/0x110
[<000000006126a348>] netlink_unicast+0x1a0/0x250
[<00000000b3340877>] netlink_sendmsg+0x2c1/0x3c0
[<00000000a25a2171>] sock_sendmsg+0x36/0x40
[<00000000f19ee1ec>] ___sys_sendmsg+0x280/0x2f0
[<00000000d0422042>] __sys_sendmsg+0x5e/0xa0
[<000000007a6c61f9>] do_syscall_64+0x5b/0x180
[<00000000ccd07542>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0000000013eaa334>] 0xffffffffffffffff

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eddd2cf1 10-Feb-2019 Eli Cohen <eli@mellanox.com>

net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICE

Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For
example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with
TCA_ID_POLICE and also differentiates these identifier, used in struct
tc_action_ops type field, from other macros starting with TCA_ACT_.

To make things clearer, we name the enum defining the TCA_ID_*
identifiers and also change the "type" field of struct tc_action to
id.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69bd4840 06-Nov-2018 Oz Shlomo <ozsh@mellanox.com>

net/sched: Remove egdev mechanism

The egdev mechanism was replaced by the TC indirect block notifications
platform.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# dac9c979 07-Oct-2018 David Ahern <dsahern@gmail.com>

net: Add extack to nlmsg_parse

Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95278dda 02-Oct-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: convert idrinfo->lock from spinlock to a mutex

In commit ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
with the spinlock held. Unfortunately, this causes a lot of
troubles here:

1. tcf_chain_destroy() could be called right after we queue the work
but before the work runs. This is a use-after-free.

2. The chain refcnt is already 0, we can't even just hold it again.
We can check refcnt==1 but it is ugly.

3. The chain with refcnt 0 is still visible in its block, which means
it could be still found and used!

4. The block has a refcnt too, we can't hold it without introducing a
proper API either.

We can make it working but the end result is ugly. Instead of wasting
time on reviewing it, let's just convert the troubling spinlock to
a mutex, which allows us to use non-atomic allocations too.

Fixes: ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
Reported-by: Ido Schimmel <idosch@idosch.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28169aba 21-Sep-2018 Eelco Chaudron <echaudro@redhat.com>

net/sched: Add hardware specific counters to TC actions

Add additional counters that will store the bytes/packets processed by
hardware. These will be exported through the netlink interface for
displaying by the iproute2 tc tool

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec3ed293 19-Sep-2018 Vlad Buslov <vladbu@mellanox.com>

net_sched: change tcf_del_walker() to take idrinfo->lock

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

And the blocking function fl_hw_destroy_tmplt() could be called when we
put a filter chain, so defer it to a work queue.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
[xiyou.wangcong@gmail.com: heavily modify the code and changelog]
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c10bbfae 03-Sep-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: null actions array pointer before releasing action

Currently, tcf_action_delete() nulls actions array pointer after putting
and deleting it. However, if tcf_idr_delete_index() returns an error,
pointer to action is not set to null. That results it being released second
time in error handling code of tca_action_gd().

Kasan error:

[ 807.367755] ==================================================================
[ 807.375844] BUG: KASAN: use-after-free in tc_setup_cb_call+0x14e/0x250
[ 807.382763] Read of size 8 at addr ffff88033e636000 by task tc/2732

[ 807.391289] CPU: 0 PID: 2732 Comm: tc Tainted: G W 4.19.0-rc1+ #799
[ 807.399542] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 807.407948] Call Trace:
[ 807.410763] dump_stack+0x92/0xeb
[ 807.414456] print_address_description+0x70/0x360
[ 807.419549] kasan_report+0x14d/0x300
[ 807.423582] ? tc_setup_cb_call+0x14e/0x250
[ 807.428150] tc_setup_cb_call+0x14e/0x250
[ 807.432539] ? nla_put+0x65/0xe0
[ 807.436146] fl_dump+0x394/0x3f0 [cls_flower]
[ 807.440890] ? fl_tmplt_dump+0x140/0x140 [cls_flower]
[ 807.446327] ? lock_downgrade+0x320/0x320
[ 807.450702] ? lock_acquire+0xe2/0x220
[ 807.454819] ? is_bpf_text_address+0x5/0x140
[ 807.459475] ? memcpy+0x34/0x50
[ 807.462980] ? nla_put+0x65/0xe0
[ 807.466582] tcf_fill_node+0x341/0x430
[ 807.470717] ? tcf_block_put+0xe0/0xe0
[ 807.474859] tcf_node_dump+0xdb/0xf0
[ 807.478821] fl_walk+0x8e/0x170 [cls_flower]
[ 807.483474] tcf_chain_dump+0x35a/0x4d0
[ 807.487703] ? tfilter_notify+0x170/0x170
[ 807.492091] ? tcf_fill_node+0x430/0x430
[ 807.496411] tc_dump_tfilter+0x362/0x3f0
[ 807.500712] ? tc_del_tfilter+0x850/0x850
[ 807.505104] ? kasan_unpoison_shadow+0x30/0x40
[ 807.509940] ? __mutex_unlock_slowpath+0xcf/0x410
[ 807.515031] netlink_dump+0x263/0x4f0
[ 807.519077] __netlink_dump_start+0x2a0/0x300
[ 807.523817] ? tc_del_tfilter+0x850/0x850
[ 807.528198] rtnetlink_rcv_msg+0x46a/0x6d0
[ 807.532671] ? rtnl_fdb_del+0x3f0/0x3f0
[ 807.536878] ? tc_del_tfilter+0x850/0x850
[ 807.541280] netlink_rcv_skb+0x18d/0x200
[ 807.545570] ? rtnl_fdb_del+0x3f0/0x3f0
[ 807.549773] ? netlink_ack+0x500/0x500
[ 807.553913] netlink_unicast+0x2d0/0x370
[ 807.558212] ? netlink_attachskb+0x340/0x340
[ 807.562855] ? _copy_from_iter_full+0xe9/0x3e0
[ 807.567677] ? import_iovec+0x11e/0x1c0
[ 807.571890] netlink_sendmsg+0x3b9/0x6a0
[ 807.576192] ? netlink_unicast+0x370/0x370
[ 807.580684] ? netlink_unicast+0x370/0x370
[ 807.585154] sock_sendmsg+0x6b/0x80
[ 807.589015] ___sys_sendmsg+0x4a1/0x520
[ 807.593230] ? copy_msghdr_from_user+0x210/0x210
[ 807.598232] ? do_wp_page+0x174/0x880
[ 807.602276] ? __handle_mm_fault+0x749/0x1c10
[ 807.607021] ? __handle_mm_fault+0x1046/0x1c10
[ 807.611849] ? __pmd_alloc+0x320/0x320
[ 807.615973] ? check_chain_key+0x140/0x1f0
[ 807.620450] ? check_chain_key+0x140/0x1f0
[ 807.624929] ? __fget_light+0xbc/0xd0
[ 807.628970] ? __sys_sendmsg+0xd7/0x150
[ 807.633172] __sys_sendmsg+0xd7/0x150
[ 807.637201] ? __ia32_sys_shutdown+0x30/0x30
[ 807.641846] ? up_read+0x53/0x90
[ 807.645442] ? __do_page_fault+0x484/0x780
[ 807.649949] ? do_syscall_64+0x1e/0x2c0
[ 807.654164] do_syscall_64+0x72/0x2c0
[ 807.658198] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 807.663625] RIP: 0033:0x7f42e9870150
[ 807.667568] Code: 8b 15 3c 7d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d b9 d5 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be cd 00 00 48 89 04 24
[ 807.687328] RSP: 002b:00007ffdbf595b58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 807.695564] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f42e9870150
[ 807.703083] RDX: 0000000000000000 RSI: 00007ffdbf595b80 RDI: 0000000000000003
[ 807.710605] RBP: 00007ffdbf599d90 R08: 0000000000679bc0 R09: 000000000000000f
[ 807.718127] R10: 00000000000005e7 R11: 0000000000000246 R12: 00007ffdbf599d88
[ 807.725651] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

[ 807.735048] Allocated by task 2687:
[ 807.738902] kasan_kmalloc+0xa0/0xd0
[ 807.742852] __kmalloc+0x118/0x2d0
[ 807.746615] tcf_idr_create+0x44/0x320
[ 807.750738] tcf_nat_init+0x41e/0x530 [act_nat]
[ 807.755638] tcf_action_init_1+0x4e0/0x650
[ 807.760104] tcf_action_init+0x1ce/0x2d0
[ 807.764395] tcf_exts_validate+0x1d8/0x200
[ 807.768861] fl_change+0x55a/0x26b4 [cls_flower]
[ 807.773845] tc_new_tfilter+0x748/0xa20
[ 807.778051] rtnetlink_rcv_msg+0x56a/0x6d0
[ 807.782517] netlink_rcv_skb+0x18d/0x200
[ 807.786804] netlink_unicast+0x2d0/0x370
[ 807.791095] netlink_sendmsg+0x3b9/0x6a0
[ 807.795387] sock_sendmsg+0x6b/0x80
[ 807.799240] ___sys_sendmsg+0x4a1/0x520
[ 807.803445] __sys_sendmsg+0xd7/0x150
[ 807.807473] do_syscall_64+0x72/0x2c0
[ 807.811506] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 807.818776] Freed by task 2728:
[ 807.822283] __kasan_slab_free+0x122/0x180
[ 807.826752] kfree+0xf4/0x2f0
[ 807.830080] __tcf_action_put+0x5a/0xb0
[ 807.834281] tcf_action_put_many+0x46/0x70
[ 807.838747] tca_action_gd+0x232/0xc40
[ 807.842862] tc_ctl_action+0x215/0x230
[ 807.846977] rtnetlink_rcv_msg+0x56a/0x6d0
[ 807.851444] netlink_rcv_skb+0x18d/0x200
[ 807.855731] netlink_unicast+0x2d0/0x370
[ 807.860021] netlink_sendmsg+0x3b9/0x6a0
[ 807.864312] sock_sendmsg+0x6b/0x80
[ 807.868166] ___sys_sendmsg+0x4a1/0x520
[ 807.872372] __sys_sendmsg+0xd7/0x150
[ 807.876401] do_syscall_64+0x72/0x2c0
[ 807.880431] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 807.887704] The buggy address belongs to the object at ffff88033e636000
which belongs to the cache kmalloc-256 of size 256
[ 807.900909] The buggy address is located 0 bytes inside of
256-byte region [ffff88033e636000, ffff88033e636100)
[ 807.913155] The buggy address belongs to the page:
[ 807.918322] page:ffffea000cf98d80 count:1 mapcount:0 mapping:ffff88036f80ee00 index:0x0 compound_mapcount: 0
[ 807.928831] flags: 0x5fff8000008100(slab|head)
[ 807.933647] raw: 005fff8000008100 ffffea000db44f00 0000000400000004 ffff88036f80ee00
[ 807.942050] raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
[ 807.950456] page dumped because: kasan: bad access detected

[ 807.958240] Memory state around the buggy address:
[ 807.963405] ffff88033e635f00: fc fc fc fc fb fb fb fb fb fb fb fc fc fc fc fb
[ 807.971288] ffff88033e635f80: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
[ 807.979166] >ffff88033e636000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 807.994882] ^
[ 807.998477] ffff88033e636080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 808.006352] ffff88033e636100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 808.014230] ==================================================================
[ 808.022108] Disabling lock debugging due to kernel taint

Fixes: edfaf94fa705 ("net_sched: improve and refactor tcf_action_put_many()")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f061b48c 29-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

Revert "net: sched: act: add extack for lookup callback"

This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97763dc0 29-Aug-2018 Paolo Abeni <pabeni@redhat.com>

net_sched: reject unknown tcfa_action values

After the commit 802bfb19152c ("net/sched: user-space can't set
unknown tcfa_action values"), unknown tcfa_action values are
converted to TC_ACT_UNSPEC, but the common agreement is instead
rejecting such configurations.

This change also introduces a helper to simplify the destruction
of a single action, avoiding code duplication.

v1 -> v2:
- helper is now static and renamed according to act_* convention
- updated extack message, according to the new behavior

Fixes: 802bfb19152c ("net/sched: user-space can't set unknown tcfa_action values")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 244cd96a 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove list_head from tc_action

After commit 90b73b77d08e, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.

Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d485c45 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unused tcf_idr_check()

tcf_idr_check() is replaced by tcf_idr_check_alloc(),
and __tcf_idr_check() now can be folded into tcf_idr_search().

Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b144e7ec 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unused parameter for tcf_action_delete()

Fixes: 16af6067392c ("net: sched: implement reference counted action release")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97a3f84f 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unnecessary ops->delete()

All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.

More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().

So it can be just removed.

Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# edfaf94f 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: improve and refactor tcf_action_put_many()

tcf_action_put_many() is mostly called to clean up actions on
failure path, but tcf_action_put_many(&actions[acts_deleted]) is
used in the ugliest way: it passes a slice of the array and
uses an additional NULL at the end to avoid out-of-bound
access.

acts_deleted is completely unnecessary since we can teach
tcf_action_put_many() scan the whole array and checks against
NULL pointer. Which also means tcf_action_delete() should
set deleted action pointers to NULL to avoid double free.

Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 802bfb19 30-Jul-2018 Paolo Abeni <pabeni@redhat.com>

net/sched: user-space can't set unknown tcfa_action values

Currently, when initializing an action, the user-space can specify
and use arbitrary values for the tcfa_action field. If the value
is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.

This change explicitly checks for unknown values at action creation
time, and explicitly convert them to TC_ACT_UNSPEC. No functional
changes are introduced, but this will allow introducing tcfa_action
values not exposed to user-space in a later patch.

Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
as the latter is already part of uAPI.

v3 -> v4:
- use an helper to check for action validity (JiriP)
- emit an extack for invalid actions (JiriP)
v4 -> v5:
- keep messages on a single line, drop net_warn (Marcelo)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f3ed383 27-Jul-2018 Jiri Pirko <jiri@mellanox.com>

net: sched: don't dump chains only held by actions

In case a chain is empty and not explicitly created by a user,
such chain should not exist. The only exception is if there is
an action "goto chain" pointing to it. In that case, don't show the
chain in the dump. Track the chain references held by actions and
use them to find out if a chain should or should not be shown
in chain dump.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0479b67 09-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: fix unprotected access to rcu cookie pointer

Fix action attribute size calculation function to take rcu read lock and
access act_cookie pointer with rcu dereference.

Fixes: eec94fdb0480 ("net: sched: use rcu for action cookie update")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0dbc81ea 08-Jul-2018 David S. Miller <davem@davemloft.net>

net: sched: Fix warnings from xchg() on RCU'd cookie pointer.

The kbuild test robot reports:

>> net/sched/act_api.c:71:15: sparse: incorrect type in initializer (different address spaces) @@ expected struct tc_cookie [noderef] <asn:4>*__ret @@ got [noderef] <asn:4>*__ret @@
net/sched/act_api.c:71:15: expected struct tc_cookie [noderef] <asn:4>*__ret
net/sched/act_api.c:71:15: got struct tc_cookie *new_cookie
>> net/sched/act_api.c:71:13: sparse: incorrect type in assignment (different address spaces) @@ expected struct tc_cookie *old @@ got struct tc_cookie [noderef] <struct tc_cookie *old @@
net/sched/act_api.c:71:13: expected struct tc_cookie *old
net/sched/act_api.c:71:13: got struct tc_cookie [noderef] <asn:4>*[assigned] __ret
>> net/sched/act_api.c:132:48: sparse: dereference of noderef expression

Handle this in the usual way by force casting away the __rcu annotation
when we are using xchg() on it.

Fixes: eec94fdb0480 ("net: sched: use rcu for action cookie update")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90b73b77 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: change action API to use array of pointers to actions

Act API used linked list to pass set of actions to functions. It is
intrusive data structure that stores list nodes inside action structure
itself, which means it is not safe to modify such list concurrently.
However, action API doesn't use any linked list specific operations on this
set of actions, so it can be safely refactored into plain pointer array.

Refactor action API to use array of pointers to tc_actions instead of
linked list. Change argument 'actions' type of exported action init,
destroy and dump functions.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0190c1d4 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: atomically check-allocate action

Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cae422f3 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: use reference counting action init

Change action API to assume that action init function always takes
reference to action, even when overwriting existing action. This is
necessary because action API continues to use action pointer after init
function is done. At this point action becomes accessible for concurrent
modifications, so user must always hold reference to it.

Implement helper put list function to atomically release list of actions
after action API init code is done using them.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e8ddd7f 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: don't release reference on action overwrite

Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 16af6067 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: implement reference counted action release

Implement helper delete function that uses new action ops 'delete', instead
of destroying action directly. This is required so act API could delete
actions by index, without holding any references to action that is being
deleted.

Implement function __tcf_action_put() that releases reference to action and
frees it, if necessary. Refactor action deletion code to use new put
function and not to rely on rtnl lock. Remove rtnl lock assertions that are
no longer needed.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a2ea349 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: implement action API that deletes action by index

Implement new action API function that atomically finds and deletes action
from idr by index. Intended to be used by lockless actions that do not rely
on rtnl lock.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f7c72bc 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: always take reference to action

Without rtnl lock protection it is no longer safe to use pointer to tc
action without holding reference to it. (it can be destroyed concurrently)

Remove unsafe action idr lookup function. Instead of it, implement safe tcf
idr check function that atomically looks up action in idr and increments
its reference and bind counters. Implement both action search and check
using new safe function

Reference taken by idr check is temporal and should not be accounted by
userspace clients (both logically and to preserver current API behavior).
Subtract temporal reference when dumping action to userspace using existing
tca_get_fill function arguments.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 789871bb 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: implement unlocked action init API

Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 036bb443 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: change type of reference and bind counters

Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eec94fdb 05-Jul-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: use rcu for action cookie update

Implement functions to atomically update and free action cookie
using rcu mechanism.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 290aa0ad 21-May-2018 Vlad Buslov <vladbu@mellanox.com>

net: sched: don't disable bh when accessing action idr

Initial net_device implementation used ingress_lock spinlock to synchronize
ingress path of device. This lock was used in both process and bh context.
In some code paths action map lock was obtained while holding ingress_lock.
Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
modified actions to always disable bh, while using action map lock, in
order to prevent deadlock on ingress_lock in softirq. This lock was removed
from net_device, so disabling bh, while accessing action map, is no longer
necessary.

Replace all action idr spinlock usage with regular calls that do not
disable bh.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 734549eb 26-Mar-2018 Craig Dillabaugh <cdillaba@mojatatu.com>

net sched actions: fix dumping which requires several messages to user space

Fixes a bug in the tcf_dump_walker function that can cause some actions
to not be reported when dumping a large number of actions. This issue
became more aggrevated when cookies feature was added. In particular
this issue is manifest when large cookie values are assigned to the
actions and when enough actions are created that the resulting table
must be dumped in multiple batches.

The number of actions returned in each batch is limited by the total
number of actions and the memory buffer size. With small cookies
the numeric limit is reached before the buffer size limit, which avoids
the code path triggering this bug. When large cookies are used buffer
fills before the numeric limit, and the erroneous code path is hit.

For example after creating 32 csum actions with the cookie
aaaabbbbccccdddd

$ tc actions ls action csum
total acts 26

action order 0: csum (tcp) action continue
index 1 ref 1 bind 0
cookie aaaabbbbccccdddd

.....

action order 25: csum (tcp) action continue
index 26 ref 1 bind 0
cookie aaaabbbbccccdddd
total acts 6

action order 0: csum (tcp) action continue
index 28 ref 1 bind 0
cookie aaaabbbbccccdddd

......

action order 5: csum (tcp) action continue
index 32 ref 1 bind 0
cookie aaaabbbbccccdddd

Note that the action with index 27 is omitted from the report.

Fixes: 4b3550ef530c ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")"
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# affaa0c7 23-Mar-2018 Davide Caratti <dcaratti@redhat.com>

net/sched: remove tcf_idr_cleanup()

tcf_idr_cleanup() is no more used, so remove it.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e76e75d 08-Mar-2018 Roman Mashak <mrv@mojatatu.com>

net sched actions: calculate add/delete event message size

Introduce routines to calculate size of the shared tc netlink attributes
and the full message size including netlink header and tc service header.

Update add/delete action logic to have the size for event messages,
the size is passed to tcf_add_notify() and tcf_del_notify() where the
notification message is being allocated and constructed.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d04e6990 08-Mar-2018 Roman Mashak <mrv@mojatatu.com>

net sched actions: update Add/Delete action API with new argument

Introduce a new function argument to carry total attributes size for
correct allocation of skb in event messages.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d143b9e3 02-Mar-2018 Roman Mashak <mrv@mojatatu.com>

net sched actions: corrected extack message

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3620145 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: handle extack in tcf_generic_walker

This patch adds extack handling for a common used TC act function
"tcf_generic_walker()" to add an extack message on failures.
The tcf_generic_walker() function can fail if get a invalid command
different than DEL and GET. The naming "action" here is wrong, the
correct naming would be command.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41780105 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack for walk callback

This patch adds extack support for act walker callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 331a9295 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack for lookup callback

This patch adds extack support for act lookup callback api. This
prepares to handle extack support inside each specific act
implementation.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 589dad6d 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack to init callback

This patch adds extack support for act init callback api. This
prepares to handle extack support inside each specific act
implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84ae017a 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: handle generic action errors

This patch adds extack support for generic act handling. The extack
will be set deeper to each called function which is not part of netdev
core api.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aea0d727 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack to init

This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1af851558 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: fix code style

This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee99b2d8 16-Feb-2018 David S. Miller <davem@davemloft.net>

net: Revert sched action extack support series.

It was mis-applied and the changes had rejects.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 10defbd2 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: add extack to init

This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.

Based on work by David Ahern <dsahern@gmail.com>

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7b347fa 15-Feb-2018 Alexander Aring <aring@mojatatu.com>

net: sched: act: fix code style

This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66dede2d 15-Feb-2018 Davide Caratti <dcaratti@redhat.com>

net: sched: fix unbalance in the error path of tca_action_flush()

When tca_action_flush() calls the action walk() and gets an error,
a successful call to nla_nest_start() is not followed by a call to
nla_nest_cancel(). It's harmless, as the skb is freed in the error
path - but it's worth to fix this unbalance.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13da199c 12-Feb-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert subsys_initcall() registered pernet_operations from net/sched

psched_net_ops only creates and destroyes /proc entry,
and safe to be executed in parallel with any foreigh
pernet_operations.

tcf_action_net_ops initializes and destructs tcf_action_net::egdev_ht,
which is not touched by foreign pernet_operations.

So, make them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a457577 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Rename idr_for_each_entry_ext

Most places in the kernel that we need to distinguish functions by the
type of their arguments, we use '_ul' as a suffix for the unsigned long
variant, not '_ext'. Also add kernel-doc.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 339913a8 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

net sched actions: Convert to use idr_alloc_u32

Use the new helper. Also untangle the error path, and in so doing
noticed that estimator generator failure would lead to us leaking an
ID. Fix that bug.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 322d884b 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Delete idr_find_ext function

Simply changing idr_remove's 'id' argument to 'unsigned long' works
for all callers.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 234a4624 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Delete idr_replace_ext function

Changing idr_replace's 'id' argument to 'unsigned long' works for all
callers. Callers which passed a negative ID now get -ENOENT instead of
-EINVAL. No callers relied on this error value.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 9c160941 28-Nov-2017 Matthew Wilcox <willy@infradead.org>

idr: Delete idr_remove_ext function

Simply changing idr_remove's 'id' argument to 'unsigned long' suffices
for all callers.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>


# 9a63b255 05-Dec-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove unused parameter from act cleanup ops

No one actually uses it.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7e460ce 06-Nov-2017 Cong Wang <xiyou.wangcong@gmail.com>

Revert "net_sched: hold netns refcnt for each action"

This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ceffcc5e 01-Nov-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: hold netns refcnt for each action

TC actions have been destroyed asynchronously for a long time,
previously in a RCU callback and now in a workqueue. If we
don't hold a refcnt for its netns, we could use the per netns
data structure, struct tcf_idrinfo, after it has been freed by
netns workqueue.

Hold refcnt to ensure netns destroy happens after all actions
are gone.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a159d3c4 01-Nov-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: acquire RTNL in tc_action_net_exit()

I forgot to acquire RTNL in tc_action_net_exit()
which leads that action ops->cleanup() is not always
called with RTNL. This usually is not a big deal because
this function is called after all netns refcnt are gone,
but given RTNL protects more than just actions, add it
for safety and consistency.

Also add an assertion to catch other potential bugs.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Tested-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0843c092 18-Oct-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/sched: Set the net-device for egress device instance

Currently the netdevice field is not set and the egdev instance
is not functional, fix that.

Fixes: 3f55bdda8df ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3f55bdd 11-Oct-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: introduce per-egress action device callbacks

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule and specified device
acts as tc action egress device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 255cd50f 13-Sep-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: fix use-after-free in tcf_action_destroy and tcf_del_walker

Recent commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu") removed
freeing in call_rcu, which changed already existing hard-to-hit
race condition into 100% hit:

[ 598.599825] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 598.607782] IP: tcf_action_destroy+0xc0/0x140

Or:

[ 40.858924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 40.862840] IP: tcf_generic_walker+0x534/0x820

Fix this by storing the ops and use them directly for module_put call.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7fb60b9 11-Sep-2017 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: get rid of tcfa_rcu

gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller is no longer needed to wait for a grace period.
So this patch gets rid of it.

This also completely closes a race condition between action free
path and filter chain add/remove path for the following patch.
Because otherwise the nested RCU callback can't be caught by
rcu_barrier().

Please see also the comments in code.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c8468dc 05-Sep-2017 Jakub Kicinski <kuba@kernel.org>

net: sched: don't use GFP_KERNEL under spin lock

The new TC IDR code uses GFP_KERNEL under spin lock. Which leads
to:

[ 582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416
[ 582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc
[ 582.636939] 2 locks held by tc/3379:
[ 582.641049] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400
[ 582.650958] #1: (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
[ 582.662217] Preemption disabled at:
[ 582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
[ 582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G W 4.13.0-rc7-debug-00648-g43503a79b9f0 #287
[ 582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
[ 582.691937] Call Trace:
...
[ 582.742460] kmem_cache_alloc+0x286/0x540
[ 582.747055] radix_tree_node_alloc.constprop.6+0x4a/0x450
[ 582.753209] idr_get_free_cmn+0x627/0xf80
...
[ 582.815525] idr_alloc_cmn+0x1a8/0x270
...
[ 582.833804] tcf_idr_create+0x31b/0x8e0
...

Try to preallocate the memory with idr_prealloc(GFP_KERNEL)
(as suggested by Eric Dumazet), and change the allocation
flags under spin lock.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65a206c0 30-Aug-2017 Chris Mi <chrism@mellanox.com>

net/sched: Change act_api and act_xxx modules to use IDR

Typically, each TC filter has its own action. All the actions of the
same type are saved in its hash table. But the hash buckets are too
small that it degrades to a list. And the performance is greatly
affected. For example, it takes about 0m11.914s to insert 64K rules.
If we convert the hash table to IDR, it only takes about 0m1.500s.
The improvement is huge.

But please note that the test result is based on previous patch that
cls_flower uses IDR.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b97bac64 09-Aug-2017 Florian Westphal <fw@strlen.de>

rtnetlink: make rtnl_register accept a flags parameter

This change allows us to later indicate to rtnetlink core that certain
doit functions should be called without acquiring rtnl_mutex.

This change should have no effect, we simply replace the last (now
unused) calcit argument with the new flag.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec1a9cca 04-Aug-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: remove check for number of actions in tcf_exts_exec

Leave it to tcf_action_exec to return TC_ACT_OK in case there is no
action present.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e62e484d 30-Jul-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: add time filter for action dumping

This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.

With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now". The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.

Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1)
match 7f000002/ffffffff at 12 (success 1 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

action order 0: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

And the filter?

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2)
match 7f000002/ffffffff at 12 (success 2 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90825b23 30-Jul-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df823b02 30-Jul-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: Use proper root attribute table for actions

Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.

Fixes: 7ba699c604ab ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4c4290c 13-Jul-2017 Roman Mashak <mrv@mojatatu.com>

net sched actions: rename act_get_notify() to tcf_get_notify()

Make name consistent with other TC event notification routines, such as
tcf_add_notify() and tcf_del_notify()

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 367a8ce8 23-May-2017 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: only create filter chains for new filters/actions

tcf_chain_get() always creates a new filter chain if not found
in existing ones. This is totally unnecessary when we get or
delete filters, new chain should be only created for new filters
(or new actions).

Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db50514f 17-May-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: add termination action to allow goto chain

Introduce new type of termination action called "goto_chain". This allows
user to specify a chain to be processed. This action type is
then processed as a return value in tcf_classify loop in similar
way as "reclassify" is, only it does not reset to the first filter
in chain but rather reset to the first filter of the desired chain.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9fb9f251 17-May-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: push tp down to action init

Tp pointer will be needed by the next patch in order to get the chain.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9da3242e 02-May-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: add helpers to handle extended actions

Jump is now the only one using value action opcode. This is going to
change soon. So introduce helpers to work with this. Convert TC_ACT_JUMP.

This also fixes the TC_ACT_JUMP check, which is incorrectly done as a
bit check, not a value check.

Fixes: e0ee84ded796 ("net sched actions: Complete the JUMPX opcode")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0ee84de 23-Apr-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: Complete the JUMPX opcode

per discussion at netconf/netdev:
When we have an action that is capable of branching (example a policer),
we can achieve a continuation of the action graph by programming a
"continue" where we find an exact replica of the same filter rule with a lower
priority and the remainder of the action graph. When you have 100s of thousands
of filters which require such a feature it gets very inefficient to do two
lookups.

This patch completes a leftover feature of action codes. Its time has come.

Example below where a user labels packets with a different skbmark on ingress
of a port depending on whether they have/not exceeded the configured rate.
This mark is then used to make further decisions on some egress port.

#rate control, very low so we can easily see the effect
sudo $TC actions add action police rate 1kbit burst 90k \
conform-exceed pipe/jump 2 index 10
# skbedit index 11 will be used if the user conforms
sudo $TC actions add action skbedit mark 11 ok index 11
# skbedit index 12 will be used if the user does not conform
sudo $TC actions add action skbedit mark 12 ok index 12

#lets bind the user ..
sudo $TC filter add dev $ETH parent ffff: protocol ip prio 8 u32 \
match ip dst 127.0.0.8/32 flowid 1:10 \
action police index 10 \
action skbedit index 11 \
action skbedit index 12

#run a ping -f and see what happens..
#
jhs@foobar:~$ sudo $TC -s filter ls dev $ETH parent ffff: protocol ip
filter pref 8 u32
filter pref 8 u32 fh 800: ht divisor 1
filter pref 8 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2800 success 1005)
match 7f000008/ffffffff at 16 (success 1005 )
action order 1: police 0xa rate 1Kbit burst 23440b mtu 2Kb action pipe/jump 2 overhead 0b
ref 2 bind 1 installed 207 sec used 122 sec
Action statistics:
Sent 84420 bytes 1005 pkt (dropped 0, overlimits 721 requeues 0)
backlog 0b 0p requeues 0

action order 2: skbedit mark 11 pass
index 11 ref 2 bind 1 installed 204 sec used 122 sec
Action statistics:
Sent 60564 bytes 721 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 3: skbedit mark 12 pass
index 12 ref 2 bind 1 installed 201 sec used 122 sec
Action statistics:
Sent 23856 bytes 284 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Not bad, about 28% non-conforming packets..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0535ce5 20-Apr-2017 Wolfgang Bumiller <w.bumiller@proxmox.com>

net sched actions: allocate act cookie early

Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.

CVE-2017-7979
Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c21ef3e3 16-Apr-2017 David Ahern <dsa@cumulusnetworks.com>

net: rtnetlink: plumb extended ack to doit function

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37f1c63e 24-Feb-2017 Roman Mashak <mrv@mojatatu.com>

net sched actions: do not overwrite status of action creation.

nla_memdup_cookie was overwriting err value, declared at function
scope and earlier initialized with result of ->init(). At success
nla_memdup_cookie() returns 0, and thus module refcnt decremented,
although the action was installed.

$ sudo tc actions add action pass index 1 cookie 1234
$ sudo tc actions ls action gact

action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0
$
$ lsmod
Module Size Used by
act_gact 16384 0
...
$
$ sudo rmmod act_gact
[ 52.310283] ------------[ cut here ]------------
[ 52.312551] WARNING: CPU: 1 PID: 455 at kernel/module.c:1113
module_put+0x99/0xa0
[ 52.316278] Modules linked in: act_gact(-) crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel psmouse pcbc evbug aesni_intel aes_x86_64 crypto_simd
serio_raw glue_helper pcspkr cryptd
[ 52.322285] CPU: 1 PID: 455 Comm: rmmod Not tainted 4.10.0+ #11
[ 52.324261] Call Trace:
[ 52.325132] dump_stack+0x63/0x87
[ 52.326236] __warn+0xd1/0xf0
[ 52.326260] warn_slowpath_null+0x1d/0x20
[ 52.326260] module_put+0x99/0xa0
[ 52.326260] tcf_hashinfo_destroy+0x7f/0x90
[ 52.326260] gact_exit_net+0x27/0x40 [act_gact]
[ 52.326260] ops_exit_list.isra.6+0x38/0x60
[ 52.326260] unregister_pernet_operations+0x90/0xe0
[ 52.326260] unregister_pernet_subsys+0x21/0x30
[ 52.326260] tcf_unregister_action+0x68/0xa0
[ 52.326260] gact_cleanup_module+0x17/0xa0f [act_gact]
[ 52.326260] SyS_delete_module+0x1ba/0x220
[ 52.326260] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 52.326260] RIP: 0033:0x7f527ffae367
[ 52.326260] RSP: 002b:00007ffeb402a598 EFLAGS: 00000202 ORIG_RAX:
00000000000000b0
[ 52.326260] RAX: ffffffffffffffda RBX: 0000559b069912a0 RCX: 00007f527ffae367
[ 52.326260] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559b06991308
[ 52.326260] RBP: 0000000000000003 R08: 00007f5280264420 R09: 00007ffeb4029511
[ 52.326260] R10: 000000000000087b R11: 0000000000000202 R12: 00007ffeb4029580
[ 52.326260] R13: 0000000000000000 R14: 0000000000000000 R15: 0000559b069912a0
[ 52.354856] ---[ end trace 90d89401542b0db6 ]---
$

With the fix:

$ sudo modprobe act_gact
$ lsmod
Module Size Used by
act_gact 16384 0
...
$ sudo tc actions add action pass index 1 cookie 1234
$ sudo tc actions ls action gact

action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0
$
$ lsmod
Module Size Used by
act_gact 16384 1
...
$ sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
$
$ sudo /home/mrv/bin/tc actions del action gact index 1
$ sudo rmmod act_gact
$ lsmod
Module Size Used by
$

Fixes: 1045ba77a ("net sched actions: Add support for user cookies")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# edb9d1bf 24-Feb-2017 Roman Mashak <mrv@mojatatu.com>

net sched actions: decrement module reference count after table flush.

When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.

Following is example with GACT action:

% sudo modprobe act_gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 1
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....

After the fix:
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module Size Used by
act_gact 16384 3
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
% sudo rmmod act_gact
% lsmod
Module Size Used by
%

Fixes: f97017cdefef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6f2e3f7d 14-Feb-2017 Wei Yongjun <weiyongjun1@huawei.com>

net_sched: nla_memdup_cookie() can be static

Fixes the following sparse warning:

net/sched/act_api.c:532:5: warning:
symbol 'nla_memdup_cookie' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1045ba77 24-Jan-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: Add support for user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: David S. Miller <davem@davemloft.net>


# 0faa9cb5 15-Jan-2017 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: fix refcnt when GETing of action after bind

Demonstrating the issue:

.. add a drop action
$sudo $TC actions add action drop index 10

.. retrieve it
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 0 installed 29 sec used 29 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

... bug 1 above: reference is two.
Reference is actually 1 but we forget to subtract 1.

... do a GET again and we see the same issue
try a few times and nothing changes
~$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 0 installed 31 sec used 31 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

... lets try to bind the action to a filter..
$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

... and now a few GETs:
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 3 bind 1 installed 204 sec used 204 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 4 bind 1 installed 206 sec used 206 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 5 bind 1 installed 235 sec used 235 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

.... as can be observed the reference count keeps going up.

After the fix

$ sudo $TC actions add action drop index 10
$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 1 bind 0 installed 4 sec used 4 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 1 bind 0 installed 6 sec used 6 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC qdisc add dev lo ingress
$ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 1 installed 32 sec used 32 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

$ sudo $TC -s actions get action gact index 10

action order 1: gact action drop
random type none pass val 0
index 10 ref 2 bind 1 installed 33 sec used 33 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Fixes: aecc5cefc389 ("net sched actions: fix GETing actions")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7246e12 07-Jan-2017 Willem de Bruijn <willemb@google.com>

net-tc: extract skip classify bit from tc_verd

Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c0d32fd 04-Dec-2016 Eric Dumazet <edumazet@google.com>

net_sched: gen_estimator: complete rewrite of rate estimators

1) Old code was hard to maintain, due to complex lock chains.
(We probably will be able to remove some kfree_rcu() in callers)

2) Using a single timer to update all estimators does not scale.

3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
is not supposed to work well)

In this rewrite :

- I removed the RB tree that had to be scanned in
gen_estimator_active(). qdisc dumps should be much faster.

- Each estimator has its own timer.

- Estimations are maintained in net_rate_estimator structure,
instead of dirtying the qdisc. Minor, but part of the simplification.

- Reading the estimator uses RCU and a seqcount to provide proper
support for 32bit kernels.

- We reduce memory need when estimators are not used, since
we store a pointer, instead of the bytes/packets counters.

- xt_rateest_mt() no longer has to grab a spinlock.
(In the future, xt_rateest_tg() could be switched to per cpu counters)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4700e9ce 26-Oct-2016 Johannes Berg <johannes.berg@intel.com>

net_sched actions: use nla_parse_nested()

Use nla_parse_nested instead of open-coding the call to
nla_parse() with the attribute data/len.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab102b80 11-Oct-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: reorder pernet ops and act ops registrations

Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).

Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.

Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aecc5cef 19-Sep-2016 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: fix GETing actions

With the batch changes that translated transient actions into
a temporary list lost in the translation was the fact that
tcf_action_destroy() will eventually delete the action from
the permanent location if the refcount is zero.

Example of what broke:
...add a gact action to drop
sudo $TC actions add action drop index 10
...now retrieve it, looks good
sudo $TC actions get action gact index 10
...retrieve it again and find it is gone!
sudo $TC actions get action gact index 10

Fixes: 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
Fixes: 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
Fixes: f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a7a5555 18-Sep-2016 Jamal Hadi Salim <jhs@mojatatu.com>

net sched: stylistic cleanups

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22dc13c8 13-Aug-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: convert tcf_exts from list to pointer array

As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 824a7e88 13-Aug-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: remove an unnecessary list_del()

This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.

Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f07fed82 13-Aug-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: remove the leftover cleanup_a()

After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec0595cc 25-Jul-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: get rid of struct tcf_common

After the previous patch, struct tc_action should be enough
to represent the generic tc action, tcf_common is not necessary
any more. This patch gets rid of it to make tc action code
more readable.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a85a970a 25-Jul-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: move tc_action into tcf_common

struct tc_action is confusing, currently we use it for two purposes:
1) Pass in arguments and carry out results from helper functions
2) A generic representation for tc actions

The first one is error-prone, since we need to make sure we don't
miss anything. This patch aims to get rid of this use, by moving
tc_action into tcf_common, so that they are allocated together
in hashtable and can be cast'ed easily.

And together with the following patch, we could really make
tc_action a generic representation for all tc actions and each
type of action can inherit from it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ebecaa66 13-Jun-2016 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: bug fix dumping actions directly didnt produce NLMSG_DONE

This refers to commands to direct action access as follows:

sudo tc actions add action drop index 12
sudo tc actions add action pipe index 10

And then dumping them like so:
sudo tc actions ls action gact

iproute2 worked because it depended on absence of TCA_ACT_TAB TLV
as end of message.
This fix has been tested with iproute2 and is backward compatible.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2313077 13-Jun-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: make tcf_hash_check() boolean

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# edb09eb1 06-Jun-2016 Eric Dumazet <edumazet@google.com>

net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b0f43fe 05-Jun-2016 Jamal Hadi Salim <jhs@mojatatu.com>

net sched: indentation and other OCD stylistic fixes

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>


# 53eb440f 06-Jun-2016 Jamal Hadi Salim <jhs@mojatatu.com>

net sched actions: introduce timestamp for firsttime use

Useful to know when the action was first used for accounting
(and debugging)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9854518e 26-Apr-2016 Nicolas Dichtel <nicolas.dichtel@6wind.com>

sched: align nlattr properly when needed

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ddf97ccd 22-Feb-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: add network namespace support for tc actions

Currently tc actions are stored in a per-module hashtable,
therefore are visible to all network namespaces. This is
probably the last part of the tc subsystem which is not
aware of netns now. This patch makes them per-netns,
several tc action API's need to be adjusted for this.

The tc action API code is ugly due to historical reasons,
we need to refactor that code in the future.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d4150c0 22-Feb-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: prepare tcf_hashinfo_destroy() for netns support

We only release the memory of the hashtable itself, not its
entries inside. This is not a problem yet since we only call
it in module release path, and module is refcount'ed by
actions. This would be a problem after we move the per module
hinfo into per netns in the latter patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3c645621 25-Aug-2015 Alexei Starovoitov <ast@kernel.org>

net_sched: make tcf_hash_destroy() static

tcf_hash_destroy() used once. Make it static.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28e6b67f 29-Jul-2015 Daniel Borkmann <daniel@iogearbox.net>

net: sched: fix refcount imbalance in actions

Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action
outside"), we end up with a wrong reference count for a tc action.

Test case 1:

FOO="1,6 0 0 4294967295,"
BAR="1,6 0 0 4294967294,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
action bpf bytecode "$FOO"
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 1 ref 1 bind 1
tc actions replace action bpf bytecode "$BAR" index 1
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
index 1 ref 2 bind 1
tc actions replace action bpf bytecode "$FOO" index 1
tc actions show action bpf
action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
index 1 ref 3 bind 1

Test case 2:

FOO="1,6 0 0 4294967295,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
tc actions add action drop index 1
RTNETLINK answers: File exists [...]
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 2 bind 1
tc actions add action drop index 1
RTNETLINK answers: File exists [...]
tc actions show action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 3 bind 1

What happens is that in tcf_hash_check(), we check tcf_common for a given
index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
found an existing action. Now there are the following cases:

1) We do a late binding of an action. In that case, we leave the
tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
handler. This is correctly handeled.

2) We replace the given action, or we try to add one without replacing
and find out that the action at a specific index already exists
(thus, we go out with error in that case).

In case of 2), we have to undo the reference count increase from
tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
do so because of the 'tcfc_bindcnt > 0' check which bails out early with
an -EPERM error.

Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an
already classifier-bound action to drop the reference count (which could
then become negative, wrap around etc), this restriction only accounts for
invocations outside a specific action's ->init() handler.

One possible solution would be to add a flag thus we possibly trigger
the -EPERM ony in situations where it is indeed relevant.

After the patch, above test cases have correct reference count again.

Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 519c818e 06-Jul-2015 Eric Dumazet <edumazet@google.com>

net: sched: add percpu stats to actions

Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4749c3ef 29-Apr-2015 Florian Westphal <fw@strlen.de>

net: sched: remove TC_MUNGED bits

Not used.

pedit sets TC_MUNGED when packet content was altered, but all the core
does is unset MUNGED again and then set OK2MUNGE.

And the latter isn't tested anywhere. So lets remove both
TC_MUNGED and TC_OK2MUNGE.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0ab6f92 28-Sep-2014 John Fastabend <john.fastabend@gmail.com>

net: sched: enable per cpu qstats

After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64015853 28-Sep-2014 John Fastabend <john.fastabend@gmail.com>

net: sched: restrict use of qstats qlen

This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22e0f8b9 28-Sep-2014 John Fastabend <john.fastabend@gmail.com>

net: sched: make bstats per cpu and estimator RCU safe

In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90f62cf3 23-Apr-2014 Eric W. Biederman <ebiederm@xmission.com>

net: Use netlink_ns_capable to verify the permisions of netlink messages

It is possible by passing a netlink socket to a more privileged
executable and then to fool that executable into writing to the socket
data that happens to be valid netlink message to do something that
privileged executable did not intend to do.

To keep this from happening replace bare capable and ns_capable calls
with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
Which act the same as the previous calls except they verify that the
opener of the socket had the desired permissions as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03701d6e 11-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: clean up tca_action_flush()

We could allocate tc_action on stack in tca_action_flush(),
since it is not large.

Also, we could use create_a() in tcf_action_get_1().

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 55334a5d 11-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: refuse to remove bound action outside

When an action is bonnd to a filter, there is no point to
remove it outside. Currently we just silently decrease the refcnt,
we should reject this explicitly with EPERM.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f1e9d89 11-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: move tcf_hashinfo_init() into tcf_register_action()

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5b5c958 11-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: refactor cleanup ops

For bindcnt and refcnt etc., they are common for all actions,
not need to repeat such operations for their own, they can be unified
now. Actions just need to do its specific cleanup if needed.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86062033 11-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: hide struct tcf_common from API

Now we can totally hide it from modules. tcf_hash_*() API's
will operate on struct tc_action, modules don't need to care about
the details.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6e6a50c2 17-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()

So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c779f7af 17-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: fetch hinfo from a->ops->hinfo

Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7eb8896d 09-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: remove struct tcf_act_hdr

It is not necessary at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a56e1953 09-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: clean up notification functions

Refactor tcf_add_notify() and factor out tcf_del_notify().

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ddafd34f 09-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: move idx_gen into struct tcf_hashinfo

There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 805c1f4a 23-Dec-2013 Jamal Hadi Salim <jhs@mojatatu.com>

net_sched: act: action flushing missaccounting

action flushing missaccounting
Account only for deleted actions

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63acd680 23-Dec-2013 Jamal Hadi Salim <jhs@mojatatu.com>

net_sched: Remove unnecessary checks for act->ops

Remove unnecessary checks for act->ops
(suggested by Eric Dumazet).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9c75f402 31-Dec-2013 stephen hemminger <stephen@networkplumber.org>

sched action: make local function static

No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a792866a 20-Dec-2013 Eric Dumazet <edumazet@google.com>

net_sched: fix regression in tc_action_ops

list_for_each_entry(a, &act_base, head) doesn't
exit with a = NULL if we reached the end of the list.

tcf_unregister_action(), tc_lookup_action_n() and tc_lookup_action()
need fixes.

Remove tc_lookup_action_id() as its unused and not worth 'fixing'

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1f747c26c48b ("net_sched: convert tc_action_ops to use struct list_head")
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f747c26 15-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: convert tc_action_ops to use struct list_head

We don't need to maintain our own singly linked list code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89819dc0 15-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: convert tcf_hashinfo to hlist and use spinlock

So that we don't need to play with singly linked list,
and since the code is not on hot path, we can use spinlock
instead of rwlock.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 369ba567 15-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: init struct tcf_hashinfo at register time

It looks weird to store the lock out of the struct but
still points to a static variable. Just move them into the struct.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33be6271 15-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: act: use standard struct list_head

Currently actions are chained by a singly linked list,
therefore it is a bit hard to add and remove a specific
entry. Convert it to struct list_head so that in the
latter patch we can remove an action without finding
its head.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d84231d3 15-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: remove get_stats from tc_action_ops

It is not used.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 17569fae 10-Dec-2013 Yang Yingliang <yangyingliang@huawei.com>

net_sched: remove unnecessary parentheses while return

return is not a function, parentheses are not required.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 382ca8a1 04-Dec-2013 Jamal Hadi Salim <jhs@mojatatu.com>

net_sched: Provide default walker function for actions

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63ef6174 04-Dec-2013 Jamal Hadi Salim <jhs@mojatatu.com>

net_sched: Default action lookup method for actions

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76c82d7a 04-Dec-2013 Jamal Hadi Salim <jhs@mojatatu.com>

net_sched: Fail if missing mandatory action operation methods

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 661d2967 21-Mar-2013 Thomas Graf <tgraf@suug.ch>

rtnetlink: Remove passing of attributes into rtnl_doit functions

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1b52739 13-Jan-2013 Benjamin LaHaise <bcrl@kvack.org>

pkt_sched: namespace aware act_mirred

Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain. His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces. Instead, pass the
"struct net *" down along the call chain to where it is needed.

This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dfc47ef8 15-Nov-2012 Eric W. Biederman <ebiederm@xmission.com>

net: Push capable(CAP_NET_ADMIN) into the rtnl methods

- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged
users to make netlink calls to modify their local network
namespace.

- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
that calls that are not safe for unprivileged users are still
protected.

Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 15e47304 07-Sep-2012 Eric W. Biederman <ebiederm@xmission.com>

netlink: Rename pid to portid to avoid confusion

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b00a53c 26-Jun-2012 David S. Miller <davem@davemloft.net>

pkt_sched: act_api: Move away from NLMSG_PUT().

Move away from NLMSG_NEW() as well.

And use nlmsg_data() while we're here too.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b34ec43 29-Mar-2012 David S. Miller <davem@davemloft.net>

pkt_sched: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a9a231d 27-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

net: Fix files explicitly needing to include module.h

With calls to modular infrastructure, these files really
needs the full module.h header. Call it out so some of the
cleanups of implicit and unrequired includes elsewhere can be
cleaned up.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# dc7f9f6e 05-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com>

net: sched: constify tcf_proto and tc_action

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7ac8679 09-Jun-2011 Greg Rose <gregory.v.rose@intel.com>

rtnetlink: Compute and store minimum ifinfo dump size

The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f5c8593c 15-Mar-2011 Lai Jiangshan <laijs@cn.fujitsu.com>

net,rcu: convert call_rcu(tcf_common_free_rcu) to kfree_rcu()

The rcu callback tcf_common_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(tcf_common_free_rcu).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# cc7ec456 19-Jan-2011 Eric Dumazet <eric.dumazet@gmail.com>

net_sched: cleanups

Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7de2cf0 08-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com>

pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ff9c364 12-May-2010 stephen hemminger <shemminger@vyatta.com>

net sched: printk message severity

The previous patch encourage me to go look at all the messages in
the network scheduler and fix them. Many messages were missing
any severity level. Some serious ones that should never happen
were turned into WARN(), and the random noise messages that were
handled changed to pr_debug().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 7316ae88 19-Mar-2010 Tom Goff <thomas.goff@boeing.com>

net_sched: make traffic control network namespace aware

Mostly minor changes to add a net argument to various functions and
remove initial network namespace checks.

Make /proc/net/psched per network namespace.

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09ad9bc7 25-Nov-2009 Octavian Purdila <opurdila@ixiacom.com>

net: use net_eq to compare nets

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 06fe9fb4 28-Sep-2009 Dirk Hohndel <hohndel@infradead.org>

tree-wide: fix a very frequent spelling mistake

something-bility is spelled as something-blity
so a grep for 'blit' would find these lines

this is so trivial that I didn't split it by subsystem / copy
additional maintainers - all changes are to comments
The only purpose is to get fewer false positives when grepping
around the kernel sources.

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# d250a5f9 02-Oct-2009 Eric Dumazet <eric.dumazet@gmail.com>

pkt_sched: gen_estimator: Dont report fake rate estimators

Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...

Okay :) Here is the last round, before the night !

Thanks again

[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.

This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a6c2b41 25-Aug-2009 Patrick McHardy <kaber@trash.net>

netlink: constify nlmsghdr arguments

Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.

Signed-off-by: Patrick McHardy <kaber@trash.net>


# 0e991ec6 25-Nov-2008 Stephen Hemminger <shemminger@vyatta.com>

tc: propogate errors from tcf_hash_create

Allow tcf_hash_create to return different errors on estimator failure.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95a5afca 16-Oct-2008 Johannes Berg <johannes@sipsolutions.net>

net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)

Some code here depends on CONFIG_KMOD to not try to load
protocol modules or similar, replace by CONFIG_MODULES
where more than just request_module depends on CONFIG_KMOD
and and also use try_then_request_module in ebtables.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36723873 13-Aug-2008 Jamal Hadi Salim <hadi@cyberus.ca>

net-sched: fix Action flushing return code

Flushing must consistently return ENOMEM on failure of any allocation

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f97017cd 13-Aug-2008 Jamal Hadi Salim <hadi@cyberus.ca>

net-sched: Fix actions flushing

Flushing of actions has been broken since we changed
the semantics of netlink parsed tb[X] to mean X is an attribute type.
This makes the flushing work.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76aab2c1 07-Aug-2008 Jamal Hadi Salim <hadi@cyberus.ca>

pkt_sched: Fix actions referencing

When an action is added several times with the same exact index
it gets deleted on every even-numbered attempt.
This fixes that issue.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 547b792c 25-Jul-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

net: convert BUG_TRAP to generic WARN_ON

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b1e0a65 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>


# 1587bac4 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Use typeful attribute parsing helpers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24beeab5 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Use typeful attribute construction helpers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57e1c487 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Use NLA_PUT_STRING for string dumping

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b3550ef 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Use nla_nest_start/nla_nest_end

Use nla_nest_start/nla_nest_end for dumping nested attributes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cee63723 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Propagate nla_parse return value

nla_parse() returns more detailed errno codes, propagate them back on
error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab27cfb8 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c96c9471 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: act_api: use nlmsg_parse

Convert open-coded nlmsg_parse to use the real function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d834e04 23-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: act_api: fix netlink API conversion bug

Fix two invalid attribute accesses, indices start at 1 with the new
netlink API.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ba699c6 22-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Convert actions from rtnetlink to new netlink API

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1e90474c 22-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API

Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62e3ba1b 22-Jan-2008 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97c53cac 19-Nov-2007 Denis V. Lunev <den@openvz.org>

[NET]: Make rtnetlink infrastructure network namespace aware (v3)

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b854272b 30-Nov-2007 Denis V. Lunev <den@openvz.org>

[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2)

Before I can enable rtnetlink to work in all network namespaces I need
to be certain that something won't break. So this patch deliberately
disables all of the rtnletlink methods in everything except the
initial network namespace. After the methods have been audited this
extra check can be disabled.

Changes from v1:
- added IPv6 addrlabel protection

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# e1e992e5 12-Sep-2007 Jamal Hadi Salim <hadi@cyberus.ca>

[NET_SCHED] protect action config/dump from irqs

(with no apologies to C Heston)

On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote:
On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote:
> >
> > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep
> > was quite noisy when I tried to shape my external (wireless) interface:
> >
> > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock:
> > [ 6400.534713] (&dev->ingress_lock){-+..}, at: [<c038d595>]
> > netif_receive_skb+0x2d5/0x3c0
> > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the
> > past:
> > [ 6400.535145] (police_lock){-.--}
>
> This is a genuine dead-lock. The police lock can be taken
> for reading with softirqs on. If a second CPU tries to take
> the police lock for writing, while holding the ingress lock,
> then a softirq on the first CPU can dead-lock when it tries
> to get the ingress lock.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ba48053 02-Jul-2007 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Remove unnecessary includes

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bdf3991 02-Jul-2007 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Remove unnecessary stats_lock pointers

Remove stats_lock pointers from qdisc-internal structures, in all cases
it points to dev->queue_lock. The only case where it is necessary is for
top-level qdiscs, where it might also point to dev->ingress_lock in case
of the ingress qdisc. Also remove it from actions completely, it always
points to the actions internal lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 876d48aa 02-Jul-2007 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option

The generic estimator is always built in anways and all the config options
does is prevent including a minimal amount of code for setting it up.
Additionally the option is already automatically selected for most cases.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 708914cc 22-Mar-2007 Thomas Graf <tgraf@suug.ch>

[PKT_SCHED] act: Use rtnl registration interface

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc5fc579 26-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[NETLINK]: Use nlmsg_trim() where appropriate

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27a884dc 19-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[SK_BUFF]: Convert skb->tail to sk_buff_data_t

So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd354f1a 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de>

[PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 10297b99 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] SCHED: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9ce1cd3 22-Aug-2006 David S. Miller <davem@sunset.davemloft.net>

[PKT_SCHED]: Kill pkt_act.h inlining.

This was simply making templates of functions and mostly causing a lot
of code duplication in the classifier action modules.

We solve this more cleanly by having a common "struct tcf_common" that
hash worker functions contained once in act_api.c can work with.

Callers work with real action objects that have the common struct
plus their module specific struct members. You go from a common
object to the higher level one using a "to_foo()" macro which makes
use of container_of() to do the dirty work.

This also kills off act_generic.h which was only used by act_simple.c
and keeping it around was more work than the it's value.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 2942e900 15-Aug-2006 Thomas Graf <tgraf@suug.ch>

[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0da974f4 21-Jul-2006 Panagiotis Issaris <takis@issaris.org>

[NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 781b456a 10-Jul-2006 Stephen Hemminger <shemminger@osdl.org>

[MAINTAINERS]: Add proper entry for TC classifier

Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ebbaeab1 09-Jul-2006 Thomas Graf <tgraf@suug.ch>

[PKT_SCHED]: act_api: Fix module leak while flushing actions

Module reference needs to be given back if message header
construction fails.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4fe683f5 05-Jul-2006 Thomas Graf <tgraf@suug.ch>

[PKT_SCHED]: Fix error handling while dumping actions

"return -err" and blindly inheriting the error code in the netlink
failure exception handler causes errors codes to be returned as
positive value therefore making them being ignored by the caller.

May lead to sending out incomplete netlink messages.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d152b4e1 05-Jul-2006 Thomas Graf <tgraf@suug.ch>

[PKT_SCHED]: Return ENOENT if action module is unavailable

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26dab893 05-Jul-2006 Thomas Graf <tgraf@suug.ch>

[PKT_SCHED]: Fix illegal memory dereferences when dumping actions

The TCA_ACT_KIND attribute is used without checking its
availability when dumping actions therefore leading to a
value of 0x4 being dereferenced.

The use of strcmp() in tc_lookup_action_n() isn't safe
when fed with string from an attribute without enforcing
proper NUL termination.

Both bugs can be triggered with malformed netlink message
and don't require any privileges.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# f6e57464 12-Mar-2006 Patrick McHardy <kaber@trash.net>

[NET_SCHED]: act_api: fix skb leak in error path

The skb is allocated by the function, so it needs to be freed instead
of trimmed on overrun.

Coverity #614

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bba3925 08-Jan-2006 Patrick McHardy <kaber@trash.net>

[PKT_SCHED]: Prefix tc actions with act_

Clean up the net/sched directory a bit by prefix all actions with act_.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f43c5a0d 08-Jan-2006 Patrick McHardy <kaber@trash.net>

[PKT_SCHED]: Convert tc action functions to single skb pointers

tcf_action_exec only gets a single skb pointer and doesn't own the skb,
but passes double skb pointers (to a local variable) to the action
functions. Change to use single skb pointers everywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2edc2689 13-Dec-2005 David S. Miller <davem@davemloft.net>

[PKT_SCHED]: Disable debug tracing logs by default in packet action API.

Noticed by Andi Kleen.

Signed-off-by: David S. Miller <davem@davemloft.net>


# ac6d439d 14-Aug-2005 Patrick McHardy <kaber@trash.net>

[NETLINK]: Convert netlink users to use group numbers instead of bitmasks

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abc3bc58 09-Aug-2005 Patrick McHardy <kaber@trash.net>

[NET]: Kill skb->tc_classid

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ef1d4c7 28-Jun-2005 Patrick McHardy <kaber@trash.net>

[NETLINK]: Missing initializations in dumped data

Mostly missing initialization of padding fields of 1 or 2 bytes length,
two instances of uninitialized nlmsgerr->msg of 16 bytes length.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e431b8c0 18-Jun-2005 Jamal Hadi Salim <hadi@cyberus.ca>

[NETLINK]: Explicit typing

This patch converts "unsigned flags" to use more explict types like u16
instead and incrementally introduces NLMSG_NEW().

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 979b6c13 13-Jun-2005 Ralf Baechle <ralf@linux-mips.org>

[NET]: Move the netdev list to vger.kernel.org.

From: Ralf Baechle <ralf@linux-mips.org>

There are archives of the old list at http://oss.sgi.com/archives/netdev

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14d50e78 03-May-2005 J Hadi Salim <hadi@cyberus.ca>

[PKT_SCHED]: Action repeat

Long standing bug.
Policy to repeat an action never worked.

Signed-off-by: J Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!