History log of /linux-master/net/sched/sch_pie.c
Revision Date Author Comments
# 241a94ab 01-Feb-2024 Michal Koutný <mkoutny@suse.com>

net/sched: Add module aliases for cls_,sch_,act_ modules

No functional change intended, aliases will be used in followup commits.
Note for backporters: you may need to add aliases also for modules that
are already removed in mainline kernel but still in your version.

Patches were generated with the help of Coccinelle scripts like:

cat >scripts/coccinelle/misc/tcf_alias.cocci <<EOD
virtual patch
virtual report

@ haskernel @
@@

@ tcf_has_kind depends on report && haskernel @
identifier ops;
constant K;
@@

static struct tcf_proto_ops ops = {
.kind = K,
...
};
+char module_alias = K;
EOD

/usr/bin/spatch -D report --cocci-file scripts/coccinelle/misc/tcf_alias.cocci \
--dir . \
-I ./arch/x86/include -I ./arch/x86/include/generated -I ./include \
-I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi \
-I ./include/uapi -I ./include/generated/uapi \
--include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h \
--jobs 8 --chunksize 1 2>/dev/null | \
sed 's/char module_alias = "\([^"]*\)";/MODULE_ALIAS_NET_CLS("\1");/'

And analogously for:

static struct tc_action_ops ops = {
.kind = K,

static struct Qdisc_ops ops = {
.id = K,

(Someone familiar would be able to fit those into one .cocci file
without sed post processing.)

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201130943.19536-3-mkoutny@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d636fc5d 06-Jun-2023 Eric Dumazet <edumazet@google.com>

net: sched: add rcu annotations around qdisc->qdisc_sleeping

syzbot reported a race around qdisc->qdisc_sleeping [1]

It is time we add proper annotations to reads and writes to/from
qdisc->qdisc_sleeping.

[1]
BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu

read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1:
qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331
__tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174
tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547
rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0:
dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115
qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103
tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693
rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1e3/0x270 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__x64_sys_sendmsg+0x46/0x50 net/socket.c:2593
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023

Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4170f0ef 22-Mar-2023 Taichi Nishimura <awkrail01@gmail.com>

fix typos in net/sched/* files

This patch fixes typos in net/sched/* files.

Signed-off-by: Taichi Nishimura <awkrail01@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 197173db 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_bytes() when possible

The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# a102c897 29-Aug-2022 Zhengchao Shao <shaozhengchao@huawei.com>

net: sched: remove redundant NULL check in change hook function

Currently, the change function can be called by two ways. The one way is
that qdisc_change() will call it. Before calling change function,
qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is
that .init() will call it. The opt parameter is also checked before
calling change function in .init(). Therefore, it's no need to check the
input parameter opt in change function.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220829071219.208646-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# e5a4b17d 09-Nov-2020 Menglong Dong <dong.menglong@zte.com.cn>

net: sched: fix misspellings using misspell-fixer tool

Some typos are found out by misspell-fixer tool:

$ misspell-fixer -rnv ./net/sched/
./net/sched/act_api.c:686
./net/sched/act_bpf.c:68
./net/sched/cls_rsvp.h:241
./net/sched/em_cmp.c:44
./net/sched/sch_pie.c:408

Fix typos found by misspell-fixer.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/5fa8e9d4.1c69fb81.5d889.5c64@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ac5c66f2 14-Jul-2020 Petr Machata <petrm@mellanox.com>

Revert "net: sched: Pass root lock to Qdisc_ops.enqueue"

This reverts commit aebe4426ccaa4838f36ea805cdf7d76503e65117.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# aebe4426 26-Jun-2020 Petr Machata <petrm@mellanox.com>

net: sched: Pass root lock to Qdisc_ops.enqueue

A following patch introduces qevents, points in qdisc algorithm where
packet can be processed by user-defined filters. Should this processing
lead to a situation where a new packet is to be enqueued on the same port,
holding the root lock would lead to deadlocks. To solve the issue, qevent
handler needs to unlock and relock the root lock when necessary.

To that end, add the root lock argument to the qdisc op enqueue, and
propagate throughout.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f95f55e 09-Mar-2020 Leslie Monis <lesliemonis@gmail.com>

net: sched: pie: change tc_pie_xstats->prob

Commit 105e808c1da2 ("pie: remove pie_vars->accu_prob_overflows")
changes the scale of probability values in PIE from (2^64 - 1) to
(2^56 - 1). This affects the precision of tc_pie_xstats->prob in
user space.

This patch ensures user space is unaffected.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 105e808c 04-Mar-2020 Leslie Monis <lesliemonis@gmail.com>

pie: remove pie_vars->accu_prob_overflows

The variable pie_vars->accu_prob is used as an accumulator for
probability values. Since probabilty values are scaled using the
MAX_PROB macro denoting (2^64 - 1), pie_vars->accu_prob is
likely to overflow as it is of type u64.

The variable pie_vars->accu_prob_overflows counts the number of
times the variable pie_vars->accu_prob overflows.

The MAX_PROB macro needs to be equal to at least (2^39 - 1) in
order to do precise calculations without any underflow. Thus
MAX_PROB can be reduced to (2^56 - 1) without affecting the
precision in calculations drastically. Doing so will eliminate
the need for the variable pie_vars->accu_prob_overflows as the
variable pie_vars->accu_prob will never overflow.

Removing the variable pie_vars->accu_prob_overflows also reduces
the size of the structure pie_vars to exactly 64 bytes.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 220d4ac7 04-Mar-2020 Leslie Monis <lesliemonis@gmail.com>

pie: remove unnecessary type casting

In function pie_calculate_probability(), the variables alpha and
beta are of type u64. The variables qdelay, qdelay_old and
params->target are of type psched_time_t (which is also u64).
The explicit type casting done when calculating the value for
the variable delta is redundant and not required.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90baeb9d 04-Mar-2020 Leslie Monis <lesliemonis@gmail.com>

pie: use term backlog instead of qlen

Remove ambiguity by using the term backlog instead of qlen when
representing the queue length in bytes.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5205ea00 22-Jan-2020 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: export symbols to be reused by FQ-PIE

This patch makes the drop_early(), calculate_probability() and
pie_process_dequeue() functions generic enough to be used by
both PIE and FQ-PIE (to be added in a future commit). The major
change here is in the way the functions take in arguments. This
patch exports these functions and makes FQ-PIE dependent on
sch_pie.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 00ea2fb7 22-Jan-2020 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: fix alignment in struct instances

Make the alignment in the initialization of the struct instances
consistent in the file.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 55f780c4 22-Jan-2020 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: fix commenting

Fix punctuation and logical mistakes in the comments. The
logical mistake was that "dequeue_rate" is no longer the default
way to calculate queuing delay and is not needed. The default
way to calculate queue delay was changed in commit cec2975f2b70
("net: sched: pie: enable timestamp based delay calculation").

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2dfb1952 22-Jan-2020 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

pie: rearrange structure members and their initializations

Rearrange the members of the structure such that closely
referenced members appear together and/or fit in the same
cacheline. Also, change the order of their initializations to
match the order in which they appear in the structure.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84bf557f 22-Jan-2020 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: move common code to pie.h

This patch moves macros, structures and small functions common
to PIE and FQ-PIE (to be added in a future commit) from the file
net/sched/sch_pie.c to the header file include/net/pie.h.
All the moved functions are made inline.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cec2975f 20-Nov-2019 Gautam Ramakrishnan <gautamramk@gmail.com>

net: sched: pie: enable timestamp based delay calculation

RFC 8033 suggests an alternative approach to calculate the queue
delay in PIE by using a timestamp on every enqueued packet. This
patch adds an implementation of that approach and sets it as the
default method to calculate queue delay. The previous method (based
on Little's law) to calculate queue delay is set as optional.

Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Acked-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2504ba9f 02-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 235

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license this
program is distributed in the hope that it will be useful but
without any warranty without even the implied warranty of
merchantability or fitness for a particular purpose see the gnu
general public license for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 53 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.904365654@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c97da14 28-Feb-2019 Leslie Monis <lesliemonis@gmail.com>

net: sched: pie: avoid slow division in drop probability decay

As per RFC 8033, it is sufficient for the drop probability
decay factor to have a value of (1 - 1/64) instead of 98%.
This avoids the need to do slow division.

Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff8285f8 26-Feb-2019 Leslie Monis <lesliemonis@gmail.com>

net: sched: pie: fix 64-bit division

Use div_u64() to resolve build failures on 32-bit platforms.

Fixes: 3f7ae5f3dc52 ("net: sched: pie: add more cases to auto-tune alpha and beta")
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24ed4900 26-Feb-2019 Leslie Monis <lesliemonis@gmail.com>

net: sched: pie: fix mistake in reference link

Fix the incorrect reference link to RFC 8033

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9d2ac5e 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: update references

RFC 8033 replaces the IETF draft for PIE

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95400b97 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: add derandomization mechanism

Random dropping of packets to achieve latency control may
introduce outlier situations where packets are dropped too
close to each other or too far from each other. This can
cause the real drop percentage to temporarily deviate from
the intended drop probability. In certain scenarios, such
as a small number of simultaneous TCP flows, these
deviations can cause significant deviations in link
utilization and queuing latency.

RFC 8033 suggests using a derandomization mechanism to avoid
these deviations.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f7ae5f3 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: add more cases to auto-tune alpha and beta

The current implementation scales the local alpha and beta
variables in the calculate_probability function by the same
amount for all values of drop probability below 1%.

RFC 8033 suggests using additional cases for auto-tuning
alpha and beta when the drop probability is less than 1%.

In order to add more auto-tuning cases, MAX_PROB must be
scaled by u64 instead of u32 to prevent underflow when
scaling the local alpha and beta variables in the
calculate_probability function.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 30a92ad7 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: change initial value of pie_vars->burst_time

RFC 8033 suggests an initial value of 150 milliseconds for
the maximum time allowed for a burst of packets.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29daa855 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: change default value of pie_params->tupdate

RFC 8033 suggests a default value of 15 milliseconds for the
update interval.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# abde7920 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: change default value of pie_params->target

RFC 8033 suggests a default value of 15 milliseconds for the
target queue delay.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57509003 25-Feb-2019 Mohit P. Tahiliani <tahiliani@nitk.edu.in>

net: sched: pie: change value of QUEUE_THRESHOLD

RFC 8033 recommends a value of 16384 bytes for the queue
threshold.

Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com>
Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com>
Signed-off-by: Manish Kumar B <bmanish15597@gmail.com>
Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac4a02c5 06-Oct-2018 Leslie Monis <lesliemonis@gmail.com>

net: sched: pie: fix coding style issues

Fix 5 warnings and 14 checks issued by checkpatch.pl:

CHECK: Logical continuations should be on the previous line
+ if ((q->vars.qdelay < q->params.target / 2)
+ && (q->vars.prob < MAX_PROB / 5))

WARNING: line over 80 characters
+ q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE]));

CHECK: Blank lines aren't necessary after an open brace '{'
+{
+

CHECK: braces {} should be used on all arms of this statement
+ if (qlen < QUEUE_THRESHOLD)
[...]
+ else {
[...]

CHECK: Unbalanced braces around else statement
+ else {

CHECK: No space is necessary after a cast
+ if (delta > (s32) (MAX_PROB / (100 / 2)) &&

CHECK: Unnecessary parentheses around 'qdelay == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

CHECK: Unnecessary parentheses around 'qdelay_old == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)

CHECK: Unnecessary parentheses around 'q->vars.prob == 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+ (q->vars.qdelay_old < q->params.target / 2) &&
+ (q->vars.prob == 0) &&
+ (q->vars.avg_dq_rate > 0))

CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+ (q->vars.qdelay_old < q->params.target / 2) &&
+ (q->vars.prob == 0) &&
+ (q->vars.avg_dq_rate > 0))

CHECK: Blank lines aren't necessary before a close brace '}'
+
+}

CHECK: Comparison to NULL could be written "!opts"
+ if (opts == NULL)

CHECK: No space is necessary after a cast
+ ((u32) PSCHED_TICKS2NS(q->params.target)) /

WARNING: line over 80 characters
+ nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) ||

CHECK: Blank lines aren't necessary before a close brace '}'
+
+}

CHECK: No space is necessary after a cast
+ .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) /

WARNING: Missing a blank line after declarations
+ struct sk_buff *skb;
+ skb = qdisc_dequeue_head(sch);

WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ qdisc_reset_queue(sch);

WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ q->params.tupdate = 0;

Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2030721c 19-Dec-2017 Alexander Aring <aring@mojatatu.com>

net: sched: sch: add extack for change qdisc ops

This patch adds extack support for change callback for qdisc ops
structtur to prepare per-qdisc specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e63d7dfd 19-Dec-2017 Alexander Aring <aring@mojatatu.com>

net: sched: sch: add extack for init callback

This patch adds extack support for init callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cdeabbb8 16-Oct-2017 Kees Cook <keescook@chromium.org>

net: sched: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Add pointer back to Qdisc.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed760cb8 17-Sep-2016 Florian Westphal <fw@strlen.de>

sched: replace __skb_dequeue with __qdisc_dequeue_head

After previous patch these functions are identical.
Replace __skb_dequeue in qdiscs with __qdisc_dequeue_head.

Next patch will then make __qdisc_dequeue_head handle
single-linked list instead of strcut sk_buff_head argument.

Doesn't change generated code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1486587b 17-Sep-2016 Florian Westphal <fw@strlen.de>

pie: use qdisc_dequeue_head wrapper

Doesn't change generated code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 520ac30f 22-Jun-2016 Eric Dumazet <edumazet@google.com>

net_sched: drop packets after root qdisc lock is released

Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.

Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.

Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.

This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc->enqueue(), and in qdisc_drop() helper.

I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db4879d9 13-Jun-2016 Eric Dumazet <edumazet@google.com>

net_sched: sch_pie: defer skb freeing

pie_change() can use rtnl_qdisc_drop() to benefit from
deferred freeing.

pie_reset() is already using qdisc_reset_queue()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ccccf5f 25-Feb-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: update hierarchical backlog too

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d5610902 24-Oct-2014 WANG Cong <xiyou.wangcong@gmail.com>

sch_pie: schedule the timer after all init succeed

Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>


# 25331d6c 28-Sep-2014 John Fastabend <john.fastabend@gmail.com>

net: sched: implement qstat helper routines

This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 219e288e 12-Feb-2014 Vijay Subramanian <vijaynsu@cisco.com>

net: sched: Cleanup PIE comments

Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add
more details. Also add references to algorithm (IETF draft and paper) to top of
file.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
CC: Mythili Prabhu <mysuryan@cisco.com>
CC: Norbert Kiesel <nkiesel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63862b5b 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>

net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4b36210 04-Jan-2014 Vijay Subramanian <vijaynsu@cisco.com>

net: pkt_sched: PIE AQM scheme

Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "

Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
support.

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>