History log of /freebsd-current/sys/netinet/cc/cc.c
Revision Date Author Comments
# 8917131e 25-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: need default in switch statement for enum.
fix clang error after c9b6241e250a4f1156e2150ccdbad0d3029dcef6

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D44081


# c9b6241e 24-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: address enum-int-mismatch
fix gcc13 error after f74352fbcf15341accaf5a92240871f98323215d


# f74352fb 24-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: use enum for all congestion control signals

Facilitate easier troubleshooting by enumerating
all congestion control signals. Typecast the
enum to int, when a congestion control module uses
private signals.

No external change.

Reviewed By: glebius, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43838


# fcea1cc9 14-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: fix RTO ssthresh for non-6675 pipe calculation

Follow up on D43768 to properly deal with the non-default
pipe calculation. When CC_RTO is processed, the timeout
will have already pulled back snd_nxt. Further, snd_fack
is not pulled along with snd_una.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43876


# 32a6df57 08-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: calculate ssthresh on RTO according to RFC5681

per RFC5681, only adjust ssthresh on the initital
retransmission timeout. Since RTO often happens
during loss recovery, while cwnd no longer tracks
all data in flight, calculcate pipe properly.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43768


# 1adab814 08-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: use tcp_fixed_maxseg instead of tcp_maxseg in cc modules

tcp_fixed_maxseg() is the streamlined calculation of typical
tcp options and more suitable for heavy use in the congestion
control modules on every received packet.

No external functional change.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43779


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 0fdc2472 25-Sep-2022 Michael Tuexen <tuexen@FreeBSD.org>

tcp: make RACK loadable again using the default configuration

Without this patch, loading the RACK stack required the newreno
CC module to be compiled into the kernel. This is not the case
anymore since CUBIC is the default now.

Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D36707


# bb1d472d 12-Sep-2022 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: make CUBIC the default congestion control mechanism.

This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.

Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537


# ccdfd621 05-Apr-2022 Michael Tuexen <tuexen@FreeBSD.org>

tcp cc: don't recurse on non recursive mutex

This issue was found by syzkaller.

Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34743


# d4290f7e 02-Apr-2022 Michael Tuexen <tuexen@FreeBSD.org>

Revert "sctp: remove a test, which isn't safe"

It included unrelated changes still under review.
This reverts commit b1fe92b28ba2e77395598db1c2ff1976b55c86ab.


# b1fe92b2 02-Apr-2022 Michael Tuexen <tuexen@FreeBSD.org>

sctp: remove a test, which isn't safe

We can't ensure the stcb is still around. This issue was found
by syzkaller.

MFC after: 3 days


# ea9017fb 21-Feb-2022 Randall Stewart <rrs@FreeBSD.org>

tcp: Congestion control move to using reference counting.

In the transport call on 12/3 Gleb asked to move the CC modules towards
using reference counting to prevent folks from unloading a module in use.
It was also agreed that Michael would do a user space utility like tcp_drop
that could be used to move all connections that are using a specific CC
to some other CC.

This is the half I committed to doing, making it so that we maintain a refcount
on a cc module every time a pcb refers to it and decrementing that every
time a pcb no longer uses a cc module. This also helps us simplify the
whole unloading process by getting rid of tcp_ccunload() which munged
through all the tcb's. Instead we mark a module as being removed and
prevent further references to it. We also make sure that if a module is
marked as being removed it cannot be made as the default and also
the opposite of that, if its a default it fails and does not mark it as being
removed.

Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33249


# a9696510 07-Feb-2022 Randall Stewart <rrs@FreeBSD.org>

tcp: Add hystart++ to our cubic implementation.

As promised to the transport call on 11/4/22 here is an implementation
of hystart++ for cubic. It also cleans up the tcp_congestion function
to have a better name. Common variables are moved into the general
cc.h structure so that both cubic and newreno can use them for
hystart++

Reviewed by: Michael Tuexen, Richard Scheffenegger
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33035


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# dcf2dfed 02-Dec-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: unloading a module that is set to default should error.

I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229


# b4fbc855 19-Nov-2021 Gordon Bergling <gbe@FreeBSD.org>

cc_newreno(4): Fix a typo in a source code comment

- s/conditons/conditions/

MFC after: 3 days


# 034a9240 12-Nov-2021 Mark Johnston <markj@FreeBSD.org>

tcp: Ensure that vnets have an initialized V_default_cc_ptr

This causes new vnets to inherit the cc algorithm from vnet0. This is a
temporary patch to fix vnet jail creation.

With encouragement from: glebius
Fixes: b8d60729deef ("tcp: Congestion control cleanup.")
Differential Revision: https://reviews.freebsd.org/D32970


# 7e3c9ec9 12-Nov-2021 Warner Losh <imp@FreeBSD.org>

tcp: better congestion control defaults

Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.

Sponsored by: Netflix
Fixes: b8d60729deef ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964


# b8d60729 11-Nov-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: Congestion control cleanup.

NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!

This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.

This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.

Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 370efe5a 19-Mar-2018 Lawrence Stewart <lstewart@FreeBSD.org>

Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.

More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn

The implementation introduces four new sysctls:

- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
enable ABE for ECN-enabled TCP connections.

- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
multiplicative window decrease factor, specified as a percentage, applied to
the congestion window in response to a loss-based or ECN-based congestion
signal respectively. They default to the values specified in the draft i.e.
beta=50 and beta_ecn=80.

- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
non-zero to enable the use of standard beta (50% by default) when repairing
loss during an ECN-signalled congestion recovery episode. It enables a more
conservative congestion response and is provided for the purposes of
experimentation as a result of some discussion at IETF 100 in Singapore.

The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.

Submitted by: Tom Jones <tj@enoti.me>
Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D11616


# fe267a55 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 439e76ec 26-Jul-2016 Brad Davis <brd@FreeBSD.org>

Fix the case for some sysctl descriptions.

Reviewed by: gnn


# 4644fda3 27-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Rename netinet/tcp_cc.h to netinet/cc/cc.h.

Discussed with: lstewart


# 2de3e790 21-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

- Rename cc.h to more meaningful tcp_cc.h.
- Declare it a kernel only include, which it already is.
- Don't include tcp.h implicitly from tcp_cc.h


# b66d74c1 21-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Cleanup TCP files from unnecessary interface related includes.


# 6df8a710 07-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.

Sponsored by: Nginx, Inc.


# 0e1152fc 27-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 614b50ae 27-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Preserve limitation of "TCP_CA_NAME_MAX" when matching the algorithm
name.

MFC after: 3 days
Suggested by: gnn @


# 60a945f9 27-Oct-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Make assignments to "net.inet.tcp.cc.algorithm" work by fixing a bad
string comparison.

MFC after: 3 days
Reported by: Jukka Ukkonen <jau789@gmail.com>
Sponsored by: Mellanox Technologies


# e167cb89 10-Aug-2014 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix string length argument passed to "sysctl_handle_string()" so that
the complete string is returned by the function and not just only one
byte.

PR: 192544
MFC after: 2 weeks


# 891b8ed4 12-Apr-2011 Lawrence Stewart <lstewart@FreeBSD.org>

Use the full and proper company name for Swinburne University of Technology
throughout the source tree.

Requested by: Grenville Armitage, Director of CAIA at Swinburne University of
Technology
MFC after: 3 days


# a66ac850 23-Jan-2011 Lawrence Stewart <lstewart@FreeBSD.org>

An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a
write to the buffer causes it to overflow. We therefore can't hold the CC list
rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND.

Switch to a fixed length sbuf which should be of sufficient size except in the
very unlikely event that the sysctl is being processed as one or more new
algorithms are loaded. If that happens, we accept the race and may fail the
sysctl gracefully if there is insufficient room to print the names of all the
algorithms.

This should address a WITNESS warning and the potential panic that would occur
if the sbuf call to malloc did sleep whilst holding the CC list rwlock.

Sponsored by: FreeBSD Foundation
Reported by: Nick Hibma
Reviewed by: bz
MFC after: 3 weeks
X-MFC with: r215166


# 78b01840 16-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Make the CC framework more VIMAGE friendly by adding the machinery to allow
vnets to select their own default CC algorithm independent of each other and the
base system. If the base system or a vnet has set a default which gets unloaded,
we reset that netstack's default to NewReno.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 3 months


# ebf92e86 16-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

- Querying the default CC algo is more common than setting it and the function
is small, so there is no good reason not to declare the buffer at the top.

- Fix a whitespace nit.

Sponsored by: FreeBSD Foundation
MFC after: 11 weeks
X-MFC with: r215166


# 99065ae6 16-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Move protocol specific implementation detail out of the core CC framework.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
MFC after: 11 weeks
X-MFC with: r215166


# 4e805854 16-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

On CC algorithm module unload, we walk the list of active TCP control blocks.
Any found to be using the algorithm that is about to go away are switched back
to NewReno to avoid leaving dangling pointers which would trigger a panic. For
VIMAGE kernels, there is a list per vnet to walk, yet the implementation was
only examining one of the vnet lists.

Fix the implementation of the above feature for VIMAGE kernels by looping
through all active TCP control blocks across all vnets.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 11 weeks


# 14f57a8b 16-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

cc_init() should only be run once on system boot, but with VIMAGE kernels it
runs on boot and each time a vnet jail is created. Running cc_init() multiple
times results in a panic when attempting to initialise the cc_list lock again,
and so r215166 effectively broke the use of vnet jails.

Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded
on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in
this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is
initialised before module registration is attempted.

Sponsored by: FreeBSD Foundation
Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com>
MFC after: 11 weeks
X-MFC with: r215166


# dbc42409 11-Nov-2010 Lawrence Stewart <lstewart@FreeBSD.org>

This commit marks the first formal contribution of the "Five New TCP Congestion
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/

- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.

- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.

- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.

- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.

- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.

- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
that relies on the size of "struct tcpcb" is required.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo
Tested by: David Hayes (and many others over the years)
MFC after: 3 months