History log of /freebsd-current/sys/netinet/cc/cc_dctcp.c
Revision Date Author Comments
# f74352fb 24-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: use enum for all congestion control signals

Facilitate easier troubleshooting by enumerating
all congestion control signals. Typecast the
enum to int, when a congestion control module uses
private signals.

No external change.

Reviewed By: glebius, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43838


# fcea1cc9 14-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: fix RTO ssthresh for non-6675 pipe calculation

Follow up on D43768 to properly deal with the non-default
pipe calculation. When CC_RTO is processed, the timeout
will have already pulled back snd_nxt. Further, snd_fack
is not pulled along with snd_una.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43876


# 32a6df57 08-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: calculate ssthresh on RTO according to RFC5681

per RFC5681, only adjust ssthresh on the initital
retransmission timeout. Since RTO often happens
during loss recovery, while cwnd no longer tracks
all data in flight, calculcate pipe properly.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43768


# 1adab814 08-Feb-2024 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: use tcp_fixed_maxseg instead of tcp_maxseg in cc modules

tcp_fixed_maxseg() is the streamlined calculation of typical
tcp options and more suitable for heavy use in the congestion
control modules on every received packet.

No external functional change.

Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43779


# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 9eb0e832 08-Nov-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: provide macros to access inpcb and socket from a tcpcb

There should be no functional changes with this commit.

Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37123


# dc9daa04 08-Nov-2022 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: allow packets to be marked as ECT1 instead of ECT0

This adds the capability for a modular congestion control
to select which variant of ECN-capable-transport it wants to use
when sending out elegible segments. As an initial CC to utilize
this, DCTCP was selected.

Event: IETF 115 Hackathon
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24869


# b8d60729 11-Nov-2021 Randall Stewart <rrs@FreeBSD.org>

tcp: Congestion control cleanup.

NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!

This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.

This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.

Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693


# 39a12f01 24-Oct-2020 Richard Scheffenegger <rscheff@FreeBSD.org>

tcp: move cwnd and ssthresh updates into cc modules

This will pave the way of setting ssthresh differently in TCP CUBIC, according
to RFC8312 section 4.7.

No functional change, only code movement.

Submitted by: chengc_netapp.com
Reviewed by: rrs, tuexen, rscheff
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26807


# 66ba9aaf 20-Jul-2020 Richard Scheffenegger <rscheff@FreeBSD.org>

Add MODULE_VERSION to TCP loadable congestion control modules.

Without versioning information, using preexisting loader /
linker code is not easily possible when another module may
have dependencies on pre-loaded modules, and also doesn't
allow the automatic loading of dependent modules.

No functional change of the actual modules.

Reviewed by: tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25744


# e68cde59 21-May-2020 Richard Scheffenegger <rscheff@FreeBSD.org>

DCTCP: update alpha only once after loss recovery.

In mixed ECN marking and loss scenarios it was found, that
the alpha value of DCTCP is updated two times. The second
update happens with freshly initialized counters indicating
to ECN loss. Overall this leads to alpha not adjusting as
quickly as expected to ECN markings, and therefore lead to
excessive loss.

Reported by: Cheng Cui
Reviewed by: chengc_netapp.com, rrs, tuexen (mentor)
Approved by: tuexen (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24817


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# 9cc711c9 25-Jan-2020 Michael Tuexen <tuexen@FreeBSD.org>

Sending CWR after an RTO is according to RFC 3168 generally required
and not only for the DCTCP congestion control.

Submitted by: Richard Scheffenegger
Reviewed by: rgrimes, tuexen@, Cheng Cui
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23119


# e11c9783 31-Dec-2019 Michael Tuexen <tuexen@FreeBSD.org>

Fix delayed ACK generation for DCTCP.

Submitted by: Richard Scheffenegger
Reviewed by: chengc@netapp.com, rgrimes@, tuexen@
Differential Revision: https://reviews.freebsd.org/D22644


# 3cf38784 01-Dec-2019 Michael Tuexen <tuexen@FreeBSD.org>

Move all ECN related flags from the flags to the flags2 field.
This allows adding more ECN related flags in the future.
No functional change intended.

Submitted by: Richard Scheffenegger
Reviewed by: rrs@, tuexen@
Differential Revision: https://reviews.freebsd.org/D22497


# bb63f59b 29-Jul-2019 Michael Tuexen <tuexen@FreeBSD.org>

When performing after_idle() or post_recovery(), don't disable the
DCTCP specific methods. Also fallthrough NewReno for non ECN capable
TCP connections and improve the integer arithmetic.

Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20550


# 333ba164 29-Jul-2019 Michael Tuexen <tuexen@FreeBSD.org>

* Improve input validation of sysctl parameters for DCTPC.
* Initialize the alpha parameter to a conservative value (like Linux)
* Improve handling of arithmetic.
* Improve man-page

Obtained from: Richard Scheffenegger
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20549


# 855acb84 15-Dec-2018 Brooks Davis <brooks@FreeBSD.org>

Fix bugs in plugable CC algorithm and siftr sysctls.

Use the sysctl_handle_int() handler to write out the old value and read
the new value into a temporary variable. Use the temporary variable
for any checks of values rather than using the CAST_PTR_INT() macro on
req->newptr. The prior usage read directly from userspace memory if the
sysctl() was called correctly. This is unsafe and doesn't work at all on
some architectures (at least i386.)

In some cases, the code could also be tricked into reading from kernel
memory and leaking limited information about the contents or crashing
the system. This was true for CDG, newreno, and siftr on all platforms
and true for i386 in all cases. The impact of this bug is largest in
VIMAGE jails which have been configured to allow writing to these
sysctls.

Per discussion with the security officer, we will not be issuing an
advisory for this issue as root access and a non-default config are
required to be impacted.

Reviewed by: markj, bz
Discussed with: gordon (security officer)
MFC after: 3 days
Security: kernel information leak, local DoS (both require root)
Differential Revision: https://reviews.freebsd.org/D18443


# 5f901c92 24-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by: bz
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D16147


# 22699887 21-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

NULL out cc_data in pluggable TCP {cc}_cb_destroy

When ABE was added (rS331214) to NewReno and leak fixed (rS333699) , it now has
a destructor (newreno_cb_destroy) for per connection state. Other congestion
controls may allocate and free cc_data on entry and exit, but the field is
never explicitly NULLed if moving back to NewReno which only internally
allocates stateful data (no entry contstructor) resulting in a situation where
newreno_cb_destory might be called on a junk pointer.

- NULL out cc_data in the framework after calling {cc}_cb_destroy
- free(9) checks for NULL so there is no need to perform not NULL checks
before calling free.
- Improve a comment about NewReno in tcp_ccalgounload

This is the result of a debugging session from Jason Wolfe, Jason Eggleston,
and mmacy@ and very helpful insight from lstewart@.

Submitted by: Kevin Bowling
Reviewed by: lstewart
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16282


# 43053c12 25-Jul-2017 Sean Bruno <sbruno@FreeBSD.org>

Revert r307901 - Inform CC modules about loss events.

This was discussed between various transport@ members and it was
requested to be reverted and discussed.

Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
Reported by: lawrence
Reviewed by: hiren
Sponsored by: Limelight Networks


# 5d53981a 25-Jul-2017 Sean Bruno <sbruno@FreeBSD.org>

Revert r308180 - Set slow start threshold more accurrately on loss ...

This was discussed between various transport@ members and it was
requested to be reverted and discussed.

Submitted by: kevin
Reported by: lawerence
Reviewed by: hiren


# e04310d5 01-Nov-2016 Hiren Panchasara <hiren@FreeBSD.org>

Set slow start threshold more accurately on loss to be flightsize/2 instead of
cwnd/2 as recommended by RFC5681. (spotted by mmacy at nextbsd dot org)

Restore pre-r307901 behavior of aligning ssthresh/cwnd on mss boundary. (spotted
by slawa at zxy dot spb dot ru)

Tested by: dim, Slawa <slawa at zxy dot spb dot ru>
MFC after: 1 month
X-MFC with: r307901
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8349


# 4e7f7553 24-Oct-2016 Hiren Panchasara <hiren@FreeBSD.org>

FreeBSD tcp stack used to inform respective congestion control module about the
loss event but not use or obay the recommendations i.e. values set by it in some
cases.

Here is an attempt to solve that confusion by following relevant RFCs/drafts.
Stack only sets congestion window/slow start threshold values when there is no
CC module availalbe to take that action. All CC modules are inspected and
updated when needed to take appropriate action on loss.

tcp_stacks/fastpath module has been updated to adapt these changes.

Note: Probably, the most significant change would be to not bring congestion
window down to 1MSS on a loss signaled by 3-duplicate acks and letting
respective CC decide that value.

In collaboration with: Matt Macy <mmacy at nextbsd dot org>
Discussed on: transport@ mailing list
Reviewed by: jtl
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8225


# dd13b7d3 24-Oct-2016 Hiren Panchasara <hiren@FreeBSD.org>

Undo r307899. It needs a bit more work and proper commit log.


# 95d82360 24-Oct-2016 Hiren Panchasara <hiren@FreeBSD.org>

In Collaboration with: Matt Macy <mmacy at nextbsd dot com>
Reviewed by: jtl
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8225


# a4641f4e 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/net*: minor spelling fixes.

No functional change.


# 4644fda3 27-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Rename netinet/tcp_cc.h to netinet/cc/cc.h.

Discussed with: lstewart


# 2de3e790 21-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

- Rename cc.h to more meaningful tcp_cc.h.
- Declare it a kernel only include, which it already is.
- Don't include tcp.h implicitly from tcp_cc.h


# b66d74c1 21-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Cleanup TCP files from unnecessary interface related includes.


# 64807b30 12-Jan-2015 Hiren Panchasara <hiren@FreeBSD.org>

DCTCP (Data Center TCP) implementation.

DCTCP congestion control algorithm aims to maximise throughput and minimise
latency in data center networks by utilising the proportion of Explicit
Congestion Notification (ECN) marked packets received from capable hardware as a
congestion signal.

Highlights:
Implemented as a mod_cc(4) module.
ECN (Explicit congestion notification) processing is done differently from
RFC3168.
Takes one-sided DCTCP into consideration where only one of the sides is using
DCTCP and other is using standard ECN.

IETF draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00
Thesis report by Midori Kato: https://eggert.org/students/kato-thesis.pdf

Submitted by: Midori Kato <katoon@sfc.wide.ad.jp> and
Lars Eggert <lars@netapp.com>
with help and modifications from
hiren
Differential Revision: https://reviews.freebsd.org/D604
Reviewed by: gnn