History log of /freebsd-11.0-release/sys/netinet/tcp_timer.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 303975 11-Aug-2016 gjb

Copy stable/11@r303970 to releng/11.0 as part of the 11.0-RELEASE
cycle.

Prune svn:mergeinfo from the new branch, and rename it to RC1.

Update __FreeBSD_version.

Use the quarterly branch for the default FreeBSD.conf pkg(8) repo and
the dvd1.iso packages population.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 302408 08-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 298743 28-Apr-2016 rrs

This cleans up the timers code in TCP to start using the new
async_drain functionality. This as been tested in NF as well as
by Verisign. Still to do in here is to remove all the old flags. They
are currently left being maintained but probably are no longer needed.

Sponsored by: Netflix Inc.
Differential Revision: http://reviews.freebsd.org/D5924


# 294840 26-Jan-2016 hiren

Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.

Submitted by: Jason Wolfe (j at nitrology dot com)
Reviewed by: gnn, rrs, hiren, bz
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D5024


# 287304 30-Aug-2015 jch

Put r284245 back in place: If at first this fix was seen as a temporary
workaround for a callout(9) issue, it turns out it is instead the right
way to use callout in mpsafe mode without using callout_drain().

r284245 commit message:

Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision: https://reviews.freebsd.org/D2763


# 287101 24-Aug-2015 jch

Revert r284245: "Fix a callout race condition introduced in TCP
timers callouts with r281599."

r281599 fixed a TCP timer race condition, but due a callout(9) bug
it also introduced another race condition workaround-ed with r284245.
The callout(9) bug being fixed with r286880, we can now revert the
workaround (r284245).

Differential Revision: https://reviews.freebsd.org/D2079 (Initial change)
Differential Revision: https://reviews.freebsd.org/D2763 (Workaround)
Differential Revision: https://reviews.freebsd.org/D3078 (Fix)
Sponsored by: Verisign, Inc.
MFC after: 2 weeks


# 286669 12-Aug-2015 hiren

Remove unused TCPTV_SRTTDFLT. We initialize srtt with TCPTV_SRTTBASE when we
don't have any rtt estimate.

Differential Revision: D3334
Sponsored by: Limelight Networks


# 284245 10-Jun-2015 jch

Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision: https://reviews.freebsd.org/D2763
Reviewed by: hiren
Approved by: hiren
MFC after: 1 day
Sponsored by: Verisign, Inc.


# 281599 16-Apr-2015 jch

Fix an old and well-documented use-after-free race condition in
TCP timers:
- Add a reference from tcpcb to its inpcb
- Defer tcpcb deletion until TCP timers have finished

Differential Revision: https://reviews.freebsd.org/D2079
Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: imp, rrs, adrian, jhb, bz
Approved by: jhb
Sponsored by: Verisign, Inc.


# 273850 30-Oct-2014 jch

Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
tcp_tw_2msl_scan(). This race condition drives unplanned timewait
timeout cancellation. Also simplify implementation by holding inpcb
reference and removing tcptw reference counting.

Differential Revision: https://reviews.freebsd.org/D826
Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com>
Submitted by: jch
Reviewed By: jhb (mentor), adrian, rwatson
Sponsored by: Verisign, Inc.
MFC after: 2 weeks
X-MFC-With: r264321


# 264321 10-Apr-2014 jhb

Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.

To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when
closing an expired connection. This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.

Submitted by: Julien Charbon <jcharbon@verisign.com>
Reviewed by: rwatson, rrs, adrian
MFC after: 2 months


# 243621 27-Nov-2012 andre

Add DELACK to list of timers.

MFC after: 1 week


# 242308 29-Oct-2012 andre

Define the delayed ACK timeout value directly as hz/10 instead of
obfuscating it by going through PR_FASTHZ. No functional change.

MFC after: 2 weeks


# 242261 28-Oct-2012 andre

For retransmits of SYN|ACK from the syncache use the slightly more
aggressive special tcp_syn_backoff[] retransmit schedule instead of
the normal tcp_backoff[] schedule for established connections.

MFC after: 2 weeks


# 231025 05-Feb-2012 glebius

Add new socket options: TCP_KEEPINIT, TCP_KEEPIDLE, TCP_KEEPINTVL and
TCP_KEEPCNT, that allow to control initial timeout, idle time, idle
re-send interval and idle send count on a per-socket basis.

Reviewed by: andre, bz, lstewart


# 212765 16-Sep-2010 andre

Remove the TCP inflight bandwidth limiter as announced in r211315
to give way for the pluggable congestion control framework. It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.

In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.

In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI. It is always set to 0.

In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI. It is always set to 0.

These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.

No MFC is planned. The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.


# 197244 16-Sep-2009 silby

Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by: andre, rwatson, Eric Van Gyzen


# 172309 24-Sep-2007 silby

Two changes:

- Reintegrate the ANSI C function declaration change
from tcp_timer.c rev 1.92

- Reorganize the tcpcb structure so that it has a single
pointer to the "tcp_timer" structure which contains all
of the tcp timer callouts. This change means that when
the single tcp timer change is reintegrated, tcpcb will
not change in size, and therefore the ABI between
netstat and the kernel will not change.

Neither of these changes should have any functional
impact.

Reviewed by: bmah, rrs
Approved by: re (bmah)


# 172074 07-Sep-2007 rwatson

Back out tcp_timer.c:1.93 and associated changes that reimplemented the many
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change. This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).

In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI. The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.

This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.

Approved by: re (kensmith)
Submitted by: silby
Tested by: rwatson, kris
Problems reported by: peter, kris, others


# 171677 31-Jul-2007 peter

Change TCPTV_MIN to be independent of HZ. While it was documented to
be in ticks "for algorithm stability" when originally committed, it turns
out that it has a significant impact in timing out connections. When we
changed HZ from 100 to 1000, this had a big effect on reducing the time
before dropping connections.

To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet
and establish a reliable round-trip-time (ie: type a few commands).
Then unplug the ethernet and press a key. Time how long it takes to
drop the connection.

The old behavior (with hz=100) caused the connection to typically drop
between 90 and 110 seconds of getting no response.

Now boot with kern.hz=1000 (default). The same test causes the ssh session
to drop after just 9-10 seconds. This is a big deal on a wifi connection.

With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30.
Note how it behaves the same as when HZ was 100. Also, note that when
booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.

This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should
always be about 30. If you set hz to Really Slow(TM), there is a safety
feature to prevent a value of 0 being used.

This may be revised in the future, but for the time being, it restores the
old, pre-hz=1000 behavior, which is significantly less annoying.

As a workaround, to avoid rebooting or rebuilding a kernel, you can run
"sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30"
to /etc/sysctl.conf. This is safe to run from 6.0 onwards.

Approved by: re (rwatson)
Reviewed by: andre, silby


# 169613 16-May-2007 andre

Remove now unused stuff forgotten in the previous commit.


# 169608 16-May-2007 andre

Move TIME_WAIT related functions and timer handling from files
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:

tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()

tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()

This is a mechanical move with appropriate renames and making
them static if used only locally.

The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.


# 168621 11-Apr-2007 ru

Make "struct tcp_timer" visible only to the kernel, and unbreak world.


# 168615 11-Apr-2007 andre

Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by: rwatson (earlier version)


# 167036 26-Feb-2007 mohans

Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.


# 162612 25-Sep-2006 jmg

if min is greater than max, prefer max over min... I managed to get a
retransmit timer that was going to take 19 days to trigger...

Reviewed by: silby


# 162111 07-Sep-2006 ru

Back when we had T/TCP support, we used to apply different
timeouts for TCP and T/TCP connections in the TIME_WAIT
state, and we had two separate timed wait queues for them.
Now that is has gone, the timeout is always 2*MSL again,
and there is no reason to keep two queues (the first was
unused anyway!).

Also, reimplement the remaining queue using a TAILQ (it
was technically impossible before, with two queues).


# 161226 11-Aug-2006 mohans

Fixes an edge case bug in timewait handling where ticks rolling over causing
the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry).
Reviewed by: silby


# 155767 16-Feb-2006 andre

Have TCP Inflight disable itself if the RTT is below a certain
threshold. Inflight doesn't make sense on a LAN as it has
trouble figuring out the maximal bandwidth because of the coarse
tick granularity.

The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold
in milliseconds below which inflight will disengage. It defaults
to 10ms.

Tested by: Joao Barros <joao.barros-at-gmail.com>,
Rich Murphey <rich-at-whiteoaklabs.com>
Sponsored by: TCP/IP Optimization Fundraise 2005


# 139823 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 133874 16-Aug-2004 rwatson

White space cleanup for netinet before branch:

- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by: re (scottl)
Submitted by: Xin LI <delphij@frontfree.net>


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 112009 08-Mar-2003 jlemon

Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one. Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs


# 111145 19-Feb-2003 jlemon

Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block. Allow the socket and tcpcb structures to be freed
earlier than inpcb. Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs


# 101975 16-Aug-2002 alfred

make the strings for tcptimers, tanames and prurequests const to silence
warnings.


# 100335 18-Jul-2002 dillon

Introduce two new sysctl's:

net.inet.tcp.rexmit_min (default 3 ticks equiv)

This sysctl is the retransmit timer RTO minimum,
specified in milliseconds. This value is
designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

This sysctl is the retransmit timer RTO slop
which is added to every retransmit timeout and
is designed to handle protocol stack overheads
and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!). Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.


# 100270 17-Jul-2002 dillon

I don't know how the minimum retransmit timeout managed to get set to
one second but it badly breaks throughput on networks with minor packet
loss.

Complaints by: at least two people tracked down to this.
MFC after: 3 days


# 92723 19-Mar-2002 alfred

Remove __P.


# 55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 54018 02-Dec-1999 jlemon

Change the delayed ack time from 200ms to 100ms.

This results in closer behavior to earlier versions, where the fixed
200ms timer actually resulted in a delay anywhere from 1..200ms, with
the average delay being 100ms.

Pointed out by: dg


# 50704 31-Aug-1999 jlemon

Remove conversion macros that were used during development.


# 50673 30-Aug-1999 jlemon

Restructure TCP timeout handling:

- eliminate the fast/slow timeout lists for TCP and instead use a
callout entry for each timer.
- increase the TCP timer granularity to HZ
- implement "bad retransmit" recovery, as presented in
"On Estimating End-to-End Network Path Properties", by Allman and Paxson.

Submitted by: jlemon, wollmann


# 50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


# 29179 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 18280 13-Sep-1996 pst

Make the misnamed tcp initial keepalive timer value (which is really the
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.

[part 1 of two commits, I just realized I can't play with the indices as
I was typing this commit message.]


# 16367 14-Jun-1996 wollman

Better selection of initial retransmit timeout when no cached
RTT information is available.

Submitted by: kbracey@art.acorn.co.uk (Kevin Bracey)
(slightly modified by me)


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 12296 14-Nov-1995 phk

New style sysctl & staticize alot of stuff.


# 6348 14-Feb-1995 wollman

Get rid of some unneeded #ifdef TTCP lines. Also, get rid of some
bogus commons declared in header files.


# 6247 08-Feb-1995 wollman

Merge in T/TCP TCP header file changes.


# 2169 21-Aug-1994 paul

Made idempotent.

Submitted by: Paul


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources