History log of /freebsd-11-stable/sys/netinet/tcp_timer.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 334727 06-Jun-2018 tuexen

MFC r333178:

Simplify the call to tcp_drop(), since the handling of soft error
is also done in tcp_drop(). No functional change.

Sponsored by: Netflix, Inc.


# 332178 07-Apr-2018 tuexen

MFC r322967:

Fix blackhole detection.

There were two bugs related to the blackhole detection:
* The smalles size was tried more than two times.
* The restored MSS was not the original one, but the second
candidate.

Sponsored by: Netflix, Inc.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 330303 03-Mar-2018 jhb

MFC 328608: Export tcp_always_keepalive for use by the Chelsio TOM module.

This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Relative to HEAD the MFC includes two additional changes:
- The t3_tom module used for cxgb(4) is also patched.
- A strong reference from the new name (tcp_always_keepalive) to the old
name (always_keepalive) has been added to preserve the KBI for existing
modules.

Suggested by: kib (strong reference)
Sponsored by: Chelsio Communications


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 300042 17-May-2016 rrs

This small change adopts the excellent suggestion for using named
structures in the add of a new tcp-stack that came in late to me
via email after the last commit. It also makes it so that a new
stack may optionally get a callback during a retransmit
timeout. This allows the new stack to clear specific state (think
sack scoreboards or other such structures).

Sponsored by: Netflix Inc.
Differential Revision: http://reviews.freebsd.org/D6303


# 298995 03-May-2016 pfg

sys/net*: minor spelling fixes.

No functional change.


# 298743 28-Apr-2016 rrs

This cleans up the timers code in TCP to start using the new
async_drain functionality. This as been tested in NF as well as
by Verisign. Still to do in here is to remove all the old flags. They
are currently left being maintained but probably are no longer needed.

Sponsored by: Netflix Inc.
Differential Revision: http://reviews.freebsd.org/D5924


# 297225 24-Mar-2016 gnn

FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by: Mike Karels
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4306


# 294931 27-Jan-2016 glebius

Rename netinet/tcp_cc.h to netinet/cc/cc.h.

Discussed with: lstewart


# 294840 26-Jan-2016 hiren

Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds and
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.

Submitted by: Jason Wolfe (j at nitrology dot com)
Reviewed by: gnn, rrs, hiren, bz
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D5024


# 294535 21-Jan-2016 glebius

- Rename cc.h to more meaningful tcp_cc.h.
- Declare it a kernel only include, which it already is.
- Don't include tcp.h implicitly from tcp_cc.h


# 293284 06-Jan-2016 glebius

Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.

With this change:

- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
options length. The functions gives a better estimate, since it takes
into account SACK state as well.

Reviewed by: jtl
Differential Revision: https://reviews.freebsd.org/D3593


# 292706 24-Dec-2015 pkelsey

Implementation of server-side TCP Fast Open (TFO) [RFC7413].

TFO is disabled by default in the kernel build. See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.

Reviewed by: gnn, jch, stas
MFC after: 3 days
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D4350


# 292309 15-Dec-2015 rrs

First cut of the modularization of our TCP stack. Still
to do is to clean up the timer handling using the async-drain.
Other optimizations may be coming to go with this. Whats here
will allow differnet tcp implementations (one included).
Reviewed by: jtl, hiren, transports
Sponsored by: Netflix Inc.
Differential Revision: D4055


# 290805 13-Nov-2015 rrs

This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after: 1 Month with associated callout change.


# 289293 14-Oct-2015 hiren

Fix an unnecessarily aggressive behavior where mtu clamping begins on first
retransmission timeout (rto) when blackhole detection is enabled. Make
sure it only happens when the second attempt to send the same segment also fails
with rto.

Also make sure that each mtu probing stage (usually 1448 -> 1188 -> 524) follows
the same pattern and gets 2 chances (rto) before further clamping down.

Note: RFC4821 doesn't specify implementation details on how this situation
should be handled.

Differential Revision: https://reviews.freebsd.org/D3434
Reviewed by: sbruno, gnn (previous version)
MFC after: 2 weeks
Sponsored by: Limelight Networks


# 287759 13-Sep-2015 gnn

dd DTrace probe points, translators and a corresponding script
to provide the TCPDEBUG functionality with pure DTrace.

Reviewed by: rwatson
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: D3530


# 287304 30-Aug-2015 jch

Put r284245 back in place: If at first this fix was seen as a temporary
workaround for a callout(9) issue, it turns out it is instead the right
way to use callout in mpsafe mode without using callout_drain().

r284245 commit message:

Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision: https://reviews.freebsd.org/D2763


# 287101 24-Aug-2015 jch

Revert r284245: "Fix a callout race condition introduced in TCP
timers callouts with r281599."

r281599 fixed a TCP timer race condition, but due a callout(9) bug
it also introduced another race condition workaround-ed with r284245.
The callout(9) bug being fixed with r286880, we can now revert the
workaround (r284245).

Differential Revision: https://reviews.freebsd.org/D2079 (Initial change)
Differential Revision: https://reviews.freebsd.org/D2763 (Workaround)
Differential Revision: https://reviews.freebsd.org/D3078 (Fix)
Sponsored by: Verisign, Inc.
MFC after: 2 weeks


# 286873 18-Aug-2015 jch

Make clear that TIME_WAIT timeout expiration is managed solely by
tcp_tw_2msl_scan().

Sponsored by: Verisign, Inc.


# 286227 03-Aug-2015 jch

Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:

- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.


# 284245 10-Jun-2015 jch

Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.

Differential Revision: https://reviews.freebsd.org/D2763
Reviewed by: hiren
Approved by: hiren
MFC after: 1 day
Sponsored by: Verisign, Inc.


# 281599 16-Apr-2015 jch

Fix an old and well-documented use-after-free race condition in
TCP timers:
- Add a reference from tcpcb to its inpcb
- Defer tcpcb deletion until TCP timers have finished

Differential Revision: https://reviews.freebsd.org/D2079
Submitted by: jch, Marc De La Gueronniere <mdelagueronniere@verisign.com>
Reviewed by: imp, rrs, adrian, jhb, bz
Approved by: jhb
Sponsored by: Verisign, Inc.


# 280990 02-Apr-2015 jch

Provide better debugging information in tcp_timer_activate() and
tcp_timer_active()

Differential Revision: https://reviews.freebsd.org/D2179
Suggested by: bz
Reviewed by: jhb
Approved by: jhb


# 280904 31-Mar-2015 jch

Use appropriate timeout_t* instead of void* in tcp_timer_activate()

Suggested by: imp
Differential Revision: https://reviews.freebsd.org/D2154
Reviewed by: imp, jhb
Approved by: jhb


# 277331 18-Jan-2015 adrian

Refactor / restructure the RSS code into generic, IPv4 and IPv6 specific
bits.

The motivation here is to eventually teach netisr and potentially
other networking subsystems a bit more about how RSS work queues / buckets
are configured so things have a hope of auto-configuring in the future.

* net/rss_config.[ch] takes care of the generic bits for doing
configuration, hash function selection, etc;
* topelitz.[ch] is now in net/ rather than netinet/;
* (and would be in libkern if it didn't directly include RSS_KEYSIZE;
that's a later thing to fix up.)
* netinet/in_rss.[ch] now just contains the IPv4 specific methods;
* and netinet/in6_rss.[ch] now just contains the IPv6 specific methods.

This should have no functional impact on anyone currently using
the RSS support.

Differential Revision: D1383
Reviewed by: gnn, jfv (intel driver bits)


# 273850 30-Oct-2014 jch

Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and
tcp_tw_2msl_scan(). This race condition drives unplanned timewait
timeout cancellation. Also simplify implementation by holding inpcb
reference and removing tcptw reference counting.

Differential Revision: https://reviews.freebsd.org/D826
Submitted by: Marc De la Gueronniere <mdelagueronniere@verisign.com>
Submitted by: jch
Reviewed By: jhb (mentor), adrian, rwatson
Sponsored by: Verisign, Inc.
MFC after: 2 weeks
X-MFC-With: r264321


# 273377 21-Oct-2014 hselasky

Fix multiple incorrect SYSCTL arguments in the kernel:

- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 273063 13-Oct-2014 sbruno

Handle small file case with regards to plpmtud blackhole detection.

Submitted by: Mikhail <mp@lenta.ru>
MFC after: 2 weeks
Relnotes: yes


# 272720 07-Oct-2014 sbruno

Implement PLPMTUD blackhole detection (RFC 4821), inspired by code
from xnu sources. If we encounter a network where ICMP is blocked
the Needs Frag indicator may not propagate back to us. Attempt to
downshift the mss once to a preconfigured value.

Default this feature to off for now while we do not have a full PLPMTUD
implementation in our stack.

Adds the following new sysctl's for control:
net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature
net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4
net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6

Adds the following new sysctl's for monitoring:
-- Number of times the code was activated to attempt a mss downshift
net.inet.tcp.pmtud_blackhole_activated
-- Number of times the blackhole mss was used in an attempt to downshift
net.inet.tcp.pmtud_blackhole_min_activated
-- Number of times that we failed to connect after we downshifted the mss
net.inet.tcp.pmtud_blackhole_failed

Phabricator: https://reviews.freebsd.org/D506
Reviewed by: rpaulo bz
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Limelight Networks


# 268027 30-Jun-2014 adrian

If we're doing RSS then ensure the TCP timer selection uses the multi-CPU
callwheel setup, rather than just dumping all the timers on swi0.


# 266422 18-May-2014 adrian

When RSS is enabled and per cpu TCP timers are enabled, do an RSS
lookup for the inp flowid/flowtype to destination CPU.

This only modifies the case where RSS is enabled and the per-cpu tcp
timer option is enabled. Otherwise the behaviour should be the same
as before.


# 264321 10-Apr-2014 jhb

Currently, the TCP slow timer can starve TCP input processing while it
walks the list of connections in TIME_WAIT closing expired connections
due to contention on the global TCP pcbinfo lock.

To remediate, introduce a new global lock to protect the list of
connections in TIME_WAIT. Only acquire the TCP pcbinfo lock when
closing an expired connection. This limits the window of time when
TCP input processing is stopped to the amount of time needed to close
a single connection.

Submitted by: Julien Charbon <jcharbon@verisign.com>
Reviewed by: rwatson, rrs, adrian
MFC after: 2 months


# 247777 04-Mar-2013 davide

- Make callout(9) tickless, relying on eventtimers(4) as backend for
precise time event generation. This greatly improves granularity of
callouts which are not anymore constrained to wait next tick to be
scheduled.
- Extend the callout KPI introducing a set of callout_reset_sbt* functions,
which take a sbintime_t as timeout argument. The new KPI also offers a
way for consumers to specify precision tolerance they allow, so that
callout can coalesce events and reduce number of interrupts as well as
potentially avoid scheduling a SWI thread.
- Introduce support for dispatching callouts directly from hardware
interrupt context, specifying an additional flag. This feature should be
used carefully, as long as interrupt context has some limitations
(e.g. no sleeping locks can be held).
- Enhance mechanisms to gather informations about callwheel, introducing
a new sysctl to obtain stats.

This change breaks the KBI. struct callout fields has been changed, in
particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t'
(8 bytes) and another 'sbintime_t' field was added for precision.

Together with: mav
Reviewed by: attilio, bde, luigi, phk
Sponsored by: Google Summer of Code 2012, iXsystems inc.
Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm),
markj (amd64), mav, Fabian Keil


# 245238 09-Jan-2013 jhb

Don't drop options from the third retransmitted SYN by default. If the
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not. This can result in
very slow data transfers in the case of window scaling disagreements.

The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.

Reviewed by: net@
MFC after: 2 weeks


# 243603 27-Nov-2012 np

Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not).


# 242267 28-Oct-2012 andre

If the user has closed the socket then drop a persisting connection
after a much reduced timeout.

Typically web servers close their sockets quickly under the assumption
that the TCP connections goes away as well. That is not entirely true
however. If the peer closed the window we're going to wait for a long
time with lots of data in the send buffer.

MFC after: 2 weeks


# 242264 28-Oct-2012 andre

Update comment to reflect the change made in r242263.

MFC after: 2 weeks


# 242263 28-Oct-2012 andre

Add SACK_PERMIT to the list of TCP options that are switched off after
retransmitting a SYN three times.

MFC after: 2 weeks


# 242260 28-Oct-2012 andre

When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE,
the default retransmit timeout, as base to calculate the backoff
time until next try instead of the TCP_REXMTVAL() macro which only
works correctly when we already have measured an actual RTT+RTTVAR.

Before it would cause the first retransmit at RTOBASE, the next
four at the same time (!) about 200ms later, and then another one
again RTOBASE later.

MFC after: 2 weeks


# 242257 28-Oct-2012 andre

Remove bogus 'else' in #ifdef that prevented the rttvar from being reset
tcp_timer_rexmt() on retransmit for IPv6 sessions.

MFC after: 2 weeks


# 242250 28-Oct-2012 andre

When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
reduce the initial CWND to one segment. This reduction got lost
some time ago due to a change in initialization ordering.

Additionally in tcp_timer_rexmt() avoid entering fast recovery when
we're still in TCPS_SYN_SENT state.

MFC after: 2 weeks


# 239075 05-Aug-2012 trociny

In tcp timers, check INP_DROPPED flag a little later, after
callout_deactivate(), so if INP_DROPPED is set we return with the
timer active flag cleared.

For me this fixes negative keep timer values reported by `netstat -x'
for connections in CLOSE state.

Approved by: net (silence)
MFC after: 2 weeks


# 237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 231025 05-Feb-2012 glebius

Add new socket options: TCP_KEEPINIT, TCP_KEEPIDLE, TCP_KEEPINTVL and
TCP_KEEPCNT, that allow to control initial timeout, idle time, idle
re-send interval and idle send count on a per-socket basis.

Reviewed by: andre, bz, lstewart


# 226318 12-Oct-2011 np

Make sure the inp wasn't dropped when rexmt let go of the inp and
pcbinfo locks.

Reviewed by: andre@
MFC after: 7 days


# 222488 30-May-2011 rwatson

Decompose the current single inpcbinfo lock into two locks:

- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


# 221209 29-Apr-2011 jhb

TCP reuses t_rxtshift to determine the backoff timer used for both the
persist state and the retransmit timer. However, the code that implements
"bad retransmit recovery" only checks t_rxtshift to see if an ACK has been
received in during the first retransmit timeout window. As a result, if
ticks has wrapped over to a negative value and a socket is in the persist
state, it can incorrectly treat an ACK from the remote peer as a
"bad retransmit recovery" and restore saved values such as snd_ssthresh and
snd_cwnd. However, if the socket has never had a retransmit timeout, then
these saved values will be zero, so snd_ssthresh and snd_cwnd will be set
to 0.

If the socket is in fast recovery (this can be caused by excessive
duplicate ACKs such as those fixed by 220794), then each ACK that arrives
triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd
to be no larger than snd_ssthresh. In effect, the socket's send window
is permamently stuck at 0 even though the remote peer is advertising a
much larger window and pending data is only sent via TCP window probes
(so one byte every few seconds).

Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that
the various snd_*_prev fields in the pcb are valid and only perform
"bad retransmit recovery" if this flag is set in the pcb. The flag is set
on the first retransmit timeout that occurs and is cleared on subsequent
retransmit timeouts or when entering the persist state.

Reviewed by: bz
MFC after: 2 weeks


# 217126 07-Jan-2011 jhb

Trim extra spaces before tabs.


# 216621 21-Dec-2010 jhb

Fix a typo in a comment.

MFC after: 1 week


# 216101 01-Dec-2010 lstewart

Pass NULL instead of 0 for the th pointer value. NULL != 0 on all platforms.

Submitted by: David Hayes <dahayes at swin edu au>
MFC after: 9 weeks
X-MFC with: r215166


# 215166 12-Nov-2010 lstewart

This commit marks the first formal contribution of the "Five New TCP Congestion
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/

- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.

- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.

- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.

- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.

- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.

- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
that relies on the size of "struct tcpcb" is required.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo
Tested by: David Hayes (and many others over the years)
MFC after: 3 months


# 205391 20-Mar-2010 kmacy

- spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1
- don't default to acquiring tcbinfo lock exclusively in rexmt

MFC after: 7 days


# 204830 07-Mar-2010 rwatson

Locking the tcbinfo structure should not be necessary in tcp_timer_delack(),
so don't.

MFC after: 1 week
Reviewed by: bz
Sponsored by: Juniper Networks


# 197244 16-Sep-2009 silby

Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by: andre, rwatson, Eric Van Gyzen


# 196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


# 195760 19-Jul-2009 rwatson

Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by: bz
Approved by: re (kib)


# 195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 194305 16-Jun-2009 jhb

Trim extra sets of ()'s.

Requested by: bde


# 190948 11-Apr-2009 rwatson

Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel. This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after: 3 days


# 189848 15-Mar-2009 rwatson

Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted. Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

INP_TIMEWAIT
INP_SOCKREF
INP_ONESBCAST
INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after: 1 week (or after dependencies are MFC'd)
Reviewed by: bz


# 187289 15-Jan-2009 lstewart

Add TCP Appropriate Byte Counting (RFC 3465) support to kernel.

The new behaviour is on by default, and can be disabled by setting the
net.inet.tcp.rfc3465 sysctl to 0 to obtain previous behaviour.

The patch changes struct tcpcb in sys/netinet/tcp_var.h which breaks
the ABI. Bump __FreeBSD_version to 800061 accordingly. User space tools
that rely on the size of struct tcpcb (e.g. sockstat) need to be recompiled.

Reviewed by: rpaulo, gnn
Approved by: gnn, kmacy (mentors)
Sponsored by: FreeBSD Foundation


# 185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


# 183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 180631 20-Jul-2008 trhodes

Document a few sysctls.

Reviewed by: rwatson


# 179487 02-Jun-2008 rwatson

When allocating temporary storage to hold a TCP/IP packet header
template, use an M_TEMP malloc(9) allocation rather than an mbuf
with mtod(9) and dtom(9). This eliminates the last use of
dtom(9) in TCP.

MFC after: 3 weeks


# 178285 17-Apr-2008 rwatson

Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to
explicitly select write locking for all use of the inpcb mutex.
Update some pcbinfo lock assertions to assert locked rather than
write-locked, although in practice almost all uses of the pcbinfo
rwlock main exclusive, and all instances of inpcb lock acquisition
are exclusive.

This change should introduce (ideally) little functional change.
However, it lays the groundwork for significantly increased
parallelism in the TCP/IP code.

MFC after: 3 months
Tested by: kris (superset of committered patch)


# 172467 07-Oct-2007 silby

Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by: re (kensmith)


# 172312 24-Sep-2007 kib

Revert rev. 1.94. After recent tcp backouts, tcp_close() may return NULL.
Check the return value of tcp_close() being NULL before dereferencing it
in #ifdef TCPDEBUG block.

Reviewed by: rwatson
Approved by: re (gnn)


# 172309 24-Sep-2007 silby

Two changes:

- Reintegrate the ANSI C function declaration change
from tcp_timer.c rev 1.92

- Reorganize the tcpcb structure so that it has a single
pointer to the "tcp_timer" structure which contains all
of the tcp timer callouts. This change means that when
the single tcp timer change is reintegrated, tcpcb will
not change in size, and therefore the ABI between
netstat and the kernel will not change.

Neither of these changes should have any functional
impact.

Reviewed by: bmah, rrs
Approved by: re (bmah)


# 172074 07-Sep-2007 rwatson

Back out tcp_timer.c:1.93 and associated changes that reimplemented the many
TCP timers as a single timer, but retain the API changes necessary to
reintroduce this change. This will back out the source of at least two
reported problems: lock leaks in certain timer edge cases, and TCP timers
continuing to fire after a connection has closed (a bug previously fixed and
then reintroduced with the timer rewrite).

In a follow-up commit, some minor restylings and comment changes performed
after the TCP timer rewrite will be reapplied, and a further change to allow
the TCP timer rewrite to be added back without disturbing the ABI. The new
design is believed to be a good thing, but the outstanding issues are
leading to significant stability/correctness problems that are holding
up 7.0.

This patch was generated by silby, but is being committed by proxy due to
poor network connectivity for silby this week.

Approved by: re (kensmith)
Submitted by: silby
Tested by: rwatson, kris
Problems reported by: peter, kris, others


# 170464 09-Jun-2007 andre

Handle a race condition on >2 core machines in tcp_timer() when
a timer issues a shutdown and a simultaneous close on the socket
happens. This race condition is inherent in the current socket/
inpcb life cycle system but can be handled well.

Reported by: kris
Tested by: kris (on 8-core machine)


# 170024 27-May-2007 rwatson

In tcp_timer_2msl(), tp can never become NULL, so don't check it for
NULL before entering tcp_trace().

Found with: Coverity Prevent(tm)
CID: 1840


# 169608 16-May-2007 andre

Move TIME_WAIT related functions and timer handling from files
other than repo copied tcp_subr.c into tcp_timewait.c#1.284:

tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()

tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset()
tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop()
tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()

This is a mechanical move with appropriate renames and making
them static if used only locally.

The tcp_tw_2msl_scan() cleanup function is still run from the
tcp_slowtimo() in tcp_timer.c.


# 169454 10-May-2007 rwatson

Move universally to ANSI C function declarations, with relatively
consistent style(9)-ish layout.


# 169309 06-May-2007 andre

Fix two comments.


# 168615 11-Apr-2007 andre

Change the TCP timer system from using the callout system five times
directly to a merged model where only one callout, the next to fire,
is registered.

Instead of callout_reset(9) and callout_stop(9) the new function
tcp_timer_activate() is used which then internally manages the callout.

The single new callout is a mutex callout on inpcb simplifying the
locking a bit.

tcp_timer() is the called function which handles all race conditions
in one place and then dispatches the individual timer functions.

Reviewed by: rwatson (earlier version)


# 168364 04-Apr-2007 andre

Retire unused TCP_SACK_DEBUG.


# 167785 21-Mar-2007 andre

ANSIfy function declarations and remove register keywords for variables.
Consistently apply style to all function declarations.


# 167721 19-Mar-2007 andre

Match up SYSCTL declaration style.


# 167036 26-Feb-2007 mohans

Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.

Reviewed by: gnn, silby.


# 162111 07-Sep-2006 ru

Back when we had T/TCP support, we used to apply different
timeouts for TCP and T/TCP connections in the TIME_WAIT
state, and we had two separate timed wait queues for them.
Now that is has gone, the timeout is always 2*MSL again,
and there is no reason to keep two queues (the first was
unused anyway!).

Also, reimplement the remaining queue using a TAILQ (it
was technically impossible before, with two queues).


# 162108 07-Sep-2006 ru

Remove a microoptimization for i386 that was a micropessimization for amd64.


# 162064 06-Sep-2006 glebius

o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely
bad under high load. For example with 40k sockets and 25k tcptw
entries, connect() syscall can run for seconds. Debugging showed
that it iterates the cycle millions times and purges thousands of
tcptw entries at a time.
Besides practical unusability this change is architecturally
wrong. First, in_pcblookup_local() is used in connect() and bind()
syscalls. No stale entries purging shouldn't be done here. Second,
it is a layering violation.
o Return back the tcptw purging cycle to tcp_timer_2msl_tw(),
that was removed in rev. 1.78 by rwatson. The commit log of this
revision tells nothing about the reason cycle was removed. Now
we need this cycle, since major cleaner of stale tcptw structures
is removed.
o Disable probably necessary, but now unused
tcp_twrecycleable() function.

Reviewed by: ru


# 161226 11-Aug-2006 mohans

Fixes an edge case bug in timewait handling where ticks rolling over causing
the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry).
Reviewed by: silby


# 159199 03-Jun-2006 rwatson

When entering a timer on a tcpcb, don't continue processing if it has been
dropped. This prevents a bug introduced during the socket/pcb refcounting
work from occuring, in which occasionally the retransmit timer may fire
after a connection has been reset, resulting in the resulting R|A TCP
packet having a source port of 0, as the port reservation has been
released.

While here, fixing up some RUNLOCK->WUNLOCK bugs.

MFC after: 1 month


# 158644 16-May-2006 glebius

- Backout one line from 1.78. The tp can be freed by tcp_drop().
- Style next line.

Coverity ID: 912


# 158304 05-May-2006 rwatson

Only return (tw) from tcp_twclose() if reuse is passed, otherwise
return NULL. In principle this shouldn't change the behavior, but
avoids returning a potentially invalid/inappropriate pointer to
the caller.

Found with: Coverity Prevent (tm)
Submitted by: pjd
MFC after: 3 months


# 157376 01-Apr-2006 rwatson

Update TCP for infrastructural changes to the socket/pcb refcount model,
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
never NULL, converting dozens of unnecessary NULL checks into
assertions, and eliminating dozens of unnecessary error handling
cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
longer required to ensure so_pcb != NULL. For example, the receive
code no longer requires the pcbinfo lock, and the send code only
requires it if building a new connection on an otherwise unconnected
socket triggered via sendto() with an address. This should
significnatly reduce tcbinfo lock contention in the receive and send
cases.

- In order to support the invariant that so_pcb != NULL, it is now
necessary for the TCP code to not discard the tcpcb any time a
connection is dropped, but instead leave the tcpcb until the socket
is shutdown. This case is handled by setting INP_DROPPED, to
substitute for using a NULL so_pcb to indicate that the connection
has been dropped. This requires the inpcb lock, but not the pcbinfo
lock.

- Unlike all other protocols in the tree, TCP may need to retain access
to the socket after the file descriptor has been closed. Set
SS_PROTOREF in tcp_detach() in order to prevent the socket from being
freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether
or not it needs to free the socket when the connection finally does
close. The typical case where this occurs is if close() is called on
a TCP socket before all sent data in the send socket buffer has been
transmitted or acknowledged. If INP_SOCKREF is found when the
connection is dropped, we release the inpcb, tcpcb, and socket instead
of flagging INP_DROPPED.

- Abort and detach protocol switch methods no longer return failures,
nor attempt to free sockets, as the socket layer does this.

- Annotate the existence of a long-standing race in the TCP timer code,
in which timers are stopped but not drained when the socket is freed,
as waiting for drain may lead to deadlocks, or have to occur in a
context where waiting is not permitted. This race has been handled
by testing to see if the tcpcb pointer in the inpcb is NULL (and vice
versa), which is not normally permitted, but may be true of a inpcb
and tcpcb have been freed. Add a counter to test how often this race
has actually occurred, and a large comment for each instance where
we compare potentially freed memory with NULL. This will have to be
fixed in the near future, but requires is to further address how to
handle the timer shutdown shutdown issue.

- Several TCP calls no longer potentially free the passed inpcb/tcpcb,
so no longer need to return a pointer to indicate whether the argument
passed in is still valid.

- Un-macroize debugging and locking setup for various protocol switch
methods for TCP, as it lead to more obscurity, and as locking becomes
more customized to the methods, offers less benefit.

- Assert copyright on tcp_usrreq.c due to significant modifications that
have been made as part of this work.

These changes significantly modify the memory management and connection
logic of our TCP implementation, and are (as such) High Risk Changes,
and likely to contain serious bugs. Please report problems to the
current@ mailing list ASAP, ideally with simple test cases, and
optionally, packet traces.

MFC after: 3 months


# 157136 25-Mar-2006 rwatson

Explicitly assert socket pointer is non-NULL in tcp_input() so as to
provide better debugging information.

Prefer explicit comparison to NULL for tcpcb pointers rather than
treating them as booleans.

MFC after: 1 month


# 155758 16-Feb-2006 andre

Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead
of being private to tcp_timer.c.

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


# 148156 19-Jul-2005 rwatson

Remove no-op spl's and most comment references to spls, as TCP locking
is believed to be basically done (modulo any remaining bugs).

MFC after: 3 days


# 146463 20-May-2005 ps

Replace t_force with a t_flag (TF_FORCEDATA).

Submitted by: Raja Mukerji.
Reviewed by: Mohan, Silby, Andre Opperman.


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 139220 22-Dec-2004 rwatson

Remove the now unused tcp_canceltimers() function. tcpcb timers are
now stopped as part of tcp_discardcb().

MFC after: 2 weeks


# 139219 22-Dec-2004 rwatson

Remove an annotation of a minor race relating to the update of
multiple MIB entries using sysctl in short order, which might
result in unexpected values for tcp_maxidle being generated by
tcp_slowtimo. In practice, this will not happen, or at least,
doesn't require an explicit comment.

MFC after: 2 weeks


# 138416 05-Dec-2004 rwatson

Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields in
the tcptw undergo non-atomic read-modify-writes.

MFC after: 2 weeks


# 138025 23-Nov-2004 rwatson

tcp_timewait() performs multiple non-atomic reads on the tcptw
structure, so assert the inpcb lock associated with the tcptw.
Also assert the tcbinfo lock, as tcp_timewait() may call
tcp_twclose() or tcp_2msl_rest(), which require it. Since
tcp_timewait() is already called with that lock from tcp_input(),
this doesn't change current locking, merely documents reasons for
it.

In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest()
is called, which requires that lock.

In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop()
is called, which requires that lock.

Document the locking strategy for the time wait queues in tcp_timer.c,
which consists of protecting the time wait queues in the same manner
as the tcbinfo structure (using the tcbinfo lock).

In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait
queues are modified.

In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait
queues may be modified.

In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait
queues may be modified.

MFC after: 2 weeks


# 138024 23-Nov-2004 rwatson

De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible
but unlikely races that could be corrected by having tcp_keepcnt
and tcp_keepintvl modifications go through handler functions via
sysctl, but probably is not worth doing. Updates to multiple
sysctls within evaluation of a single addition are unlikely.

Annotate that tcp_canceltimers() is currently unused.

De-spl tcp_timer_delack().

De-spl tcp_timer_2msl().

MFC after: 2 weeks


# 137139 02-Nov-2004 andre

Remove RFC1644 T/TCP support from the TCP side of the network stack.

A complete rationale and discussion is given in this message
and the resulting discussion:

http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706

Note that this commit removes only the functional part of T/TCP
from the tcp_* related functions in the kernel. Other features
introduced with RFC1644 are left intact (socket layer changes,
sendmsg(2) on connection oriented protocols) and are meant to
be reused by a simpler and less intrusive reimplemention of the
previous T/TCP functionality.

Discussed on: -arch


# 133874 16-Aug-2004 rwatson

White space cleanup for netinet before branch:

- Trailing tab/space cleanup
- Remove spurious spaces between or before tabs

This change avoids touching files that Andre likely has in his working
set for PFIL hooks changes for IPFW/DUMMYNET.

Approved by: re (scottl)
Submitted by: Xin LI <delphij@frontfree.net>


# 130989 23-Jun-2004 ps

Add support for TCP Selective Acknowledgements. The work for this
originated on RELENG_4 and was ported to -CURRENT.

The scoreboarding code was obtained from OpenBSD, and many
of the remaining changes were inspired by OpenBSD, but not
taken directly from there.

You can enable/disable sack using net.inet.tcp.do_sack. You can
also limit the number of sack holes that all senders can have in
the scoreboard with net.inet.tcp.sackhole_limit.

Reviewed by: gnn
Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)


# 128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 122922 20-Nov-2003 andre

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


# 122326 08-Nov-2003 sam

use local values instead of chasing pointers

Supported by: FreeBSD Foundation


# 117650 15-Jul-2003 hsu

Unify the "send high" and "recover" variables as specified in the
lastest rev of the spec. Use an explicit flag for Fast Recovery. [1]

Fix bug with exiting Fast Recovery on a retransmit timeout
diagnosed by Lu Guohan. [2]

Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com>
Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2]
Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>,
Sally Floyd <floyd@acm.org> [1]


# 115824 04-Jun-2003 hsu

Compensate for decreasing the minimum retransmit timeout.

Reviewed by: jlemon


# 112009 08-Mar-2003 jlemon

Remove a panic(); if the zone allocator can't provide more timewait
structures, reuse the oldest one. Also move the expiry timer from
a per-structure callout to the tcp slow timer.

Sponsored by: DARPA, NAI Labs


# 111145 19-Feb-2003 jlemon

Add a TCP TIMEWAIT state which uses less space than a fullblown TCP
control block. Allow the socket and tcpcb structures to be freed
earlier than inpcb. Update code to understand an inp w/o a socket.

Reviewed by: hsu, silby, jayanth
Sponsored by: DARPA, NAI Labs


# 111144 19-Feb-2003 jlemon

Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the
routine does not require a tcpcb to operate. Since we no longer keep
template mbufs around, move pseudo checksum out of this routine, and
merge it with the length update.

Sponsored by: DARPA, NAI Labs


# 109175 13-Jan-2003 hsu

Fix NewReno.

Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>


# 108265 24-Dec-2002 hsu

Validate inp to prevent an use after free.


# 102967 05-Sep-2002 bde

Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending
on namespace pollution 4 layers deep in <netinet/in_pcb.h>.

Removed unused includes. Sorted includes.


# 100420 20-Jul-2002 jdp

Fix overflows in intermediate calculations in sysctl_msec_to_ticks().
At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle
to be reported as negative.

MFC after: 3 days


# 100335 18-Jul-2002 dillon

Introduce two new sysctl's:

net.inet.tcp.rexmit_min (default 3 ticks equiv)

This sysctl is the retransmit timer RTO minimum,
specified in milliseconds. This value is
designed for algorithmic stability only.

net.inet.tcp.rexmit_slop (default 200ms)

This sysctl is the retransmit timer RTO slop
which is added to every retransmit timeout and
is designed to handle protocol stack overheads
and delayed ack issues.

Note that the *original* code applied a 1-second
RTO minimum but never applied real slop to the RTO
calculation, so any RTO calculation over one second
would have no slop and thus not account for
protocol stack overheads (TCP timestamps are not
a measure of protocol turnaround!). Essentially,
the original code made the RTO calculation almost
completely irrelevant.

Please note that the 200ms slop is debateable.
This commit is not meant to be a line in the sand,
and if the community winds up deciding that increasing
it is the correct solution then it's easy to do.
Note that larger values will destroy performance
on lossy networks while smaller values may result in
a greater number of unnecessary retransmits.


# 98102 10-Jun-2002 hsu

Lock up inpcb.

Submitted by: Jennifer Yang <yangjihui@yahoo.com>


# 97658 31-May-2002 tanimura

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


# 96972 20-May-2002 tanimura

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


# 87499 07-Dec-2001 rwatson

o Our currenty userland boot code (due to rc.conf and rc.network) always
enables TCP keepalives using the net.inet.tcp.always_keepalive by default.
Synchronize the kernel default with the userland default.


# 82122 21-Aug-2001 silby

Much delayed but now present: RFC 1948 style sequence numbers

In order to ensure security and functionality, RFC 1948 style
initial sequence number generation has been implemented. Barring
any major crypographic breakthroughs, this algorithm should be
unbreakable. In addition, the problems with TIME_WAIT recycling
which affect our currently used algorithm are not present.

Reviewed by: jesper


# 79413 08-Jul-2001 silby

Temporary feature: Runtime tuneable tcp initial sequence number
generation scheme. Users may now select between the currently used
OpenBSD algorithm and the older random positive increment method.

While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT
handling; this is causing trouble for an increasing number of folks.

To switch between generation schemes, one sets the sysctl
net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments,
1 = the OpenBSD algorithm. 1 is still the default.

Once a secure _and_ compatible algorithm is implemented, this sysctl
will be removed.

Reviewed by: jlemon
Tested by: numerous subscribers of -net


# 78642 23-Jun-2001 silby

Eliminate the allocation of a tcp template structure for each
connection. The information contained in a tcptemp can be
reconstructed from a tcpcb when needed.

Previously, tcp templates required the allocation of one
mbuf per connection. On large systems, this change should
free up a large number of mbufs.

Reviewed by: bmilekic, jlemon, ru
MFC after: 2 weeks


# 77539 31-May-2001 jesper

Disable rfc1323 and rfc1644 TCP extensions if we havn't got
any response to our third SYN to work-around some broken
terminal servers (most of which have hopefully been retired)
that have bad VJ header compression code which trashes TCP
segments containing unknown-to-them TCP options.

PR: kern/1689
Submitted by: jesper
Reviewed by: wollman
MFC after: 2 weeks


# 75733 20-Apr-2001 jesper

Say goodbye to TCP_COMPAT_42

Reviewed by: wollman
Requested by: wollman


# 75620 17-Apr-2001 kris

Note that the previous commit also restored some historical behaviour
in the TCP_COMPAT_42 case (e.g. choosing '1' as the initial sequence
number at boot-time, instead of randomizing it). TCP_COMPAT_42 is the
repository for old security holes, too :-)


# 75619 17-Apr-2001 kris

Randomize the TCP initial sequence numbers more thoroughly.

Obtained from: OpenBSD
Reviewed by: jesper, peter, -developers


# 73110 26-Feb-2001 jlemon

Use more aggressive retransmit timeouts for the initial SYN packet.
As we currently drop the connection after 4 retransmits + 2 ICMP errors,
this allows initial connection attempts to be dropped much faster.


# 66552 02-Oct-2000 jlemon

If TCPDEBUG is defined, we could dereference a tp which was freed.


# 65906 15-Sep-2000 jlemon

It is possible for a TCP callout to be removed from the timing wheel,
but have a network interrupt arrive and deactivate the timeout before
the callout routine runs. Check for this case in the callout routine;
it should only run if the callout is active and not on the wheel.


# 62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


# 62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


# 60067 06-May-2000 jlemon

Implement TCP NewReno, as documented in RFC 2582. This allows
better recovery for multiple packet losses in a single window.
The algorithm can be toggled via the sysctl net.inet.tcp.newreno,
which defaults to "on".

Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>


# 55679 09-Jan-2000 shin

tcp updates to support IPv6.
also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


# 50705 31-Aug-1999 jlemon

Simplify, and return an error if the user attempts to set a TCP
time value which results in < 1 tick.

Suggested by: bde


# 50682 31-Aug-1999 jlemon

Add a SYSCTL_PROC so that TCP timer values are now expressed to
the user in ms, while they are stored internally as ticks. Note
that there probably are rounding bogons here, especially on the
alpha.


# 50673 30-Aug-1999 jlemon

Restructure TCP timeout handling:

- eliminate the fast/slow timeout lists for TCP and instead use a
callout entry for each timer.
- increase the TCP timer granularity to HZ
- implement "bad retransmit" recovery, as presented in
"On Estimating End-to-End Network Path Properties", by Allman and Paxson.

Submitted by: jlemon, wollmann


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


# 35419 24-Apr-1998 dg

Ensure that TCP_REXMTVAL doesn't return a value less than t_rttmin. This
is believed to have been broken with the Brakmo/Peterson srtt
calculation changes. The result of this bug is that TCP connections
could time out extremely quickly (in 12 seconds).
Also backed out jdp's partial fix for this problem in rev 1.17 of
tcp_timer.c as it is obsoleted by this commit.
Bug was pointed out by Kevin Lehey <kml@roller.nas.nasa.gov>.

PR: 6068


# 35056 06-Apr-1998 phk

Remove the last traces of TUBA.

Inspired by: PR kern/3317


# 33846 26-Feb-1998 dg

Changes to support the addition of a new sysctl variable:
net.inet.tcp.delack_enabled
Which defaults to 1 and can be set to 0 to disable TCP delayed-ack
processing (i.e. all acks are immediate).


# 32752 25-Jan-1998 eivind

Make TCP_COMPAT_42 a new style option.


# 29514 16-Sep-1997 joerg

Make TCPDEBUG a new-style option.


# 27845 02-Aug-1997 bde

Removed unused #includes.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 18280 13-Sep-1996 pst

Make the misnamed tcp initial keepalive timer value (which is really the
time, in seconds, that state for non-established TCP sessions stays about)
a sysctl modifyable variable.

[part 1 of two commits, I just realized I can't play with the indices as
I was typing this commit message.]


# 17138 12-Jul-1996 dg

Fixed two bugs in previous commit: be sure to include tcp_debug.h when
TCPDEBUG is defined, and fix typo in TCPDEBUG2() macro.


# 17096 11-Jul-1996 wollman

Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.


# 16099 03-Jun-1996 jdp

Fix a bug in the handling of the "persist" state which, under certain
circumstances, caused perfectly good connections to be dropped. This
happened for connections over a LAN, where the retransmit timer
calculation TCP_REXMTVAL(tp) returned 0. If sending was blocked by flow
control for long enough, the old code dropped the connection, even
though timely replies were being received for all window probes.

Reviewed by: W. Richard Stevens <rstevens@noao.edu>


# 15262 15-Apr-1996 dg

Two fixes from Rich Stevens:

1) Set the persist timer to help time-out connections in the CLOSING state.
2) Honor the keep-alive timer in the CLOSING state.

This fixes problems with connections getting "stuck" due to incompletion
of the final connection shutdown which can be a BIG problem on busy WWW
servers.


# 15039 04-Apr-1996 phk

Add a sysctl (net.inet.tcp.always_keepalive: 0) that when set will force
keepalive on all tcp sessions. Setsockopt(2) cannot override this setting.
Maybe another one is needed that just changes the default for SO_KEEPALIVE ?
Requested by: Joe Greco <jgreco@brasil.moneng.mei.com>


# 14546 11-Mar-1996 dg

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


# 13229 04-Jan-1996 olah

Reverse the modification which caused the annoying m_copydata crash: set
the TF_ACKNOW flag when the REXMT timer goes off to force a
retransmission. In certain situations pulling snd_nxt back to snd_una
is not sufficient.


# 12296 14-Nov-1995 phk

New style sysctl & staticize alot of stuff.


# 12172 09-Nov-1995 phk

Start adding new style sysctl here too.


# 12046 03-Nov-1995 olah

Setting the TF_ACKNOW flag was redundant in the REXMT timeout because
tcp_output() checks for the condition snd_nxt == snd_una.

Reviewed by: davidg, wollman, olah
Suggested by: Richard Stevens


# 11150 03-Oct-1995 wollman

Finish 4.4-Lite-2 merge: randomize TCP initial sequence numbers
to make ISS-guessing spoofing attacks harder.


# 9773 29-Jul-1995 dg

Add connection drop capability for persist timeouts.

Reviewed by: Andras Olah
Obtained from: 4.4BSD-lite2 via W. Richard Stevens


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 7770 12-Apr-1995 dg

Fixed bug I introduced when changing PCB list to use 4.4BSD style queue
macros. Basically, detect 'tp' going away differently.


# 7684 08-Apr-1995 dg

Implemented PCB hashing. Includes new functions in_pcbinshash, in_pcbrehash,
and in_pcblookuphash.


# 6475 15-Feb-1995 wollman

Transaction TCP support now standard. Hack away!


# 6283 09-Feb-1995 wollman

Merge Transaction TCP, courtesy of Andras Olah <olah@cs.utwente.nl> and
Bob Braden <braden@isi.edu>.

NB: This has not had David's TCP ACK hack re-integrated. It is not clear
what the correct solution to this problem is, if any. If a better solution
doesn't pop up in response to this message, I'll put David's code back in
(or he's welcome to do so himself).


# 1817 02-Aug-1994 dg

Added $Id$


# 1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources