History log of /freebsd-10-stable/sys/netinet/cc/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
293711 11-Jan-2016 hiren

MFC: r292011
MFC: r292012

Add an option to use rfc6675 based pipe/inflight bytes calculation in cubic and
newreno.

273847 30-Oct-2014 hselasky

MFC r273733, r273740 and r273773:

The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.

Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.

Sponsored by: Mellanox Technologies

273736 27-Oct-2014 hselasky

MFC r263710, r273377, r273378, r273423 and r273455:

- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by: Mellanox Technologies


/freebsd-10-stable/sys/amd64/amd64/fpu.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep-v6.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep.c
/freebsd-10-stable/sys/cam/scsi/scsi_sa.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
/freebsd-10-stable/sys/cddl/dev/dtrace/dtrace_sysctl.c
/freebsd-10-stable/sys/compat/ndis/kern_ndis.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus_wmi.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_hp.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_ibm.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_rapidstart.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_sony.c
/freebsd-10-stable/sys/dev/bxe/bxe.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/cxgbe/t4_main.c
/freebsd-10-stable/sys/dev/e1000/if_em.c
/freebsd-10-stable/sys/dev/e1000/if_igb.c
/freebsd-10-stable/sys/dev/e1000/if_lem.c
/freebsd-10-stable/sys/dev/hatm/if_hatm.c
/freebsd-10-stable/sys/dev/ixgbe/ixgbe.c
/freebsd-10-stable/sys/dev/ixgbe/ixv.c
/freebsd-10-stable/sys/dev/ixl/if_ixl.c
/freebsd-10-stable/sys/dev/mpr/mpr.c
/freebsd-10-stable/sys/dev/mps/mps.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.h
/freebsd-10-stable/sys/dev/mxge/if_mxge.c
/freebsd-10-stable/sys/dev/oce/oce_sysctl.c
/freebsd-10-stable/sys/dev/qlxgb/qla_os.c
/freebsd-10-stable/sys/dev/qlxgbe/ql_os.c
/freebsd-10-stable/sys/dev/rt/if_rt.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/vxge/vxge.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/devfs/devfs_devs.c
/freebsd-10-stable/sys/fs/fuse/fuse_main.c
/freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c
/freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdkrpc.c
/freebsd-10-stable/sys/geom/geom_kern.c
/freebsd-10-stable/sys/kern/kern_cpuset.c
/freebsd-10-stable/sys/kern/kern_descrip.c
/freebsd-10-stable/sys/kern/kern_mib.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/subr_devstat.c
/freebsd-10-stable/sys/kern/subr_kdb.c
/freebsd-10-stable/sys/kern/subr_uio.c
/freebsd-10-stable/sys/kern/vfs_cache.c
/freebsd-10-stable/sys/mips/mips/busdma_machdep.c
/freebsd-10-stable/sys/net/if_lagg.c
/freebsd-10-stable/sys/net/pfvar.h
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_superg.c
/freebsd-10-stable/sys/netgraph/bluetooth/common/ng_bluetooth.c
/freebsd-10-stable/sys/netgraph/ng_base.c
/freebsd-10-stable/sys/netgraph/ng_socket.c
cc_chd.c
/freebsd-10-stable/sys/netinet/tcp_reass.c
/freebsd-10-stable/sys/netipsec/ipsec.h
/freebsd-10-stable/sys/netipx/ipx_proto.c
/freebsd-10-stable/sys/netpfil/pf/if_pfsync.c
/freebsd-10-stable/sys/netpfil/pf/pf.c
/freebsd-10-stable/sys/netpfil/pf/pf_ioctl.c
/freebsd-10-stable/sys/ofed/drivers/net/mlx4/mlx4_en.h
/freebsd-10-stable/sys/powerpc/powermac/fcu.c
/freebsd-10-stable/sys/powerpc/powermac/smu.c
/freebsd-10-stable/sys/powerpc/powerpc/busdma_machdep.c
/freebsd-10-stable/sys/powerpc/powerpc/cpu.c
/freebsd-10-stable/sys/sys/sysctl.h
/freebsd-10-stable/sys/vm/memguard.c
/freebsd-10-stable/sys/vm/vm_kern.c
/freebsd-10-stable/sys/x86/x86/busdma_bounce.c
271690 16-Sep-2014 lstewart

MFC r270160:

Destroy the "qdiffsample_zone" UMA zone on unload to avoid a use-after-unload
panic easily triggered by running "sysctl -a" after unload.

Reported and tested by: Grenville Armitage <garmitage@swin.edu.au>
Approved by: re(gjb)

270711 27-Aug-2014 hselasky

MFC r269777:
Fix string length argument passed to "sysctl_handle_string()" so that
the complete string is returned by the function and not just only one
byte.

PR: 192544

262735 04-Mar-2014 glebius

Merge r261590: Fixup for r261590 (vnet sysctl handlers cleanup)

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


252504 02-Jul-2013 lstewart

Import an implementation of the CAIA Delay-Gradient (CDG) congestion control
algorithm, which is based on the 2011 v0.1 patch release and described in the
paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes
and Grenville Armitage. It is implemented as a kernel module compatible with the
modular congestion control framework.

CDG is a hybrid congestion control algorithm which reacts to both packet loss
and inferred queuing delay. It attempts to operate as a delay-based algorithm
where possible, but utilises heuristics to detect loss-based TCP cross traffic
and will compete effectively as required. CDG is therefore incrementally
deployable and suitable for use on shared networks.

In collaboration with: David Hayes <david.hayes at ieee.org> and
Grenville Armitage <garmitage at swin edu au>
MFC after: 4 days
Sponsored by: Cisco University Research Program and FreeBSD Foundation


220592 13-Apr-2011 pluknet

Staticize malloc types.

Approved by: lstewart
MFC after: 1 week


220560 12-Apr-2011 lstewart

Use the full and proper company name for Swinburne University of Technology
throughout the source tree.

Requested by: Grenville Armitage, Director of CAIA at Swinburne University of
Technology
MFC after: 3 days


218167 01-Feb-2011 lstewart

Algorithm modules can define their own private congestion signal types in the
top 8 bits of the 32 bit signal bit field space for internal use. These private
signals should not be leaked outside of a module.

Given that many algorithm modules use the NewReno hook functions to simplify
their implementation, the obvious place such a leak would show up is in the
NewReno cong_signal hook function.

- Show the full number of significant bits in the signal type definitions in
<netinet/cc.h>.

- Add a bitmask to simplify figuring out if a given signal is in the private or
public bit range.

- Add a sanity check in newreno_cong_signal() to ensure private signals are not
being leaked into the hook function.

Sponsored by: FreeBSD Foundation
Discussed with: David Hayes <dahayes at swin edu au>
MFC after: 1 week
X-MFC with: r215166


218156 01-Feb-2011 lstewart

Fix typo in comment: "course" -> "coarse"

Sponsored by: FreeBSD Foundation
Submitted by: jmallett
MFC after: 3 months
X-MFC with: r218152


218155 01-Feb-2011 lstewart

Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control
algorithm described in the paper "Improved coexistence and loss tolerance for
delay based TCP congestion control" by Hayes and Armitage. It is implemented as
a kernel module compatible with the recently committed modular congestion
control framework.

CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide
tolerance to non-congestion related packet loss and improvements to coexistence
with loss-based congestion control algorithms. A key idea in improving
coexistence with loss-based congestion control algorithms is the use of a shadow
window, which attempts to track how NewReno's congestion window (cwnd) would
evolve. At the next packet loss congestion event, CHD uses the shadow window to
correct cwnd in a way that reduces the amount of unfairness CHD experiences when
competing with loss-based algorithms.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months


218153 01-Feb-2011 lstewart

Import a clean-room implementation of the Hamilton-Delay (HD) congestion control
algorithm based on the paper "A strategy for fair coexistence of loss and
delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and
Baker. It is implemented as a kernel module compatible with the recently
committed modular congestion control framework.

HD uses a probabilistic approach to reacting to delay-based congestion. The
probability of reducing cwnd is zero when the queuing delay is very small,
increasing to a maximum at a set threshold, then back down to zero again when
the queuing delay is high. Normal operation keeps the queuing delay below the
set threshold. However, since loss-based congestion control algorithms push the
queuing delay high when probing for bandwidth, having the probability of
reducing cwnd drop back to zero for high delays allows HD to compete with
loss-based algorithms.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months


218152 01-Feb-2011 lstewart

Import a clean-room implementation of the VEGAS congestion control algorithm
based on the paper "TCP Vegas: end to end congestion avoidance on a global
internet" by Brakmo and Peterson. It is implemented as a kernel module
compatible with the recently committed modular congestion control framework.

VEGAS uses network delay as a congestion indicator and unlike regular loss-based
algorithms, attempts to keep the network operating with stable queuing delays
and no congestion losses. By keeping network buffers used along the path within
a set range, queuing delays are kept low while maintaining high throughput.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: bz and others along the way
MFC after: 3 months


217748 23-Jan-2011 lstewart

An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a
write to the buffer causes it to overflow. We therefore can't hold the CC list
rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND.

Switch to a fixed length sbuf which should be of sufficient size except in the
very unlikely event that the sysctl is being processed as one or more new
algorithms are loaded. If that happens, we accept the race and may fail the
sysctl gracefully if there is insufficient room to print the names of all the
algorithms.

This should address a WITNESS warning and the potential panic that would occur
if the sbuf call to malloc did sleep whilst holding the CC list rwlock.

Sponsored by: FreeBSD Foundation
Reported by: Nick Hibma
Reviewed by: bz
MFC after: 3 weeks
X-MFC with: r215166


217683 21-Jan-2011 lstewart

Some correctness and robustness fixes related to CUBIC's mean RTT estimate:

- The mean RTT is updated at the end of each congestion epoch, but if we switch
to congestion avoidance within the first epoch (e.g. if ssthresh was primed
from the hostcache), we'll trigger a divide by zero panic in
cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the
mean is less than the min to ensure we have a sane mean for use in this
situation. This fixes the panic reported by Nick Hibma.

- Adjust conditions under which we update the mean RTT in cubic_post_recovery()
to ensure a low latency path won't yield an RTT of less than 1. This avoids
another potential divide by zero panic when running CUBIC in networks with
sub-millisecond latencies.

- Remove the "safety" assignment of min into mean when we don't update the mean
because of failed conditions. The above change to the conditions for updating
the mean ensures the safety issue is addressed and I feel it is better to keep
our previous mean estimate around if we can't update than to revert to the
min.

- Initialise the mean RTT to 1 on connection startup to act as a safety belt if
a situation we haven't considered and addressed with the above changes were to
crop up in the wild.

Sponsored by: FreeBSD Foundation
Reported and tested by: Nick Hibma
Discussed with: David Hayes <dahayes at swin edu au>
MFC after: 5 weeks
X-MFC with: r216114


217322 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


216115 02-Dec-2010 lstewart

Import a clean-room implementation of the experimental H-TCP congestion control
algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.

H-TCP was designed to provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "H-TCP: A framework for congestion control in high-speed and long-distance
networks" provides additional detail.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: rpaulo (older patch from a few weeks ago)
MFC after: 3 months


216114 02-Dec-2010 lstewart

Import a clean-room implementation of the experimental CUBIC congestion control
algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is
implemented as a kernel module compatible with the recently committed modular
congestion control framework.

CUBIC was designed for provide increased throughput in fast and long-distance
networks. It attempts to maintain fairness when competing with legacy NewReno
TCP in lower speed scenarios where NewReno is able to operate adequately. The
paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional
detail.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: FreeBSD Foundation
Reviewed by: rpaulo (older patch from a few weeks ago)
MFC after: 3 months


216107 02-Dec-2010 lstewart

General cleanup of the NewReno CC module (no functional changes):

- Remove superfluous includes and unhelpful comments.

- Alphabetically order functions.

- Make functions static.

Sponsored by: FreeBSD Foundation
MFC after: 9 weeks
X-MFC with: r215166


216105 02-Dec-2010 lstewart

- Reinstantiate the after_idle hook call in tcp_output(), which got lost
somewhere along the way due to mismerging r211464 in our development tree.

- Capture the essence of r211464 in NewReno's after_idle() hook. We don't
use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited.

Sponsored by: FreeBSD Foundation
Submitted by: David Hayes <dahayes at swin edu au>
MFC after: 9 weeks
X-MFC with: r215166


215395 16-Nov-2010 lstewart

Make the CC framework more VIMAGE friendly by adding the machinery to allow
vnets to select their own default CC algorithm independent of each other and the
base system. If the base system or a vnet has set a default which gets unloaded,
we reset that netstack's default to NewReno.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 3 months


215393 16-Nov-2010 lstewart

- Querying the default CC algo is more common than setting it and the function
is small, so there is no good reason not to declare the buffer at the top.

- Fix a whitespace nit.

Sponsored by: FreeBSD Foundation
MFC after: 11 weeks
X-MFC with: r215166


215392 16-Nov-2010 lstewart

Move protocol specific implementation detail out of the core CC framework.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
MFC after: 11 weeks
X-MFC with: r215166


215391 16-Nov-2010 lstewart

On CC algorithm module unload, we walk the list of active TCP control blocks.
Any found to be using the algorithm that is about to go away are switched back
to NewReno to avoid leaving dangling pointers which would trigger a panic. For
VIMAGE kernels, there is a list per vnet to walk, yet the implementation was
only examining one of the vnet lists.

Fix the implementation of the above feature for VIMAGE kernels by looping
through all active TCP control blocks across all vnets.

Sponsored by: FreeBSD Foundation
Tested by: Mikolaj Golub <to.my.trociny at gmail com>
Reviewed by: bz (briefly)
MFC after: 11 weeks


215377 16-Nov-2010 lstewart

cc_init() should only be run once on system boot, but with VIMAGE kernels it
runs on boot and each time a vnet jail is created. Running cc_init() multiple
times results in a panic when attempting to initialise the cc_list lock again,
and so r215166 effectively broke the use of vnet jails.

Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded
on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in
this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is
initialised before module registration is attempted.

Sponsored by: FreeBSD Foundation
Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com>
MFC after: 11 weeks
X-MFC with: r215166


215166 12-Nov-2010 lstewart

This commit marks the first formal contribution of the "Five New TCP Congestion
Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details
about the project are available at: http://caia.swin.edu.au/freebsd/5cc/

- Add a KPI and supporting infrastructure to allow modular congestion control
algorithms to be used in the net stack. Algorithms can maintain per-connection
state if required, and connections maintain their own algorithm pointer, which
allows different connections to concurrently use different algorithms. The
TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to
programmatically query or change the congestion control algorithm respectively
from within an application at runtime.

- Integrate the framework with the TCP stack in as least intrusive a manner as
possible. Care was also taken to develop the framework in a way that should
allow integration with other congestion aware transport protocols (e.g. SCTP)
in the future. The hope is that we will one day be able to share a single set
of congestion control algorithm modules between all congestion aware transport
protocols.

- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack
and use it to decouple the meaning of recovery from a congestion event and
recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based
congestion control protocols don't generally need to recover from packet loss
and need a different way to note a congestion recovery episode within the
stack.

- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code
and ensures the stack always uses the appropriate mechanisms for recovering
from packet loss during a congestion recovery episode.

- Extract the NewReno congestion control algorithm from the TCP stack and
massage it into module form. NewReno is always built into the kernel and will
remain the default algorithm for the forseeable future. Implementations of
additional different algorithms will become available in the near future.

- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code
that relies on the size of "struct tcpcb" is required.

Many thanks go to the Cisco University Research Program Fund at Community
Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work
at the Centre for Advanced Internet Architectures, Swinburne University of
Technology is greatly appreciated.

In collaboration with: David Hayes <dahayes at swin edu au> and
Grenville Armitage <garmitage at swin edu au>
Sponsored by: Cisco URP, FreeBSD Foundation
Reviewed by: rpaulo
Tested by: David Hayes (and many others over the years)
MFC after: 3 months