History log of /freebsd-current/sys/netinet/tcp_ratelimit.c
Revision Date Author Comments
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 69c7c811 16-Mar-2023 Randall Stewart <rrs@FreeBSD.org>

Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.

The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831


# c0e4090e 08-Feb-2023 Andrew Gallatin <gallatin@FreeBSD.org>

ktls: Accurately track if ifnet ktls is enabled

This allows us to avoid spurious calls to ktls_disable_ifnet()

When we implemented ifnet kTLSe, we set a flag in the tx socket
buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that
now, or in the past, ifnet ktls was active on a socket. Later,
I added code to switch ifnet ktls sessions to software in the case
of lossy TCP connections that have a high retransmit rate.
Because TCP was using SB_TLS_IFNET to know if it needed to do math
to calculate the retransmit ratio and potentially call into
ktls_disable_ifnet(), it was doing unneeded work long after
a session was moved to software.

This patch carefully tracks whether or not ifnet ktls is still enabled
on a TCP connection. Because the inp is now embedded in the tcpcb, and
because TCP is the most frequent accessor of this state, it made sense to
move this from the socket buffer flags to the tcpcb. Because we now need
reliable access to the tcbcb, we take a ref on the inp when creating a tx
ktls session.

While here, I noticed that rack/bbr were incorrectly implementing
tfb_hwtls_change(), and applying the change to all pending sends,
when it should apply only to future sends.

This change reduces spurious calls to ktls_disable_ifnet() by 95% or so
in a Netflix CDN environment.

Reviewed by: markj, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D38380


# 3d0d5b21 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Explicitly include <net/if_private.h> in netstack

Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200


# 26bdd35c 05-Jan-2023 Randall Stewart <rrs@FreeBSD.org>

rack and bbr not loading if TCP_RATELIMIT is not configured.

So it turns out that rack and bbr still will not load without TCP_RATELIMIT. This needs
to be fixed and lets also at the same time bring tcp_ratelimit up to date where we allow
the transports to set a divisor (though still having a default path with the default
divisor of 1000) for setting the burst size.

Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D37954


# eaabc937 14-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: retire TCPDEBUG

This subsystem is superseded by modern debugging facilities,
e.g. DTrace probes and TCP black box logging.

We intentionally leave SO_DEBUG in place, as many utilities may
set it on a socket. Also the tcp::debug DTrace probes look at
this flag on a socket.

Reviewed by: gnn, tuexen
Discussed with: rscheff, rrs, jtl
Differential revision: https://reviews.freebsd.org/D37694


# 9eb0e832 08-Nov-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: provide macros to access inpcb and socket from a tcpcb

There should be no functional changes with this commit.

Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37123


# 0ab46f28 03-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: remove unnecessary include of tcp6_var.h

Reviewed by: rscheff, melifaro
Differential revision: https://reviews.freebsd.org/D36725


# d782385e 31-Jan-2022 John Baldwin <jhb@FreeBSD.org>

tcp_ratelimit: Handle some edge cases with TLS + RL send tags.

- After a connection has fallen back from NIC TLS to SW TLS, any
pacing rate changes should modify the inpcb send tag even though
SB_TLS_IFNET is set.

- If a connection tries to modify the pacing rate before the send
tag has been converted from plain TLS to TLS + RL, don't fail
the rate request set but let it fall through to setting the rate
on the non-TLS inpcb RL tag.

Reviewed by: gallatin, rrs, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34085


# 8a7404b2 27-Jan-2022 Andrew Gallatin <gallatin@FreeBSD.org>

tcp: fix leaks in tcp_chg_pacing_rate error paths

tcp_chg_pacing_rate() is expected to release the hw rate limit table,
but failed to do so in several error cases, leading to ever
increasing counts of flows using the rate.

This patch was mostly done by rrs

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D34058
Reviewed by: hselasky, rrs, jhb (inital version, outside of Differential)


# aac52f94 18-Jan-2022 Randall Stewart <rrs@FreeBSD.org>

tcp: Warning cleanup from new compiler.

The clang compiler recently got an update that generates warnings of unused
variables where they were set, and then never used. This revision goes through
the tcp stack and cleans all of those up.

Reviewed by: Michael Tuexen, Gleb Smirnoff
Sponsored by: Netflix Inc.
Differential Revision:


# c782ea8b 14-Sep-2021 John Baldwin <jhb@FreeBSD.org>

Add a switch structure for send tags.

Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'. A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions. Now, those type-specific functions are saved
in the switch structure and invoked directly. In addition, this more
gracefully permits multiple implementations of the same tag within a
driver. In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by: gallatin, hselasky, kib (older version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31572


# 5d8fd932 06-May-2021 Randall Stewart <rrs@FreeBSD.org>

This brings into sync FreeBSD with the netflix versions of rack and bbr.
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..

Sponsored by: Netflix
Reviewed by: rscheff, mtuexen
Differential Revision: https://reviews.freebsd.org/D30036


# db46c0d0 01-Feb-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix LINT kernel builds after 1a714ff20419 .

MFC after: 1 week
Discussed with: rrs@
Differential Revision: https://reviews.freebsd.org/D28357
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 1a714ff2 26-Jan-2021 Randall Stewart <rrs@FreeBSD.org>

This pulls over all the changes that are in the netflix
tree that fix the ratelimit code. There were several bugs
in tcp_ratelimit itself and we needed further work to support
the multiple tag format coming for the joint TLS and Ratelimit dances.

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28357


# 36e0a362 29-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().

This gives a more uniform API for send tag life cycle management.

Reviewed by: gallatin, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27000


# 98d7a8d9 29-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Call m_snd_tag_rele() to free send tags.

Send tags are refcounted and if_snd_tag_free() is called by
m_snd_tag_rele() when the last reference is dropped on a send tag.

Reviewed by: gallatin, hselasky
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26995


# 7552deb2 29-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Remove an extra if_ref().

In r348254, if_snd_tag_alloc() routines were changed to bump the ifp
refcount via m_snd_tag_init(). This function wasn't in the tree at
the time and wasn't updated for the new semantics, so was still doing
a separate bump after if_snd_tag_alloc() returned.

Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26999


# 521eac97 28-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Support hardware rate limiting (pacing) with TLS offload.

- Add a new send tag type for a send tag that supports both rate
limiting (packet pacing) and TLS offload (mostly similar to D22669
but adds a separate structure when allocating the new tag type).

- When allocating a send tag for TLS offload, check to see if the
connection already has a pacing rate. If so, allocate a tag that
supports both rate limiting and TLS offload rather than a plain TLS
offload tag.

- When setting an initial rate on an existing ifnet KTLS connection,
set the rate in the TCP control block inp and then reset the TLS
send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit
send tag. This allocates the TLS send tag asynchronously from a
task queue, so the TLS rate limit tag alloc is always sleepable.

- When modifying a rate on a connection using KTLS, look for a TLS
send tag. If the send tag is only a plain TLS send tag, assume we
failed to allocate a TLS ratelimit tag (either during the
TCP_TXTLS_ENABLE socket option, or during the send tag reset
triggered by ktls_output_eagain) and ignore the new rate. If the
send tag is a ratelimit TLS send tag, change the rate on the TLS tag
and leave the inp tag alone.

- Lock the inp lock when setting sb_tls_info for a socket send buffer
so that the routines in tcp_ratelimit can safely dereference the
pointer without needing to grab the socket buffer lock.

- Add an IFCAP_TXTLS_RTLMT capability flag and associated
administrative controls in ifconfig(8). TLS rate limit tags are
only allocated if this capability is enabled. Note that TLS offload
(whether unlimited or rate limited) always requires IFCAP_TXTLS[46].

Reviewed by: gallatin, hselasky
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26691


# ce398115 28-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Save the current TCP pacing rate in t_pacing_rate.

Reviewed by: gallatin, gnn
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26875


# 9aed26b9 06-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Check if_capenable, not if_capabilities when enabling rate limiting.

if_capabilities is a read-only mask of supported capabilities.
if_capenable is a mask under administrative control via ifconfig(8).

Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26690


# 662c1305 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

net: clean up empty lines in .c and .h files


# 28540ab1 08-Apr-2020 Warner Losh <imp@FreeBSD.org>

Fix copyright year and eliminate the obsolete all rights reserved line.

Reviewed by: rrs@


# c012cfe6 27-Mar-2020 Ed Maste <emaste@FreeBSD.org>

sys/netinet: remove spurious doubled ;s


# 98085bae 09-Mar-2020 Andrew Gallatin <gallatin@FreeBSD.org>

make lacp's use_numa hashing aware of send tags

When I did the use_numa support, I missed the fact that there is
a separate hash function for send tag nic selection. So when
use_numa is enabled, ktls offload does not work properly, as it
does not reliably allocate a send tag on the proper egress nic
since different egress nics are selected for send-tag allocation
and packet transmit. To fix this, this change:

- refectors lacp_select_tx_port_by_hash() and
lacp_select_tx_port() to make lacp_select_tx_port_by_hash()
always called by lacp_select_tx_port()

- pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash()

- adds a numa_domain field to if_snd_tag_alloc_params

- plumbs the numa domain into places where we allocate send tags

In testing with NIC TLS setup on a NUMA machine, I see thousands
of output errors before the change when enabling
kern.ipc.tls.ifnet.permitted=1. After the change, I see no
errors, and I see the NIC sysctl counters showing active TLS
offload sessions.

Reviewed by: rrs, hselasky, jhb
Sponsored by: Netflix


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# d7313dc6 26-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

This commit expands tcp_ratelimit to be able to handle cards
like the mlx-c5 and c6 that require a "setup" routine before
the tcp_ratelimit code can declare and use a rate. I add the
setup routine to if_var as well as fix tcp_ratelimit to call it.
I also revisit the rates so that in the case of a mlx card
of type c5/6 we will use about 100 rates concentrated in the range
where the most gain can be had (1-200Mbps). Note that I have
tested these on a c5 and they work and perform well. In fact
in an unloaded system they pace right to the correct rate (great
job mlx!). There will be a further commit here from Hans that
will add the respective changes to the mlx driver to support this
work (which I was testing with).

Sponsored by: Netflix Inc.
Differential Revision: ttps://reviews.freebsd.org/D23647


# 348404bc 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

Lets get the real correct version.. gessh. I need
more coffee evidently.

Sponsored by: Netflix


# b8f8a6b7 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

Opps committed the wrong ratelimit version in the
whitespace cleanup.. Restore it to the proper version.

Sponsored by: Netfilx Inc.


# 481be5de 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

White space cleanup -- remove trailing tab's or spaces
from any line.

Sponsored by: Netflix Inc.


# df341f59 12-Feb-2020 Randall Stewart <rrs@FreeBSD.org>

Whitespace, remove from three files trailing white
space (leftover presents from emacs).

Sponsored by: Netflix Inc.


# ed0282f4 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

A miss from r356754.


# 2a4bd982 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch. The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC


# b1328235 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Use official macro to enter/exit the network epoch. NFC


# 8fd73e91 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Since this code dereferences struct ifnet, it must include if_var.h
explicitly, not via header pollution. While here move TCPSTATES
declaration right above the include that is going to make use of it.


# 9cdc43b1 14-Jan-2020 Gleb Smirnoff <glebius@FreeBSD.org>

The non-preemptible network epoch identified by net_epoch isn't used.
This code definitely meant net_epoch_preempt.


# eabddb25 09-Oct-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Factor out TCP rateset destruction code.

Ensure the epoch_call() function is not called more than one time
before the callback has been executed, by always checking the
RS_FUNERAL_SCHD flag before invoking epoch_call().

The "rs_number_dead" is balanced again after r353353.

Discussed with: rrs@
Sponsored by: Mellanox Technologies


# 24be1353 09-Oct-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix locking order reversal in the TCP ratelimit code by moving
destructors outside the rsmtx mutex.

Witness message:
lock order reversal: (sleepable after non-sleepable)
1st tcp_rs_mtx (rsmtx) @ sys/netinet/tcp_ratelimit.c:242
2nd sysctl lock (sysctl lock) @ sys/kern/kern_sysctl.c:607

Backtrace:
witness_debugger
witness_checkorder
_rm_wlock_debug
sysctl_ctx_free
rs_destroy
epoch_call_task
gtaskqueue_run_locked
gtaskqueue_thread_loop

Discussed with: rrs@
Sponsored by: Mellanox Technologies


# 6f32ca19 11-Sep-2019 Randall Stewart <rrs@FreeBSD.org>

With the recent commit of ktls, we no longer have a
sb_tls_flags, its just the sb_flags. Also the ratelimit
code, now that the defintion is in sockbuf.h, does not
need the ktls.h file (or its predecessor).

Sponsored by: Netflix Inc


# 15ddc5e4 26-Aug-2019 Michael Tuexen <tuexen@FreeBSD.org>

Don't hold the rs_mtx lock while calling malloc().

Reviewed by: rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D21416


# 903c4ee6 02-Aug-2019 Xin LI <delphij@FreeBSD.org>

Fix !INET build.


# 99c311c4 02-Aug-2019 Randall Stewart <rrs@FreeBSD.org>

Fix one more atomic for i86
Obtained from: mtuexen@freebsd.org


# a1589eb8 01-Aug-2019 Randall Stewart <rrs@FreeBSD.org>

Opps use fetchadd_u64 not long to keep old 32 bit platforms
happy.


# 20abea66 01-Aug-2019 Randall Stewart <rrs@FreeBSD.org>

This adds the third step in getting BBR into the tree. BBR and
an updated rack depend on having access to the new
ratelimit api in this commit.

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20953