344511 |
25-Feb-2019 |
tuexen |
Backport the new TCP reassembly code from head to stable/11.
In particular:
* Cherry pick the changes in sys/queue.h from r334804.
* MFC r338102 with manually removing changes to file not existent in stable/11 and resolve conflicts in tcp_var.h. This change represents a substantial restructure of the way we reassembly inbound tcp segments. The old algorithm just blindly dropped in segments without coalescing. This meant that every segment could take up greater and greater room on the linked list of segments. This of course is now subject to a tighter limit (100) of segments which in a high BDP situation will cause us to be a lot more in-efficent as we drop segments beyond 100 entries that we receive. What this restructure does is cause the reassembly buffer to coalesce segments putting an emphasis on the two common cases (which avoid walking the list of segments) i.e. where we add to the back of the queue of segments and where we add to the front. We also have the reassembly buffer supporting a couple of debug options (black box logging as well as counters for code coverage). These are compiled out by default but can be added by uncommenting the defines.
* Manually fix tcp_stacks/fastopen.c, since it does not exist anymore in head.
* MFC r342280: Fix a regression in the TCP handling of received segments. When receiving TCP segments the stack protects itself by limiting the resources allocated for a TCP connections. This patch adds an exception to these limitations for the TCP segement which is the next expected in-sequence segment. Without this patch, TCP connections may stall and finally fail in some cases of packet loss.
* MFC r343439: Don't include two header files when not needed. This allows the part of the rewrite of TCP reassembly in this files to be MFCed to stable/11 with manual change.
* MFC r344428: This patch addresses an issue brought up by bz@ in D18968: When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen, if user data was received during the TCP handshake and BB logging is used. A KASSERT is also added to detect tcp_reass() calls with illegal parameter combinations.
Reviewed by: bz@, jtl, rrs@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D18960 |
332177 |
07-Apr-2018 |
tuexen |
MFC r322812:
Avoid TCP log messages which are false positives.
The check for timestamps are too early to handle SYN-ACK correctly. So move it down after the corresponing processing has been done.
MFC r322813:
Avoid TCP log messages which are false positives.
This is https://svnweb.freebsd.org/changeset/base/322812, just for alternate TCP stacks.
PR: 216832 Obtained from: antonfb@hesiod.org |
330516 |
05-Mar-2018 |
eadler |
Revert r330445
This commit was reverted in r321480 in head
Reported by: sbruno |
330445 |
05-Mar-2018 |
eadler |
MFC r307901,r308180:
FreeBSD tcp stack used to inform respective congestion control module about the loss event but not use or obay the recommendations i.e. values set by it in some cases.
Here is an attempt to solve that confusion by following relevant RFCs/drafts. Stack only sets congestion window/slow start threshold values when there is no CC module availalbe to take that action. All CC modules are inspected and updated when needed to take appropriate action on loss.
tcp_stacks/fastpath module has been updated to adapt these changes.
Note: Probably, the most significant change would be to not bring congestion window down to 1MSS on a loss signaled by 3-duplicate acks and letting respective CC decide that value. |
319431 |
01-Jun-2017 |
tuexen |
When a SYN-ACK is received in SYN-SENT state, RFC 793 requires the validation of SEG.ACK as the first step. If the ACK is not acceptable, a RST segment should be sent and the segment should be dropped. Up to now, the segment was partially processed. This patch moves the check for the SEG.ACK validation up to the front as required.
Reviewed by: hiren, gnn Differential Revision: https://reviews.freebsd.org/D10424 |
319399 |
01-Jun-2017 |
tuexen |
MFC r316743:
The sysctl variable net.inet.tcp.drop_synfin is not honored in all states, for example not in SYN-SENT. This patch adds code to check the sysctl variable in other states than LISTEN. Thanks to ae and gnn for providing comments.
Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D9894 |
317368 |
24-Apr-2017 |
smh |
MFC r316676:
Use estimated RTT for receive buffer auto resizing instead of timestamps
Relnotes: Yes Sponsored by: Multiplay |
316208 |
30-Mar-2017 |
gnn |
MFC: 311225, 311243, 313045
Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and dangerous. Those wanting data from an mbuf should use DTrace itself to get the data.
Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP probes. |
315514 |
18-Mar-2017 |
ae |
MFC r304572 (by bz): Remove the kernel optoion for IPSEC_FILTERTUNNEL, which was deprecated more than 7 years ago in favour of a sysctl in r192648.
MFC r305122: Remove redundant sanity checks from ipsec[46]_common_input_cb().
This check already has been done in the each protocol callback.
MFC r309144,309174,309201 (by fabient): IPsec RFC6479 support for replay window sizes up to 2^32 - 32 packets.
Since the previous algorithm, based on bit shifting, does not scale with large replay windows, the algorithm used here is based on RFC 6479: IPsec Anti-Replay Algorithm without Bit Shifting. The replay window will be fast to be updated, but will cost as many bits in RAM as its size.
The previous implementation did not provide a lock on the replay window, which may lead to replay issues.
Obtained from: emeric.poupon@stormshield.eu Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D8468
MFC r309143,309146 (by fabient): In a dual processor system (2*6 cores) during IPSec throughput tests, we see a lot of contention on the arc4 lock, used to generate the IV of the ESP output packets.
The idea of this patch is to split this mutex in order to reduce the contention on this lock.
Update r309143 to prevent false sharing.
Reviewed by: delphij, markm, ache Approved by: so Obtained from: emeric.poupon@stormshield.eu Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D8130
MFC r313330: Merge projects/ipsec into head/.
Small summary -------------
o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting.
Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352
MFC r313331: Add removed headers into the ObsoleteFiles.inc.
MFC r313561 (by glebius): Move tcp_fields_to_net() static inline into tcp_var.h, just below its friend tcp_fields_to_host(). There is third party code that also uses this inline.
MFC r313697: Remove IPsec related PCB code from SCTP.
The inpcb structure has inp_sp pointer that is initialized by ipsec_init_pcbpolicy() function. This pointer keeps strorage for IPsec security policies associated with a specific socket. An application can use IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options to configure these security policies. Then ip[6]_output() uses inpcb pointer to specify that an outgoing packet is associated with some socket. And IPSEC_OUTPUT() method can use a security policy stored in the inp_sp. For inbound packet the protocol-specific input routine uses IPSEC_CHECK_POLICY() method to check that a packet conforms to inbound security policy configured in the inpcb.
SCTP protocol doesn't specify inpcb for ip[6]_output() when it sends packets. Thus IPSEC_OUTPUT() method does not consider such packets as associated with some socket and can not apply security policies from inpcb, even if they are configured. Since IPSEC_CHECK_POLICY() method is called from protocol-specific input routine, it can specify inpcb pointer and associated with socket inbound policy will be checked. But there are two problems: 1. Such check is asymmetric, becasue we can not apply security policy from inpcb for outgoing packet. 2. IPSEC_CHECK_POLICY() expects that caller holds INPCB lock and access to inp_sp is protected. But for SCTP this is not correct, becasue SCTP uses own locks to protect inpcb.
To fix these problems remove IPsec related PCB code from SCTP. This imply that IP_IPSEC_POLICY and IPV6_IPSEC_POLICY socket options will be not applicable to SCTP sockets. To be able correctly check inbound security policies for SCTP, mark its protocol header with the PR_LASTHDR flag.
Differential Revision: https://reviews.freebsd.org/D9538
MFC r313746: Add missing check to fix the build with IPSEC_SUPPORT and without MAC.
MFC r313805: Fix LINT build for powerpc.
Build kernel modules support only when both IPSEC and TCP_SIGNATURE are not defined.
MFC r313922: For translated packets do not adjust UDP checksum if it is zero.
In case when decrypted and decapsulated packet is an UDP datagram, check that its checksum is not zero before doing incremental checksum adjustment.
MFC r314339: Document that the size of AH ICV for HMAC-SHA2-NNN should be half of NNN bits as described in RFC4868.
PR: 215978
MFC r314812: Introduce the concept of IPsec security policies scope.
Currently are defined three scopes: global, ifnet, and pcb. Generic security policies that IKE daemon can add via PF_KEY interface or an administrator creates with setkey(8) utility have GLOBAL scope. Such policies can be applied by the kernel to outgoing packets and checked agains inbound packets after IPsec processing. Security policies created by if_ipsec(4) interfaces have IFNET scope. Such policies are applied to packets that are passed through if_ipsec(4) interface. And security policies created by application using setsockopt() IP_IPSEC_POLICY option have PCB scope. Such policies are applied to packets related to specific socket. Currently there is no way to list PCB policies via setkey(8) utility.
Modify setkey(8) and libipsec(3) to be able distinguish the scope of security policies in the `setkey -DP` listing. Add two optional flags: '-t' to list only policies related to virtual *tunneling* interfaces, i.e. policies with IFNET scope, and '-g' to list only policies with GLOBAL scope. By default policies from all scopes are listed.
To implement this PF_KEY's sadb_x_policy structure was modified. sadb_x_policy_reserved field is used to pass the policy scope from the kernel to userland. SADB_SPDDUMP message extended to support filtering by scope: sadb_msg_satype field is used to specify bit mask of requested scopes.
For IFNET policies the sadb_x_policy_priority field of struct sadb_x_policy is used to pass if_ipsec's interface if_index to the userland. For GLOBAL policies sadb_x_policy_priority is used only to manage order of security policies in the SPDB. For IFNET policies it is not used, so it can be used to keep if_index.
After this change the output of `setkey -DP` now looks like: # setkey -DPt 0.0.0.0/0[any] 0.0.0.0/0[any] any in ipsec esp/tunnel/87.250.242.144-87.250.242.145/unique:145 spid=7 seq=3 pid=58025 scope=ifnet ifname=ipsec0 refcnt=1 # setkey -DPg ::/0 ::/0 icmp6 135,0 out none spid=5 seq=1 pid=872 scope=global refcnt=1
Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9805
PR: 212018 Relnotes: yes Sponsored by: Yandex LLC |
310212 |
18-Dec-2016 |
tuexen |
MFC r308832:
Ensure that TCP state changes to state-closing are reported via dtrace. This does not cover state changes from TIME-WAIT.
Sponsored by: Netflix, Inc. |
310211 |
18-Dec-2016 |
tuexen |
MFC r308745:
Notify the user via setting errno when a TCP RST segment is received either in the CLOSING or LAST-ACK state.
Sponsored by: Netflix, Inc. |
302408 |
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
301717 |
09-Jun-2016 |
ae |
Cleanup unneded include "opt_ipfw.h".
It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago.
|
300042 |
17-May-2016 |
rrs |
This small change adopts the excellent suggestion for using named structures in the add of a new tcp-stack that came in late to me via email after the last commit. It also makes it so that a new stack may optionally get a callback during a retransmit timeout. This allows the new stack to clear specific state (think sack scoreboards or other such structures).
Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D6303
|
299864 |
15-May-2016 |
markj |
opt_kdtrace.h is not needed for SDT probes as of r258541.
|
298995 |
03-May-2016 |
pfg |
sys/net*: minor spelling fixes.
No functional change.
|
298743 |
28-Apr-2016 |
rrs |
This cleans up the timers code in TCP to start using the new async_drain functionality. This as been tested in NF as well as by Verisign. Still to do in here is to remove all the old flags. They are currently left being maintained but probably are no longer needed.
Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D5924
|
296811 |
13-Mar-2016 |
bz |
Remove duplicate external declaration of tcprexmtthresh making gcc compiles barf.
|
296476 |
08-Mar-2016 |
rrs |
Fix a sneaky bug where we were missing an extern to get the rxt threshold.. and thus created our own defaulted to 0 :-(
Sponsored by: Netflix Inc
|
296352 |
03-Mar-2016 |
gnn |
Fix dtrace probes (introduced in 287759): debug__input was used for output and drop; connect didn't always fire a user probe some probes were missing in fastpath
Submitted by: Hannes Mehnert Sponsored by: REMS, EPSRC Differential Revision: https://reviews.freebsd.org/D5525
|
295927 |
23-Feb-2016 |
rrs |
This fixes the fastpath code to have a better module initialization sequence when included in loader.conf. It also fixes it so that no matter if some one incorrectly specifies a load order, the lists and such will be initialized on demand at that time so no one can make that mistake.
Reviewed by: hiren Differential Revision: D5189
|
294931 |
27-Jan-2016 |
glebius |
Rename netinet/tcp_cc.h to netinet/cc/cc.h.
Discussed with: lstewart
|
294535 |
21-Jan-2016 |
glebius |
- Rename cc.h to more meaningful tcp_cc.h. - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h
|
294534 |
21-Jan-2016 |
glebius |
Cleanup TCP files from unnecessary interface related includes.
|
293313 |
07-Jan-2016 |
jtl |
Apply the changes from r293284 to one additional file.
Discussed with: glebius
|
292336 |
16-Dec-2015 |
rrs |
Remove redundant extern's that make the ppc compile fail. Thanks Ed Maste for the heads up.
|
292309 |
16-Dec-2015 |
rrs |
First cut of the modularization of our TCP stack. Still to do is to clean up the timer handling using the async-drain. Other optimizations may be coming to go with this. Whats here will allow differnet tcp implementations (one included). Reviewed by: jtl, hiren, transports Sponsored by: Netflix Inc. Differential Revision: D4055
|