#
303975 |
|
11-Aug-2016 |
gjb |
Copy stable/11@r303970 to releng/11.0 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, and rename it to RC1.
Update __FreeBSD_version.
Use the quarterly branch for the default FreeBSD.conf pkg(8) repo and the dvd1.iso packages population.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
302408 |
|
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
301249 |
|
03-Jun-2016 |
hselasky |
Use insertion sort instead of bubble sort in TCP LRO.
Replacing the bubble sort with insertion sort gives an 80% reduction in runtime on average, with randomized keys, for small partitions.
If the keys are pre-sorted, insertion sort runs in linear time, and even if the keys are reversed, insertion sort is faster than bubble sort, although not by much.
Update comment describing "tcp_lro_sort()" while at it.
Differential Revision: https://reviews.freebsd.org/D6619 Sponsored by: Mellanox Technologies Tested by: Netflix Suggested by: Pieter de Goeje <pieter@degoeje.nl> Reviewed by: ed, gallatin, gnn, transport
|
#
300731 |
|
26-May-2016 |
hselasky |
Use optimised complexity safe sorting routine instead of the kernel's "qsort()".
The kernel's "qsort()" routine can in worst case spend O(N*N) amount of comparisons before the input array is sorted. It can also recurse a significant amount of times using up the kernel's interrupt thread stack.
The custom sorting routine takes advantage of that the sorting key is only 64 bits. Based on set and cleared bits in the sorting key it partitions the array until it is sorted. This process has a recursion limit of 64 times, due to the number of set and cleared bits which can occur. Compiled with -O2 the sorting routine was measured to use 64-bytes of stack. Multiplying this by 64 gives a maximum stack consumption of 4096 bytes for AMD64. The same applies to the execution time, that the array to be sorted will not be traversed more than 64 times.
When serving roughly 80Gb/s with 80K TCP connections, the old method consisting of "qsort()" and "tcp_lro_mbuf_compare_header()" used 1.4% CPU, while the new "tcp_lro_sort()" used 1.1% for LRO related sorting as measured by Intel Vtune. The testing was done using a sysctl to toggle between "qsort()" and "tcp_lro_sort()".
Differential Revision: https://reviews.freebsd.org/D6472 Sponsored by: Mellanox Technologies Tested by: Netflix Reviewed by: gallatin, rrs, sephe, transport
|
#
298974 |
|
03-May-2016 |
sephe |
tcp/lro: Refactor the active list operation.
Ease more work concerning active list, e.g. hash table etc.
Reviewed by: gallatin, rrs (earlier version) Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6137
|
#
298730 |
|
28-Apr-2016 |
sephe |
tcp/lro: Fix more typo
Noticed by: hiren MFC after: 1 week Sponsored by: Microsoft OSTC
|
#
298696 |
|
27-Apr-2016 |
sephe |
tcp/lro: Fix typo.
MFC after: 1 week Sponsored by: Microsoft OSTC
|
#
297483 |
|
01-Apr-2016 |
sephe |
tcp/lro: Change SLIST to LIST, so that removing an entry is O(1)
This is kinda critical to the performance when the CPU is slow and network bandwidth is high, e.g. in the hypervisor.
Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5765
|
#
297482 |
|
01-Apr-2016 |
sephe |
tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com> Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5725
|
#
297265 |
|
25-Mar-2016 |
sephe |
tcp/lro: Return TCP_LRO_NO_ENTRIES if we are short of LRO entries.
So that callers could react accordingly.
Reviewed by: gallatin (no objection) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5695
|
#
295739 |
|
18-Feb-2016 |
sephe |
tcp/lro: Allow drivers to set the TCP ACK/data segment aggregation limit
ACK aggregation limit is append count based, while the TCP data segment aggregation limit is length based. Unless the network driver sets these two limits, it's an NO-OP.
Reviewed by: adrian, gallatin (previous version), hselasky (previous version) Approved by: adrian (mentor) MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5185
|
#
295506 |
|
11-Feb-2016 |
hselasky |
Use a pair of ifs when comparing the 32-bit flowid integers so that the sign bit doesn't cause an overflow. The overflow manifests itself as a sorting index wrap around in the middle of the sorted array, which is not a problem for the LRO code, but might be a problem for the logic inside qsort().
Reviewed by: gnn @ Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D5239
|
#
295126 |
|
01-Feb-2016 |
glebius |
These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h
|
#
294327 |
|
19-Jan-2016 |
hselasky |
Add optimizing LRO wrapper:
- Add optimizing LRO wrapper which pre-sorts all incoming packets according to the hash type and flowid. This prevents exhaustion of the LRO entries due to too many connections at the same time. Testing using a larger number of higher bandwidth TCP connections showed that the incoming ACK packet aggregation rate increased from ~1.3:1 to almost 3:1. Another test showed that for a number of TCP connections greater than 16 per hardware receive ring, where 8 TCP connections was the LRO active entry limit, there was a significant improvement in throughput due to being able to fully aggregate more than 8 TCP stream. For very few very high bandwidth TCP streams, the optimizing LRO wrapper will add CPU usage instead of reducing CPU usage. This is expected. Network drivers which want to use the optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of "tcp_lro_flush()". Further the LRO control structure must be initialized using "tcp_lro_init_args()" passing a non-zero number into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for statistics which can be prone to wrap-around. Fix this while at it and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures, especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to change of the LRO control structure size.
Sponsored by: Mellanox Technologies Reviewed by: gallatin, sbruno, rrs, gnn, transport Tested by: Netflix Differential Revision: https://reviews.freebsd.org/D4914
|
#
284961 |
|
30-Jun-2015 |
np |
Fix leak in tcp_lro_rx. Simply clearing M_PKTHDR isn't enough, any tags hanging off the header need to be freed too.
Differential Revision: https://reviews.freebsd.org/D2708 Reviewed by: ae@, hiren@
|
#
255010 |
|
28-Aug-2013 |
np |
Merge r254336 from user/np/cxl_tuning.
Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be.
Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface.
Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later.
Reviewed by: andre
|
#
247104 |
|
21-Feb-2013 |
gallatin |
Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED. Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload.
Sponsored by: Myricom Inc.
MFC after: 7 days
|
#
236394 |
|
01-Jun-2012 |
bz |
Make TCP LRO work properly with VIMAGE kernels rather than just panicing. There's no VIMAGE context set there yet as this is before if_ethersubr.c.
MFC after: 3 days X-MFC with: r235981
|
#
236093 |
|
26-May-2012 |
bz |
Trim the extra $FreeBSD$ from the comment below the license. We use the __FBSDID() macro on the file now instead.
MFC after: 3 days
|
#
235981 |
|
25-May-2012 |
bz |
In case forwarding is turned on for a given address family, refuse to queue the packet for LRO and tell the driver to directly pass it on. This avoids re-assembly and later re-fragmentation problems when forwarding.
It's not the best solution but the simplest and most effective for the moment.
Should have been done: ages ago Discussed with and by: many MFC after: 3 days
|
#
235944 |
|
24-May-2012 |
bz |
MFp4 bz_ipv6_fast:
Significantly update tcp_lro for mostly two things: 1) introduce basic support for IPv6 without extension headers. 2) try hard to also get the incremental checksum updates right, especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future, though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data type in LRO.
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
#
235474 |
|
15-May-2012 |
bz |
Switch to a standard 2 clause BSD license (from bsd-style-copyright).
Approved by: Myricom Inc. (gallatin) Approved by: Intel Corporation (jfv)
|
#
223797 |
|
05-Jul-2011 |
cperciva |
Don't allow lro->len to exceed 65535, as this will result in overflow when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss".
This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2.
Reviewed by: jfv MFC after: 2 weeks
|
#
220428 |
|
07-Apr-2011 |
jfv |
Port of the LRO fix from mxge driver to the generic LRO code. Thanks to Andrew Gallatin for the change.
MFC after: 7 days
|
#
217126 |
|
07-Jan-2011 |
jhb |
Trim extra spaces before tabs.
|
#
182089 |
|
24-Aug-2008 |
kmacy |
Don't calculate checksum if it has already been validated
Obtained from: Chelsio Inc. MFC after: 3 days
|
#
179737 |
|
11-Jun-2008 |
jfv |
Add generic TCP LOR into netinet
|