279264 |
25-Feb-2015 |
delphij |
Fix integer overflow in IGMP protocol. [SA-15:04]
Fix vt(4) crash with improper ioctl parameters. [EN-15:01]
Updated base system OpenSSL to 1.0.1l. [EN-15:02]
Fix freebsd-update libraries update ordering issue. [EN-15:03]
Approved by: so |
277808 |
27-Jan-2015 |
delphij |
Fix SCTP SCTP_SS_VALUE kernel memory corruption and disclosure vulnerability and SCTP stream reset vulnerability.
Security: FreeBSD-SA-15:02.kmem Security: CVE-2014-8612 Security: FreeBSD-SA-15:03.sctp Security: CVE-2014-8613 Approved by: so |
271669 |
16-Sep-2014 |
delphij |
Fix Denial of Service in TCP packet processing.
Security: FreeBSD-SA-14:19.tcp Approved by: so |
268434 |
08-Jul-2014 |
delphij |
Fix kernel memory disclosure in control message and SCTP notifications.
Security: FreeBSD-SA-14:17.kmem Security: CVE-2014-3952, CVE-2014-3953 Approved by: so |
265124 |
30-Apr-2014 |
delphij |
Fix devfs rules not applied by default for jails.
Fix OpenSSL use-after-free vulnerability.
Fix TCP reassembly vulnerability.
Security: FreeBSD-SA-14:07.devfs Security: CVE-2014-3001 Security: FreeBSD-SA-14:08.tcp Security: CVE-2014-3000 Security: FreeBSD-SA-14:09.openssl Security: CVE-2010-5298 Approved by: so |
260378 |
06-Jan-2014 |
glebius |
Merge r260319 from stable/10 (r260188 from head):
Fix regression from r249894. Now we pass "gw" as argument to if_output method, thus for multicast case we need it to point at "dst".
PR: 185395 Approved by: re (gjb) |
259065 |
07-Dec-2013 |
gjb |
- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1
[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
258890 |
03-Dec-2013 |
tuexen |
MFC r258574:
Only initialize some mutexes for the default VNET.
In r208160, sctp_it_ctl was made a global variable, across all VNETs. However, sctp_init() is called for every VNET that is created. This results in the same global mutexes which are part of sctp_it_ctl being initialized. This can result in crashes if many jails are created.
To reproduce the problem: (1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS, INVARIANTS. (2) Run this command in a loop: jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo
(see http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html )
Witness will warn about the same mutex being initialized.
Fix the problem by only initializing these mutexes in the default VNET.
MFC r258765:
In http://svnweb.freebsd.org/changeset/base/258221 I introduced a bug which initialized global locks whenever the SCTP stack initialized. This was fixed in http://svnweb.freebsd.org/changeset/base/258574 by rodrigc@. He just initialized the locks for the default vnet. This fix reverts to the old behaviour before r258221, which explicitly makes sure it is only called once, because this works also on other platforms.
Approved by: re@ (gjb)
|
258454 |
21-Nov-2013 |
tuexen |
MFC r256556: Remove a buggy comparision when setting manually the path MTU. After fixing, the comparision would have become redundant. Thanks to Andrew Galante for reporting the issue.
MFC r257272: Fix compilation if SCTP_DONT_DO_PRIVADDR_SCOPE is defined. The issue was reported by Andrew Galante.
MFC r257274: Fix the value of *optlen when calling getsockopt() for SCTP_REMOTE_UDP_ENCAPS_PORT. This issue was reported by Andrew Galante.
MFC r257359: Terminate a debug output with a \n.
MFC r257555: Changes from upstream to improve compilation when INET or INET6 or none of them is defined.
MFC r257574: Unlock the lock before destroying it. This issue was reported by Andrew Galante.
MFC r257800: Use htons()/ntohs() appropriately. These issues were reported by Andrew Galante.
MFC r257803: Make sure that we don't try to build an ASCONF-ACK chunk larger than what fits in the the mbuf cluster. This issue was reported by Andrew Galante.
MFC r257804: Get rid of the artification limitation enforced by SCTP_AUTH_RANDOM_SIZE_MAX. This was suggested by Andrew Galante.
MFC r258221: Cleanups which result in fixes which have been made upstream and where partially suggested by Andrew Galante. There is no functional change in FreeBSD.
MFC r258224: When determining if an address belongs to an stcb, take the address family into account for wildcard bound endpoints.
MFC r258228: Remove a stray write operation.
MFC r258235: Use SCTP_PR_SCTP_TTL when the user provides a positive timetolive in sctp_sendmsg().
Approved by: re@
|
257367 |
29-Oct-2013 |
andre |
MFC r256920:
The TCP delayed ACK logic isn't aware of LRO passing up large aggregated segments thinking it received only one segment. This causes it to enable the delay the ACK for 100ms to wait for another segment which may never come because all the data was received already.
Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes us further away from acking every other packet; b) it introduces additional delay in responding to the sender. The latter is especially bad because it is in the nature of LRO to aggregated all segments of a burst with no more coming until an ACK is sent back.
Change the delayed ACK logic to detect LRO segments by being larger than the MSS for this connection and issuing an immediate ACK for them to keep the ACK clock ticking without interruption.
Reported by: julian, cperciva Tested by: cperciva Reviewed by: lstewart
Approved by: re (glebius)
|
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256186 |
09-Oct-2013 |
glebius |
When processing ACK in tcp_do_segment, use sbcut_locked() instead of sbdrop_locked() to cut acked mbufs from the socket buffer. Free this chain a batch manner after the socket buffer lock is dropped.
This measurably reduces contention on socket buffer.
Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)
|
255993 |
02-Oct-2013 |
markj |
Add a separate translator for headers passed to the TCP probes in the input path. These probes get some of the fields in host order, whereas the output probes get them in network order, so a single translator isn't enough. This workaround ensures that the problem is essentially invisble to users: none of the probe arguments or their fields have changed.
Approved by: re (hrs)
|
255759 |
21-Sep-2013 |
bz |
Introduce spares in the TCP syncache and timewait structures so that fixed TCP_SIGNATURE handling can later be merged.
This is derived from follow-up work to SVN r183001 posted to net@ on Sep 13 2008.
Approved by: re (gjb)
|
255523 |
13-Sep-2013 |
trociny |
Unregister inet/inet6 pfil hooks on vnet destroy.
Discussed with: andre Approved by: re (rodrigc)
|
255434 |
09-Sep-2013 |
tuexen |
Fix the aborting of association with the iterator using an empty user initiated error cause (using SCTP_ABORT|SCTP_SENDALL).
Approved by: re (delphij) MFC after: 1 week
|
255397 |
08-Sep-2013 |
trociny |
Relese the interface in the last.
Reviewed by: glebius Approved by: re (kib)
|
255337 |
07-Sep-2013 |
tuexen |
When computing the partial delivery point, take the receiver socket buffer size correctly into account.
MFC after: 1 week
|
255249 |
05-Sep-2013 |
jhb |
Use LIST_FOREACH_SAFE() instead of doing it by hand.
|
255248 |
05-Sep-2013 |
jhb |
Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[].
PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week
|
255235 |
05-Sep-2013 |
ae |
Remove unused code and sort variables declarations.
PR: kern/181822 MFC after: 1 week
|
255190 |
03-Sep-2013 |
tuexen |
Remove redundant field pr_sctp_on.
MFC after: 1 week
|
255162 |
02-Sep-2013 |
tuexen |
Use uint16_t instead of in_port_t for consistency with the SCTP code.
MFC after: 1 week
|
255160 |
02-Sep-2013 |
tuexen |
All changes affect only SCTP-AUTH: * Remove non working code related to SHA224. * Remove support for non-standardised HMAC-IDs using SHA384 and SHA512. * Prefer SHA256 over SHA1. * Minor cleanup.
MFC after: 2 weeks
|
255010 |
28-Aug-2013 |
np |
Merge r254336 from user/np/cxl_tuning.
Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be.
Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface.
Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later.
Reviewed by: andre
|
254925 |
26-Aug-2013 |
jhb |
Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years.
Reviewed by: bde Tested by: exp-run by bdrewery
|
254893 |
26-Aug-2013 |
markj |
The second last argument of udp:::receive is supposed to contain the connection state, not the IP header.
X-MFC with: r254889
|
254889 |
25-Aug-2013 |
markj |
Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD.
Tested by: gnn, hiren MFC after: 1 month
|
254854 |
25-Aug-2013 |
tuexen |
Provide human readable debug output.
|
254834 |
25-Aug-2013 |
andre |
For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits. The upper 32bits are not occupied for now.
Sponsored by: The FreeBSD Foundation
|
254804 |
24-Aug-2013 |
andre |
Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are:
o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on.
o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header.
o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others).
o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros.
o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ.
o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid.
o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities.
o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue.
o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).
Sponsored by: The FreeBSD Foundation
|
254672 |
22-Aug-2013 |
tuexen |
Export the inpcb features as a 64-bit entity. Bump __FreeBSD_version to 1000048 since the modified structure is user visible and used by netstat, for example.
|
254670 |
22-Aug-2013 |
tuexen |
Make also the features of the association 64-bit. When exporting to xinpcb, just export the lower 32-bit. Using there also 64-bits will break the ABI and will be committed separetly.
MFC after: 2 weeks X-MFC with: 254248
|
254629 |
22-Aug-2013 |
delphij |
Fix an integer overflow in computing the size of a temporary buffer can result in a buffer which is too small for the requested operation.
Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast
|
254527 |
19-Aug-2013 |
andre |
Reorder the mbuf defines to make more sense and group related flags together.
Add M_FLAG_PRINTF for use with printf(9) %b indentifier.
Use the generic mbuf flags print names in the net80211 code and adjust the protocol specific bits for their new positions.
Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field they use internally to store some additional information.
Discussed with: trociny, glebius
|
254523 |
19-Aug-2013 |
andre |
Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings.
Consistently use it within IP, IPv6 and ethernet protocols.
Discussed with: trociny, glebius
|
254521 |
19-Aug-2013 |
andre |
Move the SCTP specific definition of M_NOTIFICATION onto a protocol specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h. It is only relevant within SCTP.
Discussed with: tuexen
|
254519 |
19-Aug-2013 |
andre |
Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols.
Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both.
Discussed with: trociny, glebius
|
254518 |
19-Aug-2013 |
andre |
Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layer specific flag instead. The flag is only relevant while the packet stays in the IP reassembly queue.
Discussed with: trociny, glebius
|
254517 |
19-Aug-2013 |
andre |
Remove unused M_FRAG, M_FIRSTFRAG and M_LASTFRAG tagging from ip_fragment(). There wasn't any real driver (and hardware) support for it. Modern hardware does full fragmentation/segmentation offload instead.
|
254350 |
15-Aug-2013 |
markj |
Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types.
There is no functional change.
MFC after: 2 weeks
|
254338 |
14-Aug-2013 |
tuexen |
Don't send uninitialized memory (two instances of 4 bytes) in every cookie on the wire. This bug was reported in https://bugzilla.mozilla.org/show_bug.cgi?id=905080
MFC after: 3 days
|
254292 |
13-Aug-2013 |
trociny |
Virtualize carp(4) variables to have per vnet control.
Reviewed by: ae, glebius
|
254248 |
12-Aug-2013 |
tuexen |
Make the features a 64-bit value instead of 32-bit. This will allow an easier integration of the support for NDATA. While there, do also some minor cleanups. Obtained from: rrs@ MFC after: 2 weeks
|
253858 |
01-Aug-2013 |
tuexen |
Micro-optimization suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=898234 by pchang9. While there simplify the code.
MFC after: 1 week
|
253571 |
23-Jul-2013 |
ae |
Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure.
This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters.
Reported by: Taku YAMAMOTO, Maciej Milewski
|
253493 |
20-Jul-2013 |
tuexen |
Allow the code to be compiled without warnings for any combination of INET, INET6 and SCTP_DEBUG defines. The issue was reported by Lally Singh.
MFC after: 2 weeks
|
253472 |
19-Jul-2013 |
tuexen |
Get the code compiling without INET and INET6 being defined. This is not possible in FreeBSD, but in the upstream code.
MFC after: 2 weeks
|
253395 |
16-Jul-2013 |
andre |
Free the non-fatal "timestamp missing" debug string manually as it is not covered by the catch-all free for the error cases.
Found by: Coverity
|
253282 |
12-Jul-2013 |
trociny |
A complete duplication of binding should be allowed if on both new and duplicated sockets a multicast address is bound and either SO_REUSEPORT or SO_REUSEADDR is set.
But actually it works for the following combinations:
* SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new; * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new; * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;
and fails for this:
* SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.
Fix the last case.
PR: 179901 MFC after: 1 month
|
253254 |
12-Jul-2013 |
andre |
Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache.
Reported by: trociny, rodrigc
|
253210 |
11-Jul-2013 |
andre |
Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK information into the ISN (initial sequence number) without the additional use of timestamp bits and switching to the very fast and cryptographically strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against forgeries.
The purpose of SYN cookies is to encode all necessary session state in the 32 bits of our initial sequence number to avoid storing any information locally in memory. This is especially important when under heavy spoofed SYN attacks where we would either run out of memory or the syncache would fill with bogus connection attempts swamping out legitimate connections.
The original SYN cookies method only stored an indexed MSS values in the cookie. This isn't sufficient anymore and breaks down in the presence of WSCALE information which is only exchanged during SYN and SYN-ACK. If we can't keep track of it then we may severely underestimate the available send or receive window. This is compounded with large windows whose size information on the TCP segment header is even lower numerically. A number of years back SYN cookies were extended to store the additional state in the TCP timestamp fields, if available on a connection. While timestamps are common among the BSD, Linux and other *nix systems Windows never enabled them by default and thus are not present for the vast majority of clients seen on the Internet.
The common parameters used on TCP sessions have changed quite a bit since SYN cookies very invented some 17 years ago. Today we have a lot more bandwidth available making the use window scaling almost mandatory. Also SACK has become standard making recovering from packet loss much more efficient.
This change moves all necessary information into the ISS removing the need for timestamps. Both the MSS (16 bits) and send WSCALE (4 bits) are stored in 3 bit indexed form together with a single bit for SACK. While this is significantly less than the original range, it is sufficient to encode all common values with minimal rounding.
The MSS depends on the MTU of the path and with the dominance of ethernet the main value seen is around 1460 bytes. Encapsulations for DSL lines and some other overheads reduce it by a few more bytes for many connections seen. Rounding down to the next lower value in some cases isn't a problem as we send only slightly more packets for the same amount of data.
The send WSCALE index is bit more tricky as rounding down under-estimates the available send space available towards the remote host, however a small number values dominate and are carefully selected again.
The receive WSCALE isn't encoded at all but recalculated based on the local receive socket buffer size when a valid SYN cookie returns. A listen socket buffer size is unlikely to change while active.
The index values for MSS and WSCALE are selected for minimal rounding errors based on large traffic surveys. These values have to be periodically validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[] and tcp_sc_wstab[] if necessary.
In addition the hash MAC to protect the SYN cookies is changed from MD5 to SipHash-2-4, a much faster and cryptographically secure algorithm.
Reviewed by: dwmalone Tested by: Fabian Keil <fk@fabiankeil.de>
|
253150 |
10-Jul-2013 |
andre |
Extend debug logging of TCP timestamp related specification violations.
Update related comments and style.
|
253099 |
09-Jul-2013 |
tuexen |
Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting.
X-MFC with: r252026
|
253087 |
09-Jul-2013 |
ae |
Migrate struct carpstats to PCPU counters.
|
253086 |
09-Jul-2013 |
ae |
Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters.
|
253085 |
09-Jul-2013 |
ae |
Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.
|
253084 |
09-Jul-2013 |
ae |
Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.
|
253083 |
09-Jul-2013 |
ae |
Use new macros to implement ipstat and tcpstat using PCPU counters. Change interface of kread_counters() similar ot kread() in the netstat(1).
|
253081 |
09-Jul-2013 |
ae |
Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures.
Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat.
Discussed with: arch@
|
252779 |
05-Jul-2013 |
tuexen |
Fix a bug were only 2048 streams where usable even though more than 2048 streams were negotiated on the wire. While there, remove the hard coded limit of 2048 streams.
MFC after: 3 days
|
252718 |
04-Jul-2013 |
tuexen |
When processing an incoming ABORT, SHUTDOWN_COMPLETE or ERROR (NAT related) chunk, take always the T-bit into account, when checking the verification tag.
MFC after: 3 days
|
252710 |
04-Jul-2013 |
trociny |
In r227207, to fix the issue with possible NULL inp_socket pointer dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly.
Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set.
Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR.
PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week
|
252585 |
03-Jul-2013 |
tuexen |
Code cleanups.
MFC after: 3 days
|
252577 |
03-Jul-2013 |
np |
Catch up with r238990. LLE_DELETED does not clobber everything else in la_flags since said revision.
|
252510 |
02-Jul-2013 |
hrs |
Fix a panic when leaving MC group in a kernel with VIMAGE enabled. in_leavegroup() is called from an asynchronous task, and igmp_change_state() requires that curvnet is set by the caller.
|
252504 |
02-Jul-2013 |
lstewart |
Import an implementation of the CAIA Delay-Gradient (CDG) congestion control algorithm, which is based on the 2011 v0.1 patch release and described in the paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes and Grenville Armitage. It is implemented as a kernel module compatible with the modular congestion control framework.
CDG is a hybrid congestion control algorithm which reacts to both packet loss and inferred queuing delay. It attempts to operate as a delay-based algorithm where possible, but utilises heuristics to detect loss-based TCP cross traffic and will compete effectively as required. CDG is therefore incrementally deployable and suitable for use on shared networks.
In collaboration with: David Hayes <david.hayes at ieee.org> and Grenville Armitage <garmitage at swin edu au> MFC after: 4 days Sponsored by: Cisco University Research Program and FreeBSD Foundation
|
252055 |
21-Jun-2013 |
glebius |
Fix kmod_*stat_inc() after r249276. The incorrect code actually increased the pointer, not the memory it points to.
In collaboration with: kib Reported & tested by: Ian FREISLICH <ianf clue.co.za> Sponsored by: Nginx, Inc.
|
252026 |
20-Jun-2013 |
ae |
Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting.
MFC after: 2 weeks
|
251502 |
07-Jun-2013 |
bms |
Disable IGMPv3 link timers on a transition to IGMPv2.
Submitted by: Alan Smithee
|
251296 |
03-Jun-2013 |
andre |
Allow drivers to specify a maximum TSO length in bytes if they are limited in the amount of data they can handle at once.
Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit.
The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet.
The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way.
Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI)
|
251248 |
02-Jun-2013 |
tuexen |
Use LIST_EMPTY when appropriate.
MFC after: 1 week
|
251054 |
28-May-2013 |
tuexen |
Remove redundant checks.
MFC after: 2 weeks
|
250962 |
24-May-2013 |
tuexen |
Withdraw http://svnweb.freebsd.org/changeset/base/250809 since the real fix is in http://svnweb.freebsd.org/changeset/base/250952.
|
250809 |
19-May-2013 |
tuexen |
Initialize the fibnum for outgoing packets to 0. This avoids crashing due to the usage of uninitialized fibnum. This bugs became visiable after http://svnweb.freebsd.org/changeset/base/250700
MFC after: 2 weeks
|
250756 |
17-May-2013 |
tuexen |
Set errno to ETIMEDOUT if an SCTP association times out during setup.
MFC after: 1 week
|
250754 |
17-May-2013 |
tuexen |
Don't send an ABORT chunk with verification 0.
MFC after: 1 week
|
250613 |
13-May-2013 |
jimharris |
Fix typo in net.inet.tcp.minmss sysctl description.
MFC after: 3 days
|
250523 |
11-May-2013 |
hrs |
Add IFF_MONITOR support to gre(4).
Tested by: Chip Marshall MFC after: 1 week
|
250504 |
11-May-2013 |
glebius |
Rate limit the number of remotely triggered ARP log messages to 1 log message per second.
|
250466 |
10-May-2013 |
tuexen |
Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY socket option for SCTP sockets in the same way as for UDP or TCP sockets.
MFC after: 2 weeks
|
250300 |
06-May-2013 |
andre |
Back out r249318, r249320 and r249327 due to a heisenbug most likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation.
|
250251 |
04-May-2013 |
hrs |
Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group Address. Although KAME implementation used FF02:0:0:0:0:2::/96 based on older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed in RFC 4620.
The kernel always joins the /104-prefixed address, and additionally does /96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1. The default value of the sysctl is 1.
ping6(8) -N flag now uses /104-prefixed one. When this flag is specified twice, it uses /96-prefixed one instead.
Reviewed by: ume Based on work by: Thomas Scheffler PR: conf/174957 MFC after: 2 weeks
|
250000 |
27-Apr-2013 |
cperciva |
Move IPPROTO_IPV6 from #ifdef __BSD_VISIBLE to #if __POSIX_VISIBLE >= 201112 since POSIX 2001 states that it shall be defined.
Reported by: sbruno Reviewed by: jilles MFC after: 1 week
|
249925 |
26-Apr-2013 |
glebius |
Add const qualifier to the dst parameter of the ifnet if_output method.
|
249903 |
25-Apr-2013 |
glebius |
Fix couple of mbuf leaks in incoming ARP processing.
|
249894 |
25-Apr-2013 |
glebius |
Introduce a pointer to const variable gw, which points either at the same place as dst, or to the sockaddr in the routing table.
The const constraint of gw makes us safe from modifing routing table accidentially. And "onstantness" of dst allows us to remove several bandaids, when we switched it back at &ro->ro_dst, now it always points there.
Reviewed by: rrs
|
249848 |
24-Apr-2013 |
rrs |
This fixes the issue with the "randomly changing" default route. What it was is there are two places in ip_output.c where we do a goto again. One place was fine, it copies out the new address and then resets dst = ro->rt_dst; But the other place does *not* do that, which means earlier when we found the gateway, we have dst pointing there aka dst = ro->rt_gateway is done.. then we do a goto again.. bam now we clobber the default route.
The fix is just to move the again so we are always doing dst = &ro->rt_dst; in the again loop.
PR: 174749,157796 MFC after: 1 week
|
249809 |
23-Apr-2013 |
andre |
When doing RFC3042 limited transmit on the first on second duplicate ACK make sure we actually have new data to send. This prevents us from sending unneccessary pure ACKs.
Reported by: Matt Miller <matt@matthewjmiller.net> Tested by: Matt Miller <matt@matthewjmiller.net> MFC after: 2 weeks
|
249742 |
21-Apr-2013 |
oleg |
Plug static llentry leak (ipv4 & ipv6 were affected).
PR: kern/172985 MFC after: 1 month
|
249585 |
17-Apr-2013 |
gabor |
- Corrrect mispellings of word useful
Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
|
249562 |
16-Apr-2013 |
delphij |
Fix incomplete printf.
PR: kern/177889 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week
|
249559 |
16-Apr-2013 |
delphij |
Don't leak lock when returning.
PR: kern/177888 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week
|
249411 |
12-Apr-2013 |
ae |
Reflect removing of the counter_u64_subtract() function in the macro.
|
249372 |
11-Apr-2013 |
glebius |
Fix tcp_output() so that tcpcb is updated in the same manner when an mbuf allocation fails, as in a case when ip_output() returns error.
To achieve that, move large block of code that updates tcpcb below the out: label.
This fixes a panic, that requires the following sequence to happen:
1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss 2) The retransmit timeout happened for the SYN we had sent, tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output(). In tcp_output m_get() fails. 3) Later on the SYN|ACK for the SYN sent in step 1) came, tcp_input sets tp->snd_una += 1, which leads to tp->snd_una > tp->snd_nxt inconsistency, that later panics in socket buffer code.
For reference, this bug fixed in DragonflyBSD repo:
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419
Reviewed by: andre Tested by: pho Sponsored by: Nginx, Inc. PR: kern/177456 Submitted by: HouYeFei&XiBoLiu <lglion718 163.com>
|
249327 |
10-Apr-2013 |
glebius |
Fix build.
|
249318 |
09-Apr-2013 |
andre |
Change certain heavily used network related mutexes and rwlocks to reside on their own cache line to prevent false sharing with other nearby structures, especially for those in the .bss segment.
NB: Those mutexes and rwlocks with variables next to them that get changed on every invocation do not benefit from their own cache line. Actually it may be net negative because two cache misses would be incurred in those cases.
|
249317 |
09-Apr-2013 |
andre |
Fix a race condition on tcp listen socket teardown with pending connections in the accept queue and contiguous new incoming SYNs.
Compared to the original submitters patch I've moved the test next to the SYN handling to have it together in a logical unit and reworded the comment explaining the issue.
Submitted by: Matt Miller <matt@matthewjmiller.net> Submitted by: Juan Mojica <jmojica@gmail.com> Reviewed by: Matt Miller (changes) Tested by: pho MFC after: 1 week
|
249302 |
09-Apr-2013 |
glebius |
Fix VIMAGE build.
|
249294 |
09-Apr-2013 |
ae |
Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.
MFC after: 1 week
|
249276 |
08-Apr-2013 |
glebius |
Merge from projects/counters: TCP/IP stats.
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).
This speeds up IP forwarding at extreme packet rates, and makes accounting more precise.
Sponsored by: Nginx, Inc.
|
248953 |
31-Mar-2013 |
tuexen |
Add a macro for checking for IPv4 link local addresses.
MFC after: 1 week
|
248914 |
29-Mar-2013 |
emaste |
Keep fwd_tag around for subsequent pcb lookups
For TIMEWAIT handling tcp_input may have to jump back for an additional pass through pcblookup. Prior to this change the fwd_tag had been discarded after the first lookup, so a new connection attempt delivered locally via 'ipfw fwd' would fail to find a match.
As of r248886 the tag will be detached and freed when passed to the socket buffer.
|
248552 |
20-Mar-2013 |
melifaro |
Add ipfw support for setting/matching DiffServ codepoints (DSCP).
Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg.
Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words).
Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order):
Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin
PR: kern/102471, kern/121122 MFC after: 2 weeks
|
248416 |
17-Mar-2013 |
glebius |
In m_megapullup() instead of reserving some space at the end of packet, m_align() it, reserving space to prepend data.
Reviewed by: mav
|
248373 |
16-Mar-2013 |
glebius |
- Replace compat macros with function calls.
|
248326 |
15-Mar-2013 |
glebius |
We can, and should use M_WAITOK here.
Sponsored by: Nginx, Inc.
|
248324 |
15-Mar-2013 |
glebius |
Use m_get/m_gethdr instead of compat macros.
Sponsored by: Nginx, Inc.
|
248323 |
15-Mar-2013 |
glebius |
- Use m_getcl() instead of hand allocating.
Sponsored by: Nginx, Inc.
|
248207 |
12-Mar-2013 |
glebius |
Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2().
Sorry for churn, but better now than later.
|
248158 |
11-Mar-2013 |
glebius |
Remove LIBALIAS_LOCK_ASSERT(), including a couple with an uninitialzed argument, in code that isn't compiled in kernel.
PR: kern/176667 Sponsored by: Nginx, Inc.
|
247906 |
07-Mar-2013 |
lstewart |
The hashmask returned by hashinit() is a valid index in the returned hash array. Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in hashdestroy() by ensuring the last array index in the flow counter hash table is flushed of entries.
MFC after: 3 days
|
247777 |
04-Mar-2013 |
davide |
- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats.
This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision.
Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil
|
247412 |
27-Feb-2013 |
tuexen |
Fix a potential race in returning setting errno when an association goes down. Reported by Mozilla in https://bugzilla.mozilla.org/show_bug.cgi?id=845513
MFC after: 3 days
|
247104 |
21-Feb-2013 |
gallatin |
Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED. Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload.
Sponsored by: Myricom Inc.
MFC after: 7 days
|
247044 |
20-Feb-2013 |
pluknet |
ip_savecontrol() style fixes. No functional changes. - fix indentation - put the operator at the end of the line for long statements - remove spaces between the type and the variable in a cast - remove excessive parentheses
Tested by: md5
|
246687 |
11-Feb-2013 |
tuexen |
Send the adaptation layer indication only if set by the user.
MFC after: 3 days Discussed with: rrs
|
246674 |
11-Feb-2013 |
tuexen |
Don't send kernel provided information in the User Initiated ABORT cause, since the user can also provide this kind of information. So the receiver doesn't know who provided the information. While there: Fix a bug where the stack would send a malformed ABORT chunk when using a send() call with SCTP_ABORT|SCT_SENDALL flags.
MFC after: 3 days
|
246659 |
11-Feb-2013 |
glebius |
Resolve source address selection in presense of CARP. Add a couple of helper functions:
- carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP.
Utilize ifa_preferred() in ifa_ifwithnet().
The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later.
Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc
|
246635 |
10-Feb-2013 |
tuexen |
Make sure that received packets for removed addresses are handled consistently. While there, make variable names consistent.
MFC after: 3 days
|
246595 |
09-Feb-2013 |
tuexen |
Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK only the supported address types. While there, do some whitespace cleanups.
MFC after: 1 week
|
246588 |
09-Feb-2013 |
tuexen |
Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or SHUTDOWN_ACK_SENT state. While there, make the corresponding code consistent.
MFC after: 1 week
|
246210 |
01-Feb-2013 |
jhb |
Add placeholder constants to reserve a portion of the socket option name space for use by downstream vendors to add custom options.
MFC after: 2 weeks
|
246208 |
01-Feb-2013 |
andre |
uma_zone_set_max() directly returns the rounded effective zone limit. Use the return value directly instead of doing a second uma_zone_set_max() step.
MFC after: 1 week
|
246144 |
31-Jan-2013 |
glebius |
- Move AUTHORS and ACKNOWLEDGEMENTS to the end of the page. - Add myself to list of authors.
|
246143 |
31-Jan-2013 |
glebius |
Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag.
New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value.
The structure declaration is left for compatibility with third party software, but in tree code no longer use it.
Reviewed by: ru, andre, net@
|
246130 |
30-Jan-2013 |
glebius |
Utilize m_get2() to get mbuf of appropriate size.
|
245934 |
26-Jan-2013 |
np |
Add checks for SO_NO_OFFLOAD in a couple of places that I missed earlier in r245915.
|
245932 |
26-Jan-2013 |
np |
Teach toe_l2_resolve to resolve IPv6 destinations too.
Reviewed by: bz@
|
245924 |
26-Jan-2013 |
np |
Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up in if_ether.h simply because arp_update_event used to be there too. IPv6 neighbor discovery is going to grow lle_event support and this is a good time to move it to if_llatbl.h.
The two in-tree consumers of this event - OFED and toecore - are not affected.
Reviewed by: bz@
|
245921 |
25-Jan-2013 |
np |
There is no need to call into the TOE driver twice in pru_rcvd (tod_rcvd and then tod_output right after that).
Reviewed by: bz@
|
245919 |
25-Jan-2013 |
np |
Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the one that exists for IPv4.
Reviewed by: bz@
|
245916 |
25-Jan-2013 |
np |
Teach toe_4tuple_check() to deal with IPv6 4-tuples too.
Reviewed by: bz@
|
245915 |
25-Jan-2013 |
np |
Heed SO_NO_OFFLOAD.
MFC after: 1 week
|
245914 |
25-Jan-2013 |
np |
Remove redundant test, we know inp_lport is 0.
MFC after: 1 week
|
245823 |
22-Jan-2013 |
jhb |
Use decimal values for UDP and TCP socket options rather than hex to avoid implying that these constants should be treated as bit masks.
Reviewed by: net MFC after: 1 week
|
245783 |
22-Jan-2013 |
lstewart |
Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited" logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could therefore potentially corrupt the result (although under normal operation, neither variable should legitmately exceed 32 bits).
[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html
Submitted by: jhb MFC after: 1 week
|
245238 |
09-Jan-2013 |
jhb |
Don't drop options from the third retransmitted SYN by default. If the SYNs (or SYN/ACK replies) are dropped due to network congestion, then the remote end of the connection may act as if options such as window scaling are enabled but the local end will think they are not. This can result in very slow data transfers in the case of window scaling disagreements.
The old behavior can be obtained by setting the net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.
Reviewed by: net@ MFC after: 2 weeks
|
244989 |
03-Jan-2013 |
peter |
Temporarily revert rev 244678. This is causing loopback problems with the lo (loopback) interfaces.
|
244730 |
27-Dec-2012 |
tuexen |
Some cleanups.
MFC after: 3 days
|
244729 |
27-Dec-2012 |
tuexen |
Minor cleanups of debug messages.
MFC after: 3 days
|
244728 |
27-Dec-2012 |
tuexen |
Fix a copy and paste error.
MFC after: 3 days
|
244683 |
25-Dec-2012 |
glebius |
Garbage collect carp_cksum().
|
244681 |
25-Dec-2012 |
glebius |
Change net.inet.carp.demotion sysctl to add the supplied value to the current demotion factor instead of assigning it.
This allows external scripts to control demotion factor together with kernel in a raceless manner.
|
244680 |
25-Dec-2012 |
glebius |
Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied, and arg2 doesn't pass size of arg1.
|
244678 |
25-Dec-2012 |
glebius |
The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify all interested parties in case if interface flag IFF_UP has changed.
However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol code, but in code of interface drivers. To fix this historical layering violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the IFF_UP flag, and if it did, run the if_up() handler.
This fixes configuring an address under CARP control on an interface that was initially !IFF_UP.
P.S. I intentionally omitted handling the IFF_SMART flag. This flag was never ever used in any driver since it was introduced, and since it means another layering violation, it should be garbage collected instead of pretended to be supported.
|
244665 |
24-Dec-2012 |
glebius |
Minor style(9) changes: - Remove declaration in initializer. - Add empty line between logical blocks.
|
244387 |
18-Dec-2012 |
glebius |
Fix !INET6 build after r244365.
|
244386 |
18-Dec-2012 |
glebius |
Clear correct flag in INET6 case.
|
244365 |
17-Dec-2012 |
ae |
Since we use different flags to detect tcp forwarding, and we share the same code for IPv4 and IPv6 in tcp_input, we should check both M_IP_NEXTHOP and M_IP6_NEXTHOP flags.
MFC after: 3 days
|
244183 |
13-Dec-2012 |
glebius |
Fix problem in r238990. The LLE_LINKED flag should be tested prior to entering llentry_free(), and in case if we lose the race, we should simply perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread performing arptimer(), it will remove two references from the lle instead of one.
Reported by: Ian FREISLICH <ianf clue.co.za>
|
244157 |
12-Dec-2012 |
glebius |
Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it, but later after processing and freeing the tag, we need to jump back again to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to process and free the tag for second time.
Reported & tested by: Pawel Tyll <ptyll nitronet.pl> MFC after: 3 days
|
244033 |
08-Dec-2012 |
tuexen |
Get it compiling without INET and INET6 support (mainly userland stack).
MFC after: 2 weeks
|
244031 |
08-Dec-2012 |
pjd |
More warnings for zones that depend on the kern.ipc.maxsockets limit.
Obtained from: WHEEL Systems
|
244026 |
08-Dec-2012 |
tuexen |
Use correct padding of the ABORT chunk in case of an user initiated abort cause is used.
MFC after: 2 weeks
|
244021 |
08-Dec-2012 |
tuexen |
Ensure that the padding of the last parameter of an INIT chunk is not included in the chunk length as required by RFC 4960. While there, cleanup sctp_send_initiate().
MFC after: 2 weeks
|
243882 |
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
243624 |
27-Nov-2012 |
andre |
Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability. Checksumming the IP header of fragments is no different from doing normal IP headers.
Discussed with: yongari MFC after: 1 week
|
243621 |
27-Nov-2012 |
andre |
Add DELACK to list of timers.
MFC after: 1 week
|
243603 |
27-Nov-2012 |
np |
Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not).
|
243594 |
27-Nov-2012 |
alfred |
Auto size the tcbhashsize structure based on max sockets.
While here, also make the code that enforces power-of-two more forgiving, instead of just resetting to 512, graciously round-down to the next lower power of two.
|
243565 |
26-Nov-2012 |
tuexen |
Add support for sctp_peeloff() also in the front states of the association.
MFC after: 3 days
|
243564 |
26-Nov-2012 |
tuexen |
Find the endpoint for an incoming packet also if the endpoint comes from sctp_peeloff().
MFC after: 3 days
|
243558 |
26-Nov-2012 |
tuexen |
Allow shutdown() to be used on fds returned from sctp_peeloff().
MFC after: 3 days
|
243516 |
25-Nov-2012 |
tuexen |
Remove unused function.
MFC after: 1 week
|
243186 |
17-Nov-2012 |
tuexen |
Add support for SCTP/UDP/IPV6. This completes the support of http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps
MFC after: 1 week
|
243157 |
16-Nov-2012 |
tuexen |
Get the accounting working. We now have counters how many chunks for each SCTP outgoing stream are in the send and sent queue. While there, improve the naming of NR-SACK related constants recently introduced.
MFC after: 1 week
|
242854 |
10-Nov-2012 |
rdivacky |
Initialize hdrlen to 0 to avoid clang warning in NOINET case.
|
242745 |
08-Nov-2012 |
bz |
Cleanup some whitspace in this file to get it out of an upcoming patch.
MFC after: 10 days
|
242714 |
07-Nov-2012 |
tuexen |
Add per outgoing stream accounting for chunks in the send and sent queue. This provides no functional change, but is a preparation for an upcoming stream reset improvement. Done with rrs@.
MFC after: 1 week
|
242709 |
07-Nov-2012 |
tuexen |
Add some missing changes missed in the last commit.
MFC after: 1 week X-MFC with: 242708
|
242708 |
07-Nov-2012 |
tuexen |
Improve PR-SCTP if used in combination with NR-SACK. Based on work done by Mohammad Rajiullah.
MFC after: 1 week
|
242692 |
07-Nov-2012 |
kevlo |
Fix typo; s/ouput/output
|
242680 |
06-Nov-2012 |
mjg |
Fix possible spurious sbunlock in sctp_sorecvmsg.
Reviewed by: tuexen Approved by: trasz (mentor) MFC after: 3 days
|
242627 |
05-Nov-2012 |
tuexen |
Move from early SSN assignment to late SSN assignment. This doesn't change functionality, but makes upcoming change much easier. Developed with rrs@ at the IETF 85.
MFC after: 1 week
|
242601 |
05-Nov-2012 |
andre |
Back out r242262. The simplified window change/update logic wasn't complete and ready for production use.
PR: kern/173309
|
242463 |
02-Nov-2012 |
ae |
Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set.
Suggested by: andre
|
242327 |
29-Oct-2012 |
tuexen |
Whitespace changes due to upstream integration of SCTP changes in the FreeBSD code base.
|
242326 |
29-Oct-2012 |
tuexen |
Add braces (as used elsewhere in the SCTP code).
|
242325 |
29-Oct-2012 |
tuexen |
Use ntohs() and htons() in correct order. However, this doesn't change functionality.
|
242311 |
29-Oct-2012 |
andre |
Forced commit to provide the correct commit message to r242251:
Defer sending an independent window update if a delayed ACK is pending saving a packet. The window update then gets piggy-backed on the next already scheduled ACK.
Added grammar fixes as well.
MFC after: 2 weeks
|
242308 |
29-Oct-2012 |
andre |
Define the delayed ACK timeout value directly as hz/10 instead of obfuscating it by going through PR_FASTHZ. No functional change.
MFC after: 2 weeks
|
242267 |
28-Oct-2012 |
andre |
If the user has closed the socket then drop a persisting connection after a much reduced timeout.
Typically web servers close their sockets quickly under the assumption that the TCP connections goes away as well. That is not entirely true however. If the peer closed the window we're going to wait for a long time with lots of data in the send buffer.
MFC after: 2 weeks
|
242266 |
28-Oct-2012 |
andre |
Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse.
As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp.
This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes.
Is is enabled by default. In Linux it is enabled since kernel 3.0.
MFC after: 2 weeks
|
242264 |
28-Oct-2012 |
andre |
Update comment to reflect the change made in r242263.
MFC after: 2 weeks
|
242263 |
28-Oct-2012 |
andre |
Add SACK_PERMIT to the list of TCP options that are switched off after retransmitting a SYN three times.
MFC after: 2 weeks
|
242262 |
28-Oct-2012 |
andre |
Simplify and enhance the window change/update acceptance logic, especially in the presence of bi-directional data transfers.
snd_wl1 tracks the right edge, including data in the reassembly queue, of valid incoming data. This makes it like rcv_nxt plus reassembly. It never goes backwards to prevent older, possibly reordered segments from updating the window.
snd_wl2 tracks the left edge of sent data. This makes it a duplicate of snd_una. However joining them right now is difficult due to separate update dependencies in different places in the code flow.
snd_wnd tracks the current advertized send window by the peer. In tcp_output() the effective window is calculated by subtracting the already in-flight data, snd_nxt less snd_una, from it.
ACK's become the main clock of window updates and will always update the window when the left edge of what we sent is advanced. The ACK clock is the primary signaling mechanism in ongoing data transfers. This works reliably even in the presence of reordering, reassembly and retransmitted segments. The ACK clock is most important because it determines how much data we are allowed to inject into the network.
Zero window updates get us out of persistence mode are crucial. Here a segment that neither moves ACK nor SEQ but enlarges WND is accepted.
When the ACK clock is not active (that is we're not or no longer sending any data) any segment that moves the extended right SEQ edge, including out-of-order segments, updates the window. This gives us updates especially during ping-pong transfers where the peer isn't done consuming the already acknowledged data from the receive buffer while responding with data.
The SSH protocol is a prime candidate to benefit from the improved bi-directional window update logic as it has its own windowing mechanism on top of TCP and is frequently sending back protocol ACK's.
Tcpdump provided by: darrenr Tested by: darrenr MFC after: 2 weeks
|
242261 |
28-Oct-2012 |
andre |
For retransmits of SYN|ACK from the syncache use the slightly more aggressive special tcp_syn_backoff[] retransmit schedule instead of the normal tcp_backoff[] schedule for established connections.
MFC after: 2 weeks
|
242260 |
28-Oct-2012 |
andre |
When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE, the default retransmit timeout, as base to calculate the backoff time until next try instead of the TCP_REXMTVAL() macro which only works correctly when we already have measured an actual RTT+RTTVAR.
Before it would cause the first retransmit at RTOBASE, the next four at the same time (!) about 200ms later, and then another one again RTOBASE later.
MFC after: 2 weeks
|
242257 |
28-Oct-2012 |
andre |
Remove bogus 'else' in #ifdef that prevented the rttvar from being reset tcp_timer_rexmt() on retransmit for IPv6 sessions.
MFC after: 2 weeks
|
242255 |
28-Oct-2012 |
andre |
Allow arbitrary MSS sizes and don't mind about the cluster size anymore. We've got more cluster sizes for quite some time now and the orginally imposed limits and the previously codified thoughts on efficiency gains are no longer true.
MFC after: 2 weeks
|
242254 |
28-Oct-2012 |
andre |
Change the syncache count reporting the current number of entries from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA.
Also read back the actual cache_limit after page size rounding by UMA.
PR: kern/165879 MFC after: 2 weeks
|
242253 |
28-Oct-2012 |
andre |
Simplify implementation of net.inet.tcp.reass.maxsegments and net.inet.tcp.reass.cursegments.
MFC after: 2 weeks
|
242252 |
28-Oct-2012 |
andre |
Prevent a flurry of forced window updates when an application is doing small reads on a (partially) filled receive socket buffer.
Normally one would a send a window update every time the available space in the socket buffer increases by two times MSS. This leads to a flurry of window updates that do not provide any meaningful new information to the sender. There still is available space in the window and the sender can continue sending data. All window updates then get carried by the regular ACKs. Only when the socket buffer was (almost) full and the window closed accordingly a window updates delivery new information and allows the sender to start sending more data again.
Send window updates only every two MSS when the socket buffer has less than 1/8 space available, or the available space in the socket buffer increased by 1/4 its full capacity, or the socket buffer is very small. The next regular data ACK will carry and report the exact window size again.
Reported by: sbruno Tested by: darrenr Tested by: Darren Baginski PR: kern/116335 MFC after: 2 weeks
|
242251 |
28-Oct-2012 |
andre |
When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering.
Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state.
MFC after: 2 weeks
|
242250 |
28-Oct-2012 |
andre |
When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering.
Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state.
MFC after: 2 weeks
|
242249 |
28-Oct-2012 |
andre |
Adjust the initial default CWND upon connection establishment to the new and increased values specified by RFC5681 Section 3.1.
The even larger initial CWND per RFC3390, if enabled, is not affected.
MFC after: 2 weeks
|
242161 |
26-Oct-2012 |
glebius |
o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack.
Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>
|
242079 |
25-Oct-2012 |
ae |
Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default.
Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks
|
242077 |
25-Oct-2012 |
glebius |
After r241923 the updated ip_len no longer needed.
|
242076 |
25-Oct-2012 |
glebius |
Fix error in r241913 that had broken fragment reassembly.
|
241926 |
23-Oct-2012 |
glebius |
Use ip_stripoptions() instead of handrolled version.
|
241925 |
23-Oct-2012 |
glebius |
Simplify ip_stripoptions() reducing number of intermediate variables.
|
241923 |
23-Oct-2012 |
glebius |
Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet.
Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.
|
241916 |
22-Oct-2012 |
delphij |
Remove __P.
Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months
|
241913 |
22-Oct-2012 |
glebius |
Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet.
After this change a packet processed by the stack isn't modified at all[2] except for TTL.
After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack.
[1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility.
[2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon.
Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
|
241735 |
19-Oct-2012 |
zont |
- Update cachelimit after hashsize and bucketlimit were set.
Reported by: az Reviewed by: melifaro Approved by: kib (mentor) MFC after: 1 week
|
241686 |
18-Oct-2012 |
andre |
Mechanically remove the last stray remains of spl* calls from net*/*. They have been Noop's for a long time now.
|
241648 |
17-Oct-2012 |
emaste |
Avoid potential bad pointer dereference.
Previously RuleAdd would leave entry->la unset for the first entry in the proxyList.
Sponsored by: ADARA Networks MFC After: 1 week
|
241575 |
15-Oct-2012 |
glebius |
We don't need to convert ip6_len to host byte order before ip6_output(), the IPv6 stack is working in net byte order.
The reason this code worked before is that ip6_output() doesn't look at ip6_plen at all and recalculates it based on mbuf length.
|
241547 |
14-Oct-2012 |
glebius |
Fix a miss from r241344: in ip_mloopback() we need to go to net byte order prior to calling in_delayed_cksum().
Reported by: Olivier Cochard-Labbe <olivier cochard.me>
|
241502 |
13-Oct-2012 |
melifaro |
Cleanup documentation: cloning route support has been removed in r186119.
MFC after: 2 weeks
|
241481 |
12-Oct-2012 |
glebius |
Revert fixup of ip_len from r241480. Now stack isn't yet ready for that change.
|
241480 |
12-Oct-2012 |
glebius |
In ip_stripoptions(): - Remove unused argument and incorrect comment. - Fixup ip_len after stripping.
|
241406 |
10-Oct-2012 |
melifaro |
Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is enabled. This eliminates one mtx_lock() per each routing lookup thus improving performance in several cases (routing to directly connected interface or routing to default gateway).
Icmp redirects should not be used to provide routing direction nowadays, even for end hosts. Routers should not use them too (and this is explicitly restricted in IPv6, see RFC 4861, clause 8.2).
Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every AF_INET routing table in given VNET instance on drop_redirect sysctl change.
This change is part of bigger patch eliminating rte locking.
Sponsored by: Yandex LLC MFC after: 2 weeks
|
241394 |
10-Oct-2012 |
kevlo |
Revert previous commit...
Pointyhat to: kevlo (myself)
|
241370 |
09-Oct-2012 |
kevlo |
Prefer NULL over 0 for pointers
|
241344 |
08-Oct-2012 |
glebius |
After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().
|
241342 |
08-Oct-2012 |
glebius |
No reason to play with IP header before calling sctp_delayed_cksum() with offset beyond the IP header.
|
241245 |
06-Oct-2012 |
glebius |
A step in resolving mess with byte ordering for AF_INET. After this change:
- All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated.
Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)
|
241129 |
02-Oct-2012 |
glebius |
There is a complex race in in_pcblookup_hash() and in_pcblookup_group(). Both functions need to obtain lock on the found PCB, and they can't do classic inter-lock with the PCB hash lock, due to lock order reversal. To keep the PCB stable, these functions put a reference on it and after PCB lock is acquired drop it. If the reference was the last one, this means we've raced with in_pcbfree() and the PCB is no longer valid.
This approach works okay only if we are acquiring writer-lock on the PCB. In case of reader-lock, the following scenario can happen:
- 2 threads locate pcb, and do in_pcbref() on it. - These 2 threads drop the inp hash lock. - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock, does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 2 aforementioned threads acquire reader lock on the pcb and run in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues, second gets 0 and considers pcb freed, returns. - The thread that got 1 continutes working with detached pcb, which later leads to panic in the underlying protocol level.
To plumb that problem an additional INPCB flag introduced - INP_FREED. We check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend that that was the last reference.
Discussed with: rwatson, jhb Reported by: Vladimir Medvedkin <medved rambler-co.ru>
|
241043 |
29-Sep-2012 |
glebius |
carp_send_ad() should never return without rescheduling next run.
|
240985 |
27-Sep-2012 |
glebius |
Fix bug in TCP_KEEPCNT setting, which slipped in in the last round of reviewing of r231025.
Unlike other options from this family TCP_KEEPCNT doesn't specify time interval, but a count, thus parameter supplied doesn't need to be multiplied by hz.
Reported & tested by: amdmi3
|
240849 |
23-Sep-2012 |
tuexen |
Whitespace change.
MFC after: 3 days
|
240848 |
23-Sep-2012 |
tuexen |
Declare a static function as such.
MFC after: 3 days
|
240842 |
22-Sep-2012 |
tuexen |
Fix a bug related to handling Re-config chunks. It is not true that the association can be removed if the socket is gone.
MFC after: 3 days
|
240826 |
22-Sep-2012 |
tuexen |
Small cleanups. No functional change.
MFC after: 10 days
|
240725 |
20-Sep-2012 |
kevlo |
Fix typo: s/pakcet/packet
|
240520 |
14-Sep-2012 |
eadler |
s/teh/the/g
Approved by: cperciva MFC after: 3 days
|
240507 |
14-Sep-2012 |
tuexen |
Small cleanups. No functional change.
MFC after: 10 days
|
240494 |
14-Sep-2012 |
glebius |
o Create directory sys/netpfil, where all packet filters should reside, and move there ipfw(4) and pf(4).
o Move most modified parts of pf out of contrib.
Actual movements:
sys/contrib/pf/net/*.c -> sys/netpfil/pf/ sys/contrib/pf/net/*.h -> sys/net/ contrib/pf/pfctl/*.c -> sbin/pfctl contrib/pf/pfctl/*.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/*.4 -> share/man/man4 contrib/pf/pfctl/*.5 -> share/man/man5
sys/netinet/ipfw -> sys/netpfil/ipfw
The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice.
Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd.
The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match.
Discussed with: bz, luigi
|
240263 |
09-Sep-2012 |
tuexen |
Whitespace changes.
MFC after: 10 days
|
240250 |
08-Sep-2012 |
tuexen |
Whitespace cleanup.
MFC after: 10 days
|
240233 |
08-Sep-2012 |
glebius |
Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code:
o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port.
New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers.
Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged:
r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.
I'd like to thank people who participated in early testing:
Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>
|
240198 |
07-Sep-2012 |
tuexen |
Don't include a structure containing a flexible array in another structure.
MFC after: 10 days
|
240158 |
06-Sep-2012 |
tuexen |
Get rid of a gcc'ism.
MFC after: 10 days
|
240148 |
05-Sep-2012 |
tuexen |
Using %p in a format string requires a void *.
MFC after: 10 days
|
240115 |
04-Sep-2012 |
tuexen |
Use the consistenly the size of a variable. This helps to keep the code simpler for the userland implementation.
MFC after: 3 days
|
240114 |
04-Sep-2012 |
tuexen |
Whitespace change.
MFC after: 3 days
|
240099 |
04-Sep-2012 |
melifaro |
Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit.
Remove ipfw headers from ether/bridge code since they are unneeded now.
Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet.
MFC after: 3 weeks
|
240073 |
03-Sep-2012 |
glebius |
Provide a sysctl switch that allows to install ARP entries with multicast bit set. FreeBSD refuses to install such entries since 9.0, and this broke installations running Microsoft NLB, which are violating standards.
Tested by: Tarasov Oleg <oleg_tarasov sg-tea.com>
|
240007 |
02-Sep-2012 |
tuexen |
Fix a typo which results in RTT to be off by a factor of 10, if the RTT is larger than 1 second.
MFC after: 3 days
|
239997 |
01-Sep-2012 |
eadler |
Mark the ipfw interface type as not being ether. This fixes an issue where uuidgen tried to obtain a ipfw device's mac address which was always zero.
PR: 170460 Submitted by: wxs Reviewed by: bdrewery Reviewed by: delphij Approved by: cperciva MFC after: 1 week
|
239672 |
25-Aug-2012 |
rrs |
This small change takes care of a race condition that can occur when both sides close at the same time. If that occurs, without this fix the connection enters FIN1 on both sides and they will forever send FIN|ACK at each other until the connection times out. This is because we stopped processing the FIN|ACK and thus did not advance the sequence and so never ACK'd each others FIN. This fix adjusts it so we *do* process the FIN properly and the race goes away ;-)
MFC after: 1 month
|
239511 |
21-Aug-2012 |
np |
Correctly handle the case where an inp has already been dropped by the time the TOE driver reports that an active open failed. toe_connect_failed is supposed to handle this but it should be provided the inpcb instead of the tcpcb which may no longer be around.
|
239395 |
19-Aug-2012 |
rrs |
Though I disagree, I conceed to jhb & Rui. Note that we still have a problem with this whole structure of locks and in_input.c [it does not lock which it should not, but this *can* lead to crashes]. (I have seen it in our SQA testbed.. besides the one with a refcnt issue that I will have SQA work on next week ;-)
|
239353 |
17-Aug-2012 |
rrs |
Ok jhb, lets move the ifa_free() down to the bottom to assure that *all* tables and such are removed before we start to free. This won't protect the Hash in ip_input.c but in theory should protect any other uses that *do* use locks.
MFC after: 1 week (or more)
|
239346 |
17-Aug-2012 |
lstewart |
The TCP PAWS fix for kernels with fast tick rates (r231767) changed the TCP timestamp related stack variables to reference ms directly instead of ticks. The h_ertt(4) Khelp module relies on TCP timestamp information in order to calculate its enhanced RTT estimates, but was not updated as part of r231767.
Consequently, h_ertt has not been calculating correct RTT estimates since r231767 was comitted, which in turn broke all delay-based congestion control algorithms because they rely on the h_ertt RTT estimates.
Fix the breakage by switching h_ertt to use tcp_ts_getticks() in place of all previous uses of the ticks variable. This ensures all timestamp related variables in h_ertt use the same units as the TCP stack and therefore results in meaningful comparisons and RTT estimate calculations.
Reported & tested by: Naeem Khademi (naeemk at ifi uio no) Discussed with: bz MFC after: 3 days
|
239334 |
16-Aug-2012 |
rrs |
Its never a good idea to double free the same address.
MFC after: 1 week (after the other commits ahead of this gets MFC'd)
|
239124 |
07-Aug-2012 |
luigi |
s/lenght/length/ in comments
|
239093 |
06-Aug-2012 |
luigi |
move functions outside the SYSBEGIN/SYSEND block
(SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to emulate sysctl on platforms that do not have them, and they work by creating an array which contains all the sysctl-ed symbols.)
|
239092 |
06-Aug-2012 |
luigi |
use FREE_PKT instead of m_freem to free an mbuf. The former is the standard form used in ipfw/dummynet, so that it is easier to remap it to different memory managers depending on the platform.
|
239091 |
06-Aug-2012 |
tuexen |
Fix a bug found by dim@: Don't use an uninitilized variable, if INVARIANTS is on and an illegal packet with destination 0 is received.
MFC after: 3 days X-MFC with: 238003
|
239075 |
05-Aug-2012 |
trociny |
In tcp timers, check INP_DROPPED flag a little later, after callout_deactivate(), so if INP_DROPPED is set we return with the timer active flag cleared.
For me this fixes negative keep timer values reported by `netstat -x' for connections in CLOSE state.
Approved by: net (silence) MFC after: 2 weeks
|
239052 |
05-Aug-2012 |
tuexen |
Fix a refcount issue. The called only decrements is stcb is NULL.
MFC after: 3 days Discussed with: rrs
|
239041 |
04-Aug-2012 |
tuexen |
Fix a bug reported by Simon L. B. Nielsen: If an SCTP endpoint receives an ASCONF with a wildcard lookup address and incorrect verification tag, the system crashes.
MFC after: 3 days.
|
239035 |
04-Aug-2012 |
tuexen |
Testing an interface property should depend on the interface, not on an address.
MFC after: 3 days
|
238990 |
02-Aug-2012 |
glebius |
Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer():
o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via |= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs.
The patch is a collaborative work of all submitters and myself.
PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
|
238988 |
02-Aug-2012 |
luigi |
replace __unused with a portable construct; fix a couple of signed/unsigned warnings.
|
238978 |
01-Aug-2012 |
luigi |
replace inet_ntoa_r with the more standard inet_ntop(). As discussed on -current, inet_ntoa_r() is non standard, has different arguments in userspace and kernel, and almost unused (no clients in userspace, only net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c in the kernel)
|
238977 |
01-Aug-2012 |
luigi |
add a cast to avoid a signed/unsigned warning (to be removed when we will have TUNABLE_UINT constructors)
|
238967 |
01-Aug-2012 |
glebius |
Some more whitespace cleanup.
|
238945 |
31-Jul-2012 |
glebius |
Some style(9) and whitespace changes.
Together with: Andrey Zonov <andrey zonov.org>
|
238941 |
31-Jul-2012 |
luigi |
nobody uses this file except the userspace ipfw code, but the cast of a pointer to an integer needs a cast to prevent a warning for size mismatch.
MFC after: 1 week
|
238790 |
26-Jul-2012 |
tuexen |
Fix the sctp_sockstore union such that userland programs don't depend on INET and/or INET6 to be defined and in-tune with how the kernel was compiled.
MFC after: 3 days Discussed with: rrs
|
238769 |
25-Jul-2012 |
bz |
Fix a problem when CARP is enabled on the interface for IPv4 but not for IPv6. The current checks in nd6_nbr.c along with the old version will result in ifa being NULL and subsequently the packet will be dropped. This prevented NS/NA, from working and with that IPv6.
Now return the ifa from the carp lookup function in two cases: 1) if the address matches, is a carp address, and we are MASTER (as before), 2) if the address matches but it is not a carp address at all (new).
Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood) Tested on: New Y! FreeBSD cluster machines Reviewed by: glebius
|
238699 |
22-Jul-2012 |
rwatson |
Update some stale comments regarding tcbinfo locking in the TCP input path: read locks on tcbinfo are no longer used, so won't happen. No functional change.
MFC after: 3 days
|
238573 |
18-Jul-2012 |
glebius |
Plug a reference leak: before doing 'goto again' we need to unref ia->ia_ifa if there is any.
Submitted by: Andrey Zonov <andrey zonov.org>
|
238572 |
18-Jul-2012 |
glebius |
When traversing global in_ifaddr list in the IFP_TO_IA() macro, we need to obtain IN_IFADDR_RLOCK().
|
238550 |
17-Jul-2012 |
tuexen |
Fix a refcount bug when freeing an association. While there: Change code to be consistent. Discussed with rrs@. MFC after: 3 days
|
238516 |
16-Jul-2012 |
glebius |
If ip_output() returns EMSGSIZE to tcp_output(), then the latter calls tcp_mtudisc(), which in its turn may call tcp_output(). Under certain conditions (must admit they are very special) an infinite recursion can happen.
To avoid recursion we can pass struct route to ip_output() and obtain correct mtu. This allows us not to use tcp_mtudisc() but call tcp_mss_update() directly.
PR: kern/155585 Submitted by: Andrey Zonov <andrey zonov.org> (original version of patch)
|
238501 |
15-Jul-2012 |
tuexen |
Changes which improve compilation if neither INET nor INET6 is defined.
MFC after: 3 days
|
238475 |
15-Jul-2012 |
tuexen |
#ifdef INET and INET6 consistently. This also fixes a bug, where it was done wrong.
MFC after: 3 days
|
238458 |
14-Jul-2012 |
tuexen |
Provide the correct notification type (SCTP_SEND_FAILED_EVENT) for unsent messages.
MFC after: 3 days
|
238455 |
14-Jul-2012 |
tuexen |
Use case for selecting the address family (as in other places).
MFC after: 3 days
|
238454 |
14-Jul-2012 |
tuexen |
Use case for selecting the address family (as in other places).
MFC after: 3 days
|
238294 |
09-Jul-2012 |
tuexen |
Fix a bug introduced in r237715.
MFC after:i 3 days.
|
238277 |
09-Jul-2012 |
hrs |
Make ipfw0 logging pseudo-interface clonable. It can be created automatically by $firewall_logif rc.conf(5) variable at boot time or manually by ifconfig(8) after a boot.
Discussed on: freebsd-ipfw@
|
238265 |
08-Jul-2012 |
melifaro |
Finally fix lookup (account remaining '\0') and deletion (provide valid key length for radix lookup).
Submitted by: Ihor Kaharlichenko<madkinder at gmail.com> (prev version) Approved by: kib(mentor) MFC after: 3 days
Sponsored by: Shtorm ISP
|
238122 |
04-Jul-2012 |
tuexen |
Use consistent method to determine IPV4_OUTPUT/IPV6_OUTPUT.
MFC after: 3 days
|
238121 |
04-Jul-2012 |
tuexen |
Use CSUM_SCTP_IPV6 for IPv6.
MFC after: 3 days
|
238092 |
04-Jul-2012 |
glebius |
When ip_output()/ip6_output() is supplied a struct route *ro argument, it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning here: it may be supplied to provide route, and it may be supplied to store and return to caller the route that ip_output()/ip6_output() finds. In the latter case skipping FLOWTABLE lookup is pessimisation.
The difference between struct route filled by FLOWTABLE and filled by rtalloc() family is that the former doesn't hold a reference on its rtentry. Reference is hold by flow entry, and it is about to be released in future. Thus, route filled by FLOWTABLE shouldn't be passed to RTFREE() macro.
- Introduce new flag for struct route/route_in6, that marks route not holding a reference on rtentry. - Introduce new macro RO_RTFREE() that cleans up a struct route depending on its kind. - All callers to ip_output()/ip6_output() that do supply non-NULL but empty route should use RO_RTFREE() to free results of lookup. - ip_output()/ip6_output() now do FLOWTABLE lookup always when ro->ro_rt == NULL.
Tested by: tuexen (SCTP part)
|
238087 |
03-Jul-2012 |
tuexen |
Iniitialize a variable.
MFC after: 3 days
|
238084 |
03-Jul-2012 |
trociny |
Don't check for ifp != NULL before KASSERT, as ifp may not be NULL here (it is dereferenced below).
Discussed with: jhb MFC after: 1 week
|
238083 |
03-Jul-2012 |
trociny |
Fix RTTVAR scale in net.inet.tcp.hostcache.list sysctl.
Reviewed by: andre MFC after: 3 days
|
238063 |
03-Jul-2012 |
issyl0 |
- Make ipfw's sched rules case insensitive, for user-friendliness. - Add a note to the ipfw(8) man page about the rules no longer being case sensitive. - Fix some typos in the man page.
PR: docs/164772 Reviewed by: bz Approved by: gabor (doc mentor, src committer) MFC after: 2 weeks
|
238016 |
02-Jul-2012 |
glebius |
Remove route caching from IP multicast routing code. There is no reason to do that, and also, cached route never got unreferenced, which meant a reference leak.
Reviewed by: bms
|
238003 |
02-Jul-2012 |
tuexen |
Move common code parts to sctp_common_input_processing().
MFC after: 3 days
|
238002 |
02-Jul-2012 |
tuexen |
Remove dead code (on FreeBSD) as suggested by glebius@.
MFC after: 3 days
|
237715 |
28-Jun-2012 |
tuexen |
Pass the src and dst address of a received packet explicitly around.
MFC after: 3 days
|
237569 |
25-Jun-2012 |
tuexen |
Unify sctp_input() and sctp6_input().
MFC after: 3 days
|
237565 |
25-Jun-2012 |
tuexen |
Whitespace cleanup.
MFC after: 3 days
|
237542 |
24-Jun-2012 |
tuexen |
Pass the packet length explicitly around.
MFC after: 3 days
|
237541 |
24-Jun-2012 |
tuexen |
Remove redundant check.
MFC after: 3 days
|
237540 |
24-Jun-2012 |
tuexen |
Do packet logging in a consistent way.
MFC after: 3 days
|
237479 |
23-Jun-2012 |
melifaro |
Fix interface matching by ipfw table
Submitted by: Ihor Kaharlichenko <madkinder@gmail.com> Tested by: Ihor Kaharlichenko <madkinder@gmail.com> Approved by: kib(mentor) MFC after: 3 days
|
237392 |
21-Jun-2012 |
tuexen |
Remove redundant #ifdef. Reported by gnn@.
MFC after: 3 days
|
237263 |
19-Jun-2012 |
np |
- Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon.
Build-tested with make universe.
30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe
Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
|
237230 |
18-Jun-2012 |
tuexen |
Add rate limitation for SCTP OOTB responses.
MFC after: 3 days
|
237229 |
18-Jun-2012 |
tuexen |
Cleanup the UDP decapsulation code.
MFC after: 3 days
|
237049 |
14-Jun-2012 |
tuexen |
Pass flowid explicitly through the stack instead of taking it from the mbuf chain at different places. While there: Fix several bugs related to VRFs.
MFC after: 3 days
|
237015 |
13-Jun-2012 |
joel |
mdoc: avoid nested displays. Fixes mandoc warnings.
|
236961 |
12-Jun-2012 |
tuexen |
Add a cmsg of type IP_TOS for UDP/IPv4 sockets to specify the TOS byte.
MFC after: 3 days
|
236959 |
12-Jun-2012 |
tuexen |
Add a IP_RECVTOS socket option to receive for received UDP/IPv4 packets a cmsg of type IP_RECVTOS which contains the TOS byte. Much like IP_RECVTTL does for TTL. This allows to implement a protocol on top of UDP and implementing ECN.
MFC after: 3 days
|
236956 |
12-Jun-2012 |
tuexen |
Unify the sending of ABORT, SHUTDOWN-COMPLETE and ERROR chunks. While there: Fix also some minor bugs and prepare for SCTP/DTLS.
MFC after: 3 days
|
236949 |
12-Jun-2012 |
tuexen |
Small cleanup.
MFC after: 3 days
|
236819 |
09-Jun-2012 |
melifaro |
Validate IPv4 network mask being passed to ipfw kernel interface. Incorrect mask can possibly be one of the reasons for kern/127209 existance.
Approved by: kib(mentor) MFC after: 3 days
|
236596 |
05-Jun-2012 |
eadler |
Fix style nit: don't use leading zero for dates in .Dd
Prompted by: brueffer Approved by: brueffer MFC after: 3 days
|
236575 |
04-Jun-2012 |
emax |
Plug more refcount leaks and possible NULL deref for interface address list.
Submitted by: scottl@ MFC after: 3 days
|
236522 |
03-Jun-2012 |
tuexen |
Remove code which is not needed.
MFC after: 3 days
|
236515 |
03-Jun-2012 |
tuexen |
Use an existing function to get the source address.
MFC after: 3 days
|
236493 |
02-Jun-2012 |
tuexen |
Honor sysctl for TTL.
MFC after: 3 days
|
236492 |
02-Jun-2012 |
tuexen |
Don't request data from the IPv6 layer, which is not used.
MFC after: 3 days
|
236450 |
02-Jun-2012 |
tuexen |
Remove an unused parameter.
MFC after: 3 days
|
236394 |
01-Jun-2012 |
bz |
Make TCP LRO work properly with VIMAGE kernels rather than just panicing. There's no VIMAGE context set there yet as this is before if_ethersubr.c.
MFC after: 3 days X-MFC with: r235981
|
236391 |
01-Jun-2012 |
tuexen |
Small cleanups. No functional change.
MFC after: 3 days
|
236332 |
30-May-2012 |
tuexen |
Seperate SCTP checksum offloading for IPv4 and IPv6. While there: remove some trainling whitespaces.
MFC after: 3 days X-MFC with: 236170
|
236310 |
30-May-2012 |
glebius |
Improve style(9) of bcopy() to and from mbuf tag.
Submitted by: bde
|
236297 |
30-May-2012 |
glebius |
After r228571 carp_output() expects carp_softc * pointer in the mtag.
Noticed by: thompsa
|
236170 |
28-May-2012 |
bz |
It turns out that too many drivers are not only parsing the L2/3/4 headers for TSO but also for generic checksum offloading. Ideally we would only have one common function shared amongst all drivers, and perhaps when updating them for IPv6 we should introduce that. Eventually we should provide the meta information along with mbufs to avoid (re-)parsing entirely.
To not break IPv6 (checksums and offload) and to be able to MFC the changes without risking to hurt 3rd party drivers, duplicate the v4 framework, as other OSes have done as well.
Introduce interface capability flags for TX/RX checksum offload with IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6 flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6 fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and add an alias for CSUM_DATA_VALID_IPV6.
This pretty much brings IPv6 handling in line with IPv4. TSO is still handled in a different way and not via if_hwassist.
Update ifconfig to allow (un)setting of the new capability flags. Update loopback to announce the new capabilities and if_hwassist flags.
Individual driver updates will have to follow, as will SCTP.
Reported by: gallatin, dim, .. Reviewed by: gallatin (glanced at?) MFC after: 3 days X-MFC with: r235961,235959,235958
|
236157 |
27-May-2012 |
emaste |
Add IPPROTO_MPLS (rfc4023) IP protocol definition
There are currently no in-tree consumers; I'm adding it now for use by vendor code. This matches the change OpenBSD made while implementing MPLS in gif(4).
|
236093 |
26-May-2012 |
bz |
Trim the extra $FreeBSD$ from the comment below the license. We use the __FBSDID() macro on the file now instead.
MFC after: 3 days
|
236087 |
26-May-2012 |
tuexen |
Get rid of SCTP specific code to avoid CRC32C computations on loopback. Just just offloading. MFC after: 3 days
|
235990 |
25-May-2012 |
tuexen |
Undefine SCTP_PACKED before including sctp_uio.h, which doesn't use it. Spotted by Irene Ruengeler.
MFC after: 3 days
|
235985 |
25-May-2012 |
bz |
MFp4 bz_ipv6_fast:
Properly protect the inp read access when handling the control code. In the past this was expensive but given the rlock it's not so much anymore.
Spotted while: optimizing udp6 Discussed with: rwatson (a few months ago)
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
235981 |
25-May-2012 |
bz |
In case forwarding is turned on for a given address family, refuse to queue the packet for LRO and tell the driver to directly pass it on. This avoids re-assembly and later re-fragmentation problems when forwarding.
It's not the best solution but the simplest and most effective for the moment.
Should have been done: ages ago Discussed with and by: many MFC after: 3 days
|
235961 |
25-May-2012 |
bz |
MFp4 bz_ipv6_fast:
Add code to handle pre-checked TCP checksums as indicated by mbuf flags to save the entire computation for validation if not needed.
In the IPv6 TCP output path only compute the pseudo-header checksum, set the checksum offset in the mbuf field along the appropriate flag as done in IPv4.
In tcp_respond() just initialize the IPv6 payload length to 0 as ip6_output() will properly set it.
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
235950 |
25-May-2012 |
bz |
MFp4 bz_ipv6_fast:
Factor out the tcp_hc_getmtu() call. As the comments say it applies to both v4 and v6, so only write it once making it easier to read the protocol family specifc code.
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
235944 |
24-May-2012 |
bz |
MFp4 bz_ipv6_fast:
Significantly update tcp_lro for mostly two things: 1) introduce basic support for IPv6 without extension headers. 2) try hard to also get the incremental checksum updates right, especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future, though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data type in LRO.
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole) MFC After: 3 days
|
235903 |
24-May-2012 |
tuexen |
Add sn_send_failed_event to sctp_notification.
MFC after: 3 days
|
235828 |
23-May-2012 |
tuexen |
Use consistent text at the begining of the files.
MFC after: 3 days
|
235644 |
19-May-2012 |
marcel |
Remove unused inclusion of curses.h
|
235557 |
17-May-2012 |
tuexen |
Use a default for max_burst of 4 and l2var of 2. This was discussed with rrs@.
MFC after: 3 days
|
235554 |
17-May-2012 |
tuexen |
Support SCTP_EOF also for 1-to-1 style sockets.
MFC after: 3 days
|
235474 |
15-May-2012 |
bz |
Switch to a standard 2 clause BSD license (from bsd-style-copyright).
Approved by: Myricom Inc. (gallatin) Approved by: Intel Corporation (jfv)
|
235418 |
13-May-2012 |
tuexen |
Support SCTP_REMOTE_ERROR notification.
MFC after: 3 days
|
235416 |
13-May-2012 |
tuexen |
Provide in the SCTP_SEND_FAILED and SCTP_SEND_FAILED_EVENT notifications the correct ssf_error or ssfe_error as required by RFC 6458.
MFC after: 3 days
|
235414 |
13-May-2012 |
tuexen |
Provide the error code in SCTP_PEER_ADDR_CHANGE notifications as specified in RFC 6458.
MFC after: 3 days
|
235412 |
13-May-2012 |
tuexen |
Remove unused constants.
MFC after: 3 days
|
235403 |
13-May-2012 |
tuexen |
Use ECONNABORTED in cases where the ABORT was sent to the peer.
MFC after: 3 days
|
235402 |
13-May-2012 |
tuexen |
Ensure the user can read COMM_LOST notifications on 1-to-1 style sockets.
MFC after: 3 days
|
235360 |
12-May-2012 |
tuexen |
Provide in the association change notification the received ABORT chunk if case of SCTP_COMM_LOST or SCTP_CANT_STR_ASSOC as required by RFC 6458.
MFC after: 3 days
|
235286 |
11-May-2012 |
gjb |
General mdoc(7) and typo fixes.
PR: 167734 Submitted by: Nobuyuki Koganemaru (kogane!jp.freebsd.org) MFC after: 3 days
|
235283 |
11-May-2012 |
tuexen |
Fix a bug in the handling of association reset request.
MFC after: 3 days
|
235282 |
11-May-2012 |
tuexen |
Only provide the supported features in the SCTP_ASSOC_CHANGE notif if the state is SCTP_COMM_UP or SCTP_RESTART. While there, do some cleanups.
MFC after: 3 days
|
235280 |
11-May-2012 |
tuexen |
Remove a constant which is only used on non-FreeBSD platform. (The actual code for the socket option handling has been #ifdefed out forever...)
MFC after: 3 days.
|
235091 |
06-May-2012 |
tuexen |
Address clang warnings.
MFC after: 3 days
|
235081 |
06-May-2012 |
tuexen |
Add support for the sac_info field in struct sctp_assoc_change as required by RFC 6458.
MFC after: 3 days
|
235077 |
06-May-2012 |
tuexen |
Remove debug code.
MFC after: 3 days
|
235075 |
06-May-2012 |
tuexen |
Add support for SCTP_SEND_FAILED_EVENT as required by RFC 6458.
MFC after: 3 days
|
235066 |
05-May-2012 |
tuexen |
Provide the flags in the SCTP stream reconfig related notification as specified in RFC 6525.
MFC after: 3 days
|
235064 |
05-May-2012 |
tuexen |
Honor SCTP_ENABLE_STREAM_RESET socket option when processing incoming requests. Fix also the provided result in the response and use names as specified in RFC 6525.
MFC after: 3 days
|
235057 |
05-May-2012 |
tuexen |
Do error checking for the SCTP_RESET_STREAMS, SCTP_RESET_ASSOC, and SCTP_ADD_STREAMS socket options as specified by RFC 6525.
MFC after: 3 days
|
235036 |
04-May-2012 |
delphij |
Add ToS definitions for DiffServ Codepoints as per RFC2474.
Obtained from: OpenBSD MFC after: 2 weeks
|
235021 |
04-May-2012 |
tuexen |
Add support for the SCTP_ENABLE_STREAM_RESET socket option to getsockopt(). This improves the support of RFC 6525.
MFC after: 3 days
|
235009 |
04-May-2012 |
tuexen |
Add support for SCTP_STREAM_CHANGE_EVENT, SCTP_ASSOC_RESET_EVENT as required by RFC 6525. This also fixes SCTP_STREAM_RESET_EVENT.
MFC after: 3 days
|
234996 |
04-May-2012 |
tuexen |
Call panic() only under INVARIANTS.
MFC after: 3 days
|
234995 |
04-May-2012 |
tuexen |
Use SCTP_PRINTF() instead of printf() in all SCTP sources.
MFC after: 3 days
|
234951 |
03-May-2012 |
tuexen |
Fix another RFC 6458 issue. Spotted by Irene Ruengeler.
MFC after: 3 days
|
234946 |
03-May-2012 |
melifaro |
Revert r234834 per luigi@ request.
Cleaner solution (e.g. adding another header) should be done here.
Original log: Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code.
Requested by: luigi Approved by: kib(mentor)
|
234834 |
30-Apr-2012 |
melifaro |
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h. Remove ipfw/ip_fw_private.h header from non-ipfw code.
Approved by: ae(mentor) MFC after: 2 weeks
|
234832 |
30-Apr-2012 |
tuexen |
Add support for missing gauth_number_of_chunks field. This Bug was found by Irene Ruengeler.
MFC after: 1 week
|
234762 |
28-Apr-2012 |
tuexen |
Whitespace changes.
MFC after: 3 days
|
234731 |
27-Apr-2012 |
tuexen |
Remove unused structure. Reported by Irene Ruengeler.
MFC after: 3 days
|
234699 |
26-Apr-2012 |
tuexen |
Fix a type in an SCTP AUTH related notification. Keep the old name for backwards compatibility. Spotted by Irene Ruengeler.
MFC after: 3 days
|
234614 |
23-Apr-2012 |
tuexen |
Use the flags defined in RFC 6525 in the stream reset event.
|
234539 |
21-Apr-2012 |
tuexen |
Fix check used by stream reset related events.
MFC after: 3 days
|
234464 |
19-Apr-2012 |
tuexen |
Whitespace changes.
MFC after: 3 days
|
234461 |
19-Apr-2012 |
tuexen |
Use the same pattern for mbuf logging everywhere.
MFC after: 3 days
|
234460 |
19-Apr-2012 |
tuexen |
Fix reported errno.
MFC after: 3 days
|
234459 |
19-Apr-2012 |
tuexen |
Fix a bug where we copy out more data from a mbuf chain that are actually in it. This happens when SCTP receives an unknown chunk, which requires the sending of an ERROR chunk, and there is no final padding but the chunk is not 4-byte aligned. Reported by yueting via rwatson@
MFC after: 3 days
|
234342 |
16-Apr-2012 |
glebius |
When we receive an ICMP unreach need fragmentation datagram, we take proposed MTU value from it and update the TCP host cache. Then tcp_mss_update() is called on the corresponding tcpcb. It finds the just allocated entry in the TCP host cache and updates MSS on the tcpcb. And then we do a fast retransmit of what we have in the tcp send buffer.
This sequence gets broken if the TCP host cache is exausted. In this case allocation fails, and later called tcp_mss_update() finds nothing in cache. The fast retransmit is done with not reduced MSS and is immidiately replied by remote host with new ICMP datagrams and the cycle repeats. This ping-pong can go up to wirespeed.
To fix this: - tcp_mss_update() gets new parameter - mtuoffer, that is like offer, but needs to have min_protoh subtracted. - tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify(). - tcp_mtudisc() now accepts not a useless error argument, but proposed MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az Reported by: Andrey Zonov <andrey zonov.org> Reviewed by: andre (previous version of patch)
|
234297 |
14-Apr-2012 |
tuexen |
Send always HBs when in PF state.
MFC after: 1 week X-MFC with: r234296
|
234296 |
14-Apr-2012 |
tuexen |
Bugfix: Don't send HBs on path which are not idle.
MFC after: 1 week
|
234130 |
11-Apr-2012 |
glebius |
It is a logical error that in carp_multicast_cleanup() we look at count of addresses on a particular vhid, we should account number of addresses on cif.
To achieve this we need to run carp_attach() and carp_detach() under appropriate cif lock.
|
234087 |
10-Apr-2012 |
glebius |
M_DONTWAIT is a flag from historical mbuf(9) allocator, not malloc(9) or uma(9) flag.
|
234084 |
10-Apr-2012 |
glebius |
CARP should be capable to run on if_bridge(4). Unfortunately, this commit is not enough to enable CARP operation on if_bridge(4), because the latter doesn't handle or even initialize its ifp->if_link_state.
Reported by: Alexander Lunev <sol289 gmail.com>
|
233940 |
06-Apr-2012 |
tuexen |
Remove duplicate condition in if statement.
Obtained from: brucec@ MFC after: 3 days
|
233745 |
31-Mar-2012 |
glebius |
Don't check malloc(M_WAITOK) results.
|
233660 |
29-Mar-2012 |
rrs |
Make stream our stream reset implementation compliant to RFC6525.
MFC after: 1 month
|
233601 |
28-Mar-2012 |
zec |
Permit tcpdrop in VNET jails.
Submitted by: Miljenko Mikuc MFC after: 3 days
|
233597 |
28-Mar-2012 |
tuexen |
Honor the net.inet.udp.checksum sysctl when using SCTP/UDP/IPv4 encapsulation. MFCing requires MFCing http://svn.freebsd.org/changeset/base/233554 MFC after: 2 weeks
|
233554 |
27-Mar-2012 |
bz |
Export the udp_cksum sysctl for upcoming SCTP work. Rather than always, SCTP will only do IPv4 UDP checksum calculation as defined by the host policy. When tunneling SCTP always calculates the inner checksum already so not doing the outer UDP can save cycles.
While here virtualize the variable.
Requested by: tuexen MFC after: 2 weeks
|
233478 |
25-Mar-2012 |
melifaro |
- Permit number of ipfw tables to be changed in runtime.
net.inet.ip.fw.tables_max is now read-write.
- Bump IPFW_TABLES_MAX to 65535 Default number of tables is still 128
- Remove IPFW_TABLES_MAX from ipfw(8) code.
Sponsored by Yandex LLC
Approved by: kib(mentor)
MFC after: 2 weeks
|
233311 |
22-Mar-2012 |
tuexen |
Small cleanup of the code. No functional change (in FreeBSD kernel).
MFC after: 1 week.
|
233096 |
17-Mar-2012 |
rmh |
Hide a few declarations from userland (including `struct inpcbgroup'). This removes the dependency on <machine/param.h> which was introduced with SVN rev 222748 (due to CACHE_LINE_SIZE).
Reviewed by: bde MFC after: 10 days
|
233005 |
15-Mar-2012 |
tuexen |
Clean up, no functional change.
MFC after: 3 days.
|
233004 |
15-Mar-2012 |
tuexen |
Fix bugs which can result in a panic when an non-SCTP socket it used with an sctp_ system-call which expects an SCTP socket.
MFC after: 3 days.
|
232868 |
12-Mar-2012 |
melifaro |
Fix VNET build broken by r232865. Temporary remove the ability to assign different number of tables per VNET instance.
|
232866 |
12-Mar-2012 |
rrs |
This fixes PR 165210. Basically we just add in the netgraph interface to the list of acceptable interfaces. A todo at the next IETF code blitz, though is we need to review why we screen interfaces, there was a reason ;-).
PR: 165210 MFC after: 1 week
|
232865 |
12-Mar-2012 |
melifaro |
- Add ipfw eXtended tables permitting radix to be used for any kind of keys. - Add support for IPv6 and interface extended tables - Make number of tables to be loader tunable in range 0..65534. - Use IP_FW3 opcode for all new extended table cmds
No ABI changes are introduced. Old userland will see valid tables for IPv4 tables and no entries otherwise. Flush works for any table.
IP_FW3 socket option is used to encapsulate all new opcodes: /* IP_FW3 header/opcodes */ typedef struct _ip_fw3_opheader { uint16_t opcode; /* Operation opcode */ uint16_t reserved[3]; /* Align to 64-bit boundary */ } ip_fw3_opheader;
New opcodes added: IP_FW_TABLE_XADD, IP_FW_TABLE_XDEL, IP_FW_TABLE_XGETSIZE, IP_FW_TABLE_XLIST
ipfw(8) table argument parsing behavior is changed: 'ipfw table 999 add host' now assumes 'host' to be interface name instead of hostname.
New tunable: net.inet.ip.fw.tables_max controls number of table supported by ipfw in given VNET instance. 128 is still the default value.
New syntax: ipfw add skipto tablearg ip from any to any via table(42) in ipfw add skipto tablearg ip from any to any via table(4242) out
This is a bit hackish, special interface name '\1' is used to signal interface table number is passed in p.glob field.
Sponsored by Yandex LLC
Reviewed by: ae Approved by: ae (mentor)
MFC after: 4 weeks
|
232726 |
09-Mar-2012 |
tuexen |
Fix a warning reported by bz@
MFC after: 3 days.
|
232724 |
09-Mar-2012 |
tuexen |
Add support for stf interfaces.
MFC after: 3days.
|
232723 |
09-Mar-2012 |
tuexen |
Fix a bug reported by Peter Holm which results in a crash: Verify in sctp_peeloff() that the socket is a one-to-many style SCTP socket.
MFC after: 3 days.
|
232517 |
04-Mar-2012 |
zec |
Change SYSINIT priorities so that ip_mroute_modevent() is executed before vnet_mroute_init(), since vnet_mroute_init() depends on mfchashsize tunable to be set, and that is done in in ip_mroute_modevent(). Apparently I broke that ordering with r208744 almost 2 years ago...
PR: kern/162201 Submitted by: Stevan Markovic (mcafee.com) MFC after: 3 days
|
232513 |
04-Mar-2012 |
bz |
Correct typo in the RFC number for the constants based on IANA assignments for IPv6 Neighbor Discovery Option types for "IPv6 Router Advertisement Options for DNS Configuration". It is RFC 6106.
MFC after: 3 days
|
232273 |
28-Feb-2012 |
oleg |
- Refresh dynamic tcp rule only if both sides answered keepalive packets. - Remove some useless assignments.
MFC after: 1 month
|
232272 |
28-Feb-2012 |
oleg |
lookup_dyn_rule_locked(): style(9) cleanup
MFC after: 1 month
|
232054 |
23-Feb-2012 |
kmacy |
When using flowtable llentrys can outlive the interface with which they're associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid.
Move the free pointer in to the llentry itself and update the initalization sites.
MFC after: 2 weeks
|
231991 |
22-Feb-2012 |
ae |
Don't use `m' after m_megapullup.
PR: kern/165373 MFC after: 3 days
|
231895 |
18-Feb-2012 |
tuexen |
Remove two clang warnings.
MFC after: 1 month.
|
231852 |
17-Feb-2012 |
bz |
Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity.
This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat.
Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
|
231767 |
15-Feb-2012 |
bz |
Fix PAWS (Protect Against Wrapped Sequence numbers) in cases when hz >> 1000 and thus getting outside the timestamp clock frequenceny of 1ms < x < 1s per tick as mandated by RFC1323, leading to connection resets on idle connections.
Always use a granularity of 1ms using getmicrouptime() making all but relevant callouts independent of hz.
Use getmicrouptime(), not getmicrotime() as the latter may make a jump possibly breaking TCP nfsroot mounts having our timestamps move forward for more than 24.8 days in a second without having been idle for that long.
PR: kern/61404 Reviewed by: jhb, mav, rrs Discussed with: silby, lstewart Sponsored by: Sandvine Incorporated (originally in 2011) MFC after: 6 weeks
|
231672 |
14-Feb-2012 |
tuexen |
Fix a bug where the wrong protocol overhead was used. This can lead to a deadlock of an association when an IPv6 socket was used to communcate with IPv4 and an ICMPv4 fragmentation needed message was received. While there, simplify the code a bit.
MFC after: 3 days.
|
231201 |
08-Feb-2012 |
glebius |
Set vnet context in callouts and taskqueues.
PR: 164696
|
231076 |
06-Feb-2012 |
glebius |
Make the 'tcpwin' option of ipfw(8) accept ranges and lists.
Submitted by: sem
|
231074 |
06-Feb-2012 |
tuexen |
Fix a typo which was already fixed by eadler in r227489. We missed to integrate this fix in our code base, so it was removed in r227755.
MFC after: 3 days.
|
231025 |
05-Feb-2012 |
glebius |
Add new socket options: TCP_KEEPINIT, TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT, that allow to control initial timeout, idle time, idle re-send interval and idle send count on a per-socket basis.
Reviewed by: andre, bz, lstewart
|
230863 |
01-Feb-2012 |
glebius |
o Provide functions carp_ifa_addroute()/carp_ifa_delroute() to cleanup routes from a single ifa. o Implement carp_addroute()/carp_delroute() via above functions. o Call carp_ifa_delroute() in the carp_detach() to avoid junk routes left in routing table, in case if user removes an address in a MASTER state. [1]
Reported by: az [1]
|
230614 |
27-Jan-2012 |
luigi |
a variable was erroneously declared as 32 bit instead of 64.
MFC after: 3 days
|
230508 |
24-Jan-2012 |
glebius |
Remove unused variable.
|
230452 |
22-Jan-2012 |
bz |
Make #error messages string-literals and remove punctuation.
Reported by: bde (for ip_divert) Reviewed by: bde MFC after: 3 days
|
230443 |
22-Jan-2012 |
bz |
Fix ip_divert handling of inet and inet6 and module building some more.
Properly sort the "carp" case in modules/Makefile after it was renamed.
Reported by: bde (most) Reviewed by: bde MFC after: 3 days
|
230442 |
22-Jan-2012 |
bz |
Clean up some #endif comments removing from short sections. Add #endif comments to longer, also refining strange ones.
Properly use #ifdef rather than #if defined() where possible. Four #if defined(PCBGROUP) occurances (netinet and netinet6) were ignored to avoid conflicts with eventually upcoming changes for RSS.
Reported by: bde (most) Reviewed by: bde MFC after: 3 days
|
230387 |
20-Jan-2012 |
bz |
Remove a superfluous INET6 check (no opt_inet6.h included anyway).
MFC after: 3 days
|
230379 |
20-Jan-2012 |
tuexen |
Fix a problem when using the CBAPI. While there, remove an old comment which does not apply anymore.
|
230207 |
16-Jan-2012 |
glebius |
Drop support for SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFBRDADDR, SIOCSIFDSTADDR ioctl commands.
PR: 163524 Reviewed by: net
|
230136 |
15-Jan-2012 |
tuexen |
Two cleanups. No functional change.
|
230104 |
14-Jan-2012 |
tuexen |
Fix two bugs, which result in a panic when calling getsockopt() using SCTP_RECVINFO or SCTP_NXTINFO. Reported by Clement Lecigne and forwarded to us by zi@.
MFC after: 3 days.
|
229850 |
09-Jan-2012 |
glebius |
Bunch of fixes to pfsync(4) module load/unload:
o Make the pfsync.ko actually usable. Before this change loading it didn't register protosw, so was a nop. However, a module /boot/kernel did confused users. o Rewrite the way we are joining multicast group: - Move multicast initialization/destruction to separate functions. - Don't allocate memory if we aren't going to join a multicast group. - Use modern API for joining/leaving multicast group. - Now the utterly wrong pfsync_ifdetach() isn't needed. o Move module initialization from SYSINIT(9) to moduledata_t method. o Refuse to unload module, unless asked forcibly. o Improve a bit some FreeBSD porting code: - Use separate malloc type. - Simplify swi sheduling.
This change is probably wrong from VIMAGE viewpoint, however pfsync wasn't VIMAGE-correct before this change, too.
Glanced at by: bz
|
229816 |
08-Jan-2012 |
glebius |
Make it possible to use alternative source hardware address in the ARP datagram generated by arprequest(). If caller doesn't supply the address, then it is either picked from CARP or hardware address of the interface is taken.
While here, make several minor fixes:
- Hold IF_ADDR_RLOCK(ifp) while traversing address list. - Remove not true comment. - Access internet address and mask via in_ifaddr fields, rather than ifaddr.
|
229815 |
08-Jan-2012 |
glebius |
Provide IA_MASKSIN() macro similar to IA_SIN() and IA_DSTSIN().
|
229810 |
08-Jan-2012 |
glebius |
Move arprequest() declaration to if_ether.h.
|
229805 |
08-Jan-2012 |
tuexen |
Add an SCTP sysctl "blackhole", similar to the one for TCP. If set to 1, no ABORT is sent back in response to an incoming INIT. If set to 2, no ABORT is sent back in response to an out of the blue packet. If set to 0 (the default), ABORTs are sent. Discussed with rrs@.
MFC after: 1 month.
|
229775 |
07-Jan-2012 |
tuexen |
Retire the SCTP sysctl "strict_init". We always perform the validation and there is no reason to make is configuarable. Discussed with rrs@.
|
229774 |
07-Jan-2012 |
tuexen |
Improve the handling of received INITs. Send an ABORT when not accepting the connection. Also fix a crash, which could happen when the user closed the socket.
MFC after: 1 month.
|
229749 |
07-Jan-2012 |
eadler |
- Fix sysctl description
PR: 163623 Submitted by: Eugene Grosbein <eugen@eg.sd.rdtc.ru> Approved by: bz
|
229729 |
06-Jan-2012 |
tuexen |
Use NULL instead of 0.
MFC after: 1 month.
|
229714 |
06-Jan-2012 |
np |
Always release the inp lock before returning from tcp_detach.
MFC after: 5 days
|
229700 |
06-Jan-2012 |
jhb |
Tweak the last fix to match what was actually tested.
Pointy hat to: jhb
|
229672 |
06-Jan-2012 |
pluknet |
Fix a typo.
X-MFC-with: 229665
|
229665 |
05-Jan-2012 |
jhb |
Remove the assertion from tcp_input() that rcv_nxt is always greater than or equal to rcv_adv and fix tcp_twstart() to handle this case by assuming the last window was zero rather than a negative value.
The code in tcp_input() already safely handled this case. It can happen due to delayed ACKs along with a remote sender that sends data beyond the window we previously advertised. If we have room in our socket buffer for the extra data beyond the advertised window, we will accept it. However, if the ACK for that segment is delayed, then we will not effectively fixup rcv_adv to account for that extra data until the next segment arrives and forces out an ACK. When that next segment arrives, rcv_nxt will be beyond rcv_adv.
Tested by: pjd MFC after: 1 week
|
229621 |
05-Jan-2012 |
jhb |
Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock.
Reviewed by: bz MFC after: 2 weeks
|
229478 |
04-Jan-2012 |
jhb |
Use a helper variable to wrap a long line.
|
229477 |
04-Jan-2012 |
jhb |
In the handling of the SIOC[DG]LIFADDR icotls in in_lifaddr_ioctl(), add missing interface address list locking and grab a reference on the matching interface address after dropping the lock while it is used to avoid a potential use after free.
Reviewed by: bz MFC after: 1 week
|
229476 |
04-Jan-2012 |
jhb |
Fix the SIOC[DG]LIFADDR ioctls in in_lifaddr_ioctl() to work with IPv4 interface address rather than IPv6.
Submitted by: hrs Reviewed by: bz MFC after: 1 week
|
229420 |
03-Jan-2012 |
jhb |
When cancelling multicast timers on an interface, don't release the reference on a group in the leaving state while iterating over the loop. Instead, use the same approach used in igmp_ifdetach() and mld_ifdetach() of placing the groups to free on pending release list and then releasing the references after dropping the IF_ADDR_LOCK. This closes an ugly race where the code was dropping the lock in the middle of iterating over the list. It also fixes some additional potential use-after-free bugs since the cancellation routine also applied other changes to the group after dropping the reference. Now those changes are performed before the reference is dropped and the group is potentially freed.
Prodded to fix by: glebius Reviewed by: bz MFC after: 1 week
|
229390 |
03-Jan-2012 |
jhb |
Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that do not modify the queues they iterate over.
Submitted by: glebius
|
229265 |
02-Jan-2012 |
bz |
As I came by and noticed add a comment that inp locking is a bit optistic (read: non-existent) here and should be fixed.
|
228969 |
29-Dec-2011 |
jhb |
Defer the work of freeing IPv4 multicast options from a socket to an asychronous task. This avoids tearing down multicast state including sending IGMP leave messages and reprogramming MAC filters while holding the per-protocol global pcbinfo lock that is used in the receive path of packet processing.
Reviewed by: rwatson MFC after: 1 month
|
228966 |
29-Dec-2011 |
jhb |
Use queue(3) macros instead of home-rolled versions in several places in the INET6 code. This includes retiring the 'ndpr_next' and 'pfr_next' macros.
Submitted by: pluknet (earlier version) Reviewed by: pluknet
|
228959 |
29-Dec-2011 |
glebius |
Don't fallback to a CARP address in BACKUP state.
|
228907 |
27-Dec-2011 |
tuexen |
Address issues found by clang. While there, fix also some style issues.
MFC after: 3 months.
|
228812 |
22-Dec-2011 |
glebius |
Use a better log message for master down event.
|
228768 |
21-Dec-2011 |
glebius |
Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571.
Requested by: brooks
|
228736 |
20-Dec-2011 |
glebius |
Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.
|
228653 |
17-Dec-2011 |
tuexen |
Fix unused parameter warnings. While there, fix some whitespace issues.
MFC after: 3 months.
|
228574 |
16-Dec-2011 |
glebius |
Since size of struct in_aliasreq has just been changed in r228571, and thus ifconfig(8) needs recompile, it is a good chance to make parameter checks on SIOCAIFADDR arguments more strict.
|
228571 |
16-Dec-2011 |
glebius |
A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
|
228454 |
13-Dec-2011 |
glebius |
Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We should compare not only addresses, but their masks, too, when searching for matching prefix.
|
228391 |
10-Dec-2011 |
tuexen |
Fix a bug reported by Irene Ruengeler which resulted in not sending out HEARTBEATs when requested by the user. The HEARTBEATs were only queued, but not actually sent out.
MFC after: 2 months.
|
228313 |
06-Dec-2011 |
glebius |
Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0, don't overwrite the mask with autoguessing based on classes.
|
228102 |
28-Nov-2011 |
tuexen |
Remove debug code.
MFC after: 1 month.
|
228062 |
28-Nov-2011 |
glebius |
Fix one more fallout from r227791: do not overwrite trimmed sa_len on the ia_sockmask when doing SIOCSIFNETMASK.
Reported by: Stefan Bethke <stb lassitu.de>, gonzo Pointy hat to: glebius
|
228031 |
27-Nov-2011 |
tuexen |
Fix a warning reported by arundel@. Fix a bug where the parameter length of a supported address types parameter is set to a wrong value if the kernel is built with with either INET or INET6, but not both.
MFC after: 3 days.
|
228016 |
27-Nov-2011 |
lstewart |
Plug a TCP reassembly UMA zone leak introduced in r226113 by only using the backup stack queue entry when the zone is exhausted, otherwise we leak a zone allocation each time we plug a hole in the reassembly queue.
Reported by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Tested by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Reviewed by: bz (very brief sanity check) MFC after: 3 days
|
227959 |
24-Nov-2011 |
glebius |
Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied.
|
227958 |
24-Nov-2011 |
glebius |
Fix stupid typo in r227830.
PR: 162806 Pointy hat to: glebius
|
227931 |
24-Nov-2011 |
tuexen |
Move up the address to the top of the sctp_udencaps structure like in all other structures. This avoids alignment problems.
MFC after: 3 months.
|
227930 |
24-Nov-2011 |
tuexen |
Move up the address to the top of the sctp_paddrthlds structure like in all other structures. This avoids alignment problems.
MFC after: 3 days.
|
227831 |
22-Nov-2011 |
glebius |
style(9) nit
|
227830 |
22-Nov-2011 |
glebius |
Fix SIOCDIFADDR semantics: if no address is specified, then delete first one.
|
227801 |
21-Nov-2011 |
glebius |
This check isn't needed now, sanity checking done in the beginning. Missed it in last commit.
|
227791 |
21-Nov-2011 |
glebius |
Historically in_control() did not check sockaddrs supplied with structs ifreq/in_aliasreq and there've been several panics due to that problem. All these panics were fixed just a couple of lines above the panicing code.
Take a more general approach: sanity check sockaddrs supplied with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the function and drop all checks below.
One check is now disabled due to strange code in ifconfig(8) that I've removed recently. I'm going to enable it with next __FreeBSD_version bump.
Historically in_ifinit() was able to recover from an error and restore old address. Nowadays this feature isn't working for all error cases, but for some of them. I suppose no software relies on this behavior, so I'd like to remove it, since this simplifies code a lot.
Also, move if_scrub() earlier in the in_ifinit(). It is more correct to wipe routes before removing address from local address list, and interface address list.
Silence from: bz, brooks, andre, rwatson, 3 weeks
|
227790 |
21-Nov-2011 |
glebius |
Be more informative for "unknown hardware address format" message.
Submitted by: Andrzej Tobola <ato iem.pw.edu.pl>
|
227785 |
21-Nov-2011 |
glebius |
- Reduce severity for all ARP events, that can be triggered from remote machine to LOG_NOTICE. Exception left to "using my IP address". - Fix multicast ARP warning: add newline and also log the bad MAC address.
Tested by: Alexander Wittig <wittigal msu.edu>
|
227755 |
20-Nov-2011 |
tuexen |
Add support for the SCTP_REMOTE_UDP_ENCAPS_PORT socket option. Retire the the now unused sctp_udp_tunneling_for_client_enable sysctl variable.
MFC after: 3 months.
|
227655 |
18-Nov-2011 |
tuexen |
Cleanup comparison of interface names.
MFC after: 1 month.
|
227540 |
15-Nov-2011 |
tuexen |
Set the MTU of an path to an approriate value if the interface MTU can't be determined.
MFC after: 3 days.
|
227489 |
13-Nov-2011 |
eadler |
- fix duplicate "a a" in some comments
Submitted by: eadler Approved by: simon MFC after: 3 days
|
227486 |
13-Nov-2011 |
tuexen |
Don't copy uninitialized memory. Also simplify the comparison of interface names.
MFC after: 3 days.
|
227459 |
11-Nov-2011 |
brooks |
In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free().
MFC after: 1 Month
|
227458 |
11-Nov-2011 |
eadler |
- add a missing "be" and "in" - fix other errors introduced when committing r226436 - add 'function' to a sentence where it makes sense
Submitted by: delphij Submitted by: dougb Submitted by: jhb Approved by: dougb Approved by: jhb
|
227320 |
07-Nov-2011 |
tuexen |
When loading addresses from INITs, always use the correct local address.
MFC after: 3 days.
|
227309 |
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
227266 |
06-Nov-2011 |
tuexen |
Initialize all components of the sent COOKIE.
MFC after: 3 days.
|
227207 |
06-Nov-2011 |
trociny |
Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid inp_socket->so_options dereference when we may not acquire the lock on the inpcb.
This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked.
Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>
|
227204 |
06-Nov-2011 |
trociny |
Fix the typo made in r157474.
MFC after: 3 days
|
227085 |
04-Nov-2011 |
bz |
Always use the opt_*.h options for ipfw.ko, not just when compiled into the kernel. Do not try to build the module in case of no INET support but keep #error calls for now in case we would compile it into the kernel.
This should fix an issue where the module would fail to enable IPv6 support from the rc framework, but also other INET and INET6 parts being silently compiled out without giving a warning in the module case.
While here garbage collect unneeded opt_*.h includes. opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET entry in options for conditional inclusion in kernel so keep the file with the same name.
Reported by: pluknet Reviewed by: plunket, jhb MFC After: 3 days
|
227034 |
02-Nov-2011 |
pluknet |
Restore sysctl names for tcp_sendspace/tcp_recvspace.
They seem to be changed unintentionally in r226437, and there were no any mentions of renaming in commit log message.
Reported by: Anton Yuzhaninov <citrin citrin ru>
|
226869 |
27-Oct-2011 |
tuexen |
When add a new remote address using sctp_add_remote_addr(), return the correct net if requested.
MFC after: 3 days.
|
226868 |
27-Oct-2011 |
tuexen |
Send out control chunks which have no specific destination.
MFC after: 3 days.
|
226713 |
25-Oct-2011 |
qingli |
Exclude host routes when checking for prefix coverage on multiple interfaces. A host route has a NULL mask so check for that condition. I have also been told by developers who customize the packet output path with direct manipulation of the route entry (or the outgoing interface to be specific). This patch checks for the route mask explicitly to make sure custom code will not panic.
PR: kern/161805 MFC after: 3 days
|
226610 |
21-Oct-2011 |
ed |
Add missing #includes.
According to POSIX, these two header files should be able to be included by themselves, not depending on other headers. The <net/if.h> header uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses integer datatypes (u_int32_t, u_short, etc).
MFC after: 2 months
|
226454 |
17-Oct-2011 |
bz |
Add syntactic sugar missed in r226437 and then not added either when moving things around in r226448 but desperately needed to always make things compile successfully.
MFC after: 1 week
|
226448 |
16-Oct-2011 |
andre |
Move the tcp_sendspace and tcp_recvspace sysctl's from the middle of tcp_usrreq.c to the top of tcp_output.c and tcp_input.c respectively next to the socket buffer autosizing controls.
MFC after: 1 week
|
226447 |
16-Oct-2011 |
andre |
Remove the ss_fltsz and ss_fltsz_local sysctl's which have long been superseded by the RFC3390 initial CWND sizing.
Also remove the remnants of TCP_METRICS_CWND which used the TCP hostcache to set the initial CWND in a non-RFC compliant way.
MFC after: 1 week
|
226437 |
16-Oct-2011 |
andre |
VNET virtualize tcp_sendspace/tcp_recvspace and change the type to INT. A long is not necessary as the TCP window is limited to 2**30. A larger initial window isn't useful.
MFC after: 1 week
|
226436 |
16-Oct-2011 |
eadler |
- change "is is" to "is" or "it is" - change "the the" to "the"
Approved by: lstewart Approved by: sahil (mentor) MFC after: 3 days
|
226433 |
16-Oct-2011 |
andre |
Update the comment and description of tcp_sendspace and tcp_recvspace to better reflect their purpose. MFC after: 1 week
|
226431 |
16-Oct-2011 |
ed |
Forward declare mbuf and inpcb.
This fixes a compiler warning at WARNS=6 when including the header files as follows:
#include <sys/types.h> #include <netinet/in.h> #include <netinet/ip_var.h> #include <netinet/udp.h> #include <netinet/udp_var.h>
|
226402 |
15-Oct-2011 |
glebius |
Add support for IPv4 /31 prefixes, as described in RFC3021.
To run a /31 network, participating hosts MUST drop support for directed broadcasts, and treat the first and last addresses on subnet as unicast. The broadcast address for the prefix should be the link local broadcast address, INADDR_BROADCAST.
|
226401 |
15-Oct-2011 |
glebius |
Remove last remnants of classful addressing:
- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr. - Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011. - fix bug when we were not forwarding to a host which matches classful net address. For example router having 192.168.x.y/16 network attached, would not forward traffic to 192.168.*.0, which are legal IPs in CIDR world. - For compatibility, leave autoguessing of mask based on class.
Reviewed by: andre, bz, rwatson
|
226367 |
14-Oct-2011 |
glebius |
Never switch directly from INIT to MASTER, since this produces nasty status flaps.
PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115
|
226339 |
13-Oct-2011 |
glebius |
De-spl(9).
|
226318 |
12-Oct-2011 |
np |
Make sure the inp wasn't dropped when rexmt let go of the inp and pcbinfo locks.
Reviewed by: andre@ MFC after: 7 days
|
226252 |
11-Oct-2011 |
tuexen |
Use the most significant 6 bits of the dscp instead of the least significant ones. This has changed in the latest version of the socket API ID and provides backwards compatibility and gets it in syn with the usage of the IP_TOS socket option.
MFC after: 3 days.
|
226224 |
10-Oct-2011 |
qingli |
All indirect routes will fail the rtcheck, except for a special host route where the destination IP and the gateway IP is the same. This special case handling is only meant for backward compatibility reason. The last commit introduced a bug in the route check logic, where a valid special case is treated as an error. This patch fixes that bug along with some code cleanup.
Suggested by: gleb Reviewed by: kmacy, discussed with gleb MFC after: 1 day
|
226222 |
10-Oct-2011 |
tuexen |
Get struct sctp_net_route in tune with struct route. struct route was changed in http://svn.freebsd.org/changeset/base/225698 and since then SCTP support was broken. This needs to be MFCed to stable/9 to unbreak SCTP support in 9.0 MFC after: 3 days.
|
226203 |
10-Oct-2011 |
tuexen |
When moving an stcb to a new inp and we copy over the list of bound addresses, update the last used address pointer. If not, it might result in a crash if the old inp goes away.
MFC after: 3 days.
|
226168 |
09-Oct-2011 |
tuexen |
Update the inp stored in a HB-timer when moving an stcb to a new inp. Use only this stored inp when processing a HB timeout. This fixes a bug which results in a crash.
MFC after: 3 days.
|
226120 |
07-Oct-2011 |
qingli |
Do not try removing an ARP entry associated with a given interface address if that interface does not support ARP. Otherwise the system will generate error messages unnecessarily due to the missing entry.
PR: kern/159602 Submitted by: pluknet MFC after: 3 days
|
226114 |
07-Oct-2011 |
qingli |
Remove the reference held on the loopback route when the interface address is being deleted. Only the last reference holder deletes the loopback route. All other delete operations just clear the IFA_RTSELF flag.
PR: kern/159601 Submitted by: pluknet Reviewed by: discussed on net@ MFC after: 3 days
|
226113 |
07-Oct-2011 |
andre |
Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries.
When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress.
Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket.
Add logging under net.inet.tcp.log_debug for reassembly queue issues.
Reviewed by: lsteward (previous version) Tested by: Steven Hartland <killing-at-multiplay.co.uk> MFC after: 3 days
|
226105 |
07-Oct-2011 |
andre |
Add back the IP header length to the total packet length field on raw IP sockets. It was deducted in ip_input() in preparation for protocols interested only in the payload.
On raw sockets the IP header should be delivered as it at came in from the network except for the byte order swaps in some fields.
This brings us in line with all other OS'es that provide raw IP sockets.
Reported by: Matthew Cini Sarreo <mcins1-at-gmail.com> MFC after: 3 days
|
226060 |
06-Oct-2011 |
attilio |
For the INP_TIMEWAIT case, there is no valid tcpcb object tied to the inpcb object. Skip the TCP_SIGNATURE check in that case as it is consistent with the output path (no TCP_SIGNATURE for outcoming packets in TIMEWAIT state) and also because for TIMEWAIT state the verify may be less effective.
Sponsored by: Sandvine Incorporated Reported by: rwatson No objections by: rwatson MFC after: 3 days
|
225947 |
03-Oct-2011 |
qingli |
A system may have multiple physical interfaces, all of which are on the same prefix. Since a single route entry is installed for the prefix (without RADIX_MPATH), incoming packets on the interfaces that are not associated with the prefix route may trigger an error message about unable to allocation LLE entry, and fails L2. This patch makes sure a valid route is present in the system, and allow the aforementioned condition to exist and treats as valid.
Reviewed by: bz MFC after: 5 days
|
225946 |
03-Oct-2011 |
qingli |
This patch allows ARP to work properly in the presence of self-referencing routes. This patch is a rework of r223862.
Reviewed by: bz, zec MFC after: 5 days
|
225793 |
27-Sep-2011 |
bz |
Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the assumption that the private L2 hook (which hopefully eventually will be a pfil hook as well) can still be useful.
Allow building the module without inet as well.
Glanced at by: jhb MFC after: 3 days
|
225676 |
19-Sep-2011 |
tuexen |
Cleanup the iterator code, remove code that is never executed.
Approved by: re MFC after: 1 month.
|
225635 |
17-Sep-2011 |
tuexen |
Fix the enabling/disabling of Heartbeats and path MTU discovery when using the SCTP_PEER_ADDR_PARAMS socket option. Approved by: re MFC after: 1 month.
|
225584 |
15-Sep-2011 |
tuexen |
Fix a typo introduced in http://svn.freebsd.org/changeset/base/225571 Reported by Ilya A. Arkhipov.
Approved by: re MFC after: 1 month.
|
225571 |
15-Sep-2011 |
tuexen |
Make sure that SCTP rejects broadcast, multicast and wildcard addresses as remote addresses.
Approved by: re MFC after: 1 month.
|
225559 |
14-Sep-2011 |
tuexen |
Ensure that 1-to-1 style SCTP sockets can only be connected once. Allow implicit setup also for 1-to-1 style sockets as described in the latest version of the socket API ID.
Approved by: re MFC after: 1 month
|
225549 |
14-Sep-2011 |
tuexen |
Fix the handling of the flowlabel and DSCP value in the SCTP_PEER_ADDR_PARAMS socket option. Honor the net.inet6.ip6.auto_flowlabel sysctl setting.
Approved by: re (bz) MFC after: 1 month.
|
225518 |
12-Sep-2011 |
jhb |
Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_* options defined in the kernel config. This more closely matches the behavior of other modules which inherit configuration settings from the kernel configuration during a kernel + modules build.
Reviewed by: luigi Approved by: re (kib) MFC after: 1 week
|
225462 |
09-Sep-2011 |
tuexen |
Improve implementation of the Nagle algorithm for SCTP: Don't delay the final fragment of a fragmented user message.
Approved by: re MFC after: 4 weeks
|
225223 |
28-Aug-2011 |
qingli |
When an interface address route is removed from the system, another route with the same prefix is searched for as a replacement. The current code did not bypass routes that have non-operational interfaces. This patch fixes that bug and will find a replacement route with an active interface.
PR: kern/159603 Submitted by: pluknet, ambrisko at ambrisko dot com Reviewed by: discussed on net@ Approved by: re (bz) MFC after: 3 days
|
225169 |
25-Aug-2011 |
bz |
Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB.
For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths.
Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls.
Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)
|
225046 |
20-Aug-2011 |
bz |
Fix compilation in case of defined(INET) && defined(IPFIREWALL_FORWARD) but no INET6.
Reported by: avg Tested by: avg MFC after: 4 weeks X-MFC with: r225044 Approved by: re (kib)
|
225044 |
20-Aug-2011 |
bz |
Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of.
Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)
|
225036 |
20-Aug-2011 |
bz |
Hide IPv6 next header parsing warnings under the verbose sysctl so people can possibly disable it when their consoles are flooded, or enabled it for debugging.
MFC after: 2 weeks Approved by: re (kib)
|
225034 |
20-Aug-2011 |
bz |
After r225032 fix logging in a similar way masking the the IPv6 more fragments flag off so that offset == 0 checks work properly.
PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks X-MFC with: r225032 Approved by: re (kib)
|
225033 |
20-Aug-2011 |
bz |
If we detect an IPv6 fragment header and it is not the first fragment, then terminate the loop as we will not find any further headers and for short fragments this could otherwise lead to a pullup error discarding the fragment.
PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)
|
225032 |
20-Aug-2011 |
bz |
ipfw internally checks for offset == 0 to determine whether the packet is a/the first fragment or not. For IPv6 we have added the "more fragments" flag as well to be able to determine on whether there will be more as we do not have the fragment header avaialble for logging, while for IPv4 this information can be derived directly from the IPv4 header. This allowed fragmented packets to bypass normal rules as proper masking was not done when checking offset. Split variables to not need masking for IPv6 to avoid further errors.
PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)
|
225030 |
20-Aug-2011 |
bz |
While not explicitly allowed by RFC 2460, in case there is no translation technology involved (and that section is suggested to be removed by Errata 2843), single packet fragments do not harm.
There is another errata under discussion to clarify and allow this. Meanwhile add a sysctl to allow disabling this behaviour again. We will treat single packet fragment (a fragment header added when not needed) as if there was no fragment header.
PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version) Tested by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)
|
224918 |
16-Aug-2011 |
tuexen |
Fix the handling of [gs]etsockopt() unconnected 1-to-1 style sockets. While there: * Fix a locking issue in setsockopt() of SCTP_CMT_ON_OFF. * Fix a bug in setsockopt() of SCTP_DEFAULT_PRINFO, where the pr_value was ignored.
Approved by: re@ MFC after: 2 months.
|
224870 |
14-Aug-2011 |
tuexen |
Add support for the spp_dscp field in the SCTP_PEER_ADDR_PARAMS socket option. Backwards compatibility is provided by still supporting the spp_ipv4_tos field.
Approved by: re@ MFC after: 2 months.
|
224747 |
10-Aug-2011 |
kevlo |
If RTF_HOST flag is specified, then we are interested in destination address.
PR: kern/159600 Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)
|
224641 |
03-Aug-2011 |
tuexen |
The result of a joint work between rrs@ and myself at the IETF: * Decouple the path supervision using a separate HB timer per path. * Add support for potentially failed state. * Bring back RTO.min to 1 second. * Accept packets on IP-addresses already announced via an ASCONF * While there: do some cleanups.
Approved by: re@ MFC after: 2 months.
|
224575 |
01-Aug-2011 |
glebius |
Add missing break; in r223593.
Submitted by: sem Pointy hat to: glebius Approved by: re (kib)
|
224151 |
17-Jul-2011 |
bz |
Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix)
Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align.
Discussed with: rwatson (slightly earlier version)
|
224010 |
14-Jul-2011 |
bz |
Unbreak no-INET kernels after r223839 adding the needed #ifdef INET.
MFC after: 4 weeks
|
223965 |
12-Jul-2011 |
tuexen |
Don't check for SOCK_DGRAM anymore. Also remove multicast related code which is not necessary anymore.
|
223963 |
12-Jul-2011 |
tuexen |
The socket API only specifies SCTP for SOCK_SEQPACKET and SOCK_STREAM, but not SOCK_DGRAM. So don't register it for SOCK_DGRAM. While there, fix some indentation.
|
223862 |
08-Jul-2011 |
zec |
Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6.
While here, remove two function pointers from struct lltable which are only initialized but never used.
MFC after: 3 days
|
223840 |
07-Jul-2011 |
ae |
Add again the checking for log_arp_permanent_modify that was by accident removed in the r186119.
PR: kern/154831 MFC after: 1 week
|
223839 |
07-Jul-2011 |
andre |
Remove the TCP_SORECEIVE_STREAM compile time option. The use of soreceive_stream() for TCP still has to be enabled with the loader tuneable net.inet.tcp.soreceive_stream.
Suggested by: trociny and others
|
223799 |
05-Jul-2011 |
cperciva |
Remove #ifdef notyet code dating back to 4.3BSD Net/2 (and possibly earlier).
I think the benefit of making the code cleaner and easier to understand outweighs the humour of leaving this intact (or possibly changing it to #ifdef not_yet_and_probably_never).
MFC after: 2 weeks
|
223797 |
05-Jul-2011 |
cperciva |
Don't allow lro->len to exceed 65535, as this will result in overflow when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss".
This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2.
Reviewed by: jfv MFC after: 2 weeks
|
223773 |
04-Jul-2011 |
gjb |
- General grammar and mdoc(7) fixes. [1] [2] - While here, remove a paragraph about userspace operation that has been outdated for some time. [2]
PR: 158623 Submitted by: Ben Kudak (kaduk % mit!edu) [1] Reviewed by: glebius [2] MFC after: 1 week
|
223765 |
04-Jul-2011 |
eri |
pf(4) tags now store the state key but tcp_respond tries to reuse a mbuf as an optimization. This makes pf find the wrong state and cause errors reported with state mismatches. Clear the cached state link on the pf(4) tag to avoid the state mismatches.
Approved by: bz
|
223753 |
04-Jul-2011 |
ae |
ARP code reuses mbuf from ARP request to make a reply, but it does not reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP replies were received on specified interface. Reset rcvif to NULL for ARP replies to fix this issue.
PR: kern/131817 Reviewed by: glebius MFC after: 1 month
|
223697 |
30-Jun-2011 |
tuexen |
Add the missing sca_keylength field to the sctp_authkey structure, which is used the the SCTP_AUTH_KEY socket option.
MFC after: 1 month.
|
223666 |
29-Jun-2011 |
ae |
Add new rule actions "call" and "return" to ipfw. They make possible to organize subroutines with rules.
The "call" action saves the current rule number in the internal stack and rules processing continues from the first rule with specified number (similar to skipto action). If later a rule with "return" action is encountered, the processing returns to the first rule with number of "call" rule saved in the stack plus one or higher.
Submitted by: Vadim Goncharov Discussed by: ipfw@, luigi@
|
223637 |
28-Jun-2011 |
bz |
Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools to be in sync with the kernel.
Submitted by: mlaier Submitted by: eri
|
223613 |
27-Jun-2011 |
tuexen |
Add support for SCTP_PR_SCTP_NONE which I misded to add. This constant is defined in the socket API ID.
MFC after: 2 months.
|
223593 |
27-Jun-2011 |
glebius |
Add possibility to pass IPv6 packets to a divert(4) socket.
Submitted by: sem
|
223437 |
22-Jun-2011 |
ae |
Export AddLink() function from libalias. It can be used when custom alias address needs to be specified. Add inbound handler to the alias_ftp module. It helps handle active FTP transfer mode for the case with external clients and FTP server behind NAT. Fix passive FTP transfer case for server behind NAT using redirect with external IP address different from NAT ip address.
PR: kern/157957 Submitted by: Alexander V. Chernikov
|
223421 |
22-Jun-2011 |
ae |
Document PKT_ALIAS_SKIP_GLOBAL option.
Submitted by: Alexander V. Chernikov
|
223358 |
21-Jun-2011 |
ae |
Do not use SET_HOST_IPLEN() macro for IPv6 packets.
PR: kern/157239 MFC after: 2 weeks
|
223326 |
20-Jun-2011 |
bz |
Fix a KASSERT from r212803 to check the correct length also in case of IPsec being compiled in and used. Improve reporting by adding the length fields to the panic message, so that we would have some immediate debugging hints.
Discussed with: jhb
|
223261 |
18-Jun-2011 |
bz |
Remove a these days incorrect comment left from before new-arp.
MFC after: 1 week
|
223162 |
16-Jun-2011 |
tuexen |
Add SCTP_DEFAULT_PRINFO socket option. Fix the SCTP_DEFAULT_SNDINFO socket option: Don't clear the PR SCTP policy when setting sinfo_flags.
MFC after: 1 month.
|
223152 |
16-Jun-2011 |
tuexen |
* Fix the handling of addresses in sctp_sendv(). * Add support for SCTP_SENDV_NOINFO. * Improve the error handling of sctp_sendv() and sctp_recv().
MFC after: 1 month
|
223132 |
15-Jun-2011 |
tuexen |
Add support for the newly added SCTP API. In particular add support for: * SCTP_SNDINFO, SCTP_PRINFO, SCTP_AUTHINFO, SCTP_DSTADDRV4, and SCTP_DSTADDRV6 cmsgs. * SCTP_NXTINFO and SCTP_RCVINFO cmgs. * SCTP_EVENT, SCTP_RECVRCVINFO, SCTP_RECVNXTINFO and SCTP_DEFAULT_SNDINFO socket option. * Special association ids (SCTP_FUTURE_ASSOC, ...) * sctp_recvv() and sctp_sendv() functions.
MFC after: 1 month.
|
223080 |
14-Jun-2011 |
ae |
Implement "global" mode for ipfw nat. It is similar to natd(8) "globalport" option for multiple NAT instances.
If ipfw rule contains "global" keyword instead of nat_number, then for each outgoing packet ipfw_nat looks up translation state in all configured nat instances. If an entry is found, packet aliased according to that entry, otherwise packet is passed unchanged.
User can specify "skip_global" option in NAT configuration to exclude an instance from the lookup in global mode.
PR: kern/157867 Submitted by: Alexander V. Chernikov (previous version) Tested by: Eugene Grosbein
|
223077 |
14-Jun-2011 |
ae |
Sort alias mode flags in the increasing order.
|
223073 |
14-Jun-2011 |
ae |
Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure to the check_uidgid() function, since it contains all needed arguments and also pointer to mbuf and now it is possible use in_pcblookup_mbuf() function.
Since i can not test it for the non-FreeBSD case, i keep this ifdef unchanged.
Tested by: Alexander V. Chernikov MFC after: 3 weeks
|
223049 |
13-Jun-2011 |
jhb |
Advance the advertised window (rcv_adv) to the currently received data (rcv_nxt) if we advertising a zero window. This can be true when ACK'ing a window probe whose one byte payload was accepted rather than dropped because the socket's receive buffer was not completely full, but the remaining space was smaller than the window scale.
This ensures that window probe ACKs satisfy the assumption made in r221346 and closes a window where rcv_nxt could be greater than rcv_adv.
Tested by: trasz, pho, trociny Reviewed by: silby MFC after: 1 week
|
222845 |
08-Jun-2011 |
bz |
Correct comments and debug logging in ipsec to better match reality.
MFC after: 3 days
|
222809 |
07-Jun-2011 |
ae |
Fix indentation.
|
222806 |
07-Jun-2011 |
ae |
Make a behaviour of the libalias based in-kernel NAT a bit closer to how natd(8) does work. natd(8) drops packets only when libalias returns PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat always did drop packets that were not aliased, even if they should not be aliased and just are going through.
PR: kern/122109, kern/129093, kern/157379 Submitted by: Alexander V. Chernikov (previous version) MFC after: 1 month
|
222787 |
06-Jun-2011 |
bz |
Unbreak kernels with non-default PCBGROUP included but no WITNESS. Rather than including lock.h in in_pcbgroup.c in right order, fix it for all consumers of in_pcb.h by further header file pollution under #ifdef KERNEL.
Reported by: Pan Tsu (inyaoo gmail.com)
|
222748 |
06-Jun-2011 |
rwatson |
Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table.
Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow).
Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state.
Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture.
Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz).
Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default.
Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x.
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
222742 |
06-Jun-2011 |
ae |
Do not return EINVAL when user does `ipfw set N flush` on an empty set.
MFC after: 2 weeks
|
222732 |
06-Jun-2011 |
hrs |
- Implement RDNSS and DNSSL options (RFC 6106, IPv6 Router Advertisement Options for DNS Configuration) into rtadvd(8) and rtsold(8). DNS information received by rtsold(8) will go to resolv.conf(5) by resolvconf(8) script. This is based on work by J.R. Oldroyd (kern/156259) but revised extensively[1].
- rtadvd(8) now supports "noifprefix" to disable gathering on-link prefixes from interfaces when no "addr" is specified[2]. An entry in rtadvd.conf with "noifprefix" + no "addr" generates an RA message with no prefix information option.
- rtadvd(8) now supports RTM_IFANNOUNCE message to fix crashes when an interface is added or removed.
- Correct bogus ND_OPT_ROUTE_INFO value to one in RFC 4191.
Reviewed by: bz[1] PR: kern/156259 [1] PR: bin/152458 [2]
|
222691 |
04-Jun-2011 |
rwatson |
Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes).
Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently.
(In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.)
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
222690 |
04-Jun-2011 |
rwatson |
IP divert sockets use their inpcbinfo for port reservation, although not for lookup. I missed its call to in_pcbbind() when preparing previous patches, which would lead to a lock assertion failure (although problem not an actual race condition due to global pcbinfo locks providing required synchronisation -- in this particular case only). This change adds the missing locking of the pcbhash lock.
(Existing comments in the ipdivert code question the need for using the global hash to manage the namespace, as really it's a simple port namespace and not an address/port namespace. Also, although in_pcbbind is used to manage reservations, the hash tables aren't used for lookup. It might be a good idea to make them use hashed lookup, or to use a different reservation scheme.)
Reviewed by: bz Reported by: Kristof Provost <kristof at sigsegv.be> Sponsored by: Juniper Networks
|
222602 |
02-Jun-2011 |
rwatson |
Do not leak the pcbinfohash lock in the case where in6_pcbladdr() returns an error during TCP connect(2) on an IPv6 socket.
Submitted by: bz Sponsored by: Juniper Networks, Inc.
|
222582 |
01-Jun-2011 |
ae |
O_FORWARD_IP is only action which depends from the result of lookup of dynamic rules. We are doing forwarding in the following cases: o For the simple ipfw fwd rule, e.g.
fwd 10.0.0.1 ip from any to any out xmit em0 fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1
o For the dynamic fwd rule, e.g.
fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state
When this rule triggers it creates a dynamic rule, but this dynamic rule should forward packets only in forward direction.
o And the last case that does not work before - simple fwd rule which triggers when some dynamic rule is already executed.
PR: kern/147720, kern/150798 MFC after: 1 month
|
222560 |
01-Jun-2011 |
ae |
Hide some debug messages under debug macro.
MFC after: 1 week
|
222559 |
01-Jun-2011 |
ae |
Hide useless warning under debug macro.
PR: kern/69963 MFC after: 1 week
|
222503 |
30-May-2011 |
bz |
Unbreak NOINET kernels after r222488.
Reviewed by: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems! Pointy hat: to myself for missing this during review?
|
222488 |
30-May-2011 |
rwatson |
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required.
A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary.
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
222474 |
30-May-2011 |
ae |
Wrap long line.
MFC after: 2 weeks
|
222473 |
30-May-2011 |
ae |
Add tablearg support for ipfw setfib.
PR: kern/156410 MFC after: 2 weeks
|
222459 |
29-May-2011 |
tuexen |
Get rid of unused functions.
MFC after: 1 week.
|
222438 |
29-May-2011 |
qingli |
Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface address so that proper clean up will take place in the routing code. This patch fixes the bootp panic on startup problem. Also, added more error handling and logging code in function in_scrubprefix().
MFC after: 5 days
|
222272 |
25-May-2011 |
bz |
Add FEATURE() definitions for IPv4 and IPv6 so that we can use feature_present(3) to dynamically decide whether to use one or the other family.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 10 days
|
222251 |
24-May-2011 |
rwatson |
An inpcb lock is no longer required in in_pcbref() since the move to refcount(9).
MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.
|
222217 |
23-May-2011 |
rwatson |
Continue to refine inpcb reference counting and locking, in preparation for reworking of inpcbinfo locking:
(1) Convert inpcb reference counting from manually manipulated integers to the refcount(9) KPI. This allows the refcount to be managed atomically with an inpcb read lock rather than write lock, or even with no inpcb lock at all. As a result, in_pcbref() also no longer requires an inpcb lock, so can be performed solely using the lock used to look up an inpcb.
(2) Shift more inpcb freeing activity from the in_pcbrele() context (via in_pcbfree_internal) to the explicit in_pcbfree() context. This means that the inpcb refcount is increasingly used only to maintain memory stability, not actually defer the clean up of inpcb protocol parts. This is desirable as many of those protocol parts required the pcbinfo lock, which we'd like not to acquire in in_pcbrele() contexts. Document this in comments better.
(3) Introduce new read-locked and write-locked in_pcbrele() variations, in_pcbrele_rlocked() and in_pcbrele_wlocked(), which allow the inpcb to be properly unlocked as needed. in_pcbrele() is a wrapper around the latter, and should probably go away at some point. This makes it easier to use this weak reference model when holding only a read lock, as will happen in the future.
This may well be safe to MFC, but some more KBI analysis is required.
Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.
|
222215 |
23-May-2011 |
rwatson |
Move from passing a wildcard boolean to a general set up lookup flags into in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly for IPv6 functions. In the future, we would like to support other flags relating to locking strategy.
This change doesn't appear to modify the KBI in practice, as callers already passed in INPLOOKUP_WILDCARD rather than a simple boolean.
MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
222213 |
23-May-2011 |
rwatson |
A number of quite incremental refinements to struct inpcbinfo's definition:
(1) Add a locking guide for inpcbinfo. (2) Annotate inpcbinfo fields with synchronisation information; not all annotations are 100% satisfactory. (3) Reorder inpcbinfo fields so that the lock is at the head of the structure, and close to fields it protects. (4) Sort fields that will eventually be hashlock/pcbgroup-related together even though they remain locked by ipi_lock for now.
Reviewed by: bz Sponsored by: Juniper Networks X-MFC after: KBI analysis required
|
222143 |
20-May-2011 |
qingli |
The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid.
Reviewed by: delphij MFC after: 5 days
|
222077 |
18-May-2011 |
tuexen |
Unbreak INET-less build. Reported by bz@ MFC after: 1 week
|
222029 |
17-May-2011 |
tuexen |
Copy out the mtu when calling getsockopt() with SCTP_GET_PEER_ADDR_INFO.
MFC after: 1 week.
|
222028 |
17-May-2011 |
tuexen |
Fix whitespacing. Reported by scf@
MFC after: 1 week.
|
221904 |
14-May-2011 |
tuexen |
Fix the source address selection for boundall sockets when sending INITs to a global IPv4 address having only private IPv4 address. Allow the usage of a private address and make sure that no other private address will be used by the association. Initial work was done by rrs@.
MFC after: 1 week.
|
221891 |
14-May-2011 |
jhb |
Oops, fix order of sequence numbers in KASSERT()'s to catch negative receive windows to match the labels in the panic message.
Submitted by: trociny
|
221690 |
09-May-2011 |
mav |
Refactor TCP ISN increment logic. Instead of firing callout at 100Hz to keep constant ISN growth rate, do the same directly inside tcp_new_isn(), taking into account how much time (ticks) passed since the last call.
On my test systems this decreases idle interrupt rate from 140Hz to 70Hz.
|
221627 |
08-May-2011 |
tuexen |
Fix a locking issue showing up on Mac OS X when subscribing to authentication events. DTLS/SCTP renegotiations trigger the bug.
MFC after: 2 weeks.
|
221549 |
06-May-2011 |
tuexen |
Change the name of an internal structure, since the name is used by a structure of the (new) SCTP API.
MFC after: 1 week.
|
221521 |
06-May-2011 |
ae |
Convert delay parameter back to ms when reporting to user.
PR: 156838 MFC after: 1 week
|
221460 |
04-May-2011 |
tuexen |
Implement Resource Pooling V2 and an MPTCP like congestion control. Based on a patch received from Martin Becke.
MFC after: 2 weeks.
|
221411 |
03-May-2011 |
tuexen |
Remove code with any effect.
|
221410 |
03-May-2011 |
tuexen |
Add a missing break. This bug was introduced in r221249.
MFC after: 1 week
|
221346 |
02-May-2011 |
jhb |
Handle a rare edge case with nearly full TCP receive buffers. If a TCP buffer fills up causing the remote sender to enter into persist mode, but there is still room available in the receive buffer when a window probe arrives (either due to window scaling, or due to the local application very slowing draining data from the receive buffer), then the single byte of data in the window probe is accepted. However, this can cause rcv_nxt to be greater than rcv_adv. This condition will only last until the next ACK packet is pushed out via tcp_output(), and since the previous ACK advertised a zero window, the ACK should be pushed out while the TCP pcb is write-locked.
During the window while rcv_nxt is greather than rcv_adv, a few places would compute the remaining receive window via rcv_adv - rcv_nxt. However, this value was then (uint32_t)-1. On a 64 bit machine this could expand to a positive 2^32 - 1 when cast to a long. In particular, when calculating the receive window in tcp_output(), the result would be that the receive window was computed as 2^32 - 1 resulting in advertising a far larger window to the remote peer than actually existed.
Fix various places that compute the remaining receive window to either assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the window as full if rcv_nxt is greather than rcv_adv.
Reviewed by: bz MFC after: 1 month
|
221328 |
02-May-2011 |
tuexen |
Some more cleanups related to an kernel without INET.
MFC after: 1 week
|
221264 |
30-Apr-2011 |
bz |
Fix a mismerge from p4 in that in_localaddr() is not available without INET.
Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221251 |
30-Apr-2011 |
tuexen |
Remove some leftover debug code.
MFC after: 1 week
|
221250 |
30-Apr-2011 |
bz |
Make the TCP code compile without INET. Sort #includes and add #ifdef INETs. Add some comments at #endifs given more nestedness. To make the compiler happy, some default initializations were added in accordance with the style on the files.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221249 |
30-Apr-2011 |
tuexen |
Improve compilation of SCTP code without INET support. Some bugs where fixed while doing this: * ASCONF-ACK messages might use wrong port number when using IPv6. * Checking for additional addresses takes the correct address into account and also does not do more comparisons than necessary.
This patch is based on one received from bz@ who was sponsored by The FreeBSD Foundation and iXsystems.
MFC after: 1 week
|
221248 |
30-Apr-2011 |
bz |
Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only as well compiling out most functions adding or extending #ifdef INET coverage.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221247 |
30-Apr-2011 |
bz |
Make the PCB code compile without INET support by adding #ifdef INETs and correcting few #includes.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221209 |
29-Apr-2011 |
jhb |
TCP reuses t_rxtshift to determine the backoff timer used for both the persist state and the retransmit timer. However, the code that implements "bad retransmit recovery" only checks t_rxtshift to see if an ACK has been received in during the first retransmit timeout window. As a result, if ticks has wrapped over to a negative value and a socket is in the persist state, it can incorrectly treat an ACK from the remote peer as a "bad retransmit recovery" and restore saved values such as snd_ssthresh and snd_cwnd. However, if the socket has never had a retransmit timeout, then these saved values will be zero, so snd_ssthresh and snd_cwnd will be set to 0.
If the socket is in fast recovery (this can be caused by excessive duplicate ACKs such as those fixed by 220794), then each ACK that arrives triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd to be no larger than snd_ssthresh. In effect, the socket's send window is permamently stuck at 0 even though the remote peer is advertising a much larger window and pending data is only sent via TCP window probes (so one byte every few seconds).
Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that the various snd_*_prev fields in the pcb are valid and only perform "bad retransmit recovery" if this flag is set in the pcb. The flag is set on the first retransmit timeout that occurs and is cleared on subsequent retransmit timeouts or when entering the persist state.
Reviewed by: bz MFC after: 2 weeks
|
221134 |
27-Apr-2011 |
bz |
MfP4 CH=192029:
Expose ip_icmp.c to INET6 as well and only export badport_bandlim() along with the two sysctls in the non-INET case. The bandlim types work for all cases I reviewed in IPv6 as well and the sysctls are available as we export net.inet.* from in_proto.c.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221131 |
27-Apr-2011 |
bz |
MfP4 CH=192004:
Move ip_defttl to raw_ip.c where it is actually used. In an IPv6 only world we do not want to compile ip_input.c in for that and it is a shared default with INET6.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221130 |
27-Apr-2011 |
bz |
Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
221023 |
25-Apr-2011 |
attilio |
Add the possibility to verify MD5 hash of incoming TCP packets. As long as this is a costy function, even when compiled in (along with the option TCP_SIGNATURE), it can be disabled via the net.inet.tcp.signature_verify_input sysctl.
Sponsored by: Sandvine Incorporated Reviewed by: emaste, bz MFC after: 2 weeks
|
221021 |
25-Apr-2011 |
bz |
Be less strict on includes than in r220746. We need in.h for both INET or INET6 as it holds all the IPPROTO_* definitions needed for the SYSCTL_NODE definitions.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 5 days
|
220914 |
21-Apr-2011 |
glebius |
Use size_t for sopt_valsize.
Submitted by: Brandon Gooch <jamesbrandongooch gmail.com>
|
220880 |
20-Apr-2011 |
bz |
MFp4 CH=191760:
When compiling out INET we still need the initialization routines as well as the tuning and montoring sysctls shared with IPv6.
Move the two send/recvspace variables up from the middle of the file to ease compiling out the INET only code.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days
|
220879 |
20-Apr-2011 |
bz |
MFp4 CH=191470:
Move the ipport_tick_callout and related functions from ip_input.c to in_pcb.c. The random source port allocation code has been merged and is now local to in_pcb.c only. Use a SYSINIT to get the callout started and no longer depend on initialization from the inet code, which would not work in an IPv6 only setup.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
220878 |
20-Apr-2011 |
bz |
MFp4 CH=191466:
Move fw_one_pass to where it belongs: it is a property of ipfw, not of ip_input.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days
|
220837 |
19-Apr-2011 |
glebius |
- Rewrite functions that copyin/out NAT configuration, so that they calculate required memory size dynamically. - Fix races on chain re-lock. - Introduce new field to ip_fw_chain - generation count. Now utilized only in the NAT configuration, but can be utilized wider in ipfw. - Get rid of NAT_BUF_LEN in ip_fw.h
PR: kern/143653
|
220832 |
19-Apr-2011 |
ae |
Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit and .pipe_slot_limit oids to prevent to set incorrect values.
MFC after: 2 weeks
|
220831 |
19-Apr-2011 |
ae |
ipdn_bound_var() functions is designed to bound a variable between specified minimum and maximum. In case when specified default value is out of bounds it does not work as expected and does not limit variable. Check that default value is in range and limit it if needed. Also bump max_hash_size value to 65536 to correspond with manual page.
PR: kern/152887 MFC after: 2 weeks
|
220812 |
19-Apr-2011 |
ae |
Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks.
MFC after: 1 week
|
220800 |
18-Apr-2011 |
glebius |
LibAliasInit() should allocate memory with M_WAITOK flag. Modify it and its callers.
|
220796 |
18-Apr-2011 |
glebius |
Pullup up to TCP header length before matching against 'tcpopts'.
PR: kern/156180 Reviewed by: luigi
|
220794 |
18-Apr-2011 |
jhb |
When checking to see if a window update should be sent to the remote peer, don't force a window update if the window would not actually grow due to window scaling. Specifically, if the window scaling factor is larger than 2 * MSS, then after the local reader has drained 2 * MSS bytes from the socket, a window update can end up advertising the same window. If this happens, the supposed window update actually ends up being a duplicate ACK. This can result in an excessive number of duplicate ACKs when using a higher maximum socket buffer size.
Reviewed by: bz MFC after: 1 month
|
220746 |
17-Apr-2011 |
bz |
Make in_proto.c dependent on either inet or inet6.
While it does not provide any functionality for IPv6, it provides the sysctl nodes for net.inet.* that a lot of functionality shared between IPv4 and IPv6 depends on. We cannot change these anymore without breaking a lot of management and tuning.
In case of IPv6 only, we compile out everything but the sysctl node declarations.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC After: 5 days
|
220620 |
14-Apr-2011 |
trasz |
Refactor udp_input(), moving calls to u_tun_func() into udp_append().
Obtained from: Wheel Systems Sp. z o.o. Reviewed by: bz@
|
220619 |
14-Apr-2011 |
bz |
The mbuf_frag_size always was and is file local and not queried from base user space tools via kvm. Mark it static.
MFC after: 3 days
|
220592 |
13-Apr-2011 |
pluknet |
Staticize malloc types.
Approved by: lstewart MFC after: 1 week
|
220568 |
12-Apr-2011 |
ae |
Restore previous behaviour - always match rule when we doing tagging, even when tag is already exists.
Reported by: Vadim Goncharov MFC after: 1 week
|
220560 |
12-Apr-2011 |
lstewart |
Use the full and proper company name for Swinburne University of Technology throughout the source tree.
Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days
|
220428 |
07-Apr-2011 |
jfv |
Port of the LRO fix from mxge driver to the generic LRO code. Thanks to Andrew Gallatin for the change.
MFC after: 7 days
|
220211 |
31-Mar-2011 |
ae |
Fill up src_port and dst_port variables for SCTP over IPv4.
PR: kern/153415 MFC after: 1 week
|
220204 |
31-Mar-2011 |
ae |
Fix malloc types.
MFC after: 1 week
|
220203 |
31-Mar-2011 |
ae |
Fix a memory leak. Memory that is allocated for schedulers hash table was not freed.
PR: kern/156083 MFC after: 1 week
|
220156 |
30-Mar-2011 |
jhb |
Clamp the initial advertised receive window when responding to a SYN/ACK to the maximum allowed window. Growing the window too large would cause an underflow in the calculations in tcp_output() to decide if a window update should be sent which would prevent the persist timer from being started if data was pending and the other end of the connection advertised an initial window size of 0.
PR: kern/154006 Submitted by: Stefan `Sec` Zehl sec 42 org Reviewed by: bz MFC after: 1 week
|
220105 |
28-Mar-2011 |
weongyo |
Covers values if (BYTES_THIS_ACK(tp, th) / tp->t_maxseg) value is from 2.0 to 3.0.
Reviewed by: lstewart
|
219828 |
21-Mar-2011 |
pluknet |
Reference ifaddr object before unlocking as it can be freed from another context at the moment of later access.
PR: kern/155555 Submitted by: Andrew Boyer <aboyer att averesystems.com> Approved by: avg (mentor) MFC after: 2 weeks
|
219819 |
21-Mar-2011 |
jeff |
- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
|
219779 |
19-Mar-2011 |
bz |
Properly check for an IPv4 socket after r219579.
In some cases as udp6_connect() without an earlier bind(2) to an address, v4-mapped scokets allowed and a non mapped destination address, we can end up here with both v4 and v6 indicated: inp_vflag = (INP_IPV4|INP_IPV6|INP_IPV6PROTO)
In that case however laddrp is NULL as the IPv6 path does not pass in a copy currently.
Reported by: Pawel Worach (pawel.worach gmail.com) Tested by: Pawel Worach (pawel.worach gmail.com) MFC after: 6 days X-MFC with: r219579
|
219579 |
12-Mar-2011 |
bz |
Merge the two identical implementations for local port selections from in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport().
MFC after: 2 weeks
|
219397 |
08-Mar-2011 |
rrs |
Tunes and fixes the new DC-CC to seem to hit the right mix. Still may need some tweaks but it appears to almost not give away too much to an RFC2581 flow, but can really minimize the amount of buffers used in the net.
MFC after: 3 months
|
219120 |
01-Mar-2011 |
rrs |
Adds a new Congestion Control that helps reduce the RTT that a flow will build up in buffers in transit. It is a slight modification to RFC2581 but is more friendly i.e. less aggressive.
MFC after: 3 months
|
219071 |
26-Feb-2011 |
dim |
Fix breakage in sys/netinet/sctp_sysctl.c, introduced by r219057. If SCTP_HAS_RTTC is not defined, this file fails to compile. Insert the necessary #ifdefs to make it work.
Pointy hat to: rrs
|
219057 |
26-Feb-2011 |
rrs |
Improvements to CC modules: 1) Add four new points that allow you to get more information to cc algo's 2) Fix the case where user changes module on a existing TCB, in such a case, the initialization module needs to be called on all nets. 3) Move htcp_cc structure to a union that other modules can use. 4) Add 5th point for get/set socket options for cc_module specific options
MFC after: 2 months
|
219014 |
24-Feb-2011 |
tuexen |
* Fix several bugs where the scaled versions of srtt and rttvar where used incorrectly. * Use appropriate variable names for RTO instead of RTT.
MFC after: 3 months.
|
219013 |
24-Feb-2011 |
tuexen |
* Cleanup the code computing the retransmission timeout. * Fix an initialization bug for the scaled variance of the RTO.
MFC after: 3 months.
|
218909 |
21-Feb-2011 |
brucec |
Fix typos - remove duplicate "the".
PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days
|
218818 |
18-Feb-2011 |
tuexen |
Bugfix: Get per vnet sysctl variables and statistics working.
MFC after:3 months.
|
218757 |
16-Feb-2011 |
bz |
Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix.
The current expectations are documented at the beginning of uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
|
218741 |
16-Feb-2011 |
pluknet |
Bump dummynet module version to meet dummynet schedulers' requirements, and thus unbreak loading dummynet.ko via /boot/loader.conf.
Reported by: rihad <rihad att mail.ru> on freebsd-net Approved by: kib (mentor)
|
218641 |
13-Feb-2011 |
rrs |
Fix a bug reported by Jonathan Leighton in his web-sctp testing at the Univ-of-Del. Basically when a 1-to-1 socket did a socket/bind/send(data)/close. If the timing was right we would dereference a socket that is NULL.
MFC after: 1 month
|
218639 |
13-Feb-2011 |
tuexen |
Fix several bugs related to stream scheduling.
Obtained from: Robin Seggelmann MFC after: 3 months.
|
218629 |
13-Feb-2011 |
deischen |
Oops, revert an accidental local change that got added in my last commit (r218627). No damage was done in the last commit, just some duplicated code was added (which is now removed).
|
218627 |
13-Feb-2011 |
deischen |
Allow the SO_SETFIB socket option to select the default (0) routing table.
Reviewed by: julian
|
218521 |
10-Feb-2011 |
tuexen |
Remove addresses from endpoint when there are no associations. This fixes a bug reported by brucec@.
MFC after: 3 months.
|
218400 |
07-Feb-2011 |
tuexen |
Fix bugs related to M_FLOWID: * Store the flowid when receiving an SCTP/IPv6 packet. * Store the flowid when receiving an SCTP packet with wrong CRC. * Initilize flowid correctly. * Put test code under INVARIANTS. MFC after: 3 months.
|
218393 |
07-Feb-2011 |
rrs |
If not set (due to some error Michael is working on fixing) set it for the net.
MFC after: 3 months
|
218392 |
07-Feb-2011 |
rrs |
1) Track when flowid does get set. MFC after: 3 months
|
218371 |
06-Feb-2011 |
rrs |
1) Use same scheme Michael and I discussed for a selected for a flowid 2) If flowid is not set, arrange so it is stored. 3) If flowid is set by lower layer, use it.
MFC after: 3 Months
|
218360 |
05-Feb-2011 |
luigi |
correct the 'output_time' of packets generated by dummynet. In the dec.2009 rewrite I introduced a bug, using for the computation the arrival time instead of the time the packet has exited from the queue. The bandwidth computation was still correct because it is computed elsewhere, but traffic was sent out in bursts.
The bug is also present in RELENG_8 after dec.2009
Thanks to Daikichi Osuga for investingating, finding and fixing the bug with detailed graphs of the behaviour before and after the fix.
Submitted by: Daikichi Osuga MFC after: 2 weeks
|
218335 |
05-Feb-2011 |
tuexen |
Add support for M_FLOWID.
|
218319 |
05-Feb-2011 |
rrs |
1) Typo correction in comments and one spacing change. 2) Mass update to all copyrights. MFC after: 3 Months
|
218271 |
04-Feb-2011 |
jhb |
When turning off TCP_NOPUSH, only call tcp_output() to immediately flush any pending data if the connection is established.
Submitted by: csjp Reviewed by: lstewart MFC after: 1 week
|
218269 |
04-Feb-2011 |
rrs |
1) Fix cpu mapping per JB's suggestions 2) Fix it so INIT's don't always end up on CPU0
MFC after: 3 months
|
218264 |
04-Feb-2011 |
brucec |
Fix typo (Tuneable -> Tunable).
|
218241 |
03-Feb-2011 |
tuexen |
Fix several bugs in the stream schedulers. From Robin Seggelmann.
MFC after: 3 months.
|
218235 |
03-Feb-2011 |
tuexen |
Make sure that changing the ECN sysctl does not affect exisiting associations and endpoints.
MFC after: 3 months.
|
218232 |
03-Feb-2011 |
rrs |
1) Move per John Baldwin to mp_maxid 2) Some signed/unsigned errors found by Mac OS compiler (from Michael) 3) a couple of copyright updates on the effected files.
MFC after: 3 months
|
218219 |
03-Feb-2011 |
rrs |
Fix the per CPU stats so that: 1) They don't use the giant "MAX_CPU" define and instead are allocated dynamically based on mp_ncpus 2) Will zero with the netstat -z -s -p sctp 3) Will be properly handled by both the sctp_init and finish (the multi-net stuff was incorrectly bzero'ing in sctp_init the wrong size.. the bzero is now moved to the right places). And of course the free is put in at the very end.
MFC after: 3 Months
|
218211 |
03-Feb-2011 |
rrs |
Adds an experimental option to create a pool of threads. These serve as input threads and are queued packets based on the V-tag number. This is similar to what a modern card can do with queue's for TCP... but alas modern cards know nothing about SCTP.
MFC after: 3 months (maybe)
|
218186 |
02-Feb-2011 |
rrs |
1) Allow a chunk to track the cwnd it was at when sent. 2) Add separate max-bursts for retransmit and hb. These are set to sysctlable values but not settable via the socket api. This makes sure we don't blast out HB's or fast-retransmits. 3) Determine on the first data transmission on a net if its local-lan (by being under or over a RTT). This can later be used to think about different algorithms based on locallan vs big-i (experimental) 4) The cwnd should NOT be allowed to grow when an ECNEcho is seen (TCP has this same bug). We fix this in SCTP so an ECNe being seen prevents an advance of cwnd. 5) CWR's should not be sent multiple times to the same network, instead just updating the TSN being transmitted if needed.
MFC after: 1 Month
|
218167 |
01-Feb-2011 |
lstewart |
Algorithm modules can define their own private congestion signal types in the top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module.
Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function.
- Show the full number of significant bits in the signal type definitions in <netinet/cc.h>.
- Add a bitmask to simplify figuring out if a given signal is in the private or public bit range.
- Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function.
Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166
|
218156 |
01-Feb-2011 |
lstewart |
Fix typo in comment: "course" -> "coarse"
Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152
|
218155 |
01-Feb-2011 |
lstewart |
Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework.
CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
|
218153 |
01-Feb-2011 |
lstewart |
Import a clean-room implementation of the Hamilton-Delay (HD) congestion control algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework.
HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
|
218152 |
01-Feb-2011 |
lstewart |
Import a clean-room implementation of the VEGAS congestion control algorithm based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework.
VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
|
218129 |
31-Jan-2011 |
rrs |
More ECN fixes: 1) We now remove ECN-Nonce since it will no longer continue as a I-D 2) Eliminate last_tsn_echo, this tied us to an assoc not the net and thus we were not doing m-homing on the ECN-Echo senders side right. 3) Increment the count going out even if the TSN in lower in the pending ECN-Echo, this way the receiver knows exactly how many packets were marked even with network re-ordering 4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue MFC after: 1 month
|
218078 |
29-Jan-2011 |
bz |
Remove duplicate printing of TF_NOPUSH in db_print_tflags().
MFC after: 10 days
|
218072 |
29-Jan-2011 |
rrs |
Fixes to ECN in SCTP. 1) ECN was on an association basis, this is incorrect and will not work with CMT or for that matter if the user is sending to multiple addresses. This commit makes ECN on a per path basis. 2) Adopt the new format for the ECN internet draft. This also maintains compatability with old format chunks as well. 3) Keep track of the real time of a RTT down to micro seconds. For some future conditional features (for like a data center this is good information to have). MFC after: 1 month
|
218039 |
28-Jan-2011 |
rrs |
Keep track of the real last RTT on each net. This will be used for Data Center congestion control, we won't want to engage it in the ECN code unless we KNOW that the RTT is less than 500us.
MFC after: 1 week
|
218037 |
28-Jan-2011 |
rrs |
Fix a bug in the way ECN-Echo chunk sends were being accounted for. The counting was such that we counted only when we queued a chunk, not when we sent it. Now keep an additional counter for queuing and one for sending.
MFC after: 1 week
|
217913 |
26-Jan-2011 |
tuexen |
* Use 300 ms as the default for RTO_MIN. * Disable burst mitigation by default. * Remove unused constant. Discussed with rrs. MFC after: 3 months.
|
217895 |
26-Jan-2011 |
tuexen |
Make SCTP_MAX_BURST compliant with the latest version of the socket API ID. This is not compatible with the API in stable/8.
|
217894 |
26-Jan-2011 |
tuexen |
Change infrastructure for SCTP_MAX_BURST to allow compliance with the latest socket API ID. Especially it can be disabled.
Full compliance needs changing the structure used in the socket option. Since this breaks the API, it will be a seperate commit which will not be MFCed to stable/8.
MFC after: 3 months.
|
217888 |
26-Jan-2011 |
deischen |
Prison check addresses set with multicast interface options.
Reviewed by: bz MFC after: 1 week
|
217829 |
25-Jan-2011 |
thompsa |
When matching an incoming ARP against a bridge, ensure both interfaces belong to the same bridge.
Submitted by: Alexander Zagrebin
|
217806 |
24-Jan-2011 |
lstewart |
Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses the Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low noise estimate of the instantaneous RTT. ERTT's implementation is robust even in the face of delayed acknowledgements and/or TSO being in use for a connection.
A high quality, low noise RTT estimate is a requirement for applications such as delay-based congestion control, for which we will be importing some algorithm implementations shortly.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months
|
217760 |
23-Jan-2011 |
tuexen |
Add stream scheduling support. This work is based on a patch received from Robin Seggelmann.
MFC after: 3 months.
|
217748 |
23-Jan-2011 |
lstewart |
An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND.
Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms.
This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock.
Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166
|
217742 |
23-Jan-2011 |
tuexen |
Remove unnecessary checking of variable.
MFC after: 3 months.
|
217683 |
21-Jan-2011 |
lstewart |
Some correctness and robustness fixes related to CUBIC's mean RTT estimate:
- The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma.
- Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies.
- Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min.
- Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild.
Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114
|
217638 |
20-Jan-2011 |
tuexen |
Improve comments.
MFC after: 1 week.
|
217635 |
20-Jan-2011 |
rrs |
Fix it so we align with new socket API draft for state's in destination (i.e. ACTIVE/INACTIVE/UNCONFIRMED)
MFC after: 1 week
|
217611 |
19-Jan-2011 |
tuexen |
Cleanup the management of CC functions.
MFC after: 3 months.
|
217597 |
19-Jan-2011 |
rrs |
Fix style 9 nit that snuck in when I grabbed the wrong patch ;-0 (thanks Daniel)
MFC after: 1 week
|
217592 |
19-Jan-2011 |
rrs |
Fix a bug where Multicast packets sent from a udp endpoint may end up echoing back to the sender even with OUT joining the multi-cast group.
Reviewed by: gnn, bms, bz? Obtained from: deischen (with help from)
|
217554 |
18-Jan-2011 |
mdf |
Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.
|
217469 |
16-Jan-2011 |
tuexen |
Add support for resource pooling to CMT. An original version of the patch was developed by Martin Becke and Thomas Dreibholz.
MFC after: 3 months
|
217361 |
13-Jan-2011 |
jhb |
Use a blocking malloc() to initialize the dummynet taskq.
Reviewed by: luigi
|
217333 |
12-Jan-2011 |
csjp |
Un-break the build: use the correct format specifier for sizeof()
|
217322 |
12-Jan-2011 |
mdf |
sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
|
217315 |
12-Jan-2011 |
gnn |
Fix several bugs in the ARP code related to improperly formatted packets.
*) Reject requests with a protocol length not equal to 4. This is IPv4 and there is no reason to accept anything else.
*) Reject packets that have a multicast source hardware address.
*) Drop requests where the hardware address length is not equal to the hardware address length of the interface.
Pointed out by: Rozhuk Ivan MFC after: 1 week
|
217252 |
11-Jan-2011 |
lstewart |
Fixe some whitespace nits that were introduced in r216758.
Sponsored by: FreeBSD Foundation Submitted by: pjd MFC after: 10 weeks X-MFC with: r216758
|
217221 |
10-Jan-2011 |
lstewart |
Reset the last_sack_ack SACK hint for TCP input processing to ensure that the hint is 0 when no SACK data is received to update the hint with. This was accidentally omitted from r216753.
Sponsored by: FreeBSD Foundation MFC after: 10 weeks X-MFC with: 216753
|
217169 |
08-Jan-2011 |
deischen |
Make sure to always do source address selection on an unbound socket, regardless of any multicast options. If an address is specified via a multicast option, then let it override normal the source address selection.
This fixes a bug where source address selection was not being performed when multicast options were present but without an interface being specified.
Reviewed by: bz MFC after: 1 day
|
217126 |
07-Jan-2011 |
jhb |
Trim extra spaces before tabs.
|
217121 |
07-Jan-2011 |
gnn |
Fix a memory leak in ARP queues.
Pointed out by: jhb@ MFC after: 2 weeks
|
217113 |
07-Jan-2011 |
gnn |
Adjust ARP hold queue locking.
Submitted by: Rozhuk Ivan, jhb MFC after: 2 weeks
|
217110 |
07-Jan-2011 |
jhb |
Use a regular taskqueue for dummynet rather than a "fast" taskqueue.
Reviewed by: luigi
|
216887 |
02-Jan-2011 |
tuexen |
Bugfix: Make sure that the COMM_UP notificatin is delivered first also on the passive side.
MFC after: 3 days.
|
216878 |
01-Jan-2011 |
tuexen |
Fix a typo.
MFC after: 3 months.
|
216857 |
31-Dec-2010 |
bz |
Try to catch a possible divide-by-zero as early as possible if "mtu" is 0 (also test for negative MTUs if checking it anyway). An MTU of 0 is arguably a bug elsewhere, but this at least gives us some more debugging hints.
Sponsored by: ISPsystem (Early 2010) MFC after: 1 week
|
216825 |
30-Dec-2010 |
tuexen |
Define and use SCTP_SSN_GE, SCTP_SSN_GT, SCTP_TSN_GE, SCTP_TSN_GT macros and use them instead of the generic compare_with_wrap. Retire compare_with_wrap.
MFC after: 3 months.
|
216822 |
30-Dec-2010 |
tuexen |
Code cleanup: Use LIST_FOREACH, LIST_FOREACH_SAFE, TAILQ_FOREACH, TAILQ_FOREACH_SAFE where appropriate. No functional change.
MFC after: 3 months.
|
216821 |
30-Dec-2010 |
tuexen |
Fix three bugs related to the sequence number wrap-around affecting the processing of ECNE and ASCONF chunks.
Reviewed by: rrs MFC after: 3 days.
|
216760 |
28-Dec-2010 |
lstewart |
Add a comment for the ccv member of struct tcpcb.
Sponsored by: FreeBSD Foundation MFC after: 5 weeks X-MFC with: r215166
|
216758 |
28-Dec-2010 |
lstewart |
- Add some helper hook points to the TCP stack. The hooks allow Khelp modules to access inbound/outbound events and associated data for established TCP connections. The hooks only run if at least one hook function is registered for the hook point, ensuring the impact on the stack is effectively nil when no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual data to any registered Khelp module hook functions.
- Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp modules to associate per-connection data with the TCP control block.
- Bump __FreeBSD_version and add a note to UPDATING regarding to ABI changes introduced by this commit and r216753.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz, others along the way MFC after: 3 months
|
216753 |
28-Dec-2010 |
lstewart |
Add a new sack hint to track the most recent and highest sacked sequence number. This will be used by the incoming Enhanced RTT Khelp module.
Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> Reviewed by: bz and others (as part of a larger patch) MFC after: 3 months
|
216749 |
28-Dec-2010 |
lstewart |
Fix a whitespace nit introduced in r215166.
Sponsored by: FreeBSD Foundation Spotted by: bz MFC after: 5 weeks X-MFC with: r215166
|
216742 |
27-Dec-2010 |
rwatson |
Remove comment bemoaning the lack of an INP_INHASHLIST above in_pcbdrop(); I fixed this in r189657 in early 2009, so the comment is OBE.
Reviewed by: bz MFC after: 3 days
|
216672 |
22-Dec-2010 |
tuexen |
Provide a possibility to configure the inital congestion window to the value defined in RFC 4960.
MFC after: 3 months.
|
216669 |
22-Dec-2010 |
tuexen |
Improve plausibility check in sctp_handle_sack(). Allow cmt_on_off to support values 0 (no CMT), 1 (CMT), and 2 (CMT/RP).
MFC after: 3 months.
|
216621 |
21-Dec-2010 |
jhb |
Fix a typo in a comment.
MFC after: 1 week
|
216502 |
17-Dec-2010 |
tuexen |
Fix a flightsize bug related to the processing of PKTDRP reports.
MFC after: 3 days.
|
216495 |
16-Dec-2010 |
tuexen |
Bugfix: Take also the nr-mapping array into account when detecting gaps.
Reviewed by: rrs@ MFC after: 3 days.
|
216480 |
16-Dec-2010 |
tuexen |
Add a missing cast. Reported by blade_ly at yahoo.com.cn.
MFC after: 1 day.
|
216466 |
15-Dec-2010 |
bz |
Bring back (most of) NATM to avoid further bitrot after r186119. Keep three lines disabled which I am unsure if they had been used at all. This will allow us to seek testers and possibly bring it all back.
Discussed with: rwatson MFC after: 7 weeks
|
216397 |
12-Dec-2010 |
tuexen |
Bugfix: Do correct accounting using the MIB counters when an association is aborted via sctp_abort_association().
MFC after: 3 days.
|
216192 |
05-Dec-2010 |
bz |
Use correct field to track statistics counting error as bad header length. This assimilates the code to what ip_input has been doing since r1.1 in this case.
Submitted by: Rozhuk Ivan (rozhuk.im gmail.com) MFC after: 4 days
|
216188 |
04-Dec-2010 |
tuexen |
Fix a bug where also the number of non-renegable gap reports was considered to be potentially renegable.
MFC after: 1 day.
|
216115 |
02-Dec-2010 |
lstewart |
Import a clean-room implementation of the experimental H-TCP congestion control algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework.
H-TCP was designed to provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "H-TCP: A framework for congestion control in high-speed and long-distance networks" provides additional detail.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months
|
216114 |
02-Dec-2010 |
lstewart |
Import a clean-room implementation of the experimental CUBIC congestion control algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework.
CUBIC was designed for provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional detail.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months
|
216107 |
02-Dec-2010 |
lstewart |
General cleanup of the NewReno CC module (no functional changes):
- Remove superfluous includes and unhelpful comments.
- Alphabetically order functions.
- Make functions static.
Sponsored by: FreeBSD Foundation MFC after: 9 weeks X-MFC with: r215166
|
216105 |
02-Dec-2010 |
lstewart |
- Reinstantiate the after_idle hook call in tcp_output(), which got lost somewhere along the way due to mismerging r211464 in our development tree.
- Capture the essence of r211464 in NewReno's after_idle() hook. We don't use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited.
Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166
|
216103 |
02-Dec-2010 |
lstewart |
Set ssthresh appropriately on RTO. This change was accidentally not ported from the pre modular CC stack.
Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166
|
216101 |
02-Dec-2010 |
lstewart |
Pass NULL instead of 0 for the th pointer value. NULL != 0 on all platforms.
Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166
|
216075 |
30-Nov-2010 |
glebius |
Use time_uptime instead of non-monotonic time_second to drive ARP timeouts.
Suggested by: bde
|
215956 |
27-Nov-2010 |
brucec |
Fix more continuous/contiguous typos (cf. r215955)
|
215817 |
25-Nov-2010 |
rrs |
Adds new dtrace for cwnd functions and lay's groundwork for future dtrace points (rwnd flightsize etc).
MFC after: 2 months
|
215790 |
24-Nov-2010 |
glebius |
Redo r166423. It is important not only skip freeing multicast entires when underlying interface is detached, but also purge pointers to them, to avoid double-free in future.
|
215701 |
22-Nov-2010 |
dim |
After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------ r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined.
------------------------------------------------------------------------ r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.
------------------------------------------------------------------------ r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
|
215677 |
22-Nov-2010 |
zec |
Remove an apparently redundant CURVNET_SET() / CURVNET_RESTORE() pair.
MFC after: 3 days
|
215553 |
20-Nov-2010 |
lstewart |
Fix a minor code redundancy nit.
MFC after: 3 days
|
215552 |
20-Nov-2010 |
lstewart |
When enabling or disabling SIFTR with a VIMAGE kernel, ensure we add or remove the SIFTR pfil(9) hook functions to or from all network stacks. This patch allows packets inbound or outbound from a vnet to be "seen" by SIFTR.
Additional work is required to allow SIFTR to actually generate log messages for all vnet related packets because the siftr_findinpcb() function does not yet search for inpcbs across all vnets. This issue will be fixed separately.
Reported and tested by: David Hayes <dahayes at swin edu au> MFC after: 3 days
|
215434 |
17-Nov-2010 |
gnn |
Add new, per connection, statistics for TCP, including: Retransmitted Packets Zero Window Advertisements Out of Order Receives
These statistics are available via the -T argument to netstat(1). MFC after: 2 weeks
|
215410 |
16-Nov-2010 |
tuexen |
Add an SCTP socket option to retrieve the number of timeouts of an association.
MFC after: 3 days.
|
215395 |
16-Nov-2010 |
lstewart |
Make the CC framework more VIMAGE friendly by adding the machinery to allow vnets to select their own default CC algorithm independent of each other and the base system. If the base system or a vnet has set a default which gets unloaded, we reset that netstack's default to NewReno.
Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 3 months
|
215393 |
16-Nov-2010 |
lstewart |
- Querying the default CC algo is more common than setting it and the function is small, so there is no good reason not to declare the buffer at the top.
- Fix a whitespace nit.
Sponsored by: FreeBSD Foundation MFC after: 11 weeks X-MFC with: r215166
|
215392 |
16-Nov-2010 |
lstewart |
Move protocol specific implementation detail out of the core CC framework.
Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166
|
215391 |
16-Nov-2010 |
lstewart |
On CC algorithm module unload, we walk the list of active TCP control blocks. Any found to be using the algorithm that is about to go away are switched back to NewReno to avoid leaving dangling pointers which would trigger a panic. For VIMAGE kernels, there is a list per vnet to walk, yet the implementation was only examining one of the vnet lists.
Fix the implementation of the above feature for VIMAGE kernels by looping through all active TCP control blocks across all vnets.
Sponsored by: FreeBSD Foundation Tested by: Mikolaj Golub <to.my.trociny at gmail com> Reviewed by: bz (briefly) MFC after: 11 weeks
|
215377 |
16-Nov-2010 |
lstewart |
cc_init() should only be run once on system boot, but with VIMAGE kernels it runs on boot and each time a vnet jail is created. Running cc_init() multiple times results in a panic when attempting to initialise the cc_list lock again, and so r215166 effectively broke the use of vnet jails.
Switch to using a SYSINIT to run cc_init() on boot. CC algorithm modules loaded on boot register in the same SI_SUB_PROTO_IFATTACHDOMAIN category as is used in this patch, so cc_init() is run at SI_ORDER_FIRST to ensure the framework is initialised before module registration is attempted.
Sponsored by: FreeBSD Foundation Reported and tested by: Mikolaj Golub <to.my.trociny at gmail com> MFC after: 11 weeks X-MFC with: r215166
|
215317 |
14-Nov-2010 |
dim |
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree.
|
215305 |
14-Nov-2010 |
tuexen |
Take out special code for disable CRC computations on the loopback interface for IPv6. It will be handled by the loopback interface.
|
215301 |
14-Nov-2010 |
tuexen |
Simplify sctp_delayed_cksum() a bit.
MFC after: 3 days.
|
215241 |
13-Nov-2010 |
tuexen |
Fix a locking issue reported by brucec@ affecting 1-to-1 style sockets which have not yet been accepted.
MFC after: 3 days.
|
215207 |
12-Nov-2010 |
gnn |
Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program it is quite possible that we will drop packets due to the fact that only one packet can be held per ARP entry. This leads to packets being missed when a program starts or restarts if the ARP data is not currently in the ARP cache.
This code adds a new sysctl, net.link.ether.inet.maxhold, which defines a system wide maximum number of packets to be held in each ARP entry. Up to maxhold packets are queued until an ARP reply is received or the ARP times out. The default setting is the old value of 1 which has been part of the BSD networking code since time immemorial.
Expose the time we hold an incomplete ARP entry by adding the sysctl net.link.ether.inet.wait, which defaults to 20 seconds, the value used when the new ARP code was added..
Reviewed by: bz, rpaulo MFC after: 3 weeks
|
215199 |
12-Nov-2010 |
tuexen |
Don't print an empty line when printing mapping arrays.
MFC after: 3 days.
|
215198 |
12-Nov-2010 |
tuexen |
Fix more issues with the SACK/NR-SACK generation code.
MFC after: 3 days.
|
215179 |
12-Nov-2010 |
luigi |
The first customer of the SO_USER_COOKIE option: the "sockarg" ipfw option matches packets associated to a local socket and with a non-zero so_user_cookie value. The value is made available as tablearg, so it can be used as a skipto target or pipe number in ipfw/dummynet rules.
Code by Paul Joe, manpage by me.
Submitted by: Paul Joe MFC after: 1 week
|
215166 |
12-Nov-2010 |
lstewart |
This commit marks the first formal contribution of the "Five New TCP Congestion Control Algorithms for FreeBSD" FreeBSD Foundation funded project. More details about the project are available at: http://caia.swin.edu.au/freebsd/5cc/
- Add a KPI and supporting infrastructure to allow modular congestion control algorithms to be used in the net stack. Algorithms can maintain per-connection state if required, and connections maintain their own algorithm pointer, which allows different connections to concurrently use different algorithms. The TCP_CONGESTION socket option can be used with getsockopt()/setsockopt() to programmatically query or change the congestion control algorithm respectively from within an application at runtime.
- Integrate the framework with the TCP stack in as least intrusive a manner as possible. Care was also taken to develop the framework in a way that should allow integration with other congestion aware transport protocols (e.g. SCTP) in the future. The hope is that we will one day be able to share a single set of congestion control algorithm modules between all congestion aware transport protocols.
- Introduce a new congestion recovery (TF_CONGRECOVERY) state into the TCP stack and use it to decouple the meaning of recovery from a congestion event and recovery from packet loss (TF_FASTRECOVERY) a la RFC2581. ECN and delay based congestion control protocols don't generally need to recover from packet loss and need a different way to note a congestion recovery episode within the stack.
- Remove the net.inet.tcp.newreno sysctl, which simplifies some portions of code and ensures the stack always uses the appropriate mechanisms for recovering from packet loss during a congestion recovery episode.
- Extract the NewReno congestion control algorithm from the TCP stack and massage it into module form. NewReno is always built into the kernel and will remain the default algorithm for the forseeable future. Implementations of additional different algorithms will become available in the near future.
- Bump __FreeBSD_version to 900025 and note in UPDATING that rebuilding code that relies on the size of "struct tcpcb" is required.
Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated.
In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: rpaulo Tested by: David Hayes (and many others over the years) MFC after: 3 months
|
215153 |
12-Nov-2010 |
lstewart |
Standardise all Swinburne related copyright/licence statements throughout the tree in preparation for another large code import. Swinburne University is the legal entity that owns copyright and the 2-clause BSD licence is acceptable.
|
215152 |
12-Nov-2010 |
lstewart |
The university does not require that its CRICOS number be included in source code. Remove all references from the tree.
MFC after: 3 days
|
215134 |
11-Nov-2010 |
tuexen |
Fix the SACK/NR-SACK generation code.
MFC after: 3 days.
|
215110 |
11-Nov-2010 |
rrs |
Fix so that a multicast packet can be sent even if there is no route out to that mcast address. The code in in_pcb inadvertantly would error (no route) even though the user may have specified the address with the proper socket option (to specify the egress interface). Thanks bz for reminding me I forgot to commit this ;-)
Reviewed by: bz MFC after: 1 week
|
215039 |
09-Nov-2010 |
tuexen |
Improve the scalability by using the local and remote port when putting inps in the tcpephash.
MFC after: 3 days.
|
215035 |
09-Nov-2010 |
tuexen |
Fix a bug which resulted in kevent() reporting an event twice on 1-to-1 style sockets when an ABORT was received.
MFC after: 3 days.
|
215034 |
09-Nov-2010 |
brucec |
Fix typos.
PR: bin/148894 Submitted by: olgeni
|
214939 |
07-Nov-2010 |
tuexen |
Do not have the MTU table twice in the code. Therefore move the function from the timer code to util, rename it appropriately and also fix a bug in sctp_get_prev_mtu(), where calling it with a value existing in the MTU table did not return a smaller one.
MFC after: 3 days.
|
214933 |
07-Nov-2010 |
tuexen |
Remove two functions which are not used.
MFC after: 3 days.
|
214928 |
07-Nov-2010 |
tuexen |
* Use exponential backoff for retransmission of SHUTDOWN and SHUTDOWN-ACK chunks. * While there, do some cleanups.
MFC after: 3 days.
|
214918 |
07-Nov-2010 |
tuexen |
Not only stop all timers when entering the SHUTDOWN_SENT state, but also when entering the SHUTDOWN_ACK_SEND state.
MFC after: 3 days.
|
214877 |
06-Nov-2010 |
tuexen |
Do not resend DATA chunks without delay when dropped by the peer and the CRC was correct.
MFC after: 3 days.
|
214876 |
06-Nov-2010 |
tuexen |
* Fix an accounting bug regarding SACK/NR-SACK chunks. * Fix the generation of the SACK/NR-SACK gap lists.
MFC after: 3 days.
|
214754 |
03-Nov-2010 |
n_hibma |
Don't spam the console with loaded modules during boot and/or during startup of ppp.
Note: This cannot be hidden behind bootverbose as this file is included from lib/libalias as well.
|
214675 |
02-Nov-2010 |
jhb |
Don't leak the LLE lock if the arptimer callout is pending or inactive.
Reported by: David Rhodus MFC after: 1 month
|
214509 |
29-Oct-2010 |
glebius |
Remove meaningless XXXXX, that is a remain of comment, removed in r186200.
|
214508 |
29-Oct-2010 |
glebius |
Revert a small part of the r198301, that is entirely unrelated to the r198301 itself. It also broke the logic of not sending more than one ARP request per second, that consequently lead to a potential problem of flooding network with broadcast packets.
MFC after: 1 week
|
214303 |
24-Oct-2010 |
bz |
Add initial inet DDB support for show in_ifaddr and show sin commands which proved to be useful while debugging address list problems.
MFC after: 6 days
|
214250 |
23-Oct-2010 |
bz |
Make the IPsec SADB embedded route cache a union to be able to hold both the legacy and IPv6 route destination address. Previously in case of IPv6, there was a memory overwrite due to not enough space for the IPv6 address.
PR: kern/122565 MFC After: 2 weeks
|
214054 |
19-Oct-2010 |
uqs |
mdoc: drop even more redundant .Pp calls
No change in rendered output, less mandoc lint warnings.
Tool provided by: Nobuyuki Koganemaru n-kogane at syd.odn.ne.jp
|
213932 |
16-Oct-2010 |
bz |
MfP4 CH182763 (original version):
Make it harder to exploit certain in_control() related races between the intiial lookup at the beginning and the time we will remove the entry from the lists by re-checking that entry is still in the list before trying to remove it.
(*) It is believed that with the current code and locking strategy we cannot completely fix all race.
Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817 Tested by: Nima Misaghian (nima_misa hotmail.com) (original version) PR: kern/146250 Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version) MFC after: 1 week
|
213913 |
16-Oct-2010 |
lstewart |
Retire the system-wide, per-reassembly queue segment limit. The mechanism is far too coarse grained to be useful and the default value significantly degrades TCP performance on moderate to high bandwidth-delay product paths with non-zero loss (e.g. 5+Mbps connections across the public Internet often suffer).
Replace the outgoing mechanism with an individual per-queue limit based on the number of MSS segments that fit into the socket's receive buffer. This should strike a good balance between performance and the potential for resource exhaustion when FreeBSD is acting as a TCP receiver. With socket buffer autotuning (which is enabled by default), the reassembly queue tracks the socket buffer and benefits too.
As the XXX comment suggests, my testing uncovered some unexpected behaviour which requires further investigation. By using so->so_rcv.sb_hiwat instead of sbspace(&so->so_rcv), we allow more segments to be held across both the socket receive buffer and reassembly queue than we probably should. The tradeoff is better performance in at least one common scenario, versus a devious sender's ability to consume more resources on a FreeBSD receiver.
Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
|
213912 |
16-Oct-2010 |
lstewart |
- Switch the "net.inet.tcp.reass.cursegments" and "net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone stats. The value returned by the cursegments sysctl is approximate owing to the way in which uma_zone_get_cur is implemented.
- Discontinue use of V_tcp_reass_qsize as a global reassembly segment count variable in the reassembly implementation. The variable was used without proper synchronisation and was duplicating accounting done by UMA already. The lack of synchronisation was particularly problematic on SMP systems terminating many TCP sessions, resulting in poor TCP performance for connections with non-zero packet loss.
Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo (as part of a larger patch) MFC after: 2 weeks
|
213832 |
14-Oct-2010 |
bz |
Use ifa_ifwithaddr_check() rather than ifa_ifwithaddr() as we are not interested in the result and would leak a reference otherwise.
PR: kern/151435 Submitted by: Andrew Boyer (aboyer averesystems.com) MFC after: 3 days
|
213329 |
01-Oct-2010 |
luigi |
put back the assigment to sched_time. It was correct, and it was necessary.
Submitted by: Riccardo Panicucci
|
213325 |
01-Oct-2010 |
bz |
Proper bracketing.
PR: kern/151100 Submitted by: SunMinghao (sunminghao hotmail.com) MFC after: 3 days
|
213279 |
29-Sep-2010 |
luigi |
remove an unnecessary (and wrong) assignment. It was meant to reset idle_time (and it was not needed), but i even used the wrong field.
Obtained from: Oleg MFC after: 3 days
|
213267 |
29-Sep-2010 |
luigi |
whitespace changes in preparation for future commits
|
213265 |
29-Sep-2010 |
luigi |
fix handling of initial credit for an idle pipe. This fixes the bug where setting bw > 1 MTU/tick resulted in infinite bandwidth if io_fast=1
PR: 147245 148429 Obtained from: Riccardo Panicucci MFC after: 3 days
|
213254 |
28-Sep-2010 |
luigi |
fix breakage in in-kernel NAT: the code did not honor net.inet.ip.fw.one_pass and always moved to the next rule in case of a successful nat.
This should fix several related PR (waiting for feedback before closing them)
PR: 145167 149572 150141 MFC after: 3 days
|
213253 |
28-Sep-2010 |
luigi |
Whitespace changes to reduce diffs wrt the most recent ipfw/dummynet code: + remove an unused macro, + adjust the constants in an enum + small whitespace changes
MFC after: 3 days
|
213225 |
27-Sep-2010 |
delphij |
Add a bandaid for a long-standing race condition during route entry un-expiring.
The previous version of code have no locking when testing rt_refcnt. The result of the lack of locking may result in a condition where a routing entry have a reference count but at the same time have RTPRF_OURS bit set and an expiration timer. These would eventually lead to a panic:
panic: rtqkill route really not free
When the system have ICMP redirects accepted from local gateway in a moderate frequency, for instance.
Commit this workaround for now until we have some better solution.
PR: kern/149804 Reviewed by: bz Tested by: Zhao Xin, Pete French MFC after: 2 weeks
|
213162 |
25-Sep-2010 |
lstewart |
Log the number of segments currently in the reassembly queue.
Sponsored by: FreeBSD Foundation
|
213158 |
25-Sep-2010 |
lstewart |
Internalise reassembly queue related functionality and variables which should not be used outside of the reassembly queue implementation. Provide a new function to flush all segments from a reassembly queue and call it from the appropriate places instead of manipulating the queue directly.
Sponsored by: FreeBSD Foundation Reviewed by: andre, gnn, rpaulo MFC after: 2 weeks
|
213103 |
24-Sep-2010 |
attilio |
Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() general in the kernel (just as inet_ntoa() and inet_aton()) are and sync their prototype accordingly with already mentioned functions.
Sponsored by: Sandvine Incorporated Reviewed by: emaste, rstone Approved by: dfr MFC after: 2 weeks
|
213101 |
24-Sep-2010 |
attilio |
IP_BINDANY is not correctly handled in getsockopt() case. Fix it by specifying the correct bits.
Sponsored by: Sandvine Incorporated Reviewed by: bz, emaste, rstone Obtained from: Sandvine Incorporated MFC after: 10 days
|
212898 |
20-Sep-2010 |
glebius |
Do not convert some meaningful error value to EINVAL.
Reviewed by: will
|
212897 |
20-Sep-2010 |
tuexen |
Fix a locking issue which resulted in aborted associations due to a corrupted nr-mapping array.
MFC after: 2 weeks.
|
212851 |
19-Sep-2010 |
tuexen |
Allow the initial congestion window to be configure to one MTU. Improve the description.
MFC after: 2 weeks.
|
212850 |
19-Sep-2010 |
tuexen |
Fix a locking issue which shows up when the code is used on Mac OS X.
MFC after: 2 weeks.
|
212803 |
17-Sep-2010 |
andre |
Rearrange the TSO code to make it more readable and to clearly separate the decision logic, of whether we can do TSO, and the calculation of the burst length into two distinct parts.
Change the way the TSO burst length calculation is done. While TSO could do bursts of 65535 bytes that can't be represented in ip_len together with the IP and TCP header. Account for that and use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both have the same value of 64K). When more data is available prevent less than MSS sized segments from being sent during the current TSO burst.
Add two more KASSERTs to ensure the integrity of the packets.
Tested by: Ben Wilber <ben-at-desync com> MFC after: 10 days
|
212801 |
17-Sep-2010 |
tuexen |
Fix a bug where the wrong PR-SCTP policy was considered. While there, use always the same code for the check of TTL expiration.
MFC after: 2 weeks.
|
212800 |
17-Sep-2010 |
tuexen |
Make the initial congestion window configurable via sysctl.
MFC after: 2 weeks.
|
212799 |
17-Sep-2010 |
tuexen |
* Implement initial version of send buffer splitting. * Make send/recv buffer splitting switchable via sysctl. * While there: Fix some comments.
|
212765 |
16-Sep-2010 |
andre |
Remove the TCP inflight bandwidth limiter as announced in r211315 to give way for the pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference.
In 'struct tcpcb' the variables previously used by the inflight limiter are renamed to spares to keep the ABI intact and to have some more space for future extensions.
In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to preserve the ABI. It is always set to 0.
In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed to preserve the ABI. It is always set to 0.
These unused variable in the various structures may be reused in the future or garbage collected before the next release or at some other point when an ABI change happens anyway for other reasons.
No MFC is planned. The inflight bandwidth limiter stays disabled by default in the other branches but remains available.
|
212731 |
16-Sep-2010 |
andre |
Improve comment to TCP_MINMSS by taking the wording from lstewart (with a small difference in the last paragraph though) as suggested by jhb.
Clarify that the 'reviewed by' in r212653 by lstewart was for the functional change, not the comments in the committed version.
|
212714 |
16-Sep-2010 |
tuexen |
Remove old debug code.
MFC after: 2 weeks.
|
212713 |
15-Sep-2010 |
tuexen |
Remove unused variable/assignment.
MFC after: 3 weeks.
|
212712 |
15-Sep-2010 |
tuexen |
Delay the assignment of a path for DATA chunk until they hit the sent_queue. Honor a given path when the SCTP_ADDR_OVER flag is set.
MFC after: 2 weeks.
|
212711 |
15-Sep-2010 |
tuexen |
Use TAILQ_EMPTY() for testing if a tail queue is empty. Set whoFrom to NULL after freeing whoFrom.
|
212707 |
15-Sep-2010 |
tuexen |
Remove unused variable/assignment.
MFC after: 2 weeks.
|
212704 |
15-Sep-2010 |
tuexen |
Remove assignment without effect.
MFC after: 2 weeks.
|
212702 |
15-Sep-2010 |
tuexen |
* Use !TAILQ_EMPTY() for checking if a tail queue is not empty. * Remove assignment without any effect.
MFC after: 2 weeks.
|
212653 |
15-Sep-2010 |
andre |
Change the default MSS for IPv4 and IPv6 TCP connections from an artificial power-of-2 rounded number to their real values specified in RFC879 and RFC2460.
From the history and existing comments it appears that the rounded numbers were intended to be advantageous for the kernel and mbuf system. However this hasn't been the case at for at least a long time. The mbuf clusters used in tcp_output() have enough space to hold the larger real value for the default MSS for both IPv4 and IPv6. Note that the default MSS is only used when path MTU discovery is disabled.
Update and expand related comments.
Reviewed by: lsteward (including some word-smithing) MFC after: 2 weeks
|
212502 |
12-Sep-2010 |
qingli |
Adding an address on an interface also requires the loopback route to that address be installed.
PR: kern/150481 Submitted by: Ingo Flaschberger <if at xip.at> MFC after: 5 days
|
212380 |
09-Sep-2010 |
tuexen |
* Remove code which has no effect. * Clean up the handling in sctp_lower_sosend().
MFC after: 3 weeks.
|
212266 |
06-Sep-2010 |
will |
Fix CARP in backup mode by properly registering its hooks for INET and INET6 using ipproto_{un,}register() and the newly created ip6proto_{un,}register() so that it can again receive IPPROTO_CARP packets allowing its state machine to work.
Reviewed by: bz Approved by: ken (mentor)
|
212265 |
06-Sep-2010 |
will |
Fix static kernel builds with carp(4) by changing its SYSINIT order so that it is initialized after basic protocol initialization, which allows it to register via pf_proto_register().
Reviewed by: bz Approved by: ken (mentor)
|
212256 |
06-Sep-2010 |
glebius |
in_delayed_cksum() requires host byte order.
Reported by: Alexander Levin <amindomao googlemail.com> MFC after: 1 week
|
212242 |
05-Sep-2010 |
tuexen |
Implement correct handling of address parameter and sendinfo for SCTP send calls.
MFC after: 4 weeks.
|
212225 |
05-Sep-2010 |
rrs |
Fix some CLANG warnings. One clang warning is left due to the fact that its bogus.. nam->sa_family will not change from AF_INET6 to AF_INET (but clang thinks it does ;-D)
|
212209 |
04-Sep-2010 |
bz |
In case of RADIX_MPATH do not leak the IN_IFADDR read lock on early return.
MFC after: 3 days
|
212155 |
02-Sep-2010 |
bz |
MFp4 CH=183052 183053 183258:
In protosw we define pr_protocol as short, while on the wire it is an uint8_t. That way we can have "internal" protocols like DIVERT, SEND or gaps for modules (PROTO_SPACER). Switch ipproto_{un,}register to accept a short protocol number(*) and do an upfront check for valid boundries. With this we also consistently report EPROTONOSUPPORT for out of bounds protocols, as we did for proto == 0. This allows a caller to not error for this case, which is especially important if we want to automatically call these from domain handling.
(*) the functions have been without any in-tree consumer since the initial introducation, so this is considered save.
Implement ip6proto_{un,}register() similarly to their legacy IP counter parts to allow modules to hook up dynamically.
Reviewed by: philip, will MFC after: 1 week
|
212099 |
01-Sep-2010 |
tuexen |
Fix a bug which results in peer IPv4 addresses a.b.c.d with 224<=d<=239 incorrectly being detected as multicast addresses on little endian systems.
MFC after: 2 weeks
|
211992 |
30-Aug-2010 |
maxim |
o Some programs could send broadcast/multicast traffic to ipfw pseudo-interface. This leads to a panic due to uninitialized if_broadcastaddr address. Initialize it and implement ip_output() method to prevent mbuf leak later.
ipfw pseudo-interface should never send anything therefore call panic(9) in if_start() method.
PR: kern/149807 Submitted by: Dmitrij Tejblum MFC after: 2 weeks
|
211969 |
29-Aug-2010 |
tuexen |
Fix the the SCTP_WITH_NO_CSUM option when used in combination with interface supporting CRC offload. While at it, make use of the feature that the loopback interface provides CRC offloading.
MFC after: 4 weeks
|
211950 |
28-Aug-2010 |
tuexen |
Bugfix: Do not send a packet drop report in response to a received INIT-ACK with incorrect CRC.
|
211944 |
28-Aug-2010 |
tuexen |
Fix the switching on/off of CMT using sysctl and socket option. Fix the switching on/off of PF and NR-SACKs using sysctl. Add minor improvement in handling malloc failures. Improve the address checks when sending.
MFC after: 4 weeks
|
211888 |
27-Aug-2010 |
jhb |
Simplify the tcp pcblist estimate logic slightly.
MFC after: 3 days
|
211874 |
27-Aug-2010 |
andre |
Use timestamp modulo comparison macro for automatic receive buffer scaling to correctly handle wrapping of ticks value.
MFC after: 1 week
|
211501 |
19-Aug-2010 |
anchie |
MFp4: anchie_soc2009 branch:
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.
The implementation consists of a kernel module that gets packets from the nd6 code, sends them to user space on a dedicated socket and reinjects them back for further processing.
Hooks are used from nd6 code paths to divert relevant packets to the send implementation for processing in user space. The hooks are only triggered if the send module is loaded. In case no user space application is connected to the send socket, processing continues normaly as if the module would not be loaded. Unloading the module is not possible at this time due to missing nd6 locking.
The native SeND socket is similar to a raw IPv6 socket but with its own, internal pseudo-protocol.
Approved by: bz (mentor)
|
211464 |
18-Aug-2010 |
andre |
If a TCP connection has been idle for one retransmit timeout or more it must reset its congestion window back to the initial window.
RFC3390 has increased the initial window from 1 segment to up to 4 segments.
The initial window increase of RFC3390 wasn't reflected into the restart window which remained at its original defaults of 4 segments for local and 1 segment for all other connections. Both values are controllable through sysctl net.inet.tcp.local_slowstart_flightsize and net.inet.tcp.slowstart_flightsize.
The increase helps TCP's slow start algorithm to open up the congestion window much faster.
Reviewed by: lstewart MFC after: 1 week
|
211462 |
18-Aug-2010 |
andre |
Untangle the net.inet.tcp.log_in_vain and net.inet.tcp.log_debug sysctl's and remove any side effects.
Both sysctl's share the same backend infrastructure and due to the way it was implemented enabling net.inet.tcp.log_in_vain would also cause log_debug output to be generated. This was surprising and eventually annoying to the user.
The log output backend is kept the same but a little shim is inserted to properly separate log_in_vain and log_debug and to remove any side effects.
PR: kern/137317 MFC after: 1 week
|
211451 |
18-Aug-2010 |
bz |
When calculating the expected memory size for userspace, also take the number of syncache entries into account for the surplus we add to account for a possible increase of records in the re-entry window.
Discussed with: jhb, silby MFC after: 1 week
|
211433 |
17-Aug-2010 |
jhb |
Ensure a minimum "slop" of 10 extra pcb structures when providing a memory size estimate to userland for pcb list sysctls. The previous behavior of a "slop" of n/8 does not work well for small values of n (e.g. no slop at all if you have less than 8 open UDP connections).
Reviewed by: bz MFC after: 1 week
|
211333 |
15-Aug-2010 |
andre |
Fix the interaction between 'ICMP fragmentation needed' MTU updates, path MTU discovery and the tcp_minmss limiter for very small MTU's.
When the MTU suggested by the gateway via ICMP, or if there isn't any the next smaller step from ip_next_mtu(), is lower than the floor enforced by net.inet.tcp.minmss (default 216) the value is ignored and the default MSS (512) is used instead. However the DF flag in the IP header is still set in tcp_output() preventing fragmentation by the gateway.
Fix this by using tcp_minmss as the MSS and clear the DF flag if the suggested MTU is too low. This turns off path MTU dissovery for the remainder of the session and allows fragmentation to be done by the gateway.
Only MTU's smaller than 256 are affected. The smallest official MTU specified is for AX.25 packet radio at 256 octets.
PR: kern/146628 Tested by: Matthew Luckie <mjl-at-luckie org nz> MFC after: 1 week
|
211332 |
15-Aug-2010 |
andre |
Initializing the new error variable to zero in syncache_socket() is not necessary.
Noticed by: bz
|
211327 |
15-Aug-2010 |
andre |
Add more logging points for failures in syncache_socket() to report when a new socket couldn't be created because one of in_pcbinshash(), in6_pcbconnect() or in_pcbconnect() failed.
Logging is conditional on net.inet.tcp.log_debug being enabled.
MFC after: 1 week
|
211317 |
14-Aug-2010 |
andre |
When using TSO and sending more than TCP_MAXWIN sendalot is set and we loop back to 'again'. If the remainder is less or equal to one full segment, the TSO flag was not cleared even though it isn't necessary anymore. Enabling the TSO flag on a segment that doesn't require any offloaded segmentation by the NIC may cause confusion in the driver or hardware.
Reset the internal tso flag in tcp_output() on every iteration of sendalot.
PR: kern/132832 Submitted by: Renaud Lienhart <renaud-at-vmware com> MFC after: 1 week
|
211316 |
14-Aug-2010 |
andre |
Change the messages of the ICMP bad port bandwidth limiter from a kernel printf to a log output with the priority of LOG_NOTICE.
This way the messages still show up in /var/log/messages but no longer spam the console every other second on busy servers that are port scanned: "Limiting open port RST response from 114 to 100 packets/sec"
PR: kern/147352 Submitted by: Eugene Grosbein <eugen-at-eg sd rdtc ru> MFC after: 1 week
|
211315 |
14-Aug-2010 |
andre |
Disable TCP inflight limiter by default.
It was experimental and interferes with the normal congestion control algorithms by instating a separate, possibly lower, ceiling for the amount of data that is in flight to the remote host. With high speed internet connections the inflight limit frequently has been estimated too low due to the noisy nature of the RTT measurements.
This code gives way for the upcoming pluggable congestion control framework. It is the task of the congestion control algorithm to set the congestion window and amount of inflight data without external interference.
Reviewed by: lstewart MFC after: 1 week Removal after: 1 month
|
211193 |
11-Aug-2010 |
will |
Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs.
Reviewed by: bz Approved by: ken (mentor)
|
211157 |
11-Aug-2010 |
will |
Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack.
Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.
This commit requires IP6PROTOSPACER, introduced in r211115.
Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks
|
211059 |
08-Aug-2010 |
delphij |
Address an edge condition that we found at work, where the carp(4) interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at cold boot. This behavior is not observed when carp(4) interface is created slightly later, when the underlying interface is fully up.
Before this change what happen at boot is roughly:
- ifconfig creates em0 interface; - ifconfig clones a carp device using em0; (em0's link state is DOWN at this point) - carp state: INIT -> BACKUP [*] - carp state: BACKUP -> MASTER - [Some negotiate between em0 and switch] - em0 kicks up link state change event (em0's link state is now up DOWN at this point) - do_link_state_change() -> carp_carpdev_state() - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+] - carp state: INIT -> BACKUP - carp state: BACKUP -> MASTER
At the [*] stage, em0 did not received any broadcast message from other node, and assume our node is the master, thus carp(4) sets the link state to "UP" after becoming a master. At [+], the master status is forcely set to "INIT", then an election is casted, after which our node would actually become a master.
We believe that at the [*] stage, the master status should remain as "INIT" since the underlying parent interface's link state is not up.
Obtained from: iXsystems, Inc. Reported by: jpaetzel MFC after: 2 months
|
211057 |
08-Aug-2010 |
ed |
Don't use struct timezone.
The timezone structure acquired by gettimeofday() is not used at all. Just remove it.
|
210866 |
05-Aug-2010 |
tuexen |
Fix a bug where endpoints bound to wildcard addresses where using addresses not announced to the peer due to address scoping.
MFC after: 3 weeks
|
210714 |
01-Aug-2010 |
tuexen |
Cleanup code.
MFC after: 2 weeks
|
210703 |
31-Jul-2010 |
bz |
Document the mandatory argument to the arptimer() and nd6_llinfo_timer() functions with a KASSERT(). Note: there is no need to return after panic.
In the legacy IP case, only assign the arg after the check, in the IPv6 case, remove the extra checks for the table and interface as they have to be there unless we freed and forgot to cancel the timer. It doesn't matter anyway as we would panic on the NULL pointer deref immediately and the bug is elsewhere. This unifies the code of both address families to some extend.
Reviewed by: rwatson MFC after: 6 days
|
210686 |
31-Jul-2010 |
bz |
MFp4 @181628:
Free the rtentry after we diconnected it from the FIB and are counting it as rttrash. There might still be a chance we leak it from a different code path but there is nothing we can do about this here.
Sponsored by: ISPsystem (in February) Reviewed by: julian (in February) MFC after: 2 weeks
|
210666 |
30-Jul-2010 |
andre |
Fix a bug in syncache where the initial CWND for new incoming connections was limited to one segment under the faulty assumption of a retransmit. Due to this the opportunity to initialize the increased congestion window according to RFC3390 was missed.
Support for RFC3465 introduced in r187289 uncovered the bug as the ACK to SYN/ACK no longer caused snd_cwnd increase by MSS (actually, this increase shouldn't happen as it's explicitly forbidden by RFC3390, but it's another issue). Snd_cwnd remains really small (1*MSS + 1) and this causes really bad interaction with delayed acks on other side.
The variable name sc_rxmits is a bit misleading as it counts all transmits, not just retransmits.
Submitted by: Maxim Dounin <mdounin-at-mdounin-dot-ru> MFC after: 10 days
|
210600 |
29-Jul-2010 |
rrs |
Fix the comment block that has the nice table to really have the nice table :-)
MFC after: 1 month
|
210599 |
29-Jul-2010 |
rrs |
PR SCTP Bugs. Basically a full sized frame of PR SCTP FWD-TSN's would not be sent and thus cause a stalled connection. Also the rwnd Calculation was also off on the receiver side for PR-SCTP. MFC after: 1 month
|
210537 |
27-Jul-2010 |
glebius |
Fix operation of "netgraph" action in conjunction with the net.inet.ip.fw.one_pass sysctl.
The "ngtee" action is still broken.
PR: kern/148885 Submitted by: Nickolay Dudorov <nnd mail.nsk.ru>
|
210495 |
26-Jul-2010 |
tuexen |
Fix a bug where the length of a FORWARD-TSN chunk was set incorrectly in the chunk. This resulted in malformed frames. Remove a duplicate assignment.
MFC after: 2 weeks
|
210494 |
26-Jul-2010 |
rrs |
Make sure that we report chunks if a socket still exists that were not sent. In either case carefully remove the data if it does not get taken by the reporting routines.
MFC after: 2 weeks
|
210493 |
26-Jul-2010 |
rrs |
When counting the number of chunks in the retransmission queue to validate the retran count, we need to include the chunks in the control send queue too. Otherwise the count will not match and you will get the invarient warning if invarients are on.
MFC after: 2 weeks
|
210203 |
18-Jul-2010 |
lstewart |
- Move common code from the hook functions that fills in a packet node struct to a separate inline function. This further reduces duplicate code that didn't have a good reason to stay as it was.
- Reorder the malloc of a pkt_node struct in the hook functions such that it only occurs if we managed to find a usable tcpcb associated with the packet.
- Make the inp_locally_locked variable's type consistent with the prototype of siftr_siftdata().
Sponsored by: FreeBSD Foundation
|
210160 |
16-Jul-2010 |
imp |
machine/cpu.h isn't appropriate for this file,so remove it
|
210123 |
15-Jul-2010 |
luigi |
remove some conditional #ifdefs (no-op on FreeBSD); run the timer routine on cpu 0.
|
210120 |
15-Jul-2010 |
luigi |
whitespace fixes
|
210119 |
15-Jul-2010 |
luigi |
fix a comment and final empty line
|
209982 |
13-Jul-2010 |
lstewart |
The SIFTR DPCPU statistics struct was not being zeroed between enable/disable cycles so the values would accumulate rather than reset for each cycle.
Sponsored by: FreeBSD Foundation
|
209980 |
13-Jul-2010 |
lstewart |
Catch up with the rename of DPCPU_SUM to DPCPU_VARSUM in r209978.
Sponsored by: FreeBSD Foundation
|
209845 |
09-Jul-2010 |
glebius |
Improve last commit: use bpf_mtap2() to avoiding stack usage.
Prodded by: julian
|
209797 |
08-Jul-2010 |
glebius |
Since r209216 bpf(4) searches for mbuf_tags(9) and thus will not work with a stub m_hdr instead of a full mbuf.
PR: kern/148050
|
209663 |
03-Jul-2010 |
rrs |
This fixes a crash in SCTP. It was possible to have a large number of packets queued to a crashing process. In a specific case you may get 2 ABORT's back (from say two packets in flight). If the aborts happened to be processed at the same time its possible to have one free the association while the other is trying to report all the outbound packets. When this occured it could lead to a crash.
MFC after: 3 days
|
209662 |
03-Jul-2010 |
lstewart |
Import the Statistical Information For TCP Research (SIFTR) kernel module into FreeBSD. SIFTR logs a range of statistics on active TCP connections to a log file, providing the ability to make highly granular measurements of TCP connection state. The tool is aimed at system administrators, developers and researchers alike. Please take it for a spin and test it out - the man page should have all the information required to get you going.
Many thanks go to the Cisco University Research Program Fund at Community Foundation Silicon Valley and the FreeBSD Foundation. Their support of our work at the Centre for Advanced Internet Architectures, Swinburne University of Technology is greatly appreciated.
Sponsored by: Cisco URP, FreeBSD Foundation Reviewed by: dwmalone, gnn, rpaulo Tested by: Many on freebsd-current@ and elsewhere over the years MFC after: 1 month
|
209644 |
02-Jul-2010 |
rrs |
Fix a bug that WILL cause a panic. Basically a read-lock is being called to check the vtag-timewait cache. Then in two cases (where a vtag is bad i.e. in the time-wait state) the write-unlock is called NOT the read-unlock. Under conditions where lots of associations are coming and going this will cause the system to panic at some point.
MFC after: 3 days
|
209589 |
29-Jun-2010 |
glebius |
After processing the O_SKIPTO opcode our cmd points to the next rule, and "match" processing at the end of inner loop would look ahead into the next rule, which is incorrect. Particularly, in the case when the next rule started with F_NOT opcode it was skipped blindly.
To fix this, exit the inner loop with the continue operator forcibly and explicitly.
PR: kern/147798
|
209499 |
24-Jun-2010 |
tuexen |
Fix a bug I introduced in r209470.
MFC after: 3 days
|
209470 |
23-Jun-2010 |
tuexen |
* Implement sctp_does_stcb_own_this_addr() correclty. It was taking the wrong side into account. * sctp_findassociation_ep_addr() must check the local address if available. This fixes a bug where ABORT chunks were accepted even in the case where the local was not owned by the endpoint. Thanks to brucec for pointing out a bug in my first version of the fix. MFC after: 3 days
|
209289 |
18-Jun-2010 |
tuexen |
Fix a rece condition in the shutdown handling. The race condition resulted in a panic.
MFC after: 3 days
|
209178 |
14-Jun-2010 |
tuexen |
* Fix a bug where the length of the ASCONF-ACK was calculated wrong due to using an uninitialized variable. * Fix a bug where a NULL pointer was dereferenced when interfaces come and go at a high rate. * Fix a bug where inps where not deregistered from iterators. * Fix a race condition in freeing an association. * Fix a refcount problem related to the iterator. Each of the above bug results in a panic. It shows up when interfaces come and go at a high rate.
Obtained from: rrs (partly) MFC after: 3 days
|
209029 |
11-Jun-2010 |
rrs |
3 Fixes - a) There was a case where a ICMP message could cause us to return leaving a stuck lock on an stcb. b) The iterator needed some tweaks to fix its lock ordering. c) The ITERATOR_LOCK is no longer needed in the freeing of a stcb. Now that the timer based one is gone we don't have a multiple resume situation. Add to that that there was somewhere a path out of the freeing of an assoc that did NOT release the iterator_lock.. it was time to clean this old code up and in the process fix the lock bug.
MFC after: 1 week
|
208970 |
09-Jun-2010 |
rrs |
Found by Michael. In cases where we run out of memory (no more inp space) we don't propely NULL the INP on return.
Obtained from: tuexen MFC after: 3 Days
|
208953 |
09-Jun-2010 |
rrs |
Fix serveral bugs all having to do with freeing an sctp_inpcb: 1) Make sure not to remove the flag on the PCB until after the close() caller is back in control with the lock. Otherwise a quickly freeing assoc could kill the inpcb and cause a panic.
2) Make sure all calls to log_closing have not released the locks before calling the log function, we don't want the logging function to crash us due to a freed inpcb.
3) Make sure that when we get to the end, we release all locks (after removing them from view) and as long as we are NOT the inp-kill timer removing the inp, call the callout_drain() function so a racing timer won't later call in and cause a racing crash. MFC after: 1 week
|
208952 |
09-Jun-2010 |
rrs |
BUG:Turns out we need to use both bit maps to calculate the cum-ack (we were not doing it for the NR-Sack case). With this fix NR-sack should now work correctly. MFC after: 1 week
|
208902 |
08-Jun-2010 |
rrs |
2 Bugs:
1) Only use both mapping arrays when NR sack is off. This way we can hold off moving the cumack (not the best but workable) when NR-sack is on.
2) We must make sure to just return on the move of the bit to the NR array if the cum-ack as already went past the TSN. This prevents marking a bit behind the array and hitting the invariant code that panic's us.
MFC after: 1 week
|
208897 |
07-Jun-2010 |
rrs |
This fixes a BUG in the handling of the cum-ack calculation. We were only paying attention to the nr-mapping-array. Which seems to make sense on the surface, by definition things up to the cum-ack should be deliverable thus in the nr-mapping-array. However (there is always a gotcha) thats not true when it comes to large messages. The stack may hold the message while re-assembling it not not deliver it based on several thresholds. If that happens (which it would for smaller large messages) then the cum-ack is figured wrong. We now properly use both arrays in the cum-ack calculation.
MFC after: 1 week.
|
208891 |
07-Jun-2010 |
rrs |
Opps... my bad.. we don't need a SOCK_UNLOCK() after calling socantrcvmore_locked() since it will unlock the lock for you.
MFC after: 1 week
|
208883 |
07-Jun-2010 |
rrs |
Fix so we call socantrcvmore_locked so we don't see a race where we unlock to call the non-locked version and have the socket go away.
MFC after: 1 week
|
208879 |
06-Jun-2010 |
rrs |
1) Optimize the cleanup and don't always depend on the timer. This is done by considering the locks we will destroy and if they are contended we consider it the same as a reference count being up. Fixing this appears to cleanup another crash that was appearing with all the timers where the socket buf lock got corrupted.
2) Fix the sysctl code to take a lot more care when looking at INP's that are in the GONE or ALLGONE state.
MFC after: 1 week
|
208878 |
06-Jun-2010 |
rrs |
Ok, yet another bug in killing off all the hundreds of apitesters.. Basically we end up with attempting to destroy a lock thats contended on. A cookie echo arrives at the same time that the close is happening. The close gets the lock but the cookie echo has already passed the check for the gone flag and is then locked waiting on the create lock.. when we go to destroy it bam. For now we do the timer destroy for all calls to close.. We can probably optimize this later so that we check whats being contended on and if there is contention then do the timer thing. but this is probably safest since the inp has been removed from all lists and references and only the timer can find it.. once the locks are released all other places will instantly see the GONE flag and bail (thats what the change in sctp_input is one place that was lacking the bail code).
MFC after: 1 week
|
208876 |
06-Jun-2010 |
rrs |
1) Further enhance the INVARIANT lock validation (no locks) are held by checking the create and inp locks as well.
2) Fix a bug in that when a socket is closed an INIT-ACK is returned, we do NOT unlock the locked_tcb unless its different (an unlikely scenario). If we blindly unlock as we were doing before we can end up unlocking the actual stcb thats about to be sent down to the free function which requires the lock be held.
MFC after: 1 week
|
208875 |
06-Jun-2010 |
rrs |
Fix a bug in the sctp_inpcb_free. Basically if the socket was setup to do an abortive close an association that was in the accept_queue could get stuck and never freed. Now we properly start the kill timer on the socket and turn off the flag (same thing we do for the graceful close method). MFC after: 1 week
|
208874 |
06-Jun-2010 |
rrs |
Fix a bug in sctp_abort_assoc(). DON'T call the sctp_inpcb_free when the gone flag is set. You don't know what locks the caller has set and there is already a kill timer running.
MFC after: 1 week
|
208864 |
06-Jun-2010 |
rrs |
Hopefully this fixes a LOR by making so we only hold the iterator lock during updates to the iterators work.
MFC after: 1 week
|
208863 |
06-Jun-2010 |
rrs |
Bruce's fix for some return's in error legs.
MFC after: 1 week
|
208857 |
05-Jun-2010 |
rrs |
Purge out a Windows def that somehow slipped past the scrubber.
MFC after: 1 Week
|
208856 |
05-Jun-2010 |
rrs |
Spacing issues
MFC after: 1 Week
|
208855 |
05-Jun-2010 |
rrs |
This change does the following: 1) Fix the alignment of a comment. 2) Fix a BUG where we were NOT paying attention to the RESEND marking on retransmitting control chunks.. and worse we were not decrementing the retran count that could cause us to loop forever. 3) Add in the valdiate_no_lock function on invariants so that we will really check all ways out to be sure a lock does not slip out locked.
MFC after: 1 week.
|
208854 |
05-Jun-2010 |
rrs |
Use the proper increment macro when increasing the number on sent_queue_retran_cnt.
MFC after: 1 week
|
208853 |
05-Jun-2010 |
rrs |
This does two changes: 1) Makes it so that the INVARIANT function validate nolocks is available anywhere. 2) Fixes a BUG where a close has been done on a collision socket and the cookie processing would return leaving a lock held. MFC after: 1 week
|
208852 |
05-Jun-2010 |
rrs |
This fixes a bug in the close up of a socket that had un-accepted assoc's. Basically the assoc (and inp) would get stuck and never get cleaned up.
MFC after: 1 week
|
208744 |
02-Jun-2010 |
zec |
Virtualize the IPv4 multicast routing code.
Submitted by: iprebeg Reviewed by: bms, bz, Pavlin Radoslavov MFC after: 30 days
|
208553 |
25-May-2010 |
qingli |
This patch fixes the problem where proxy ARP entries cannot be added over the if_ng interface.
MFC after: 3 days
|
208160 |
16-May-2010 |
rrs |
This adds back the Iterator to the sctp code base. We now properly have ONE thread that services all VNET's. Also we purge out the old timer based iterator code which had multiple LOR's and other issues.
MFC after: 3 days
|
207985 |
12-May-2010 |
rrs |
Fix an old long time bug in generating a fwd-tsn. This would appear when greater than the size of mbuf TSN's would need to be skipped.
MFC after: 3 days
|
207983 |
12-May-2010 |
rrs |
More PR-SCTP bugs: - Make sure that when you kick the streams you add correctly using a 16 bit unsigned. - Make sure when sending out you allow FWD-TSN to skip over and list the ACKED chunks in the stream/seq list (so the rcv will kick the stream) MFC after: 3 days
|
207966 |
12-May-2010 |
tuexen |
Get rid of unused constants.
MFC after: 3 days.
|
207963 |
12-May-2010 |
rrs |
This fixes PR-SCTP issues: - Slide the map at the proper place. - Mark the bits in the nr_array ONLY if there is no marking. - When generating a FWD-TSN we allow us to skip past ACKED chunks too.
MFC after: 1 weeks
|
207924 |
11-May-2010 |
rrs |
This fixes a bug with the one-2-one model socket when a user sets up a socket to a server sends data and closes the socket before the server has called accept(). It used to NOT work at all. Now we add a flag to the assoc and defer assoc cleanup so that the accept will suceed.
|
207369 |
29-Apr-2010 |
bz |
MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network stack, like pim6_init(), icmp_init() or significantly shorten others like ip6_init() and nd6_init(), using static initialization again where possible and formerly missed.
Move (most) variables back to the place they used to be before the container structs and VIMAGE_GLOABLS (before r185088) and try to reduce the diff to stable/7 and earlier as good as possible, to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are no longer needed.
Reviewed by: jhb Discussed with: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH MFC after: 6 days
|
207277 |
27-Apr-2010 |
bz |
Enhance the historic behaviour of raw sockets and jails in a way that we allow all possible jail IPs as source address rather than forcing the "primary". While IPv6 naturally has source address selection, for legacy IP we do not go through the pain in case IP_HDRINCL was not set. People should bind(2) for that.
This will, for example, allow ping(|6) -S to work correctly for non-primary addresses.
Reported by: (ten 211.ru) Tested by: (ten 211.ru) MFC after: 4 days
|
207275 |
27-Apr-2010 |
bms |
Fix a regression where DVMRP diagnostic traffic, such as that used by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control traffic must always have a TTL of 1.
Submitted by: Matthew Luckie MFC after: 3 days
|
207197 |
25-Apr-2010 |
tuexen |
Sending a FWDTSN chunk should not affect the retran count.
MFC after: 3 days.
|
207191 |
25-Apr-2010 |
tuexen |
Undo my lastest fix since that wasn't one at all.
MFC after: 3 days.
|
207099 |
23-Apr-2010 |
tuexen |
* Fix compilation when using SCTP_AUDITING_ENABLED. * Fix delaying of SACK by taking out old optimization code which does not optimize anymore. * Fix fast retransmission of chunks abandoned by the "number of retransmissions" policy.
MFC after: 3 days.
|
206989 |
21-Apr-2010 |
bz |
Avoid memory access after free. Use the (shortend) copy for the ipsec mtu lookup as well.
PR: kern/145736 Submitted by: Peter Molnar (peter molnar.cc) MFC after: 3 days
|
206892 |
20-Apr-2010 |
tuexen |
Update highest_tsn variables when sliding mapping arrays.
|
206891 |
20-Apr-2010 |
tuexen |
Really print the nr_mapping array when it should be printed.`
MFC after: 3 days.
|
206845 |
19-Apr-2010 |
luigi |
whitespace fixes (trailing whitespace, bad indentation after a merge, etc.)
|
206844 |
19-Apr-2010 |
ken |
Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO. This was causing TSO to break for the Xen netfront driver.
Reviewed by: gibbs, rwatson MFC after: 7 days
|
206840 |
19-Apr-2010 |
tuexen |
Get delayed SACK working again.
MFC after: 3 days.
|
206758 |
17-Apr-2010 |
tuexen |
Fix a bug where SACKs are not sent when they should. Move some protection code to INVARIANTS. Cleanups.
MFC after: 3 days.
|
206481 |
11-Apr-2010 |
bz |
Plug reference leaks in the link-layer code ("new-arp") that previously prevented the link-layer entry from being freed.
In both in.c and in6.c (though that code path seems to be basically dead) plug a reference leak in case of a pending callout being drained.
In if_ether.c consistently add a reference before resetting the callout and in case we canceled a pending one remove the reference for that. In the final case in arptimer, before freeing the expired entry, remove the reference again and explicitly call callout_stop() to clear the active flag.
In nd6.c:nd6_free() we are only ever called from the callout function and thus need to remove the reference there as well before calling into llentry_free().
In if_llatbl.c when freeing entire tables make sure that in case we cancel a pending callout to remove the reference as well.
Reviewed by: qingli (earlier version) MFC after: 10 days Problem observed, patch tested by: simon on ipv6gw.f.o, Christian Kratzer (ck cksoft.de), Evgenii Davidov (dado korolev-net.ru) PR: kern/144564 Configurations still affected: with options FLOWTABLE
|
206461 |
10-Apr-2010 |
bz |
Try to help with a virtualized dummynet after r206428.
This adds the explicit include (so far probably included through one of the few "hidden" includes in other header files) for vnet.h and adds a cast to unbreak LINT-VIMAGE.
|
206456 |
10-Apr-2010 |
rpaulo |
Honor the CE bit even when the CWR bit is set.
PR: 145600 Submitted by: Richard Scheffenegger <rs at netapp.com> MFC after: 1 week
|
206452 |
10-Apr-2010 |
bms |
Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by pfSense developers exposed an issue where OpenOSPFD was using the IPv4 address to leave the OSPF link-scope multicast groups on a dynamic OpenVPN tun interface, rather than using RFC 3678 with the interface index, which won't be raced when the interface's addresses change.
In inp_join_group(): If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does not violate POLA for apps using the old API. It also stops us falling through to kicking IGMP state transactions in what is otherwise a no-op case. [This has already been dealt with in HEAD, but make it explicit before we MFC the change to 8.]
In inp_leave_group(): Fix a bogus conditional. Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case where it actually belongs. If an interface was specified, by primary IPv4 address, for ioctl IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation), then and only then should we look up the ifp from the IPv4 address in mreqs.imr_interface. If not, we fall through to imo_match_group() as before, but only in the IP_DROP_MEMBERSHIP case.
With these changes, the legacy 4.4BSD multicast API idempotence should be mostly preserved in the SSM enabled IPv4 stack.
Found by: ermal (with pfSense) MFC after: 3 days
|
206428 |
09-Apr-2010 |
luigi |
This commit enables partial operation of dummynet with kernels compiled with "options VIMAGE". As it is now, there is still a single instance of the pipes, and it is only usable from vnet0 (the main instance). Trying to use a pipe from a different vimage does not crash the system as it did before, but the traffic coming out from the pipe goes to the wrong place, and i still need to figure out where.
Support for per-vimage pipes is almost there (just a matter of uncommenting the VNET_* definitions for dn_cfg, plus putting into the structure the remaining static variables), however i need first to figure out how init/uninit work, and also to understand where packets are ending up on exit from a pipe.
In summary: vimage support for dummynet is not complete yet, but we are getting there.
|
206425 |
09-Apr-2010 |
luigi |
no need to pass an argument to dn_compat_calc_size()
MFC after: 3 days
|
206339 |
07-Apr-2010 |
luigi |
Hopefully fix the recent breakage in rule deletion. A few more tests and this will also go into -stable where the problem is more critical.
|
206281 |
06-Apr-2010 |
tuexen |
Fix a off-by-one bug in zeroing out the mapping arrays. Fix sctp_print_mapping_array().
MFC after: 1 week
|
206151 |
04-Apr-2010 |
tuexen |
Use also SCTP/IPv6 checksum offloading in special cases.
MFC after: 2 weeks
|
206137 |
03-Apr-2010 |
tuexen |
* Fix some race condition in SACK/NR-SACK processing. * Fix handling of mapping arrays when draining mbufs or processing FORWARD-TSN chunks. * Cleanup code (no duplicate code anymore for SACKs and NR-SACKs). Part of this code was developed together with rrs. MFC after: 2 weeks.
|
206022 |
31-Mar-2010 |
delphij |
Add definition of IPv6 mobility header's protocol number, as assigned by IANA and defined in RFC 3775.
Obtained from: KAME
|
205955 |
31-Mar-2010 |
luigi |
fix bug in previous commit related to rule deletion (stable/8 just fixed moments ago)
|
205831 |
29-Mar-2010 |
luigi |
remove a leftover debugging message
|
205830 |
29-Mar-2010 |
luigi |
Fix handling of set manipulations. This patch has two fixes for potential kernel panics (one wrong index, one access to the wrong lock) and two fixes to wrong logic in a conditional. The potential panics are also on stable/8, so I am going to MFC the fix quickly.
|
205629 |
24-Mar-2010 |
rrs |
Adds the option of keeping per-cpu statistics in SCTP. This may be useful since it gets rid of atomics but I want it to remain an option until I can do further testing on if it really speeds things up.
|
205628 |
24-Mar-2010 |
rrs |
lagging file I forgot to commit with my nr-sack fixes... opps
Reviewed by: tuexen@freebsd.org
|
205627 |
24-Mar-2010 |
rrs |
Fix for NR-Sack code. The code was NOT working properly when enabled. Basically most of the operations were incorrect causing bad sacks when you enabled nr-sack. The fixes range across 4 files and unifiy most of the processing so that we only test nr_sack flags to decide which type of sack to generate.
Optimization left for this is to combine the sack generation code and make it capable of generating either sack thus shrinking out a routine.
Reviewed by: tuexen@freebsd.org
|
205602 |
24-Mar-2010 |
luigi |
Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed. I forgot to handle this case when i did the mtag cleanup three months ago.
PR: 145004
|
205502 |
23-Mar-2010 |
rrs |
Fixes a bug where SACKs in the face of mapping_array expansion would break. Basically once we expanded the array we no longer had both mapping arrays in sync which the sack processing code depends on. This would mean we were randomly referring to memory that was probably not there. This mostly just gave us bad sack results going back to the peer. If INVARIENTS was on of course we would hit the panic routine in the sack_check call.
We also add a print routine for the place where one would panic in invarients so one can see what the main mapping array holds.
Reviewed by: tuexen@freebsd.org MFC after: 2 weeks
|
205488 |
22-Mar-2010 |
kmacy |
- boot-time size the ipv4 flowtable and the maximum number of flows - increase flow cleaning frequency and decrease flow caching time when near the flow limit - stop allocating new flows when within 3% of maxflows don't start allocating again until below 12.5%
MFC after: 7 days
|
205417 |
21-Mar-2010 |
luigi |
Add a priority-based packet scheduler.
Sponsored by: The ONELAB2 Project Submitted by: Riccardo Panicucci
|
205415 |
21-Mar-2010 |
luigi |
no need for ipfw_flush_tables(), we just need ipfw_destroy_tables()
|
205414 |
21-Mar-2010 |
luigi |
revise documentation
|
205391 |
20-Mar-2010 |
kmacy |
- spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1 - don't default to acquiring tcbinfo lock exclusively in rexmt
MFC after: 7 days
|
205251 |
17-Mar-2010 |
bz |
Add pcb reference counting to the pcblist sysctl handler functions to ensure type stability while caching the pcb pointers for the copyout.
Reviewed by: rwatson MFC after: 7 days
|
205178 |
15-Mar-2010 |
luigi |
small fixes to estimate the buffer size when requesting all pipes/flows.
|
205173 |
15-Mar-2010 |
luigi |
+ implement (two lines) the kernel side of 'lookup dscp N' to use the dscp as a search key in table lookups;
+ (re)implement a sysctl variable to control the expire frequency of pipes and queues when they become empty;
+ add 'queue number' as optional part of the flow_id. This can be enabled with the command
queue X config mask queue ...
and makes it possible to support priority-based schedulers, where packets should be grouped according to the priority and not some fields in the 5-tuple. This is implemented as follows: - redefine a field in the ipfw_flow_id (in sys/netinet/ip_fw.h) but without changing the size or shape of the structure, so there are no ABI changes. On passing, also document how other fields are used, and remove some useless assignments in ip_fw2.c
- implement small changes in the userland code to set/read the field;
- revise the functions in ip_dummynet.c to manipulate masks so they also handle the additional field;
There are no ABI changes in this commit.
|
205157 |
14-Mar-2010 |
rwatson |
Abstract out initialization of most aspects of struct inpcbinfo from their calling contexts in {IP divert, raw IP sockets, TCP, UDP} and create new helper functions: in_pcbinfo_init() and in_pcbinfo_destroy() to do this work in a central spot. As inpcbinfo becomes more complex due to ongoing work to add connection groups, this will reduce code duplication.
MFC after: 1 month Reviewed by: bz Sponsored by: Juniper Networks
|
205104 |
12-Mar-2010 |
rrs |
The proper fix for the delayed SCTP checksum is to have the delayed function take an argument as to the offset to the SCTP header. This allows it to work for V4 and V6. This of course means changing all callers of the function to either pass the header len, if they have it, or create it (ip_hl << 2 or sizeof(ip6_hdr)). PR: 144529 MFC after: 2 weeks
|
205066 |
12-Mar-2010 |
kmacy |
- restructure flowtable to support ipv6 - add a name argument to flowtable_alloc for printing with ddb commands - extend ddb commands to print destination address or 4-tuples - don't parse ports in ulp header if FL_HASH_ALL is not passed - add kern_flowtable_insert to enable more generic use of flowtable (e.g. system calls for adding entries) - don't hash loopback addresses - cleanup whitespace - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention - add sysctls to accumulate stats and report aggregate
MFC after: 7 days
|
205050 |
11-Mar-2010 |
luigi |
implement listing of a subset of pipes/queues/schedulers. The filtering of the output is done in the kernel instead of userland to reduce the amount of data transfered.
|
204954 |
10-Mar-2010 |
luigi |
fix handling of commands issued by RELENG_7 version of /sbin/ipfw,
Submitted by: Riccardo Panicucci
|
204902 |
09-Mar-2010 |
qingli |
One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to allow for connection load balancing across interfaces. Currently the address alias handling method is colliding with the ECMP code. For example, when two interfaces are configured on the same prefix, only one prefix route is installed. So connection load balancing among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the current code, is that the interface link-state is not reflected in the route entry. For example, if there are two interfaces on the same prefix, the cable on one interface is unplugged, new and existing connections should switch over to the other interface. This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes in the userland will always return an error even though the command is successfully executed.
MFC after: 5 days
|
204866 |
08-Mar-2010 |
luigi |
cosmetic changes and C++ compatibility
|
204865 |
08-Mar-2010 |
luigi |
don't use C++ keywords as variable names
|
204862 |
08-Mar-2010 |
luigi |
do not report an error unnecessarily
|
204838 |
07-Mar-2010 |
bz |
Destroy TCP UMA zones (empty or not) upon network stack teardown to not leak them, otherwise making UMA/vmstat unhappy with every stoped vnet. We will still leak pages (especially for zones marked NOFREE).
Reshuffle cleanup order in tcp_destroy() to get rid of what we can easily free first.
Sponsored by: ISPsystem Reviewed by: rwatson MFC after: 5 days
|
204837 |
07-Mar-2010 |
bz |
Not only flush the ipfw tables when unloading ipfw or tearing down a virtual netowrk stack, but also free the Radix Node Head.
Sponsored by: ISPsystem Reviewed by: julian MFC after: 5 days
|
204830 |
07-Mar-2010 |
rwatson |
Locking the tcbinfo structure should not be necessary in tcp_timer_delack(), so don't.
MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks
|
204829 |
07-Mar-2010 |
rwatson |
Add comment in tcp_discardcb() talking about how we don't, but should, address TCP races relating to not calling tcp_drain() on stopped callouts.
Discussed with: bz
|
204826 |
07-Mar-2010 |
rwatson |
Make udp_set_kernel_tunneling() less forgiving when its invariants are violated: so_pcb can never be NULL for a valid UDP socket, and it is always SOCK_DGRAM. Use sotoinpcb() as the rest of the UDP code does.
MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks
|
204810 |
06-Mar-2010 |
rwatson |
Remove unnecessary locking of divcbinfo lock from div_output(): this has not been required since FreeBSD 7.0 when the so_pcb pointer leading to inp was guaranteed to be stable when a valid socket reference is held (as it is in the output path).
MFC after: 1 week Reviewed by: bz Sponsored by: Juniper Networks
|
204809 |
06-Mar-2010 |
rwatson |
Add a comment to tcp_usr_accept() to indicate why it is we acquire the tcbinfo lock there: r175612, which re-added it, masked a race between sonewconn(2) and accept(2) that could allow an incompletely initialized address on a newly-created socket on a listen queue to be exposed. Full details can be found in that commit message.
MFC after: 1 week Sponsored by: Juniper Networks
|
204807 |
06-Mar-2010 |
bz |
Destroy UDP UMA zones (empty or not) upon network stack teardown to not leak them making the VM subsystem unhappy with every stoped vnet(*). We will still leak pages (especially as zones are marked NOFREE).
(*) This will also keep vmstat -z more usable.
Sponsored by: ISPsystem MFC after: 5 days
|
204806 |
06-Mar-2010 |
rwatson |
Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE() to match other pcbinfo locking macros.
MFC after: 1 week
|
204763 |
05-Mar-2010 |
luigi |
plug a memory leak on pipe's reconfiguration
|
204754 |
05-Mar-2010 |
luigi |
fix a memory leak when deleting RED queues
|
204736 |
04-Mar-2010 |
luigi |
portability fixes
|
204735 |
04-Mar-2010 |
luigi |
don't use keywords as variable names.
|
204714 |
04-Mar-2010 |
luigi |
use callout_drain() (outside the lock) when unloading the module. This prevents a potential deadlock.
Submitted by: Francesco Magno
|
204713 |
04-Mar-2010 |
luigi |
improve compatibility with RELENG_7.2
|
204591 |
02-Mar-2010 |
luigi |
Bring in the most recent version of ipfw and dummynet, developed and tested over the past two months in the ipfw3-head branch. This also happens to be the same code available in the Linux and Windows ports of ipfw and dummynet.
The major enhancement is a completely restructured version of dummynet, with support for different packet scheduling algorithms (loadable at runtime), faster queue/pipe lookup, and a much cleaner internal architecture and kernel/userland ABI which simplifies future extensions.
In addition to the existing schedulers (FIFO and WF2Q+), we include a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new, very fast version of WF2Q+ called QFQ.
Some test code is also present (in sys/netinet/ipfw/test) that lets you build and test schedulers in userland.
Also, we have added a compatibility layer that understands requests from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries, and replies correctly (at least, it does its best; sometimes you just cannot tell who sent the request and how to answer). The compatibility layer should make it possible to MFC this code in a relatively short time.
Some minor glitches (e.g. handling of ipfw set enable/disable, and a workaround for a bug in RELENG_7's /sbin/ipfw) will be fixed with separate commits.
CREDITS: This work has been partly supported by the ONELAB2 project, and mostly developed by Riccardo Panicucci and myself. The code for the qfq scheduler is mostly from Fabio Checconi, and Marta Carbone and Francesco Magno have helped with testing, debugging and some bug fixes.
|
204522 |
01-Mar-2010 |
joel |
The NetBSD Foundation has granted permission to remove clause 3 and 4 from their software.
Obtained from: NetBSD
|
204143 |
20-Feb-2010 |
bz |
Upon virtual network stack teardown properly release the TCP syncache resources.
Sponsored by: ISPsystem Reviewed by: rwatson MFC After: 5 days
|
204141 |
20-Feb-2010 |
tuexen |
Fix handling of SHUTDOWN-ACK chunk in COOKIE_WAIT and COOKIE_ECHOED.
MFC after: 1 week
|
204140 |
20-Feb-2010 |
bz |
Split up ip_drain() into an outer lock and iterator part and a "locked" version that will only handle a single network stack instance. The latter is called directly from ip_destroy().
Hook up an ip_destroy() function to release resources from the legacy IP network layer upon virtual network stack teardown.
Sponsored by: ISPsystem Reviewed by: rwatson MFC After: 5 days
|
204096 |
19-Feb-2010 |
tuexen |
* Fix another u_long -> uint32_t issue. * Remove an unused global variable. * Fix an issue reported by Bruce Cran related to reusing SCTP socket which where connected.
MFC after: 1 week
|
204068 |
18-Feb-2010 |
pjd |
No need to include security/mac/mac_framework.h here.
|
204040 |
18-Feb-2010 |
tuexen |
Use uint32_t instead of u_long.
MFC after: 1 week
|
204003 |
17-Feb-2010 |
luigi |
remove recursive lock/unlock calls, we do them already before entering the switch.
Reported by: Marta Carbone
|
203847 |
13-Feb-2010 |
tuexen |
Add missing SCTP_PACKED. Spotted by Irene Ruengeler.
MFC after: 1 week
|
203724 |
09-Feb-2010 |
bz |
Properly free resources when destroying the TCP hostcache while tearing down a network stack (in the VIMAGE jail+vnet case).
For that break out the logic from tcp_hc_purge() into an internal function we can call from both, the sysctl handler and the tcp_hc_destroy().
Sponsored by: ISPsystem Reviewed by: silby, lstewart MFC After: 8 days
|
203503 |
04-Feb-2010 |
tuexen |
Restore the checksum received before processing the packet.
MFC after: 1 week
|
203401 |
02-Feb-2010 |
qingli |
Some of the existing ppp and vpn related scripts create and set the IP addresses of the tunnel end points to the same value. In these cases the loopback route is not installed for the local end.
Verified by: avg MFC after: 5 days
|
203343 |
01-Feb-2010 |
luigi |
use u_char instead of u_int for short bitfields.
For our compiler the two constructs are completely equivalent, but some compilers (including MSC and tcc) use the base type for alignment, which in the cases touched here result in aligning the bitfields to 32 bit instead of the 8 bit that is meant here.
Note that almost all other headers where small bitfields are used have u_int8_t instead of u_int.
MFC after: 3 days
|
202782 |
22-Jan-2010 |
tuexen |
Use [] instead of [0] for flexible arrays.
Obtained from: Bruce Cran MFC after: 1 week
|
202526 |
17-Jan-2010 |
tuexen |
Get rid of a lot of duplicated code for NR-SACK handle. Generalize the SACK to code handle also NR-SACKs.
|
202523 |
17-Jan-2010 |
rrs |
Bug fix: If the allocation of a socket failed and we freed the inpcb, it was possible to not set the proper flags on the pcb (i.e. the socket is not there). This is HIGHLY unlikely since no one else should be able to find the socket.. but for consistency we do the proper loop thing to make sure that we mark the socket as gone on the PCB.
|
202521 |
17-Jan-2010 |
rrs |
Pulls out another leaked windows ifdef that somehow made its way through the scrubber.
|
202520 |
17-Jan-2010 |
rrs |
This change syncs up the socketAPI stream-reset values to match those in linux and the I-D just released to the IETF.
|
202518 |
17-Jan-2010 |
rrs |
More leaked ifdefs for APPLE and its mobility stuff.
|
202517 |
17-Jan-2010 |
rrs |
Remove another set of "leaked" ifdefs that somehow found their way into FreeBSD.
|
202516 |
17-Jan-2010 |
rrs |
Remove strange APPLE define that leaked through the scrubber scripts. Scripts are now fixed so this won't happen again.
|
202469 |
17-Jan-2010 |
bz |
Garbage collect references to the no longer implemented tcp_fasttimo().
Discussed with: rwatson MFC after: 5 days
|
202468 |
17-Jan-2010 |
bz |
Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control whether to use source address selection (default) or the primary jail address for unbound outgoing connections.
This is intended to be used by people upgrading from single-IP jails to multi-IP jails but not having to change firewall rules, application ACLs, ... but to force their connections (unless otherwise changed) to the primry jail IP they had been used for years, as well as for people prefering to implement similar policies.
Note that for IPv6, if configured incorrectly, this might lead to scope violations, which single-IPv6 jails could as well, as by the design of jails. [1]
Reviewed by: jamie, hrs (ipv6 part) Pointed out by: hrs [1] MFC After: 2 weeks Asked for by: Jase Thew (bazerka beardz.net)
|
202459 |
17-Jan-2010 |
ume |
Change 'me' to match any IPv6 address configured on an interface in the system as well as any IPv4 address.
Reviewed by: David Horn <dhorn2000__at__gmail.com>, luigi, qingli MFC after: 2 weeks
|
202449 |
16-Jan-2010 |
tuexen |
Get rid of support of an old version of the SCTP-AUTH draft. Get rid of unused MD5 code.
MFC after: 1 week
|
201811 |
08-Jan-2010 |
qingli |
Ensure an address is removed from the interface address list when the installation of that address fails.
PR: 139559
|
201801 |
08-Jan-2010 |
ru |
Complete the swap of carp(4) log levels and document the change.
MFC after: 3 days
|
201758 |
07-Jan-2010 |
mbr |
Remove extraneous semicolons, no functional changes.
Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
|
201745 |
07-Jan-2010 |
luigi |
we don't use dummynet_drain!
|
201740 |
07-Jan-2010 |
luigi |
check that we have an ipv4 packet before swapping ip_len and ip_off. This should fix the handling of ipv6 packets which i broke when i made ipfw operate on packets in network format.
Reported by: Hajimu UMEMOTO
|
201735 |
07-Jan-2010 |
luigi |
Following up on a request from Ermal Luci to make ip_divert work as a client of pf(4), make ip_divert not depend on ipfw.
This is achieved by moving to ip_var.h the struct ipfw_rule_ref (which is part of the mtag for all reinjected packets) and other declarations of global variables, and moving to raw_ip.c global variables for filter and divert hooks.
Note that names and locations could be made more generic (ipfw_rule_ref is really a generic reference robust to reconfigurations; the packet filter is not necessarily ipfw; filters and their clients are not necessarily limited to ipv4), but _right now_ most of this stuff works on ipfw and ipv4, so i don't feel like doing a gratuitous renaming, at least for the time being.
|
201732 |
07-Jan-2010 |
luigi |
some header shuffling to help decoupling ip_divert from ipfw
|
201722 |
07-Jan-2010 |
luigi |
put ip_len in correct order for ip_output(). This prevents a panic when ipfw generates packets on its own (such as reject or keepalives for dynamic rules).
Reported by: Chagin Dmitry
|
201568 |
05-Jan-2010 |
luigi |
this file does not require ip_dummynet.h
|
201544 |
05-Jan-2010 |
qingli |
An existing incomplete ARP entry would expire a subsequent statically configured entry of the same host. This bug was due to the expiration timer was not cancelled when installing the static entry. Since there exist a potential race condition with respect to timer cancellation, simply check for the LLE_STATIC bit inside the expiration function instead of cancelling the active timer.
MFC after: 5 days
|
201527 |
04-Jan-2010 |
luigi |
Various cleanup done in ipfw3-head branch including: - use a uniform mtag format for all packets that exit and re-enter the firewall in the middle of a rulechain. On reentry, all tags containing reinject info are renamed to MTAG_IPFW_RULE so the processing is simpler.
- make ipfw and dummynet use ip_len and ip_off in network format everywhere. Conversion is done only once instead of tracking the format in every place.
- use a macro FREE_PKT to dispose of mbufs. This eases portability.
On passing i also removed a few typos, staticise or localise variables, remove useless declarations and other minor things.
Overall the code shrinks a bit and is hopefully more readable.
I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr. For ng_ipfw i am actually waiting for feedback from glebius@ because we might have some small changes to make. For if_bridge and if_ethersubr feedback would be welcome (there are still some redundant parts in these two modules that I would like to remove, but first i need to check functionality).
|
201523 |
04-Jan-2010 |
tuexen |
Correct usage of parenthesis.
PR: kern/142066 Approved by: rrs (mentor) Obtained from: Henning Petersen, Bruce Cran. MFC after: 2 weeks
|
201416 |
03-Jan-2010 |
np |
Avoid NULL dereference in arpresolve.
|
201285 |
30-Dec-2009 |
qingli |
Consolidate the route message generation code for when address aliases were added or deleted. The announced route entry for an address alias is no longer empty because this empty route entry was causing some route daemon to fail and exit abnormally.
MFC after: 5 days
|
201282 |
30-Dec-2009 |
qingli |
The proxy arp entries could not be added into the system over the IFF_POINTOPOINT link types. The reason was due to the routing entry returned from the kernel covering the remote end is of an interface type that does not support ARP. This patch fixes this problem by providing a hint to the kernel routing code, which indicates the prefix route instead of the PPP host route should be returned to the caller. Since a host route to the local end point is also added into the routing table, and there could be multiple such instantiations due to multiple PPP links can be created with the same local end IP address, this patch also fixes the loopback route installation failure problem observed prior to this patch. The reference count of loopback route to local end would be either incremented or decremented. The first instantiation would create the entry and the last removal would delete the route entry.
MFC after: 5 days
|
201254 |
30-Dec-2009 |
syrinx |
Make sure the multicast forwarding cache entry's stall queue is properly initialized before trying to insert an entry into it.
PR: kern/142052 Reviewed by: bms MFC after: now
|
201150 |
29-Dec-2009 |
luigi |
we really need htonl() here, see the comment a few lines above in the code.
|
201145 |
28-Dec-2009 |
antoine |
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used.
PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
|
201141 |
28-Dec-2009 |
bz |
Make the compiler happy after r201125: - + remove two unnecessary initializations in ip_output; + + remove one unnecessary initializations in ip_output;
|
201131 |
28-Dec-2009 |
luigi |
introduce a local variable rte acting as a cache of ro->ro_rt within ip_output, achieving (in random order of importance): - a reduction of the number of 'r's in the source code; - improved legibility; - a reduction of 64 bytes in the .text
|
201125 |
28-Dec-2009 |
luigi |
+ remove an unused #define print_ip; + remove two unnecessary initializations in ip_output; + localize 'len'; + introduce a temporary variable n to count the number of fragments, the compiler seems unable to identify a common subexpression (written 3 times, used twice); + document some assumptions on ip_len and ip_hl
|
201124 |
28-Dec-2009 |
luigi |
bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects to find it there. Unfortunately this reintroduces the dependency on ip_fw_pfil.c
|
201122 |
28-Dec-2009 |
luigi |
bring in several cleanups tested in ipfw3-head branch, namely:
r201011 - move most of ng_ipfw.h into ip_fw_private.h, as this code is ipfw-specific. This removes a dependency on ng_ipfw.h from some files.
- move many equivalent definitions of direction (IN, OUT) for reinjected packets into ip_fw_private.h
- document the structure of the packet tags used for dummynet and netgraph;
r201049 - merge some common code to attach/detach hooks into a single function.
r201055 - remove some duplicated code in ip_fw_pfil. The input and output processing uses almost exactly the same code so there is no need to use two separate hooks. ip_fw_pfil.o goes from 2096 to 1382 bytes of .text
r201057 (see the svn log for full details) - macros to make the conversion of ip_len and ip_off between host and network format more explicit
r201113 (the remaining parts) - readability fixes -- put braces around some large for() blocks, localize variables so the compiler does not think they are uninitialized, do not insist on precise allocation size if we have more than we need.
r201119 - when doing a lookup, keys must be in big endian format because this is what the radix code expects (this fixes a bug in the recently-introduced 'lookup' option)
No ABI changes in this commit.
MFC after: 1 week
|
201121 |
28-Dec-2009 |
luigi |
readability fixes -- add braces on large blocks, remove unnecessary initializations
|
201120 |
28-Dec-2009 |
luigi |
explain details of operation of table lookups, and improve portability
|
201046 |
27-Dec-2009 |
luigi |
diverted packet must re-enter _after_ the matching rule, or we create loops. The divert cookie (that can be set from userland too) contains the matching rule nr, so we must start from nr+1.
Reported by: Joe Marcus Clarke
|
200951 |
24-Dec-2009 |
luigi |
fix poor indentation resulting from a merge
|
200909 |
23-Dec-2009 |
luigi |
mostly style changes, such as removal of trailing whitespace, reformatting to avoid unnecessary line breaks, small block restructuring to avoid unnecessary nesting, replace macros with function calls, etc.
As a side effect of code restructuring, this commit fixes one bug: previously, if a realloc() failed, memory was leaked. Now, the realloc is not there anymore, as we first count how much memory we need and then do a single malloc.
|
200897 |
23-Dec-2009 |
luigi |
fix build with the new fast lookup structure. Also remove some unnecessary headers
|
200896 |
23-Dec-2009 |
luigi |
fix build on 64-bit architectures. Also fix the indentation on a few lines.
|
200855 |
22-Dec-2009 |
luigi |
merge code from ipfw3-head to reduce contention on the ipfw lock and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from the upper half of the kernel. Some things, such as 'ipfw show', can be done holding this lock in read mode, whereas insert and delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces the 'next' chain currently used in ipfw rules. At the moment the map is a simple array (sorted by rule number and then rule_id), so we can find a rule quickly instead of having to scan the list. This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done by userland, we grab IPFW_UH_WLOCK, create a new copy of the map without blocking the bottom half of the kernel, then acquire IPFW_WLOCK and quickly update pointers to the map and related info. After dropping IPFW_LOCK we can then continue the cleanup protected by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc, but rather pass a <slot, chain_id, rulenum, rule_id> tuple. We validate the slot index (in the array of #2) with chain_id, and if successful do a O(1) dereference; otherwise, we can find the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned ------------------------------------------------------------------- + skipto X, non cached O(N) O(log N) + skipto X, cached O(1) O(1) XXX dynamic rule lookup O(1) O(log N) O(1) + skipto tablearg O(N) O(1) + reinject, non cached O(N) O(log N) + reinject, cached O(1) O(1) + kernel blocked during setsockopt() O(N) O(1) -------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli MFC after: 1 month
|
200847 |
22-Dec-2009 |
jhb |
- Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove the leading underscores since they are now implemented. - Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info structure.
Reviewed by: rwatson MFC after: 2 weeks
|
200838 |
22-Dec-2009 |
luigi |
some mostly cosmetic changes in preparation for upcoming work:
+ in many places, replace &V_layer3_chain with a local variable chain; + bring the counter of rules and static_len within ip_fw_chain replacing static variables; + remove some spurious comments and extern declaration; + document which lock protects certain data structures
|
200673 |
18-Dec-2009 |
ru |
Added proper attribution.
Requested by: luigi
|
200654 |
17-Dec-2009 |
luigi |
Add some experimental code to log traffic with tcpdump, similar to pflog(4). To use the feature, just put the 'log' options on rules you are interested in, e.g.
ipfw add 5000 count log ....
and run tcpdump -ni ipfw0 ...
net.inet.ip.fw.verbose=0 enables logging to ipfw0, net.inet.ip.fw.verbose=1 sends logging to syslog as before.
More features can be added, similar to pflog(), to store in the MAC header metadata such as rule numbers and actions. Manpage to come once features are settled.
|
200634 |
17-Dec-2009 |
luigi |
simplify and document lookup_next_rule()
|
200629 |
17-Dec-2009 |
luigi |
simplify the code that finds the next rule after reinjections
MFC after: 1 week
|
200610 |
16-Dec-2009 |
luigi |
remove a duplicate sysctl entry
|
200603 |
16-Dec-2009 |
luigi |
bring back a couple of #include that are supplied by nesting, and explain why they are used.
|
200601 |
16-Dec-2009 |
luigi |
Various cosmetic cleanup of the files: - move global variables around to reduce the scope and make them static if possible; - add an ipfw_ prefix to all public functions to prevent conflicts (the same should be done for variables); - try to pack variable declaration in an uniform way across files; - clarify some comments; - remove some misspelling of names (#define V_foo VNET(bar)) that slipped in due to cut&paste - remove duplicate static variables in different files;
MFC after: 1 month
|
200598 |
16-Dec-2009 |
imp |
Quick fix to make this compile: Remove redundant extern declearations. If the maintainer has a better fix, then feel free to back this out.
|
200590 |
15-Dec-2009 |
luigi |
more splitting of ip_fw2.c, now extract the 'table' routines and the sockopt routines (the upper half of the kernel).
Whoever is the author of the 'table' code (Ruslan/glebius/oleg ?) please change the attribution in ip_fw_table.c. I have copied the copyright line from ip_fw2.c but it carries my name and I have neither written nor designed the feature so I don't deserve the credit.
MFC after: 1 month
|
200580 |
15-Dec-2009 |
luigi |
Start splitting ip_fw2.c and ip_fw.h into smaller components. At this time we pull out from ip_fw2.c the logging functions, and support for dynamic rules, and move kernel-only stuff into netinet/ipfw/ip_fw_private.h
No ABI change involved in this commit, unless I made some mistake. ip_fw.h has changed, though not in the userland-visible part.
Files touched by this commit:
conf/files now references the two new source files
netinet/ip_fw.h remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.
netinet/ipfw/ip_fw_private.h new file with kernel-specific ipfw definitions
netinet/ipfw/ip_fw_log.c ipfw_log and related functions
netinet/ipfw/ip_fw_dynamic.c code related to dynamic rules
netinet/ipfw/ip_fw2.c removed the pieces that goes in the new files
netinet/ipfw/ip_fw_nat.c minor rearrangement to remove LOOKUP_NAT from the main headers. This require a new function pointer.
A bunch of other kernel files that included netinet/ip_fw.h now require netinet/ipfw/ip_fw_private.h as well. Not 100% sure i caught all of them.
MFC after: 1 month
|
200567 |
15-Dec-2009 |
luigi |
implement a new match option,
lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N
which searches the specified field in table N and sets tablearg accordingly. With dst-ip or src-ip the option replicates two existing options. When used with other arguments, the option can be useful to quickly dispatch traffic based on other fields.
Work supported by the Onelab project.
MFC after: 1 week
|
200473 |
13-Dec-2009 |
bz |
Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error.
Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack.
We cannot change the classic jailed() call to do that, as it is used outside the network stack as well.
Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days
|
200361 |
10-Dec-2009 |
luigi |
use div64 when converting back the burst value for userland
|
200360 |
10-Dec-2009 |
luigi |
when draining a flowset free the entire chain, not just one packet.
|
200358 |
10-Dec-2009 |
luigi |
centralize the code to free a packet (or a chain) while in dummynet. Remove an old macro and its stale comment.
|
200170 |
05-Dec-2009 |
oleg |
Fix burst processing for WF2Q pipes - do not increase available burst size unless pipe is idle. This should fix follwing issues: - 'dummynet: OUCH! pipe should have been idle!' log messages. - exceeding configured pipe bandwidth.
MFC after: 1 week
|
200118 |
05-Dec-2009 |
luigi |
adjust comment in previous commit after Julian's explanation
|
200116 |
05-Dec-2009 |
luigi |
remove a dead block of code, document how the ipfw clients are hooked and the difference in handling the 'enable' variable for layer2 and layer3. The latter needs fixing once i figure out how it worked pre-vnet.
MFC after: 7 days
|
200113 |
05-Dec-2009 |
luigi |
fix build with VNET enabled
Reported by: David Wolfskill
|
200102 |
04-Dec-2009 |
ume |
Use INET_ADDRSTRLEN and INET6_ADDRSTRLEN rather than hard coded number.
Spotted by: bz
|
200059 |
03-Dec-2009 |
luigi |
preparation work to replace the monster switch in ipfw_chk() with table of functions.
This commit (which is heavily based on work done by Marta Carbone in this year's GSOC project), removes the goto's and explicit return from the inner switch(), so we will have a easier time when putting the blocks into individual functions.
MFC after: 3 weeks
|
200055 |
03-Dec-2009 |
ume |
Teach an IPv6 to the debug prints.
|
200040 |
02-Dec-2009 |
luigi |
- initialize src_ip in the main loop to prevent a compiler warning (gcc 4.x under linux, not sure how real is the complaint). - rename a macro argument to prevent name clashes. - add the macro name on a couple of #endif - add a blank line for readability.
MFC after: 3 days
|
200034 |
02-Dec-2009 |
luigi |
Dispatch sockopt calls to ipfw and dummynet using the new option numbers, IP_FW3 and IP_DUMMYNET3. Right now the modules return an error if called with those arguments so there is no danger of unwanted behaviour.
MFC after: 3 days
|
200029 |
02-Dec-2009 |
luigi |
small changes for portability and diff reduction wrt/ FreeBSD 7. No functional differences.
- use the div64() macro to wrap 64 bit divisions (which almost always are 64 / 32 bits) so they are easier to handle with compilers or OS that do not have native support for 64bit divisions;
- use a local variable for p_numbytes even if not strictly necessary on HEAD, as it reduces diffs with FreeBSD7
- in dummynet_send() check that a tag is present before dereferencing the pointer.
- add a couple of blank lines for readability near the end of a function
MFC after: 3 days
|
200027 |
02-Dec-2009 |
ume |
Teach an IPv6 to send_pkt() and ipfw_tick(). It fixes the issue which keep-alive doesn't work for an IPv6.
PR: kern/117234 Submitted by: mlaier, Joost Bekkers <joost__at__jodocus.org> MFC after: 1 month
|
200026 |
02-Dec-2009 |
glebius |
Until this moment carp(4) used a strange logging priority. It used debug priority for such important information as MASTER/BACKUP state change, and used a normal logging priority for such innocent messages as receiving short packet (which is a normal VRRP packet between some other routers) or receving a CARP packet on non-carp interface (someone else running CARP).
This commit shifts message logging priorities to a more sane default.
|
200023 |
02-Dec-2009 |
luigi |
Add new sockopt names for ipfw and dummynet.
This commit is just grabbing entries for the new names that will be used in the future, so you don't need to rebuild anything now.
MFC after: 3 days
|
200020 |
02-Dec-2009 |
luigi |
change the type of the opcode from enum *:8 to u_int8_t so the size and alignment of the ipfw_insn is not compiler dependent. No changes in the code generated by gcc.
There was only one instance of this kind in our entire source tree, so i suspect the old definition was a poor choice (which i made).
MFC after: 3 days
|
199866 |
27-Nov-2009 |
tuexen |
Use the default stack size for the iterator thread. This fixes a crash reported by Irene Ruengeler.
Approved by: rrs (mentor) MFC after: 1 month
|
199525 |
19-Nov-2009 |
bms |
Correct a comment.
MFC after: 1 day
|
199477 |
18-Nov-2009 |
tuexen |
Fix a bug where the system panics when a SHUTDOWN is received with an illegal TSN.
Approved by: rrs (mentor) MFC after: ASAP
|
199459 |
17-Nov-2009 |
tuexen |
Get rid of unused fields addr_over which is never really used, only copied around.
Approved by: rrs (mentor)
|
199437 |
17-Nov-2009 |
tuexen |
Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, which is defined as LIST_EMPTY.
Approved by: rrs (mentor) MFC after: 1 month
|
199374 |
17-Nov-2009 |
tuexen |
Fix a bug where queued ASCONF messags are not sent out.
Approved by: rrs (mentor) Obtained from: Irene Ruengeler MFC after: 1 month
|
199373 |
17-Nov-2009 |
tuexen |
Fix a memory leak when destroying an SCTP stack. Clean up sctp_pcb_finish(). Approved by: rrs (mentor) MFC after: 1 month
|
199372 |
17-Nov-2009 |
tuexen |
Do not start the iterator when there are no associations. This fixes a bug found by Irene Ruengeler.
Approved by: rrs (mentor) MFC after: 1 month
|
199371 |
17-Nov-2009 |
tuexen |
Disable (temporary) the thread based interator. It does not work with vnet.
Approved by: rrs (mentor)
|
199370 |
17-Nov-2009 |
tuexen |
Allow the UMA to free data. This resolves the UMA related bug reported by Julian.
Approved by: rrs (mentor) MFC after: 1 month
|
199369 |
17-Nov-2009 |
tuexen |
Do not hold the lock longer than necessary.
Approved by: rrs (mentor) MFC after: 1 month
|
199287 |
15-Nov-2009 |
bms |
Fix a functional regression in multicast.
Userland daemons need to see IGMP traffic regardless of the group; omit the imo filter check if the proto is IGMP. The kernel part of IGMP will have already filtered appropriately at this point.
MFC after: ASAP Submitted by: Franz Struwig Reported by: Ivor Prebeg, Franz Struwig
|
199208 |
12-Nov-2009 |
attilio |
Move inet_aton() (specular to inet_ntoa(), already present in libkern) into libkern in order to made it usable by other modules than alias_proxy.
Obtained from: Sandvine Incorporated Sponsored by: Sandvine Incorporated MFC: 1 week
|
199102 |
09-Nov-2009 |
trasz |
Remove ifdefed out part of code, which seems to have originated a decade ago in OpenBSD. As it is now, there is no way for this to be useful, since IPsec is free to forward packets via whatever interface it wants, so checking capabilities of the interface passed from ip_output (fetched from the routing table) serves no purpose.
Discussed with: sam@
|
199073 |
09-Nov-2009 |
oleg |
style(9): add missing parentheses
|
198990 |
06-Nov-2009 |
jhb |
Several years ago a feature was added to TCP that casued soreceive() to send an ACK right away if data was drained from a TCP socket that had previously advertised a zero-sized window. The current code requires the receive window to be exactly zero for this to kick in. If window scaling is enabled and the window is smaller than the scale, then the effective window that is advertised is zero. However, in that case the zero-sized window handling is not enabled because the window is not exactly zero. The fix changes the code to check the raw window value against zero.
Reviewed by: bz MFC after: 1 week
|
198845 |
03-Nov-2009 |
oleg |
Fix two issues that can lead to exceeding configured pipe bandwidth: - do not expire queues which are not ready to be expired. - properly calculate available burst size.
MFC after: 3 days
|
198621 |
29-Oct-2009 |
tuexen |
Improve round robin stream scheduler and cleanup some code.
Approved by: rrs (mentor) MFC after: 3 days
|
198539 |
28-Oct-2009 |
brueffer |
Close a stream file descriptor leak.
PR: 138130 Submitted by: Patroklos Argyroudis <argp@census-labs.com> MFC after: 1 week
|
198522 |
27-Oct-2009 |
tuexen |
Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke.
Approved by: rrs (mentor) MFC after: 3 days
|
198499 |
26-Oct-2009 |
tuexen |
Improve the round robin stream scheduler.
Approved by: rrs (mentor) MFC after: 3 days
|
198438 |
24-Oct-2009 |
rwatson |
Correct spelling typo in ip_input comment.
Pointed out by: N.J. Mann <njm at njm.me.uk>, John Nielsen <john at jnielsen.net>, julian (!), lstewart MFC after: 2 days
|
198418 |
23-Oct-2009 |
qingli |
Use the correct option name in the preprocessor command to enable or disable diagnostic messages.
Reviewed by: ru MFC after: 3 days
|
198393 |
23-Oct-2009 |
rwatson |
Improve grammar in ip_input comment while attempting to maintain what might be its meaning.
MFC after: 3 days
|
198301 |
20-Oct-2009 |
qingli |
In the ARP callout timer expiration function, the current time_second is compared against the entry expiration time value (that was set based on time_second) to check if the current time is larger than the set expiration time. Due to the +/- timer granularity value, the comparison returns false, causing the alternative code to be executed. The alternative code path freed the memory without removing that entry from the table list, causing a use-after-free bug.
Reviewed by: discussed with kmacy MFC after: immediately Verified by: rnoland, yongari
|
198196 |
18-Oct-2009 |
rwatson |
Rewrap ip_input() comment so that it prints more nicely.
MFC after: 3 days
|
198111 |
15-Oct-2009 |
qingli |
This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete ARP entry was timing out too quickly (1 second timeout), as such, a new entry is created each time arpresolve() is called. Therefore the maximum attempts made is always 1. Consequently the error code returned to the application is always 0. 2. Set the expiration of each incomplete entry to a 20-second lifetime. 3. Return "incomplete" entries to the application.
Reviewed by: kmacy MFC after: 3 days
|
198050 |
13-Oct-2009 |
bz |
Compare pointer to NULL rather than 0.
MFC after: 1 month
|
197955 |
11-Oct-2009 |
tuexen |
Fix a race condition where a mutex was destroyed while sleeping on it. Found while analyzing a report from julian. It might fix his bug. Approved by: rrs (mentor) MFC after: 3 days
|
197952 |
11-Oct-2009 |
julian |
Virtualize the pfil hooks so that different jails may chose different packet filters. ALso allows ipfw to be enabled on on ejail and disabled on another. In 8.0 it's a global setting.
Sitting aroung in tree waiting to commit for: 2 months MFC after: 2 months
|
197929 |
10-Oct-2009 |
tuexen |
Correct include order as indicated by bz.
Approved by: re (mentor) MFC after: 3 days
|
197914 |
09-Oct-2009 |
tuexen |
Do not include vnet.h twice.
Approved by: rrs (mentor) MFC after: 3 days
|
197868 |
08-Oct-2009 |
tuexen |
Use correct arguments when calling SCTP_RTALLOC().
Approved by: rrs (mentor) MFC after: 0 days
|
197856 |
08-Oct-2009 |
rrs |
Fix so that round robing stream scheduling works as advertised
MFC after: 0 days
|
197814 |
06-Oct-2009 |
rwatson |
Remove tcp_input lock statistics; these are intended for debugging only and are not intended to ship in 8.0 as they dirty additional cache lines in a performance-critical per-packet path.
MFC after: 3 days
|
197795 |
05-Oct-2009 |
rwatson |
In tcp_input(), we acquire a global write lock at first only if a segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN). If we later have to upgrade the lock, we acquire an inpcb reference and drop both global/inpcb locks before reacquiring in-order. In that gap, the connection may transition into TIMEWAIT, so we need to loop back and reevaluate the inpcb after relocking.
MFC after: 3 days Reported by: Kamigishi Rei <spambox at haruhiism.net> Reviewed by: bz
|
197696 |
02-Oct-2009 |
qingli |
Remove a log message from production code. This log message can be triggered by a misconfigured host that is sending out gratuious ARPs. This log message can also be triggered during a network renumbering event when multiple prefixes co-exist on a single network segment.
MFC after: immediately
|
197695 |
02-Oct-2009 |
qingli |
Previously, if an address alias is configured on an interface, and this address alias has a prefix matching that of another address configured on the same interface, then the ARP entry for the alias is not deleted from the ARP table when that address alias is removed. This patch fixes the aforementioned issue.
PR: kern/139113 MFC after: 3 days
|
197342 |
20-Sep-2009 |
tuexen |
Fix handling of sctp_drain().
Approved by: rrs (mentor) MFC after: 2 month
|
197341 |
20-Sep-2009 |
tuexen |
Fix errnos.
Approved by: rrs(mentor) MFC after: 3 days.
|
197328 |
19-Sep-2009 |
tuexen |
Use appropriate locking when using interface list.
Approved by: rrs (mentor) MFC after: 1 month.
|
197327 |
19-Sep-2009 |
tuexen |
Fix the disabling of sctp_drain().
Approved by: rrs (mentor) MFC after: 1 month.
|
197326 |
19-Sep-2009 |
tuexen |
Get SCTP working in combination with VIMAGE. Contains code from bz. Approved by: rrs (mentor) MFC after: 1 month.
|
197314 |
18-Sep-2009 |
bms |
Return ENOBUFS consistently if user attempts to exceed in_mcast_maxsocksrc resource limit.
Submitted by: syrinx MFC after: 3 days
|
197288 |
17-Sep-2009 |
rrs |
Support for VNET in SCTP (hopefully)
|
197257 |
16-Sep-2009 |
tuexen |
Fix a bug reported by Daniel Mentz: When authenticating DATA chunks some DATA chunks might get stuck when the MTU gets decreased via an ICMP message.
Approved by: rrs (mentor) MFC after: immediately
|
197244 |
16-Sep-2009 |
silby |
Add the ability to see TCP timers via netstat -x. This can be a useful feature when you have a seemingly stuck socket and want to figure out why it has not been closed yet.
No plans to MFC this, as it changes the netstat sysctl ABI.
Reviewed by: andre, rwatson, Eric Van Gyzen
|
197236 |
15-Sep-2009 |
andre |
-Put the optimized soreceive_stream() under a compile time option called TCP_SORECEIVE_STREAM for the time being.
Requested by: brooks
Once compiled in make it easily switchable for testers by using a tuneable net.inet.tcp.soreceive_stream and a corresponding read-only sysctl to report the current state.
Suggested by: rwatson
MFC after: 2 days -This line, and those below, will be ignored-- > Description of fields to fill in above: 76 columns --| > PR: If a GNATS PR is affected by the change. > Submitted by: If someone else sent in the change. > Reviewed by: If someone else reviewed your modification. > Approved by: If you needed approval for this commit. > Obtained from: If the change is from a third party. > MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email. > Security: Vulnerability reference (one per line) or description. > Empty fields above will be automatically removed.
M sys/conf/options M sys/kern/uipc_socket.c M sys/netinet/tcp_subr.c M sys/netinet/tcp_usrreq.c
|
197227 |
15-Sep-2009 |
qingli |
Self pointing routes are installed for configured interface addresses and address aliases. After an interface is brought down and brought back up again, those self pointing routes disappeared. This patch ensures after an interface is brought back up, the loopback routes are reinstalled properly.
Reviewed by: bz MFC after: immediately
|
197225 |
15-Sep-2009 |
qingli |
This patch enables the node to respond to ARP requests for configured proxy ARP entries.
Reviewed by: bz MFC after: immediately
|
197210 |
15-Sep-2009 |
qingli |
The bootp code installs an interface address and the nfs client module tries to install the same address again. This extra code is removed, which was discovered by the removal of a call to in_ifscrub() in r196714. This call to in_ifscrub is put back here because the SIOCAIFADDR command can be used to change the prefix length of an existing alias.
Reviewed by: kmacy
|
197203 |
14-Sep-2009 |
qingli |
Previously local end of point-to-point interface is not reachable within the system that owns the interface. Packets destined to the local end point leak to the wire towards the default gateway if one exists. This behavior is changed as part of the L2/L3 rewrite efforts. The local end point is now reachable within the system. The inpcb code needs to consider this fact during the address selection process.
Reviewed by: bz MFC after: immediately
|
197173 |
13-Sep-2009 |
rrs |
Fixes two bugs: 1) A lock issue, if we ever had to try again we would double lock the INP lock. 2) We were allowing (at wrap) associd 0... which really we cannot allow since 0 normally means in most socket API calls that we are wishing to effect something on the INP not TCB.
MFC after: 1 week
|
197148 |
13-Sep-2009 |
bms |
In expire_mfc(), add an assert on the multicast forwarding cache mutex.
PR: 138666
|
197136 |
12-Sep-2009 |
bms |
Comment some flawed assumptions in inp_join_group() about mixing SSM full-state and delta-based APIs.
ENOTIME to fix right now. No functional changes.
MFC after: 5 days
|
197135 |
12-Sep-2009 |
bms |
Don't allow joins w/o source on an existing group. This is almost always pilot error.
We don't need to check for group filter UNDEFINED state at t1, because we only ever allocate filters with their groups, so we unconditionally reject such calls with EINVAL. Trying to change the active filter mode w/o going through IP_MSFILTER is also disallowed.
Deals with the case described in PR 137164 upfront, cumulative with the fix in svn rev 197132 which only calls imo_match_source() if the source address family was not unspecified.
PR: 137164 MFC after: 5 days
|
197132 |
12-Sep-2009 |
bms |
Tighten input checking in inp_join_group(): * Don't try to use the source address, when its family is unspecified. * If we get a join without a source, on an existing inclusive mode group, this is an error, as it would change the filter mode.
Fix a problem with the handling of in_mfilter for new memberships: * Do not rely on imf being NULL; it is explicitly initialized to a non-NULL pointer when constructing a membership. * Explicitly initialize *imf to EX mode when the source address is unspecified.
This fixes a problem with in_mfilter slot recycling in the join path.
PR: 138690 Submitted by: Stef Walter MFC after: 5 days
|
197130 |
12-Sep-2009 |
bms |
Fix an obvious logic error in the IPv4 multicast leave processing, where the filter mode vector was not updated correctly after the leave.
PR: 138691 Submitted by: Stef Walter MFC after: 5 days
|
197129 |
12-Sep-2009 |
bms |
Fix an API issue in leave processing for IPv4 multicast groups. * Do not assume that the group lookup performed by imo_match_group() is valid when ifp is NULL in this case. * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the membership we are being asked to leave.
Caveat user: * The way IPv4 multicast memberships are implemented in the inpcb layer at the moment, has the side-effect that struct ip_moptions will still hold the membership, under the old ifp, until ip_freemoptions() is called for the parent inpcb. * The underlying issue is: the inpcb layer does not get notification of ifp being detached going away in a thread-safe manner. This is non-trivial to fix.
But hey, at least the kernel should't panic when you unplug a card.
PR: 138689 Submitted by: Stef Walter MFC after: 5 days
|
196995 |
08-Sep-2009 |
np |
Add arp_update_event. This replaces route_arp_update_event, which has not worked since the arp-v2 rewrite.
The event handler will be called with the llentry write-locked and can examine la_flags to determine whether the entry is being added or removed.
Reviewed by: gnn, kmacy Approved by: gnn (mentor) MFC after: 1 month
|
196967 |
08-Sep-2009 |
phk |
Move the duplicate definition of struct sockaddr_storage to its own include file, and include this where the previous duplicate definitions were.
Static program checkers like FlexeLint rightfully take a dim view of duplicate definitions, even if they currently are identical.
|
196932 |
07-Sep-2009 |
syrinx |
When joining a multicast group, the inp_lookup_mcast_ifp call does a KASSERT that the group address is multicast, so the check if this is indeed true and eventually return a EINVAL if not, should be done before calling inp_lookup_mcast_ifp. This fixes a kernel crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...) with invalid group address.
Reviewed by: bms Approved by: bms
MFC after: 3 days
|
196881 |
06-Sep-2009 |
pjd |
Correct comment.
|
196797 |
03-Sep-2009 |
gnn |
Add ARP statistics to the kernel and netstat.
New counters now exist for: requests sent replies sent requests received replies received packets received total packets dropped due to no ARP entry entrys timed out Duplicate IPs seen
The new statistics are seen in the netstat command when it is given the -s command line switch.
MFC after: 2 weeks In collaboration with: bz
|
196738 |
01-Sep-2009 |
bz |
In case an upper layer protocol tries to send a packet but the L2 code does not have the ethernet address for the destination within the broadcast domain in the table, we remember the original mbuf in `la_hold' in arpresolve() and send out a different packet with an arp request. In case there will be more upper layer packets to send we will free an earlier one held in `la_hold' and queue the new one.
Once we get a packet in, with which we can perfect our arp table entry we send out the original 'on hold' packet, should there be any. Rather than continuing to process the packet that we received, we returned without freeing the packet that came in, which basically means that we leaked an mbuf for every arp request we sent.
Rather than freeing the received packet and returning, continue to process the incoming arp packet as well. This should (a) improve some setups, also proxy-arp, in case it was an incoming arp request and (b) resembles the behaviour FreeBSD had from day 1, which alignes with RFC826 "Packet reception" (merge case).
Rename 'm0' to 'hold' to make the code more understandable as well as diffable to earlier versions more easily.
Handle the link-layer entry 'la' lock comepletely in the block where needed and release it as early as possible, rather than holding it longer, down to the end of the function.
Found by: pointyhat, ns1 Bug hunting session with: erwin, simon, rwatson Tested by: simon on cluster machines Reviewed by: ratson, kmacy, julian MFC after: 3 days
|
196714 |
31-Aug-2009 |
qingli |
This patch fixes the following issues:
- Routing messages are not generated when adding and removing interface address aliases. - Loopback route installed for an interface address alias is not deleted from the routing table when that address alias is removed from the associated interface. - Function in_ifscrub() is called extraneously.
Reviewed by: gnn, kmacy, sam MFC after: 3 days
|
196610 |
28-Aug-2009 |
tuexen |
Fix a bug where vlan interfaces are not supported by SCTP.
Approved by: rrs (mentor) MFC after: 3 days
|
196608 |
28-Aug-2009 |
qingli |
Do not try to free the rt_lle entry of the cached route in ip_output() if the cached route was not initialized from the flow-table. The rt_lle entry is invalid unless it has been initialized through the flow-table.
Reviewed by: kmacy, rwatson MFC after: immediately
|
196535 |
25-Aug-2009 |
rwatson |
Use locks specific to the lltable code, rather than borrow the ifnet list/index locks, to protect link layer address tables. This avoids lock order issues during interface teardown, but maintains the bug that sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy MFC after: 3 days
|
196509 |
24-Aug-2009 |
tuexen |
This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT was not honored, if the socket buffer size was not 4 times that large.
Approved by: rrs (mentor) MFC after: 3 days.
|
196507 |
24-Aug-2009 |
rrs |
This fixes two bugs in the NR-Sack code: 1) When calculating the table offset for sliding the sack array, the two byte values must be "ored" together in order for us to do the correct sliding of the arrays. 2) We were NOT properly doing CC and other changes to things only NR-Sacked. The solution here is to make a separate function that will actually do both CC/updates and free things if its NR sack'd. This actually shrinks out common code from three places (much better).
MFC after: 3 days
|
196502 |
24-Aug-2009 |
zec |
Introduce a div_destroy() function which takes over per-vnet cleanup tasks from the existing modevent / MOD_UNLOAD handler, and register div_destroy() in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds. In nooptions VIMAGE builds, div_destroy() will be invoked from the modevent handler, resulting in effectively identical operation as it was prior this change. div_destroy() also tears down hashtables used by ipdivert, which were previously left behind on ipdivert kldunloads.
For options VIMAGE builds only, temporarily disable kldunloading of ipdivert, because without introducing additional locking logic it is impossible to atomically check whether all ipdivert instances in all vnets are idle, and proceed with cleanup without opening a race window for a vnet to open an ipdivert socket while ipdivert tear-down is in progress.
While here, staticize div_init(), because it is not used outside of ip_divert.c.
In cooperation with: julian Approved by: re (rwatson), julian (mentor) MFC after: 3 days
|
196481 |
23-Aug-2009 |
rwatson |
Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts.
Reviewed by: bz, julian MFC after: 3 days
|
196453 |
23-Aug-2009 |
julian |
Fix another typo right next to the previous one, that amazingly, I did not see before.
MFC after: 1 week
|
196451 |
23-Aug-2009 |
julian |
Fix typo in comment that has been bugging me for days.
MFC after: 1 week
|
196423 |
21-Aug-2009 |
julian |
Fix ipfw's initialization functions to get the correct order of evaluation to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers instead of the modevent handler. Correct some spelling errors in comments in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when ipfw is unloaded.
This patch is a minimal patch for 8.0 I have a much larger patch that actually fixes the underlying problems that will be applied after 8.0
Reviewed by: zec@, rwatson@, bz@(earlier version) Approved by: re (rwatson) MFC after: Immediatly
|
196410 |
20-Aug-2009 |
peter |
Fix signed comparison bug when ticks goes negative after 24 days of uptime. This causes the tcp time_wait state code to fail to expire sockets in timewait state.
Approved by: re (kensmith)
|
196397 |
20-Aug-2009 |
will |
Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when CARP tries to free them using M_IFADDR after the last address for a virtual host is removed and when detaching from the parent interface.
Reviewed by: mlaier Approved by: re (kib), ken (mentor)
|
196376 |
19-Aug-2009 |
tuexen |
Fix a bug in the handling of unreliable messages which results in stalled associations.
Approved by: re, rrs (mentor) MFC after: immediately
|
196368 |
18-Aug-2009 |
kmacy |
- change the interface to flowtable_lookup so that we don't rely on the mbuf for obtaining the fib index - check that a cached flow corresponds to the same fib index as the packet for which we are doing the lookup - at interface detach time flush any flows referencing stale rtentrys associated with the interface that is going away (fixes reported panics) - reduce the time between cleans in case the cleaner is running at the time the eventhandler is called and the wakeup is missed less time will elapse before the eventhandler returns - separate per-vnet initialization from global initialization (pointed out by jeli@)
Reviewed by: sam@ Approved by: re@
|
196364 |
18-Aug-2009 |
tuexen |
Fix a crash when using one-to-one stlye socket in non-blocking mode and there is no listening server. PR: 137795 Approved by: re, rrs (mentor) MFC after:immediately.
|
196322 |
17-Aug-2009 |
jhb |
Purge mergeinfo in sys/ that is either empty or a subset of the parent mergeinfo on sys/ itself.
Approved by: re (mergeinfo blanket)
|
196260 |
15-Aug-2009 |
tuexen |
* Fix a bug where PR-SCTP settings are ignore when using implicit association setup. * Fix a bug where message with illegal stream ids are not deleted. * Fix a crash when reporting back unsent messages from the send_queue. * Fix a bug related to INIT retransmission when the socket is already closed. * Fix a bug where associations were stalled when partial delivery API was enabled. * Fix a bug where the receive buffer size was smaller than the partial_delivery_point.
Approved by: re, rrs (mentor) MFC after: One day.
|
196234 |
14-Aug-2009 |
qingli |
In function ip_output(), the cached route is flushed when there is a mismatch between the cached entry and the intended destination. The cached rtentry{} is flushed but the associated llentry{} is not. This causes the wrong destination MAC address being used in the output packets. The fix is to flush the llentry{} when rtentry{} is cleared.
Reviewed by: kmacy, rwatson Approved by: re
|
196229 |
14-Aug-2009 |
zec |
SCTP is not yet compatible with options VIMAGE kernels although it compiles with VIMAGE defined, so explicitly disallow building such kernels.
Reviewed by: rrs Approved by: re (rwatson), julian (mentor)
|
196201 |
14-Aug-2009 |
julian |
Fix ipfw crash on uid or gid check. Receiving any ip packet for which there is no existing socket will crash if ipfw has a uid or gid test rule, as the uid/gid of the non existent owner of said non existent socket is tested. Brooks introduced this error as part of his >16 gids patch. It appears to be a cut-n-paste error from similar code a few lines before. The old code used the 'pcb' variable here, but in the new code that switched the 'inp' variable, which is often NULL and what is tested in the code further up. The rest of the multi-gid patch for ipfw seems solid (and cleaner than previous code).
Reviewed by: brooks Approved by: re (rwatson)
|
196041 |
02-Aug-2009 |
rwatson |
Add padding to struct inpcb, missed during our padding sweep earlier in the release cycle.
Approved by: re (kensmith)
|
196039 |
02-Aug-2009 |
rwatson |
Many network stack subsystems use a single global data structure to hold all pertinent statatistics for the subsystem. These structures are sometimes "borrowed" by kernel modules that require a place to store statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel modules so that they no longer encode certain specifics of how the data structures are named and stored. This change is intended to make it easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge if_cxgb if_gif ip_mroute ipdivert pf
In practice, most of these statistics consumers should, in fact, maintain their own statistics data structures rather than borrowing structures from the base network stack. However, that change is too agressive for this point in the release cycle.
Reviewed by: bz Approved by: re (kib)
|
196019 |
01-Aug-2009 |
rwatson |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes.
Reviewed by: bz Approved by: re (vimage blanket)
|
195976 |
30-Jul-2009 |
delphij |
Show interface name which received short CARP packet (e.g. a VRRP packet), in order to match other codepaths nearby. This makes troubleshooting easier.
Approved by: re (kib) MFC after: 1 month
|
195923 |
28-Jul-2009 |
julian |
Startup the vnet part of initialization a bit after the global part. Fixes crash on boot if ipfw compiled in.
Submitted by: tegge@ Reviewed by: tegge@ Approved by: re (kib)
|
195922 |
28-Jul-2009 |
julian |
Somewhere along the line accept sockets stopped honoring the FIB selected for them. Fix this.
Reviewed by: ambrisko Approved by: re (kib) MFC after: 3 days
|
195919 |
28-Jul-2009 |
tuexen |
Fix a bug where wrong initialization value in used for an SCTP specific sysctl variable.
Approved by: re, rrs(mentor). MFC after: 2 weeks.
|
195918 |
28-Jul-2009 |
rrs |
Turns out that when a receiver forwards through its TNS's the processing code holds the read lock (when processing a FWD-TSN for pr-sctp). If it finds stranded data that can be given to the application, it calls sctp_add_to_readq(). The readq function also grabs this lock. So if INVAR is on we get a double recurse on a non-recursive lock and panic.
This fix will change it so that readq() function gets a flag to tell if the lock is held, if so then it does not get the lock.
Approved by: re@freebsd.org (Kostik Belousov) MFC after: 1 week
|
195914 |
27-Jul-2009 |
qingli |
This patch does the following:
- Allow loopback route to be installed for address assigned to interface of IFF_POINTOPOINT type. - Install loopback route for an IPv4 interface addreess when the "useloopback" sysctl variable is enabled. Similarly, install loopback route for an IPv6 interface address when the sysctl variable "nd6_useloopback" is enabled. Deleting loopback routes for interface addresses is unconditional in case these sysctl variables were disabled after an interface address has been assigned.
Reviewed by: bz Approved by: re
|
195906 |
27-Jul-2009 |
tuexen |
Fix the handling of unordered messages when using PR-SCTP.
Approved by: re, rrs (mentor) MFC after: 3 weeks.
|
195904 |
27-Jul-2009 |
tuexen |
Get rid of unused field. This will also be deleted in the official speciication of the SCTP socket API.
Approved by:re, rrs (mentor)
|
195894 |
26-Jul-2009 |
tuexen |
Add a missing unlock for the inp lock when returning early from sctp_add_to_readq().
Approved by: re, rrs (mentor) MFC after: 2 weeks.
|
195862 |
25-Jul-2009 |
julian |
Catch ipfw up to the rest of the vimage code. It got left behind when it moved to its new location.
Approved by: re (kensmith)
|
195837 |
23-Jul-2009 |
rwatson |
Introduce and use a sysinit-based initialization scheme for virtual network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will occur each time a network stack is instantiated and destroyed. In the !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT. For the VIMAGE case, we instead use SYSINIT's to track their order and properties on registration, using them for each vnet when created/ destroyed, or immediately on module load for already-started vnets. - Remove vnet_modinfo mechanism that existed to serve this purpose previously, as well as its dependency scheme: we now just use the SYSINIT ordering scheme. - Implement VNET_DOMAIN_SET() to allow protocol domains to declare that they want init functions to be called for each virtual network stack rather than just once at boot, compiling down to DOMAIN_SET() in the non-VIMAGE case. - Walk all virtualized kernel subsystems and make use of these instead of modinfo or DOMAIN_SET() for init/uninit events. In some cases, convert modular components from using modevent to using sysinit (where appropriate). In some cases, do minor rejuggling of SYSINIT ordering to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup) Discussed with: jhb, bz, julian, zec Reviewed by: bz Approved by: re (VIMAGE blanket)
|
195814 |
21-Jul-2009 |
bz |
sysctl_msec_to_ticks is used with both virtualized and non-vrtiualized sysctls so we cannot used one common function.
Add a macro to convert the arg1 in the virtualized case to vnet.h to not expose the maths to all over the code.
Add a wrapper for the single virtualized call, properly handling arg1 and call the default implementation from there.
Convert the two over places to use the new macro.
Reviewed by: rwatson Approved by: re (kib)
|
195788 |
20-Jul-2009 |
rwatson |
Back out the moving in r195782 of V_ip_id's initialization from the top back to the bottom of ip_init() as found in 7.x. I missed the fact that the bottom half of the init routine only runs in the !VNET case.
Submitted by: zec Approved by: re (vimage blanket)
|
195782 |
20-Jul-2009 |
rwatson |
Garbage collect vnet module registrations that have neither constructors nor destructors, as there's no actual work to do.
In most cases, the constructors weren't needed because of the existing protocol initialization functions run by net_init_domain() as part of VNET_MOD_NET, or they were eliminated when support for static initialization of virtualized globals was added.
Garbage collect dependency references to modules without constructors or destructors, notably VNET_MOD_INET and VNET_MOD_INET6.
Reviewed by: bz Approved by: re (vimage blanket)
|
195760 |
19-Jul-2009 |
rwatson |
Reimplement and/or implement vnet list locking by replacing a mostly unused custom mutex/condvar-based sleep locks with two locks: an rwlock (for non-sleeping use) and sxlock (for sleeping use). Either acquired for read is sufficient to stabilize the vnet list, but both must be acquired for write to modify the list.
Replace previous no-op read locking macros, used in various places in the stack, with actual locking to prevent race conditions. Callers must declare when they may perform unbounded sleeps or not when selecting how to lock.
Refactor vnet sysinits so that the vnet list and locks are initialized before kernel modules are linked, as the kernel linker will use them for modules loaded by the boot loader.
Update various consumers of these KPIs based on whether they may sleep or not.
Reviewed by: bz Approved by: re (kib)
|
195727 |
16-Jul-2009 |
rwatson |
Remove unused VNET_SET() and related macros; only VNET_GET() is ever actually used. Rename VNET_GET() to VNET() to shorten variable references.
Discussed with: bz, julian Reviewed by: bz Approved by: re (kensmith, kib)
|
195699 |
14-Jul-2009 |
rwatson |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
195655 |
13-Jul-2009 |
lstewart |
Fix a race in the manipulation of the V_tcp_sack_globalholes global variable, which is currently not protected by any type of lock. When triggered, the bug would sometimes cause a panic when the TCP activity to an affected machine eventually slowed during a lull. The panic only occurs if INVARIANTS is compiled into the kernel, and has laid dormant for some time as a result of INVARIANTS being off by default except in FreeBSD-CURRENT.
Switch to atomic operations in the locations where the variable is changed. Reads have not been updated to be protected by atomics, so there is a possibility of accounting errors in any given calculation where the variable is read. This is considered unlikely to occur in the wild, and will not cause serious harm on rare occasions where it does.
Thanks to Robert Watson for debugging help.
Reported by: Kamigishi Rei <spambox at haruhiism dot net> Tested by: Kamigishi Rei <spambox at haruhiism dot net> Reviewed by: silby Approved by: re (rwatson), kensmith (mentor temporarily unavailable)
|
195654 |
13-Jul-2009 |
lstewart |
Replace struct tcpopt with a proxy toeopt struct in the TOE driver interface to the TCP syncache. This returns struct tcpopt to being private within the TCP implementation, thus allowing it to be modified without ABI concerns.
The patch breaks the ABI. Bump __FreeBSD_version to 800103 accordingly. The cxgb driver is the only TOE consumer affected by this change, and needs to be recompiled along with the kernel.
Suggested by: rwatson Reviewed by: rwatson, kmacy Approved by: re (kensmith), kensmith (mentor temporarily unavailable)
|
195634 |
12-Jul-2009 |
lstewart |
Pad the following TCP related structs to allow MFCs of upcoming features/fixes back to the 8 branch:
tcp_var.h - struct sackhint - struct tcpcb - struct tcpstat
The patch breaks the ABI. Bump __FreeBSD_version to 800102 accordingly. User space tools that rely on the size of any of these structs (e.g. sockstat) need to be recompiled.
Reviewed by: rpaulo, sam, andre, rwatson Approved by: re & mentor (gnn)
|
195023 |
26-Jun-2009 |
rwatson |
Update various IPFW-related modules to use if_addr_rlock()/ if_addr_runlock() rather than IF_ADDR_LOCK()/IF_ADDR_UNLOCK().
MFC after: 6 weeks
|
194971 |
25-Jun-2009 |
rwatson |
Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks.
Some pieces of code require significant further reworking to be safe from all classes of writer-writer races.
Reviewed by: bz MFC after: 6 weeks
|
194962 |
25-Jun-2009 |
rwatson |
Initialize in_ifaddr_lock using RW_SYSINIT() instead of in ip_init(), so that it doesn't run multiple times if VIMAGE is being used.
Discussed with: bz MFC after: 6 weeks
|
194951 |
25-Jun-2009 |
rwatson |
Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists.
Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently).
For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists.
MFC after: 6 weeks Reviewed by: bz
|
194930 |
24-Jun-2009 |
oleg |
- fix dummynet 'fast' mode for WF2Q case. - fix printing of pipe profile data. - introduce new pipe parameter: 'burst' - how much data can be sent through pipe bypassing bandwidth limit.
|
194912 |
24-Jun-2009 |
rwatson |
Fix CARP build.
Reported by: bz
|
194907 |
24-Jun-2009 |
rwatson |
Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible.
Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)
|
194837 |
24-Jun-2009 |
rwatson |
Add missing unlock of if_addr_mtx when an unmatched ARP packet is received.
Reported by: lstewart MFC after: 6 weeks
|
194835 |
24-Jun-2009 |
rwatson |
Clear 'ia' after iterating if_addrhead for unicast address matching: since 'ifa' was used as the TAILQ_FOREACH() iterator argument, and 'ia' was just derived form it, it could be left non-NULL which confused later conditional freeing code. This could cause kernel panics if multicast IP packets were received. [1]
Call 'struct in_ifaddr *' in ip_rtaddr() 'ia', not 'ifa' in keeping with normal conventions.
When 'ipstealth' is enabled returns from ip_input early, properly release the 'ia' reference.
Reported by: lstewart, sam [1] MFC after: 6 weeks
|
194820 |
24-Jun-2009 |
rwatson |
In ARP input, more consistently acquire and release ifaddr references.
MFC after: 6 weeks
|
194777 |
23-Jun-2009 |
bz |
Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful.
Asked for by: rwatson Reviewed by: rwatson
|
194760 |
23-Jun-2009 |
rwatson |
Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references:
ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr
Remove unused macro which didn't have required referencing:
IFP_TO_IA6
This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references.
Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed.
Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
|
194739 |
23-Jun-2009 |
bz |
After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.
|
194672 |
22-Jun-2009 |
andre |
Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets.
It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per *mbuf*. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability.
It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done.
This function was written by "reverse engineering" and is not just a stripped down variant of soreceive().
It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done.
Testers, especially with 10GigE gear, are welcome.
MFP4: r164817 //depot/user/andre/soreceive_stream/
|
194660 |
22-Jun-2009 |
zec |
V_irtualize flowtable state.
This change should make options VIMAGE kernel builds usable again, to some extent at least.
Note that the size of struct vnet_inet has changed, though in accordance with one-bump-per-day policy we didn't update the __FreeBSD_version number, given that it has already been touched by r194640 a few hours ago. Reviewed by: bz Approved by: julian (mentor)
|
194622 |
22-Jun-2009 |
rwatson |
Add a new function, ifa_ifwithaddr_check(), which rather than returning a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference.
MFC after: 3 weeks
|
194616 |
22-Jun-2009 |
bz |
Remove a hack from r186086 so that IPsec via loopback routes continued working. It was targeted for stable/7 compatibility and actually never did anything in HEAD.
Reminded by: rwatson X-MFC after: never
|
194602 |
21-Jun-2009 |
rwatson |
Clean up common ifaddr management:
- Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management.
The ifa_mtx is now used for exactly one ioctl, and possibly should be removed.
MFC after: 3 weeks
|
194581 |
21-Jun-2009 |
rdivacky |
Switch cmd argument to u_long. This matches what if_ethersubr.c does and allows the code to compile cleanly on amd64 with clang.
Reviewed by: rwatson Approved by: ed (mentor)
|
194498 |
19-Jun-2009 |
brooks |
Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.)
The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc.
Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search.
Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error.
Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity.
Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
|
194368 |
17-Jun-2009 |
bz |
Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.
|
194357 |
17-Jun-2009 |
bz |
Add the explicit include of vimage.h to another five .c files still missing it.
Remove the "hidden" kernel only include of vimage.h from ip_var.h added with the very first Vimage commit r181803 to avoid further kernel poisoning.
|
194355 |
17-Jun-2009 |
rrs |
Changes to the NR-Sack code so that: 1) All bit disappears 2) The two sets of gaps (nr and non-nr) are disjointed, you don't have gaps struck in both places.
This adjusts us to coorespond to the new draft. Still to-do, cleanup the code so that there are only one set of sack routines (original NR-Sack done by E cloned all sack code).
|
194305 |
16-Jun-2009 |
jhb |
Trim extra sets of ()'s.
Requested by: bde
|
194304 |
16-Jun-2009 |
jhb |
Fix edge cases with ticks wrapping from INT_MAX to INT_MIN in the handling of the per-tcpcb t_badtrxtwin.
Submitted by: bde
|
194303 |
16-Jun-2009 |
jhb |
- Change members of tcpcb that cache values of ticks from int to u_int: t_rcvtime, t_starttime, t_rtttime, t_bw_rtttime, ts_recent_age, t_badrxtwin. - Change t_recent in struct timewait from u_long to u_int32_t to match the type of the field it shadows from tcpcb: ts_recent. - Change t_starttime in struct timewait from u_long to u_int to match the t_starttime field in tcpcb.
Requested by: bde (1, 3)
|
194252 |
15-Jun-2009 |
jamie |
Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread.
Reviewed by: zec, julian Approved by: bz (mentor)
|
194245 |
15-Jun-2009 |
oleg |
Since dn_pipe.numbytes is int64_t now - remove unnecessary overflow detection code in ready_event_wfq().
|
194076 |
12-Jun-2009 |
bz |
Move the kernel option FLOWTABLE chacking from the header file to the actual implementation. Remove the accessor functions for the compiled out case, just returning "unavail" values. Remove the kernel conditional from the header file as it is no longer needed, only leaving the externs. Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size under the kernel option as well.
Reviewed by: rwatson
|
194062 |
12-Jun-2009 |
vanhu |
Added support for NAT-Traversal (RFC 3948) in IPsec stack.
Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele (julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense team, and all people who used / tried the NAT-T patch for years and reported bugs, patches, etc...
X-MFC: never
Reviewed by: bz Approved by: gnn(mentor) Obtained from: NETASQ
|
194003 |
11-Jun-2009 |
jhb |
Correct printf format type mismatches.
|
194002 |
11-Jun-2009 |
jhb |
Trim extra ()'s.
Submitted by: bde
|
193941 |
10-Jun-2009 |
jhb |
Change a few members of tcpcb that store cached copies of ticks to be ints instead of unsigned longs. This fixes a few overflow edge cases on 64-bit platforms. Specifically, if an idle connection receives a packet shortly before 2^31 clock ticks of uptime (about 25 days with hz=1000) and the keep alive timer fires after 2^31 clock ticks, the keep alive timer will think that the connection has been idle for a very long time and will immediately drop the connection instead of sending a keep alive probe.
Reviewed by: silby, gnn, lstewart MFC after: 1 week
|
193938 |
10-Jun-2009 |
imp |
These are no longer referenced in the tree, so can be safely removed.
Reviewed by: bms@
|
193896 |
10-Jun-2009 |
luigi |
in ip_dn_ctl(), do not allocate a large structure on the stack, and use malloc() instead if/when it is necessary.
The problem is less relevant in previous versions because the variable involved (tmp_pipe) is much smaller there. Still worth fixing though.
Submitted by: Marta Carbone (GSOC) MFC after: 3 days
|
193895 |
10-Jun-2009 |
bz |
Remove the "The option TCPDEBUG requires option INET." requirement. In case of !INET we will not have a timestamp on the trace for now but that might only affect spx debugging as long as INET6 requires INET.
Reviewed by: rwatson (earlier version)
|
193894 |
10-Jun-2009 |
luigi |
small simplifications to the code in charge of reaping deleted rules: - clear the head pointer immediately before using it, so there is no chance of mistakes; - call reap_rules() unconditionally. The function can handle a NULL argument just fine, and the cost of the extra call is hardly significant given that we do it rarely and outside the lock.
MFC after: 3 days
|
193859 |
09-Jun-2009 |
oleg |
Close long existed race with net.inet.ip.fw.one_pass = 0: If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc) it carries pointer to matching ipfw rule. If this packet then reinjected back to ipfw, ruleset processing starts from that rule. If rule was deleted meanwhile, due to existed race condition panic was possible (as well as other odd effects like parsing rules in 'reap list').
P.S. this commit changes ABI so userland ipfw related binaries should be recompiled.
MFC after: 1 month Tested by: Mikolaj Golub
|
193744 |
08-Jun-2009 |
bz |
After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds.
Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
|
193731 |
08-Jun-2009 |
zec |
Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor functions to clean up and release per-module state. The destructor mechanisms can be triggered by invoking "vimage -d", or a future equivalent command which will be provided via the new jail framework.
While this patch introduces numerous placeholder destructor functions, many of those are currently incomplete, thus leaking memory or (even worse) failing to stop all running timers. Many of such issues are already known and will be incrementaly fixed over the next weeks in smaller incremental commits.
Apart from introducing new fields in structs ifnet, domain, protosw and vnet_net, which requires the kernel and modules to be rebuilt, this change should have no impact on nooptions VIMAGE builds, since vnet destructors can only be called in VIMAGE kernels. Moreover, destructor functions should be in general compiled in only in options VIMAGE builds, except for kernel modules which can be safely kldunloaded at run time.
Bump __FreeBSD_version to 800097. Reviewed by: bz, julian Approved by: rwatson, kib (re), julian (mentor)
|
193664 |
07-Jun-2009 |
hrs |
Fix and add a workaround on an issue of EtherIP packet with reversed version field sent via gif(4)+if_bridge(4). The EtherIP implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had an interoperability issue because it sent the incorrect EtherIP packets and discarded the correct ones.
This change introduces the following two flags to gif(4):
accept_rev_ethip_ver: accepts both correct EtherIP packets and ones with reversed version field, if enabled. If disabled, the gif accepts the correct packets only. This flag is enabled by default.
send_rev_ethip_ver: sends EtherIP packets with reversed version field intentionally, if enabled. If disabled, the gif sends the correct packets only. This flag is disabled by default.
These flags are stored in struct gif_softc and can be set by ifconfig(8) on per-interface basis.
Note that this is an incompatible change of EtherIP with the older FreeBSD releases. If you need to interoperate older FreeBSD boxes and new versions after this commit, setting "send_rev_ethip_ver" is needed.
Reviewed by: thompsa and rwatson Spotted by: Shunsuke SHINOMIYA PR: kern/125003 MFC after: 2 weeks
|
193582 |
06-Jun-2009 |
zec |
Unbreak options VIMAGE build.
Submitted by: julian (mentor) Approved by: julian (mentor)
|
193550 |
05-Jun-2009 |
pjd |
Only four out of nine arguments for ip_ipsec_output() are actually used. Kill unused arguments except for 'ifp' as it might be used in the future for detecting IPsec-capable interfaces.
|
193532 |
05-Jun-2009 |
luigi |
move kernel ipfw-related sources to a separate directory, adjust conf/files and modules' Makefiles accordingly.
No code or ABI changes so this and most of previous related changes can be easily MFC'ed
MFC after: 5 days
|
193516 |
05-Jun-2009 |
luigi |
Several ipfw options and actions use a 16-bit argument to indicate pipes, queues, tags, rule numbers and so on. These are all different namespaces, and the only thing they have in common is the fact they use a 16-bit slot to represent the argument.
There is some confusion in the code, mostly for historical reasons, on how the values 0 and 65535 should be used. At the moment, 0 is forbidden almost everywhere, while 65535 is used to represent a 'tablearg' argument, i.e. the result of the most recent table() lookup.
For now, try to use explicit constants for the min and max allowed values, and do not overload the default rule number for that.
Also, make the MTAG_IPFW declaration only visible to the kernel.
NOTE: I think the issue needs to be revisited before 8.0 is out: the 2^16 namespace limit for rule numbers and pipe/queue is annoying, and we can easily bump the limit to 2^32 which gives a lot more flexibility in partitioning the namespace.
MFC after: 5 days
|
193511 |
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
193510 |
05-Jun-2009 |
rwatson |
Unifdef MAC label pointer in syncache entries -- in general, ifdef'd structure contents are a bad idea in the kernel for binary compatibility reasons, and this is a single pointer that is now included in compiles by default anyway due to options MAC being in GENERIC.
|
193502 |
05-Jun-2009 |
luigi |
More cleanup in preparation of ipfw relocation (no actual code change):
+ move ipfw and dummynet hooks declarations to raw_ip.c (definitions in ip_var.h) same as for most other global variables. This removes some dependencies from ip_input.c;
+ remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly;
+ remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly;
+ move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it;
To be merged together with rev 193497
MFC after: 5 days
|
193497 |
05-Jun-2009 |
luigi |
Small changes (no actual code changes) in preparation of moving ipfw-related stuff to its own directory, and cleaning headers and dependencies:
In this commit: + remove one use of a typedef; + document dn_rule_delete(); + replace one usage of the DUMMYNET_LOADED macro with its value;
No MFC planned until the cleanup is complete.
|
193435 |
04-Jun-2009 |
luigi |
fix a bug introduced in rev.190865 related to the signedness of the credit of a pipe. On passing, also use explicit signed/unsigned types for two other fields. Noticed by Oleg Bulyzhin and Maxim Ignatenko long ago, i forgot to commit the fix.
Does not affect RELENG_7.
|
193391 |
03-Jun-2009 |
rwatson |
Continue work to optimize performance of "options MAC" when no MAC policy modules are loaded by avoiding mbuf label lookups when policies aren't loaded, pushing further socket locking into MAC policy modules, and avoiding locking MAC ifnet locks when no policies are loaded:
- Check mac_policies_count before looking for mbuf MAC label m_tags in MAC Framework entry points. We will still pay label lookup costs if MAC policies are present but don't require labels (typically a single mbuf header field read, but perhaps further indirection if IPSEC or other m_tag consumers are in use).
- Further push socket locking for socket-related access control checks and events into MAC policies from the MAC Framework, so that sockets are only locked if a policy specifically requires a lock to protect a label. This resolves lock order issues during sonewconn() and also in local domain socket cross-connect where multiple socket locks could not be held at once for the purposes of propagatig MAC labels across multiple sockets. Eliminate mac_policy_count check in some entry points where it no longer avoids locking.
- Add mac_policy_count checking in some entry points relating to network interfaces that otherwise lock a global MAC ifnet lock used to protect ifnet labels.
Obtained from: TrustedBSD Project
|
193332 |
02-Jun-2009 |
rwatson |
Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies.
Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero.
This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies.
Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls.
Obtained from: TrustedBSD Project
|
193272 |
01-Jun-2009 |
jhb |
Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call.
Discussed with: rwatson, rmacklem
|
193232 |
01-Jun-2009 |
bz |
Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back.
Update netstat to get the correct pointer using kvm_read() as well.
This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now.
Reviewed by: julian, rwatson, zec X-MFC: not possible
|
193231 |
01-Jun-2009 |
bms |
Merge fixes from p4: * Tighten v1 query input processing. * Borrow changes from MLDv2 for how general queries are processed. * Do address field validation upfront before accepting input. * Do NOT switch protocol version if old querier present timer active. * Always clear IGMPv3 state in igmp_v3_cancel_link_timers(). * Update comments.
Tested by: deeptech71 at gmail dot com
|
193219 |
01-Jun-2009 |
rwatson |
Reimplement the netisr framework in order to support parallel netisr threads:
- Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy.
In the future it would be desirable to support topology-centric policies, such as "one netisr per package".
- Allow each protocol to advertise an ordering policy, which can currently be one of:
NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket).
NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available.
NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid).
- Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions.
- Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams.
- Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used.
- Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256.
- All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present.
- Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration.
In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible.
Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue.
An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime.
A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated.
This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE.
Bump __FreeBSD_version.
Reviewed by: bz
|
193217 |
01-Jun-2009 |
pjd |
- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it).
Discussed with: julian, adrian, jhb, rwatson, kmacy
|
193090 |
30-May-2009 |
rrs |
Adds missing sysctl to manage the vtag_time_wait time. This will even allow disabling time-wait all together if you set the value to 0 (not advisable actually). The default remains the same i.e. 60 seconds.
|
193089 |
30-May-2009 |
rrs |
Fix a small memory leak from the nr-sack code - the mapping array was not being freed at term of association. Also get rid of the MICHAELS_EXP code.
|
193088 |
30-May-2009 |
rrs |
Make sctp_uio user to kernel structure match the socket-api draft. Two fields were uint32_t when they should have been uint16_t.
Reported by Jonathan Leighton at U-del.
|
192912 |
27-May-2009 |
zml |
Correct handling of SYN packets that are to the left of the current window of an ESTABLISHED connection.
Reviewed by: net@, gnn Approved by: dfr (mentor)
|
192895 |
27-May-2009 |
jamie |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings.
Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call.
Approved by: bz (mentor)
|
192893 |
27-May-2009 |
trasz |
Don't discard packets with 'Destination Unreachable' at the beginning of ip_forward(), if the IPSEC is compiled in. It is possible that there is an SPD that this packets will go through, even if there is no matching route. If not, ICMP will be sent anyway, after ip_output().
This is somewhat similar in purpose to r191621, except that one was for the packets sent from the host, while this one is for packets being forwarded by the host.
Reviewed by: bz@ Sponsored by: Wheel Sp. z o.o. (http://www.wheel.pl)
|
192848 |
26-May-2009 |
jhb |
Correct the sense of a test so that this filter always waits for the full request to arrive. Previously it would end up returning as soon as the request length stored in the first two bytes had arrived.
Reviewed by: dwmalone MFC after: 1 week
|
192761 |
25-May-2009 |
rwatson |
Remove comment about moving tcp_reass() to its own file named tcp_reass.c, that happened a while ago.
MFC after: 3 days
|
192651 |
23-May-2009 |
bz |
For UDP with introducing the UDP control block, the uma zone had to be named "udp_inpcb" to avoid a naming conflict with tcp[1]. For consistency rename the uma zone for TCP from "inpcb" to "tcp_inpcb".
Found by: rwatson [1] Discussed with: rwatson
|
192649 |
23-May-2009 |
bz |
Implement UDP control block support.
So far the udp_tun_func_t had been (ab)using inp_ppcb for udp in kernel tunneling callbacks. Move that into the udpcb and add a field for flags there to be used by upcoming changes instead of sticking udp only flags into in_pcb flags2.
Bump __FreeBSD_version for ports to detect it and because of vnet* struct size changes.
Submitted by: jhb (7.x version) Reviewed by: rwatson
|
192648 |
23-May-2009 |
bz |
Add sysctls to toggle the behaviour of the (former) IPSEC_FILTERTUNNEL kernel option. This also permits tuning of the option per virtual network stack, as well as separately per inet, inet6.
The kernel option is left for a transition period, marked deprecated, and will be removed soon.
Initially requested by: phk (1 year 1 day ago) MFC after: 4 weeks
|
192612 |
22-May-2009 |
bz |
If including vnet.h one has to include opt_route.h as well. This is because struct vnet_net holds the rt_tables[][] for MRT and array size is compile time dependent. If you had ROUTETABLES set to >1 after r192011 V_loif was pointing into nonsense leading to strange results or even panics for some people.
Reviewed by: mz
|
192528 |
21-May-2009 |
rwatson |
Consolidate and clean up the first section of ip_output.c in light of the last year or two's work on routing:
- Combine iproute initialization and flowtable lookup blocks, eliminating unnecessary tests for known-zero'd iproute fields.
- Add a comment indicating (a) why the route entry returned by the flowtable is considered stable and (b) that the flowtable lookup must occur after the setup of the mbuf flow ID.
- Assert the inpcb lock before any use of inpcb fields.
Reviewed by: kmacy
|
192476 |
20-May-2009 |
qingli |
When an interface address is removed and the last prefix route is also being deleted, the link-layer address table (arp or nd6) will flush those L2 llinfo entries that match the removed prefix.
Reviewed by: kmacy
|
192351 |
18-May-2009 |
bz |
Revert the logical change of r192341.
net.inet.ip.fw.one_pass is a classic ip_input.c variable and is used in the pfil and bridge code as well. As ipfw is loadable we need to always provide it. That is the reason why it lives in struct vnet_inet and not in struct vnet_ipfw.
|
192341 |
18-May-2009 |
jhb |
- Fix typo in description of 'net.inet.ip.fw.autoinc_step'. - Use 'vnet_ipfw' instead of 'vnet_inet' for 'net.inet.ip.fw.one_pass'.
|
192262 |
17-May-2009 |
bz |
Unbreak options VIMAGE builds, in a followup to r192011 which did not introduce INIT_VNET_NET() initializers necessary for accessing V_loif.
Submitted by: zec Reviewed by: julian
|
192116 |
14-May-2009 |
rwatson |
Staticize two functions not used outside of in_pcb.c: in_pcbremlists() and db_print_inpcb().
MFC after: 1 month
|
192085 |
14-May-2009 |
qingli |
Ignore the INADDR_ANY address inserted/deleted by DHCP when installing a loopback route to the interface address.
|
192011 |
12-May-2009 |
qingli |
This patch adds a host route to an interface address (that is assigned to a non loopback/ppp link types) through the loopback interface. Prior to the new L2/L3 rewrite, this host route is implicitly added by the L2 code during RTM_RESOLVE of that interface address. This host route is deleted when that interface is removed.
Reviewed by: kmacy
|
191943 |
09-May-2009 |
imp |
Remove bogus comment.
|
191932 |
09-May-2009 |
jhb |
Convert IPFW_DEFAULT_TO_ACCEPT into a loader tunable 'net.inet.ip.fw.default_to_accept'. The current value can also be queried via a read-only sysctl of the same name.
Requested by: plosher MFC after: 1 week
|
191917 |
08-May-2009 |
zec |
A NOP change: style / whitespace cleanup of the noise that slipped into r191816.
Spotted by: bz Approved by: julian (mentor) (an earlier version of the diff)
|
191916 |
08-May-2009 |
zec |
Remove a bogus check that unintentionally slipped in r191816.
This change has no functional impact on nooptions VIMAGE builds. Submitted by: bz
|
191891 |
07-May-2009 |
rrs |
repository sync to multi-OS repo ... spaceing change
|
191890 |
07-May-2009 |
rrs |
ABI expansions to hopefully future-proof our MIB/netstat code for 8.0
|
191846 |
06-May-2009 |
zec |
Remove unnecessary CURVNET_SET() calls where curvnet context is (i.e. seems to be) already set.
This should reduce console noise due to curvnet recursion reports.
This change has no impact on nooptions VIMAGE builds. Approved by: julian (mentor)
|
191845 |
06-May-2009 |
zec |
Unbreak options VIMAGE kernel builds.
Approved by: julian (mentor)
|
191816 |
05-May-2009 |
zec |
Change the curvnet variable from a global const struct vnet *, previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all vnet instances.
Approved by: julian (mentor)
|
191738 |
02-May-2009 |
zec |
Make indentation more uniform accross vnet container structs.
This is a purely cosmetic / NOP change.
Reviewed by: bz Approved by: julian (mentor) Verified by: svn diff -x -w producing no output
|
191734 |
02-May-2009 |
zec |
Unbreak options VIMAGE + nooptions INVARIANTS kernel builds.
Submitted by: julian Approved by: julian (mentor)
|
191688 |
30-Apr-2009 |
zec |
Permit buiding kernels with options VIMAGE, restricted to only a single active network stack instance. Turning on options VIMAGE at compile time yields the following changes relative to default kernel build:
1) V_ accessor macros for virtualized variables resolve to structure fields via base pointers, instead of being resolved as fields in global structs or plain global variables. As an example, V_ifnet becomes:
options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet default build: vnet_net_0._ifnet options VIMAGE_GLOBALS: ifnet
2) INIT_VNET_* macros will declare and set up base pointers to be used by V_ accessor macros, instead of resolving to whitespace:
INIT_VNET_NET(ifp->if_vnet); becomes
struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];
3) Memory for vnet modules registered via vnet_mod_register() is now allocated at run time in sys/kern/kern_vimage.c, instead of per vnet module structs being declared as globals. If required, vnet modules can now request the framework to provide them with allocated bzeroed memory by filling in the vmi_size field in their vmi_modinfo structures.
4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are extended to hold a pointer to the parent vnet. options VIMAGE builds will fill in those fields as required.
5) curvnet is introduced as a new global variable in options VIMAGE builds, always pointing to the default and only struct vnet.
6) struct sysctl_oid has been extended with additional two fields to store major and minor virtualization module identifiers, oid_v_subs and oid_v_mod. SYSCTL_V_* family of macros will fill in those fields accordingly, and store the offset in the appropriate vnet container struct in oid_arg1. In sysctl handlers dealing with virtualized sysctls, the SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target variable and make it available in arg1 variable for further processing.
Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have been deleted.
Reviewed by: bz, rwatson Approved by: julian (mentor)
|
191672 |
29-Apr-2009 |
bms |
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING.
NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.
|
191661 |
29-Apr-2009 |
bms |
Add MLDv2 prototypes and defines.
|
191660 |
29-Apr-2009 |
bms |
Use KTR_INET for MROUTING CTRs.
|
191659 |
29-Apr-2009 |
bms |
Cut over to KTR_INET for CTR. For clarity, put pointer incremement/size decrement on own line when copying out in-mode source filters to userland.
|
191658 |
29-Apr-2009 |
bms |
Do not assume that ip6_moptions is always set, it is a lazy-allocated structure.
|
191657 |
29-Apr-2009 |
bms |
Fix a problem whereby enqueued IGMPv3 filter list changes would be incorrectly output, if the RB-tree enumeration happened to reuse the same chain for a mode switch: that is, both ALLOW and BLOCK records were appended for the same group, in the same mbuf packet chain.
This was introduced during an mbuf chain layout bug fix involving m_getptr(), which obviously cannot count from offset 0 on the second pass through the RB-tree when serializing the IGMPv3 group records into the pending mbuf chain.
Cut over to KTR_INET for IGMPv3 CTR usage.
|
191621 |
28-Apr-2009 |
trasz |
Don't require packet to match a route (any route; this information wasn't used anyway, so a typical workaround was to add a dummy route) if it's going to be sent through IPSec tunnel.
Reviewed by: bz
|
191570 |
27-Apr-2009 |
oleg |
Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was processed by ipfw once - avoid second ipfw_chk() call. This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and ip/tcp/udp header parsing.
MFC after: 2 month
|
191548 |
26-Apr-2009 |
zec |
In preparation for turning on options VIMAGE in next commits, rearrange / replace / adjust several INIT_VNET_* initializer macros, all of which currently resolve to whitespace.
Reviewed by: bz (an older version of the patch) Approved by: julian (mentor)
|
191528 |
26-Apr-2009 |
rwatson |
Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead (colloquially known as if_addrlist). Currently not acquired around interface address loops that call out to the routing code due to potential lock order issues.
MFC after: 3 weeks
|
191500 |
25-Apr-2009 |
rwatson |
Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial lookup of 'ia' from if_addrhead through most use. Note that we currently have to drop it prematurely in some cases due to calls out to the routing and interface code while using 'ia', but this closes many races. Annotate several potential races that persist after this change. Move to using M_NOWAIT for allocating new interface addresses due to lock(s) being held.
MFC after: 3 weeks
|
191476 |
24-Apr-2009 |
rwatson |
In in_purgemaddrs(), remove the inm being freed from the address list before freeing it, rather than vice version, to avoid potential use after free.
Reviewed by: bms
|
191456 |
24-Apr-2009 |
rwatson |
Relocate permissions checking code in in_control() to before the body of the implementation of ioctls. This makes the mapping of ioctls to specific privileges more explicit, and also simplifies the implementation by reducing the use of FALLTHROUGH handling in switch.
While this is not intended to be a functional change, it does mean that certain privilege checks are now performed earlier, so EPERM might be returned in preference to EADDRNOTAVAIL for management ioctls that could have failed for both reasons.
MFC after: 3 weeks
|
191443 |
23-Apr-2009 |
rwatson |
Reorganize in_control() so that invariants are more obvious, and so that it is easier to lock:
- Handle the unsupported ioctl case at the beginning of in_control(), handing off to ifp->if_ioctl, rather than looking up interfaces and addresses unnecessarily in this case.
- Make it an invariant that ifp is always non-NULL when running in_control()-implemented ioctls, simplifying the code structure.
MFC after: 3 weeks
|
191356 |
21-Apr-2009 |
bms |
Bracket struct mfc and struct rtdetq with #ifdef _KERNEL. Match the bracketing in netstat. Since the cleanup of MROUTING, ports have broken because they expect to include <netinet/ip_mroute.h> without including <sys/queue.h>. Fix breakage at source.
The real fix, of course, is to fix the MROUTING APIs by blowing them away and replacing them with something else...
|
191348 |
21-Apr-2009 |
bms |
remove IFF_ASSERTGIANT
|
191338 |
20-Apr-2009 |
rwatson |
Prefer actual field names (if_addrhead, ifa_link) to macros aliasing those field names in FreeBSD code.
MFC after: 2 weeks
|
191314 |
20-Apr-2009 |
rwatson |
In ip_input(), cache the received mbuf's network interface in a local variable. Acquire the interface address list lock when iterating over the interface address list searching for a matching received broadcast address.
MFC after: 2 weeks
|
191311 |
20-Apr-2009 |
rwatson |
In icmp_reflect(), acquire the inteface address list lock when searching for a source address to use.
MFC after: 2 weeks Reviewed by: bz
|
191288 |
19-Apr-2009 |
rwatson |
Lock the interface address list when searching for a matching interface by address, or when implementing 'me' rules on IPv6. Prefer the field name if_addrhead to the macro if_addrlist.
MFC after: 2 weeks
|
191287 |
19-Apr-2009 |
rwatson |
In divert_packet(), lock the interface address list before iterating over it in search of an address.
MFC after: 2 weeks
|
191286 |
19-Apr-2009 |
rwatson |
Lock interface address lists in in_pcbladdr() when searching for a source address for a connection and there's no route or now interface for the route.
MFC after: 2 weeks
|
191285 |
19-Apr-2009 |
rwatson |
Protect against some writer-writer races in in_control() by acquiring the interface address list lock around interface address list modifications. More to do here.
MFC after: 2 weeks
|
191264 |
19-Apr-2009 |
bms |
Now that IFF_NEEDSGIANT has been removed from the network stack, catch up with this in IGMPv3 and remove dead code. This has the side-effect of not being back-portable to RELENG_7 w/o further changes.
|
191259 |
19-Apr-2009 |
kmacy |
- Allocate a small flowtable in ip_input.c (changeable by tuneable) - Use for accelerating ip_output
|
191160 |
16-Apr-2009 |
kmacy |
s/void/void */
|
191158 |
16-Apr-2009 |
kmacy |
restore spare pointers for MFCing
|
191148 |
16-Apr-2009 |
kmacy |
Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2
Reviewed by: rwatson
|
191129 |
15-Apr-2009 |
kmacy |
- convert pspare pointers in inpcb to an llentry and rtentry cache - add flags to indicate their validity
|
191126 |
15-Apr-2009 |
kmacy |
- add second flags field to to inpcb - update comments in vflag
|
191125 |
15-Apr-2009 |
kmacy |
provide additional convenience macros for inpcb locking (upgrade, downgrade, exclusive)
|
191120 |
15-Apr-2009 |
kmacy |
make LLTABLE visible to netinet
|
191117 |
15-Apr-2009 |
kmacy |
add an llentry to struct route{_in6} to allow it to be passed around with the rtentry
|
191073 |
14-Apr-2009 |
rrs |
Add missing address lock when we look at the ifa list
|
191049 |
14-Apr-2009 |
rrs |
Move the flight size reduction to right after we recognize its a retransmit, ahead of the PR-SCTP work. Without this fix, we end up NOT reducing flight size and causing an miscalculation when PR-SCTP is active and data is skipped.
Obtained from: Michael Tuexen.
|
190978 |
12-Apr-2009 |
rwatson |
Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL.
MFC after: 3 days
|
190968 |
12-Apr-2009 |
rwatson |
Update stats in struct carpstats using two new macros: CARPSTATS_ADD() and CARPSTATS_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure.
MFC after: 3 days
|
190967 |
12-Apr-2009 |
rwatson |
Update stats in struct pimstat using two new macros: PIMSTAT_ADD() and PIMSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure.
MFC after: 3 days
|
190966 |
12-Apr-2009 |
rwatson |
Update stats in struct mrtstat using two new macros: MRTSTAT_ADD() and MRTSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure.
MFC after: 3 days
|
190965 |
12-Apr-2009 |
rwatson |
Update stats in struct igmpstat using two new macros: IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures.
MFC after: 3 days
|
190964 |
12-Apr-2009 |
rwatson |
Update stats in struct icmpstat and icmp6stat using four new macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and ICMP6STAT_INC(), rather than directly manipulating the fields of these structures across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures.
In on case, icmp6stat members are manipulated indirectly, by icmp6_errcount(), and this will require further work to fix for per-CPU stats.
MFC after: 3 days
|
190962 |
12-Apr-2009 |
rwatson |
Update stats in struct udpstat using two new macros, UDPSTAT_ADD() and UDPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures.
MFC after: 3 days
|
190951 |
11-Apr-2009 |
rwatson |
Update stats in struct ipstat using four new macros, IPSTAT_ADD(), IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures.
MFC after: 3 days
|
190948 |
11-Apr-2009 |
rwatson |
Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and TCPSTAT_INC(), rather than directly manipulating the fields across the kernel. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structures.
MFC after: 3 days
|
190941 |
11-Apr-2009 |
piso |
What's the point of adjusting a checksum if we are going to toss the packet? Anticipate the check/return code.
|
190938 |
11-Apr-2009 |
piso |
Plug two bugs introduced with modules conversion:
-UdpAliasIn(): correctly check return code after modules ran. -alias_nbt: in case of malformed packets (or some other unrecoverable error), toss the packet.
|
190935 |
11-Apr-2009 |
piso |
Remove stale comments.
|
190909 |
11-Apr-2009 |
zec |
Introduce vnet module registration / initialization framework with dependency tracking and ordering enforcement.
With this change, per-vnet initialization functions introduced with r190787 are no longer directly called from traditional initialization functions (which cc in most cases inlined to pre-r190787 code), but are instead registered via the vnet framework first, and are invoked only after all prerequisite modules have been initialized. In the long run, this framework should allow us to both initialize and dismantle multiple vnet instances in a correct order.
The problem this change aims to solve is how to replay the initialization sequence of various network stack components, which have been traditionally triggered via different mechanisms (SYSINIT, protosw). Note that this initialization sequence was and still can be subtly different depending on whether certain pieces of code have been statically compiled into the kernel, loaded as modules by boot loader, or kldloaded at run time.
The approach is simple - we record the initialization sequence established by the traditional mechanisms whenever vnet_mod_register() is called for a particular vnet module. The vnet_mod_register_multi() variant allows a single initializer function to be registered multiple times but with different arguments - currently this is only used in kern/uipc_domain.c by net_add_domain() with different struct domain * as arguments, which allows for protosw-registered initialization routines to be invoked in a correct order by the new vnet initialization framework.
For the purpose of identifying vnet modules, each vnet module has to have a unique ID, which is statically assigned in sys/vimage.h. Dynamic assignment of vnet module IDs is not supported yet.
A vnet module may specify a single prerequisite module at registration time by filling in the vmi_dependson field of its vnet_modinfo struct with the ID of the module it depends on. Unless specified otherwise, all vnet modules depend on VNET_MOD_NET (container for ifnet list head, rt_tables etc.), which thus has to and will always be initialized first. The framework will panic if it detects any unresolved dependencies before completing system initialization. Detection of unresolved dependencies for vnet modules registered after boot (kldloaded modules) is not provided.
Note that the fact that each module can specify only a single prerequisite may become problematic in the long run. In particular, INET6 depends on INET being already instantiated, due to TCP / UDP structures residing in INET container. IPSEC also depends on INET, which will in turn additionally complicate making INET6-only kernel configs a reality.
The entire registration framework can be compiled out by turning on the VIMAGE_GLOBALS kernel config option.
Reviewed by: bz Approved by: julian (mentor)
|
190880 |
10-Apr-2009 |
kmacy |
Import "flowid" support for serializing flows across transmit queues
Reviewed by: rwatson and jeli
|
190865 |
09-Apr-2009 |
luigi |
Add emulation of delay profiles, which lets you model various types of MAC overheads such as preambles, link level retransmissions and more.
Note- this commit changes the userland/kernel ABI for pipes (but not for ordinary firewall rules) so you need to rebuild kernel and /sbin/ipfw to use dummynet features.
Please check the manpage for details on the new feature.
The MFC would be trivial but it breaks the ABI, so it will be postponed until after 7.2 is released.
Interested users are welcome to apply the patch manually to their RELENG_7 tree.
Work supported by the European Commission, Projects Onelab and Onelab2 (contract 224263).
|
190843 |
08-Apr-2009 |
rrs |
Fix a FR bug. When doing PR-SCTP with number rtx set to a low number. The check for skipping was in the incorrect place. Which meant we would FR chunks we should not. MFC after: 1 Month
|
190842 |
08-Apr-2009 |
rrs |
Add more padding and a new variable. This will help us be able to keep ABI compatibility between 8 and 9. MFC after: Never
|
190841 |
08-Apr-2009 |
piso |
-don't pass down, to module's fingerprint function, unused data like a pointer to the ip header. -style -spacing
|
190800 |
07-Apr-2009 |
bz |
With the right comparison we get a proper wscale value and thus more adequate TCP performance with IPv6.
Changes for IPv4, r166403 and r172795, both ignored the IPv6 counterpart and left it in the state of art of year 2000.
The same logic in syncache already shares code between v4 and v6 so things do not need to be adapted there.
Reported by: Steinar Haug (sthaug nethelp.no) Tested by: Steinar Haug (sthaug nethelp.no) MFC after: 3 days
|
190787 |
06-Apr-2009 |
zec |
First pass at separating per-vnet initializer functions from existing functions for initializing global state.
At this stage, the new per-vnet initializer functions are directly called from the existing global initialization code, which should in most cases result in compiler inlining those new functions, hence yielding a near-zero functional change.
Modify the existing initializer functions which are invoked via protosw, like ip_init() et. al., to allow them to be invoked multiple times, i.e. per each vnet. Global state, if any, is initialized only if such functions are called within the context of vnet0, which will be determined via the IS_DEFAULT_VNET(curvnet) check (currently always true).
While here, V_irtualize a few remaining global UMA zones used by net/netinet/netipsec networking code. While it is not yet clear to me or anybody else whether this is the right thing to do, at this stage this makes the code more readable, and makes it easier to track uncollected UMA-zone-backed objects on vnet removal. In the long run, it's quite possible that some form of shared use of UMA zone pools among multiple vnets should be considered.
Bump __FreeBSD_version due to changes in layout of structs vnet_ipfw, vnet_inet and vnet_net.
Approved by: julian (mentor)
|
190753 |
05-Apr-2009 |
kan |
If KTR_SUBSYS is compiled in, it does not necessarily mean that user is interested in being spammed by mcast-related printfs.
Use proper check against ktr_mask instead KTR_COMPILE.
|
190692 |
04-Apr-2009 |
bms |
Fix mbuf chain layout pessimization: in the case where a single mbuf is allocated due to m_getcl() returning NULL, we already call MH_ALIGN, so do not increment m->m_data in this case.
Found during MLDv2 port.
|
190691 |
04-Apr-2009 |
bms |
Do not obliterate QQI with MAXRESP.
Found during MLDv2 port.
|
190689 |
04-Apr-2009 |
rrs |
Many bug fixes (from the IETF hack-fest): - PR-SCTP had major issues when skipping through a multi-part message. o Did not look at socket buffer. o Did not properly handle the reassmebly queue. o The MARKED segments could interfere and un-skip a chunk causing a problem with the proper FWD-TSN. o No FR of FWD-TSN's was being done. - NR-Sack code was basically disabled. It needed fixes that never got into the real code. - CMT code had issues when the two paths were NOT the same b/w. We found a few small bugs, but also the critcal one here was not dividing the rwnd amongst the paths.
Obtained from: Michael Tuexen and myself at the IETF hack-fest ;-)
|
190633 |
01-Apr-2009 |
piso |
Implement an ipfw action to reassemble ip packets: reass.
|
190354 |
24-Mar-2009 |
bms |
Don't call m_freem() after ip_output(), as it always consumes the mbuf chain provided to it.
Found by: Pierre Guinoiseau
|
190233 |
22-Mar-2009 |
jmallett |
Remove local in6_addr variables for local and foreign addresses in sysctl_drop, they were passed uninitialized to in6_pcblookup_hash. Instead, do as is done for IPv4 and use the addresses within the sockaddr structure, which are correctly populated.
This fixes tcpdrop(8) for IPv6 address pairs.
Reviewed by: bz
|
190148 |
20-Mar-2009 |
bms |
Fix brainos introduced during mechanical KTR change.
Pointy hat to: bms
|
190054 |
19-Mar-2009 |
bms |
Cleanup: Nuke debug.mrtdebug, and replace it with KTR.
|
190012 |
19-Mar-2009 |
bms |
Introduce a number of changes to the MROUTING code. This is purely a forwarding plane cleanup; no control plane code is involved.
Summary: * Split IPv4 and IPv6 MROUTING support. The static compile-time kernel option remains the same, however, the modules may now be built for IPv4 and IPv6 separately as ip_mroute_mod and ip6_mroute_mod. * Clean up the IPv4 multicast forwarding code to use BSD queue and hash table constructs. Don't build our own timer abstractions when ratecheck() and timevalclear() etc will do. * Expose the multicast forwarding cache (MFC) and virtual interface table (VIF) as sysctls, to reduce netstat's dependence on libkvm for this information for running kernels. * bandwidth meters however still require libkvm. * Make the MFC hash table size a boot/load-time tunable ULONG, net.inet.ip.mfchashsize (defaults to 256). * Remove unused members from struct vif and struct mfc. * Kill RSVP support, as no current RSVP implementation uses it. These stubs could be moved to raw_ip.c. * Don't share locks or initialization between IPv4 and IPv6. * Don't use a static struct route_in6 in ip6_mroute.c. The v6 code is still using a cached struct route_in6, this is moved to mif6 for the time being. * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.
v4 path tested using ports/net/mcast-tools. v6 changes are mostly mechanical locking and *have not* been tested. As these changes partially break some kernel ABIs, they will not be MFCed. There is a lot more work to be done here.
Reviewed by: Pavlin Radoslavov
|
190011 |
19-Mar-2009 |
bms |
Comment IGMP_PIM as being very historic, as in, don't use.
|
189931 |
17-Mar-2009 |
bms |
Deal with the case where ifma_protospec may be NULL, during any IPv4 multicast operations which reference it.
There is a potential race because ifma_protospec is set to NULL when we discover the underlying ifnet has gone away. This write is not covered by the IF_ADDR_LOCK, and it's difficult to widen its scope without making it a recursive lock. It isn't clear why this manifests more quickly with 802.11 interfaces, but does not seem to manifest at all with wired interfaces.
With this change, the 802.11 related panics reported by sam@ and cokane@ should go away. It is not the right fix, that requires more thought before 8.0.
Idea from: sam Tested by: cokane
|
189851 |
15-Mar-2009 |
rwatson |
Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced in FreeBSD 5.x to allow network device drivers to run with Giant despite the network stack being Giant-free. This significantly simplifies calls into ioctl() on network interfaces, especially in the multicast code, as well as eliminates deferred invocation of interface if_start routines.
Disable the build on device drivers still depending on IFF_NEEDSGIANT as they no longer compile. They will be removed in a few weeks if they haven't been made MPSAFE in that time. Disabled drivers:
if_ar if_axe if_aue if_cdce if_cue if_kue if_ray if_rue if_rum if_sr if_udav if_ural if_zyd
Drivers that were already disabled because of tty changes:
if_ppp if_sl
Discussed on: arch@
|
189848 |
15-Mar-2009 |
rwatson |
Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers):
INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED
Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account.
MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz
|
189836 |
14-Mar-2009 |
rrs |
Opps.. I missed a file on the commit :-)
|
189829 |
14-Mar-2009 |
das |
Namespace: Defining htonl() and friends here instead of arpa/inet.h is a BSD extension.
|
189790 |
14-Mar-2009 |
rrs |
Fixes several PR-SCTP releated bugs. - When sending large PR-SCTP messages over a lossy link we would incorrectly calculate the fwd-tsn - When receiving large multipart pr-sctp packets we would incorrectly send back a SACK that would renege improperly on already received packets thus causing unneeded retransmissions.
|
189657 |
11-Mar-2009 |
rwatson |
Add INP_INHASHLIST flag for inpcb->inp_flags to indicate whether or not the inpcb is currenty on various hash lookup lists, rather than using (lport != 0) to detect this. This means that the full 4-tuple of a connection can be retained after close, which should lead to more sensible netstat output in the window between TCP close and socket close.
MFC after: 2 weeks
|
189637 |
10-Mar-2009 |
rwatson |
Remove unused v6 macro aliases for inpcb fields:
in6p_ip6_nxt in6p_vflag in6p_flags in6p_socket in6p_lport in6p_fport in6p_ppcb
Remove unused v6 macro aliases for inpcb flags:
IN6P_HIGHPORT IN6P_LOWPORT IN6P_ANONPORT IN6P_RECVIF IN6P_MTUDISC IN6P_FAITH IN6P_CONTROLOPTS
References to in6p_lport and in6_fport in sockstat are also replaced with normal inp_lport and inp_fport references.
MFC after: 3 days Reviewed by: bz
|
189635 |
10-Mar-2009 |
bms |
Don't print inm_print() chatter when KTR_IGMPV3 is not enabled in the KTR_COMPILE mask.
Found by: gnn
|
189615 |
10-Mar-2009 |
rwatson |
Remove now-unused INP_UNMAPPABLEOPTS.
MFC after: 3 days Discussed with: bz
|
189603 |
09-Mar-2009 |
bms |
Fix uninitialized use of ifp for ii.
Found by: Peter Holm
|
189592 |
09-Mar-2009 |
bms |
Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD IPv4 stack.
Diffs are minimized against p4. PCS has been used for some protocol verification, more widespread testing of recorded sources in Group-and-Source queries is needed. sizeof(struct igmpstat) has changed.
__FreeBSD_version is bumped to 800070.
|
189494 |
07-Mar-2009 |
marius |
On architectures with strict alignment requirements compensate the misalignment of the IP header that prepending the EtherIP header might have caused.
PR: 131921 MFC after: 1 week
|
189444 |
06-Mar-2009 |
rrs |
Fixes for window probes: 1) WP should never be marked unless flight size is 0 2) When recovering from wp if the peer ack's it we don't mark for retran 3) When recovering, we must assure a timer is still running.
|
189371 |
04-Mar-2009 |
rrs |
- PR-SCTP bug, where the CUM-ACK was not being updated into the advance_peer_ack point so we would incorrectly send a wrong value in the FWD-TSN - PR-SCTP bug, where an PR packet is used for a window probe which could incorrectly get the packet moved back into the send_queue, which will cause major issues and should not happen. - Fix a trace to use the proper macro.
|
189359 |
04-Mar-2009 |
bms |
In ip_output(), do not acquire the IN_MULTI_LOCK(), and do not attempt to perform a group lookup. This is a socket layer lock, and the bottom half of IP really has no business taking it.
Use the value of the in_mcast_loop sysctl to determine if we should loop back by default, in the absence of any multicast socket options. Because the check on group membership is now deferred to the input path, an m_copym() is now required.
This should increase multicast send performance where the source has not requested loopback, although this has not been benchmarked or measured.
It is also a necessary change for IN_MULTI_LOCK to become non-recursive, which is required in order to implement IGMPv3 in a thread-safe way.
|
189357 |
04-Mar-2009 |
bms |
Add sysctl net.inet.ip.mcast.loop. This controls whether or not IPv4 multicast sends are looped back to senders by default on a stack-wide basis, rather than relying on the socket option. Note that the sysctl only applies to newly created multicast sockets.
|
189347 |
04-Mar-2009 |
bms |
Merge header file definitions used by the new IGMPv3 implementation. This is a partial merge. Compatibility defines are retained for the existing IGMPv2 implementation.
|
189346 |
04-Mar-2009 |
bms |
Add various defines/macros required by IGMPv3: * MCAST_UNDEFINED state. * in_allhosts() macro (group is 224.0.0.1). This uses a const endian comparison. * IP_MAX_GROUP_SRC_FILTER, IP_MAX_SOCK_SRC_FILTER default resource limits.
|
189343 |
04-Mar-2009 |
bms |
Add function ip_checkrouteralert(), which will be used by IGMPv3 to check for the IPv4 Router Alert [RFC2113] option in a pulled-up IP mbuf chain.
|
189303 |
03-Mar-2009 |
bz |
Start removing IPv6 Type 0 Routing header code. RH0 was deprecated by RFC 5095.
While most of the code had been disabled by #if 0 already, leave a bit of infrastructure for possible RH2 code and a log message under BURN_BRIDGES in case a user still tries to send RH0 packets.
Reviewed by: gnn (a bit back, earlier version)
|
189289 |
02-Mar-2009 |
luigi |
curr_time is a 64 bit variable so SYSCTL_LONG is not appropriate as a handler. The variable was exported only for debugging, but there is little reason to do it now that the timekeeping is supported by various other variables. For the time being just comment out the sysctl, but I think this should go away.
|
189288 |
02-Mar-2009 |
luigi |
fw_debug has been unused for ages, so remove it from the list of sysctl_variables. I would also remove it from the VNET record but I am unsure if there is any ABI issue -- so for the time being just mark it as unused in ip_fw.h, and then we will collect the garbage at some appropriate time in the future.
MFC after: 3 days
|
189225 |
01-Mar-2009 |
bz |
Add size-guards evaluated at compile-time to the main struct vnet_* which are not in a module of their own like gif.
Single kernel compiles and universe will fail if the size of the struct changes. Th expected values are given in sys/vimage.h. See the comments where how to handle this.
Requested by: peter
|
189196 |
28-Feb-2009 |
rwatson |
Remove unreachable code for generating RST segments from tcp_twcheck(); this code became stale when T/TCP support was removed.
Discussed with: bz, sam MFC after: 1 month
|
189121 |
27-Feb-2009 |
rrs |
Fix the add stream feature of strm-reset to really work: - Fix the copy, we can't do a blind copy but must transfer the data from the old to the new. - Fix the ACK processing so we properly stop retransmitting the thing. - Fix it so if we get a retran we will properly reply with the saved response without doing anything.
MFC after: 1 month
|
189106 |
27-Feb-2009 |
bz |
For all files including net/vnet.h directly include opt_route.h and net/route.h.
Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.
We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong.
This does not change the list of files which depend on opt_route.h but we can identify them now more easily.
|
189004 |
24-Feb-2009 |
rdivacky |
Change the functions to ANSI in those cases where it breaks promotion to int rule. See ISO C Standard: SS6.7.5.3:15.
Approved by: kib (mentor) Reviewed by: warner Tested by: silence on -current
|
188992 |
24-Feb-2009 |
rwatson |
In tcp_usr_shutdown() and tcp_usr_send(), I missed converting NULL checks for the tcpcb, previously used to detect complete disconnection, with INP_DROPPED checks. Correct that, preventing shutdown() from improperly generating a TCP segment with destination IP and port of 0.0.0.0:0.
PR: kern/132050 Reported by: david gueluy <david.gueluy at netasq.com> MFC after: 3 weeks
|
188962 |
23-Feb-2009 |
rwatson |
In in_rtqkill(), assert the radix head lock, and pass RTF_RNH_LOCKED to in_rtrequest(); the radix head lock is already acquired before rnh_walktree is called in in_rtqtimo_one(). This avoids a recursive acquisition that is no longer permitted in 8.x due to use of an rwlock for the radix head lock.
Reported by: dikshie <dikshie at gmail.com> MFC after: 3 days
|
188854 |
20-Feb-2009 |
rrs |
Add the add-stream capability. Still needs more testing..
MFC after: 1 month
|
188852 |
20-Feb-2009 |
rrs |
Fix a bug. The sending was being restricted improperly by the max_burst. It should only be gated by cwnd in the lower level send.
Obtained from: Michael Tuexen MFC after: 1 week.
|
188676 |
16-Feb-2009 |
luigi |
correct some #include
|
188673 |
16-Feb-2009 |
luigi |
remove dependency on eventhandler.h, we only need a forward declaration
|
188672 |
16-Feb-2009 |
luigi |
remove dependency on net/if.h of this header
|
188669 |
16-Feb-2009 |
luigi |
use a const format string in the log message so we can check the arguments (if/when we enable those checks)
|
188626 |
15-Feb-2009 |
luigi |
remove unnecessary #include from vnet.h and vinet.h
Approved by: Marko Zec
|
188605 |
14-Feb-2009 |
rrs |
This commit fixes the issue with alias_sctp.c. No longer do we require SCTP to be in the kernel for the lib to be able to handle SCTP. We do this by moving the CRC32c checksum into libkern/crc32.c and then adjusting all routines to use the common methods. Note that this will improve the performance of iSCSI since they were using the old single 256 bit table lookup versus the slicing 8 algorithm (which gives a 4x speed up in CRC32c calculation :-D)
Reviewed by:rwatson, gnn, scottl, paolo MFC after: 4 week? (assuming we MFC the alias_sctp changes)
|
188590 |
13-Feb-2009 |
rrs |
Have the jail code use the error returned to pass not constant errors. Obtained from: jamie@freebsd.org
|
188580 |
13-Feb-2009 |
luigi |
remove unnecessary #include, and document some of the others
|
188578 |
13-Feb-2009 |
luigi |
Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. Add a note next to fields in network format.
The n_* types are not enough for compiler checks on endianness, and their use often requires an otherwise unnecessary #include <netinet/in_systm.h>
The typedef in in_systm.h are still there.
|
188577 |
13-Feb-2009 |
rrs |
Move the new rwnd field down to the very end of the xsctp structure. This is where all new fields belong (not that we will be ABI compatiable with 7.x anyway.. sigh).
|
188398 |
09-Feb-2009 |
rrs |
Add padding to then end of the xsctp_xxx structures to allow future changes to be able to maintain ABI compatibility
|
188388 |
09-Feb-2009 |
rrs |
Fix minor spacing problem found by s9indent from last commit.
|
188387 |
09-Feb-2009 |
rrs |
Fix INET only build breakage with SCTP - pointy hat to me :-)
|
188306 |
08-Feb-2009 |
bz |
Try to remove/assimilate as much of formerly IPv4/6 specific (duplicate) code in sys/netipsec/ipsec.c and fold it into common, INET/6 independent functions.
The file local functions ipsec4_setspidx_inpcb() and ipsec6_setspidx_inpcb() were 1:1 identical after the change in r186528. Rename to ipsec_setspidx_inpcb() and remove the duplicate.
Public functions ipsec[46]_get_policy() were 1:1 identical. Remove one copy and merge in the factored out code from ipsec_get_policy() into the other. The public function left is now called ipsec_get_policy() and callers were adapted.
Public functions ipsec[46]_set_policy() were 1:1 identical. Rename file local ipsec_set_policy() function to ipsec_set_policy_internal(). Remove one copy of the public functions, rename the other to ipsec_set_policy() and adapt callers.
Public functions ipsec[46]_hdrsiz() were logically identical (ignoring one questionable assert in the v6 version). Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(), the public function to ipsec_hdrsiz(), remove the duplicate copy and adapt the callers. The v6 version had been unused anyway. Cleanup comments.
Public functions ipsec[46]_in_reject() were logically identical apart from statistics. Move the common code into a file local ipsec46_in_reject() leaving vimage+statistics in small AF specific wrapper functions. Note: unfortunately we already have a public ipsec_in_reject().
Reviewed by: sam Discussed with: rwatson (renaming to *_internal) MFC after: 26 days X-MFC: keep wrapper functions for public symbols?
|
188299 |
08-Feb-2009 |
piso |
Silent LINT: add 2 stubs (update_crc32 and sctp_finalize_crc32) to fix LIBALIAS + SCTP_NO_CSUM case.
|
188294 |
07-Feb-2009 |
piso |
Add SCTP NAT support.
Submitted by: CAIA (http://caia.swin.edu.au)
|
188148 |
05-Feb-2009 |
jamie |
Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind.
Approved by: bz (mentor)
|
188144 |
05-Feb-2009 |
jamie |
Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL.
Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls.
Approved by: bz (mentor)
|
188100 |
03-Feb-2009 |
rrs |
LOR fix - Lock only when calling the actual code that is messing with the UDP tunnel. This means that if two users actually tried to change the tunnel port at the same time interesting things COULD result, but its probably very unlikely to happen :-)
|
188067 |
03-Feb-2009 |
rrs |
- Cleanup checksum code. - Prepare for CRC offloading, add MIB counters (RS/MT). - Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT). - Bugfix: Handle close() with SO_LINGER correctly when notifications are generated during the close() call(MT). - Bugfix: Generate DRY event when sender is dry during subscription. Only for 1-to-1 style sockets (RS/MT) - Bugfix: Put vtags for the correct amount of time into time-wait (MT). - Bugfix: Clear vtag entries correctly on expiration (MT). - Bugfix: shutdown() indicates ENOTCONN when called for unconnected 1-to-1 style sockets (MT). - Bugfix: In sctp Auth code (PL). - Add support for devices that support SCTP csum offload (igb). - Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS) Obtained from: With help from Peter Lei and Michael Tuexen
|
188066 |
03-Feb-2009 |
rrs |
Adds support for SCTP checksum offload. This means we, like TCP and UDP, move the checksum calculation into the IP routines when there is no hardware support we call into the normal SCTP checksum routine.
The next round of SCTP updates will use this functionality. Of course the IGB driver needs a few updates to support the new intel controller set that actually does SCTP csum offload too.
Reviewed by: gnn, rwatson, kmacy
|
187822 |
28-Jan-2009 |
luigi |
initialize a couple of variables, gcc 4.2.4-4 (linux) reports some possible uninitialized uses and the warning does make sense.
|
187821 |
28-Jan-2009 |
luigi |
For some reason (probably dating ages ago) an #ifdef SYSCTL_NODE / #endif section included a lot of stuff that did not belong there. So split the block in multiple components each around the relevant stuff.
This said, I wonder if building a kernel where SYSCTL_NODE is not defined is supported at all.
Submitted by: Marta Carbone
|
187684 |
25-Jan-2009 |
bz |
For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN.
Submitted by: jamie (as part of a larger patch) MFC after: 1 week
|
187585 |
22-Jan-2009 |
bz |
Add externs to fix build with VIMAGE_GLOBALS after r187289.
|
187380 |
18-Jan-2009 |
sam |
remove too noisy DIAGNOSTIC code
Reviewed by: qingli
|
187304 |
15-Jan-2009 |
piso |
Silent userland warnings about missing prototypes.
Submitted by: Roman Divacky <rdivacky@freebsd.org>
|
187289 |
15-Jan-2009 |
lstewart |
Add TCP Appropriate Byte Counting (RFC 3465) support to kernel.
The new behaviour is on by default, and can be disabled by setting the net.inet.tcp.rfc3465 sysctl to 0 to obtain previous behaviour.
The patch changes struct tcpcb in sys/netinet/tcp_var.h which breaks the ABI. Bump __FreeBSD_version to 800061 accordingly. User space tools that rely on the size of struct tcpcb (e.g. sockstat) need to be recompiled.
Reviewed by: rpaulo, gnn Approved by: gnn, kmacy (mentors) Sponsored by: FreeBSD Foundation
|
187062 |
11-Jan-2009 |
rwatson |
Since we allow conditional allocation of labels on syncache entries, remove historic assertion that labels are always present.
|
186980 |
09-Jan-2009 |
bz |
Restrict arp, ndp and theoretically the FIB listing (if not read with libkvm) to the addresses of a prison, when inside a jail. [1] As the patch from the PR was pre-'new-arp', add checks to the llt_dump handlers as well.
While touching RTM_GET in route_output(), consistently use curthread credentials rather than the creds from the socket there. [2]
PR: kern/68189 Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1] Discussed with: rwatson [2] Reviewed by: rwatson MFC after: 4 weeks
|
186963 |
09-Jan-2009 |
adrian |
Fix fat-fingered comment.
Noticed-by: julian
|
186961 |
09-Jan-2009 |
adrian |
Fix indentation; add FALLTHROUGH.
Thanks Max!
|
186960 |
09-Jan-2009 |
adrian |
Better comment what the socket option does. Thanks to Sam Leffler for suggesting this.
|
186959 |
09-Jan-2009 |
adrian |
Comment some potentially confusing logic.
Nitpicking by: mlaier
MFC after: 2 weeks
|
186955 |
09-Jan-2009 |
adrian |
Implement a new IP option (not compiled/enabled by default) to allow applications to specify a non-local IP address when bind()'ing a socket to a local endpoint.
This allows applications to spoof the client IP address of connections if (obviously!) they somehow are able to receive the traffic normally destined to said clients.
This patch doesn't include any changes to ipfw or the bridging code to redirect the client traffic through the PCB checks so TCP gets a shot at it. The normal behaviour is that packets with a non-local destination IP address are not handled locally. This can be dealth with some IPFW hackery; modifications to IPFW to make this less hacky will occur in subsequent commmits.
Thanks to Julian Elischer and others at Ironport. This work was approved and donated before Cisco acquired them.
Obtained from: Julian Elischer and others MFC after: 2 weeks
|
186948 |
09-Jan-2009 |
bz |
Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related jail-aware. Up to now we returned the first address of the interface for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for programs querying for an address but running inside a jail, as the address returned usually did not belong to the jail. Like for v6, if there was an ifr_addr given on v4, you could probe for more addresses on the interfaces that you were not allowed to see from inside a jail. Return an error (EADDRNOTAVAIL) in that case now unless the address is on the given interface and valid for the jail.
PR: kern/114325 Reviewed by: rwatson MFC after: 4 weeks
|
186935 |
09-Jan-2009 |
harti |
Set a minimum of information in the routing message (like version and type) so that generic routing message parsing code can parse the messages for L2 info that are retrieved via the sysctl interface.
|
186821 |
06-Jan-2009 |
rrs |
Addresses Roberts comments on comments. Also adds the KASSERT and checks suggested.
Reviewed by: The udp tunneling was discussed on net@ under the thread entitled "Heads up -- Thinking about UDP and tunneling"
|
186813 |
06-Jan-2009 |
rrs |
Add the ability of an alternate transport protocol to easily tunnel over udp by providing a hook function that will be called instead of appending to the socket buffer.
|
186717 |
03-Jan-2009 |
rwatson |
Allow the IP_MINTTL socket option to be set to 0 so that it can be disabled entirely, which is its default state before set to a non-zero value.
PR: 128790 Submitted by: Nick Hilliard <nick at foobar dot org> MFC after: 3 weeks
|
186708 |
03-Jan-2009 |
qingli |
Some modules such as SCTP supplies a valid route entry as an input argument to ip_output(). The destionation is represented in a sockaddr{} object that may contain other pieces of information, e.g., port number. This same destination sockaddr{} object may be passed into L2 code, which could be used to create a L2 entry. Since there exists a L2 table per address family, the L2 lookup function can make address family specific comparison instead of the generic bcmp() operation over the entire sockaddr{} structure.
Note in the IPv6 case the sin6_scope_id is not compared because the address is currently stored in the embedded form inside the kernel. The in6_lltable_lookup() has to account for the scope-id if this storage format were to change in the future.
|
186544 |
28-Dec-2008 |
bz |
For consistency use LLE_IS_VALID() in this 4th place that is actually interested in the (void *)-1 return value hack. This way we can easily identify those special parts of the code.
|
186500 |
26-Dec-2008 |
qingli |
This checkin addresses a couple of issues: 1. The "route" command allows route insertion through the interface-direct option "-iface". During if_attach(), an sockaddr_dl{} entry is created for the interface and is part of the interface address list. This sockaddr_dl{} entry describes the interface in detail. The "route" command selects this entry as the "gateway" object when the "-iface" option is present. The "arp" and "ndp" commands also interact with the kernel through the routing socket when adding and removing static L2 entries. The static L2 information is also provided through the "gateway" object with an AF_LINK family type, similar to what is provided by the "route" command. In order to differentiate between these two types of operations, a RTF_LLDATA flag is introduced. This flag is set by the "arp" and "ndp" commands when issuing the add and delete commands. This flag is also set in each L2 entry returned by the kernel. The "arp" and "ndp" command follows a convention where a RTM_GET is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills in the fields for a "rtm" object, which is reinjected into the kernel by a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET is a prefix route, so the RTF_LLDATA flag must be specified when issuing the RTM_ADD/DELETE messages.
2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the specification for retrieving L2 information. Also optimized the code logic.
Reviewed by: julian
|
186474 |
24-Dec-2008 |
kmacy |
Fix missed unlock and reference drop of lle
Found by: pho
|
186437 |
23-Dec-2008 |
bz |
Remove long unused netinet/ipprotosw.h (basically since r82884).
Discussed with: rwatson MFC after: 4 weeks
|
186411 |
23-Dec-2008 |
qingli |
Don't create a bogus ARP entry for 0.0.0.0.
|
186317 |
19-Dec-2008 |
qingli |
The proxy-arp code was broken and responds to ARP requests for addresses that are not proxied locally.
|
186223 |
17-Dec-2008 |
bz |
Another step assimilating IPv[46] PCB code: normalize IN6P_* compat flags usage to their equialent INP_* counterpart.
Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
|
186222 |
17-Dec-2008 |
bz |
Use inc_flags instead of the inc_isipv6 alias which so far had been the only flag with random usage patterns. Switch inc_flags to be used as a real bit field by using INC_ISIPV6 with bitops to check for the 'isipv6' condition.
While here fix a place or two where in case of v4 inc_flags were not properly initialized before.[1]
Found by: rwatson during review [1] Discussed with: rwatson Reviewed by: rwatson MFC after: 4 weeks
|
186200 |
17-Dec-2008 |
kmacy |
default to doing lla_lookup with shared afdata lock and returning a shared lock on the lle - thus restoring parallel performance to pre-arpv2 level
|
186180 |
16-Dec-2008 |
rwatson |
IPFW's pfil hook/unhook code ignores the return values of pfil_add_hook() and pfil_remove_hook(), so cast them to (void).
MFC after: pretty soon
|
186178 |
16-Dec-2008 |
kmacy |
ipfw doesn't use the radix node head lock to protect the radix tree - remove acquisition
|
186164 |
16-Dec-2008 |
kmacy |
check pointer against NULL add new line after declaration for style
|
186161 |
16-Dec-2008 |
kmacy |
don't unlock lle if it is NULL
|
186150 |
16-Dec-2008 |
kmacy |
unlock and destroy an llentry's lock before freeing
Found by: sam
|
186141 |
15-Dec-2008 |
bz |
Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change.
Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)
|
186119 |
15-Dec-2008 |
qingli |
This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries.
Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
|
186086 |
14-Dec-2008 |
bz |
Add a check, that is currently under discussion for 8 but that we need to keep for 7-STABLE when MFCing in_pcbladdr() to not change the behaviour there.
With this a destination route via a loopback interface is treated as a valid and reachable thing for IPv4 source address selection, even though nothing of that network is ever directly reachable, but it is more like a blackhole route. With this the source address will be selected and IPsec can grab the packets before we would discard them at a later point, encapsulate them and send them out from a different tunnel endpoint IP.
Discussed on: net Reported by: Frank Behrens <frank@harz.behrens.de> Tested by: Frank Behrens <frank@harz.behrens.de> MFC after: 4 weeks (just so that I get the mail)
|
186057 |
13-Dec-2008 |
bz |
De-virtualize the MD5 context for TCP initial seq number generation and make it a function local variable like we do almost everywhere inside the kernel.
Discussed with: rwatson, silby MFC after: 4 weeks
|
186054 |
13-Dec-2008 |
kmacy |
version that will compile
|
186053 |
13-Dec-2008 |
kmacy |
radix node head lock needs to be held when calling rnh_addaddr
|
186052 |
13-Dec-2008 |
kmacy |
don't acquire lock recursively
|
186048 |
13-Dec-2008 |
bz |
Second round of putting global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL.
Put the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely.
Sponsored by: The FreeBSD Foundation
|
185937 |
11-Dec-2008 |
bz |
Put a global variables, which were virtualized but formerly missed under VIMAGE_GLOBAL.
Start putting the extern declarations of the virtualized globals under VIMAGE_GLOBAL as the globals themsevles are already. This will help by the time when we are going to remove the globals entirely.
While there garbage collect a few dead externs from ip6_var.h.
Sponsored by: The FreeBSD Foundation
|
185934 |
11-Dec-2008 |
bz |
Use the correct INIT_VNET_INET() as the virtualized variable here are in vinet.h not in vinet6.h
Sponsored by: The FreeBSD Foundation
|
185895 |
10-Dec-2008 |
zec |
Conditionally compile out V_ globals while instantiating the appropriate container structures, depending on VIMAGE_GLOBALS compile time option.
Make VIMAGE_GLOBALS a new compile-time option, which by default will not be defined, resulting in instatiations of global variables selected for V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be effectively compiled out. Instantiate new global container structures to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0, vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.
Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_ macros resolve either to the original globals, or to fields inside container structures, i.e. effectively
#ifdef VIMAGE_GLOBALS #define V_rt_tables rt_tables #else #define V_rt_tables vnet_net_0._rt_tables #endif
Update SYSCTL_V_*() macros to operate either on globals or on fields inside container structs.
Extend the internal kldsym() lookups with the ability to resolve selected fields inside the virtualization container structs. This applies only to the fields which are explicitly registered for kldsym() visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently this is done only in sys/net/if.c.
Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code, and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in turn result in proper code being generated depending on VIMAGE_GLOBALS.
De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c which were prematurely V_irtualized by automated V_ prepending scripts during earlier merging steps. PF virtualization will be done separately, most probably after next PF import.
Convert a few variable initializations at instantiation to initialization in init functions, most notably in ipfw. Also convert TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in initializer functions.
Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
185858 |
10-Dec-2008 |
rwatson |
Remove inconsistent white space from in_pcballoc().
MFC after: pretty soon
|
185857 |
10-Dec-2008 |
rwatson |
Move syncache flag definitions below data structure, compress some vertical whitespace.
MFC after: pretty soon
|
185855 |
10-Dec-2008 |
rwatson |
Move flag definitions for t_flags and t_oobflags below the definition of struct tcpcb so that the structure definition is a bit more vertically compact. Can't yet fit it on one printed page, though.
MFC after: pretty soon
|
185845 |
10-Dec-2008 |
kmacy |
unlock when done
|
185844 |
10-Dec-2008 |
kmacy |
don't reference if_addr_mtx directly
|
185813 |
10-Dec-2008 |
rwatson |
Update comment on INP_TIMEWAIT to say what it's about, as we caution regarding the misplacement of flags in inp_vflag in an earlier comment.
MFC after: pretty soon
|
185795 |
09-Dec-2008 |
rwatson |
Enhance one comment relating to recent TCP locking changes, and fix a typo in another.
MFC after: 6 weeks
|
185791 |
09-Dec-2008 |
rwatson |
Move macros defining flags and shortcus to nested structure fields in inpcbinfo below the structure definition in order to make inpcbinfo fit on a single printed page; related style tweaks.
MFC after: pretty soon
|
185775 |
08-Dec-2008 |
rwatson |
Move from solely write-locking the global tcbinfo in tcp_input() to read-locking in the TCP input path, allowing greater TCP input parallelism where multiple ithreads or ithread and netisr are able to run in parallel. Previously, most TCP input paths held a write lock on the global tcbinfo lock, effectively serializing TCP input.
Before looking up the connection, acquire a write lock if a potentially state-changing flag is set on the TCP segment header (FIN, RST, SYN), and otherwise a read lock. We may later have to upgrade to a write lock in certain cases (ACKs received by the syncache or during TIMEWAIT) in order to support global state transitions, but this is never required for steady-state packets.
Upgrading from a write lock to a read lock must be done as a trylock operation to avoid deadlocks, and actually violates the lock order as the tcbinfo lock preceeds the inpcb lock held at the time of upgrade. If the trylock fails, we bump the refcount on the inpcb, drop both locks, and re-acquire in-order. If another thread has freed the connection while the locks are dropped, we free the inpcb and repeat the lookup (this should hardly ever or never happen in practice).
For now, maintain a number of new counters measuring how many times various cases execute, and in particular whether various optimistic assumptions about when read locks can be used, whether upgrades are done using the fast path, and whether connections close in practice in the above-described race, actually occur.
MFC after: 6 weeks Discussed with: kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy
|
185773 |
08-Dec-2008 |
rwatson |
Add a reference count to struct inpcb, which may be explicitly incremented using in_pcbref(), and decremented using in_pcbfree() or inpcbrele(). Protocols using only current in_pcballoc() and in_pcbfree() calls will see the same semantics, but it is now possible for TCP to call in_pcbref() and in_pcbrele() to prevent an inpcb from being freed when both tcbinfo and per-inpcb locks are released. This makes it possible to safely transition from holding only the inpcb lock to both tcbinfo and inpcb lock without re-looking up a connection in the input path, timer path, etc.
Notice that in_pcbrele() does not unlock the connection after decrementing the refcount, if the connection remains, so that the caller can continue to use it; in_pcbrele() returns a flag indicating whether or not the inpcb pointer is still valid, and in_pcbfee() is now a simple wrapper around in_pcbrele().
MFC after: 1 month Discussed with: bz, kmacy Reviewed by: bz, gnn, kmacy Tested by: kmacy
|
185713 |
06-Dec-2008 |
csjp |
in_rtalloc1(9) returns a locked route, so make sure that we use RTFREE_LOCKED() here. This macro makes sure the reference count on the route is being managed properly. This elimates another case which results in the following message being printed to the console:
rtfree: 0xc841ee88 has 1 refs
Reviewed by: bz MFC after: 2 weeks
|
185694 |
06-Dec-2008 |
rrs |
Code from the hack-session known as the IETF (and a bit of debugging afterwards): - Fix protection code for notification generation. - Decouple associd from vtag - Allow vtags to have less strigent requirements in non-uniqueness. o don't pre-hash them when you issue one in a cookie. o Allow duplicates and use addresses and ports to discriminate amongst the duplicates during lookup. - Add support for the NAT draft draft-ietf-behave-sctpnat-00, this is still experimental and needs more extensive testing with the Jason Butt ipfw changes. - Support for the SENDER_DRY event to get DTLS in OpenSSL working with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon). - Update the support of SCTP-AUTH by Peter Lei. - Use macros for refcounting. - Fix MTU for UDP encapsulation. - Fix reporting back of unsent data. - Update assoc send counter handling to be consistent with endpoint sent counter. - Fix a bug in PR-SCTP. - Fix so we only send another FWD-TSN when a SACK arrives IF and only if the adv-peer-ack point progressed. However we still make sure a timer is running if we do have an adv_peer_ack point. - Fix PR-SCTP bug where chunks were retransmitted if they are sent unreliable but not abandoned yet.
With the help of: Michael Teuxen and Peter Lei :-) MFC after: 4 weeks
|
185636 |
05-Dec-2008 |
glebius |
In a case of CARP status change run through the if_link_state_change() routine, so that devd(8) and others are notified about link state change.
|
185571 |
02-Dec-2008 |
bz |
Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files.
For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h.
Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
|
185435 |
29-Nov-2008 |
bz |
MFp4: Bring in updated jail support from bz_jail branch.
This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,..
SCTP support was updated and supports IPv6 in jails as well.
Cpuset support permits jails to be bound to specific processor sets after creation.
Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future.
DDB 'show jails' command was added to aid debugging.
Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities.
Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years.
Bump __FreeBSD_version for the afore mentioned and in kernel changes.
Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this.
Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
|
185420 |
28-Nov-2008 |
zec |
Add an essential .h file that skipped from the last commit (r185419).
Pointy hat #1 on...
Pointed out by: bz
|
185419 |
28-Nov-2008 |
zec |
Unhide declarations of network stack virtualization structs from underneath #ifdef VIMAGE blocks.
This change introduces some churn in #include ordering and nesting throughout the network stack and drivers but is not expected to cause any additional issues.
In the next step this will allow us to instantiate the virtualization container structures and switch from using global variables to their "containerized" counterparts.
Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
185382 |
28-Nov-2008 |
des |
missing V_
|
185371 |
27-Nov-2008 |
bz |
Replace most INP_CHECK_SOCKAF() uses checking if it is an IPv6 socket by comparing a constant inp vflag. This is expected to help to reduce extra locking.
Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks
|
185370 |
27-Nov-2008 |
bz |
Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place.
Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
|
185366 |
27-Nov-2008 |
bz |
Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical.
Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrappers in 7 to keep the symbols.
|
185348 |
26-Nov-2008 |
zec |
Merge more of currently non-functional (i.e. resolving to whitespace) macros from p4/vimage branch.
Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.
De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless.
Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
185344 |
26-Nov-2008 |
bz |
Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice.
Reviewed by: rwatson MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
|
185333 |
26-Nov-2008 |
bz |
Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable.
No functional changes.
Reviewed by: rwatson MFC after: 6 weeks
|
185101 |
19-Nov-2008 |
julian |
Fix a scope problem in the multiple routing table code that stopped the SO_SETFIB socket option from working correctly.
Obtained from: Ironport MFC after: 3 days
|
185088 |
19-Nov-2008 |
zec |
Change the initialization methodology for global variables scheduled for virtualization.
Instead of initializing the affected global variables at instatiation, assign initial values to them in initializer functions. As a rule, initialization at instatiation for such variables should never be introduced again from now on. Furthermore, enclose all instantiations of such global variables in #ifdef VIMAGE_GLOBALS blocks.
Essentialy, this change should have zero functional impact. In the next phase of merging network stack virtualization infrastructure from p4/vimage branch, the new initialization methology will allow us to switch between using global variables and their counterparts residing in virtualization containers with minimum code churn, and in the long run allow us to intialize multiple instances of such container structures.
Discussed at: devsummit Strassburg Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
184883 |
12-Nov-2008 |
rrs |
-Improvement: Add '\n' on debug output in sctp_lower_sosend(). -Improvement: panic() on INVARIANTS kernels if memory allocation fails for a tagblock in sctp_add_vtag_to_timewait(). -Bugfix: Protect code in sctp_is_in_timewait() by SCTP_INP_INFO_WLOCK/SCTP_INP_INFO_WUNLOCK. -Cleanup: Get rid of unused variable now in sctp_init_asoc(). -Bugfix: Reuse the correct vtag in sctp_add_vtag_to_timewait(). -Cleanup: Get rid of unused constant SCTP_TIME_WAIT_SHORT in sctp_constants.h. -Improvement: Use all hash buckets of the vtag hash table. -Cleanup: Get rid of then unused constant SCTP_STACK_VTAG_HASH_SIZE_A. -Bugfix: Handle SHUTDOWN;SACK packet correctly. -Bugfix: Last TSN in a gap ack block was not being "ack'd" in the internal scoreboard. Obtained from: (with help from Michael Tuexen)
|
184797 |
09-Nov-2008 |
bz |
For consistency work on the local object passed into the function for the lock operation instead using the global name.
Submitted by: ganbold MFC after: 2 months
|
184731 |
06-Nov-2008 |
bz |
Fix typo and while here another one.
Reviewed by: keramida Reported by: keramida MFC after: 2 months (with r184720)
|
184722 |
06-Nov-2008 |
bz |
Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code.
Move the TSO logic back to tcp_mss() and out of tcp_mss_update(). We tried to avoid that initially but if were are called from tcp_output() with EMSGSIZE, we cleared the TSO flag on the tcpcb there, called into tcp_mtudisc() and tcp_mss_update() which then would reenable TSO on the tcpcb based on TSO capabilities of the interface as learnt in tcp_maxmtu/6(). So if TSO was enabled on the (possibly new) outgoing interface it was turned back on, which lead to an endless loop between tcp_output() and tcp_mtudisc() until we overflew the stack.
Reported by: kmacy MFC after: 2 months (along with r182851)
|
184721 |
06-Nov-2008 |
bz |
Adopt the comment for tcp_maxmtu(); we are returning a number not a pointer. While here update the rest of the comment to better match what we have these days.
MFC after: 2 months
|
184720 |
06-Nov-2008 |
bz |
Fix a bug introduced with r182851 splitting tcp_mss() into tcp_mss() and tcp_mss_update() so that tcp_mtudisc() could re-use the same code.
In case we return early and got a metricptr to pass the hostcache info back to the caller we need to initialize the data to a defined state (zero it) as tcp_hc_get() would do if there was no hit. Without that the caller would check on random stack garbage which could lead to undefined results.
This only affected tcp_mss() if there was no routing entry for the peer, tcp_mtudisc() was not affected.
MFC after: 2 months (along with r182851)
|
184414 |
28-Oct-2008 |
oleg |
Type of q_time (start of queue idle time) has changed: uint32_t -> uint64_t. This should fix q_time overflow, which happens after 2^32/(86400*hz) days of uptime (~50days for hz = 1000). q_time overflow cause following: - traffic shaping may not work in 'fast' mode (not enabled by default). - incorrect average queue length calculation in RED/GRED algorithm.
NB: due to ABI change this change is not applicable to stable.
PR: kern/128401
|
184340 |
27-Oct-2008 |
rrs |
More issues with pre-blocking: a) Need for EEOR mode to take the min of the socket buffer size and the add more threshold, otherwise if you are so silly as to set a send buf size less than the add-more you could block forever in eeor mode.
b) We were incorrectly using the sysctl vs the calculated value. This causes us to block forever if the addmore theshold is larger than then the socket buffer size.
|
184336 |
27-Oct-2008 |
rrs |
Two inter-related bugs. - If we send EXACTLY the size left in the send buffer and then send again, we end up with exactly 0 bytes and don't hit the pre-block code to wait for more space. - If we fall into the loop with our max_len == 0 (the bug above) we then call in to copy out the data, setup the length of the waiting to transmit data to 0 and call the mbuf copy routine which 0 indicates copy all the data to the mbuf chain.. which it does. This then leaves a "stuck" message on the stream queue with its size exactly 0 bytes but all the data there and thus nothing left in the uio structure. We then reach a stuck forever state never being able to send data.
|
184334 |
27-Oct-2008 |
rrs |
Get rid of ifdef for vimage on version 8 comparison. Now the scrubbing program properly takes care of this.
|
184333 |
27-Oct-2008 |
rrs |
Invariants changes that make more sense.
|
184304 |
26-Oct-2008 |
rwatson |
In both dropwithreset paths in tcp_input.c, drop the tcbinfo lock sooner to decomplicate locking and eliminate the need for a rather chatty comment about why we have to handle the global lock in a special way for the benefit of ipfw and pf cred rules.
MFC after: 3 days
|
184298 |
26-Oct-2008 |
rwatson |
Remove endearing but syntactically unnecessary "return;" statements directly before the final closeing brackets of some TCP functions.
MFC after: 3 days
|
184295 |
26-Oct-2008 |
bz |
Style changes only: - Consistently add parentheses to return statements. - Use NULL instead of 0 when comparing pointers, also avoiding unnecessary casts. - Do not use pointers as booleans.
Reviewed by: rwatson (earlier version) MFC after: 2 months
|
184214 |
23-Oct-2008 |
des |
Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
|
184205 |
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
184097 |
20-Oct-2008 |
bz |
Update a comment which to my reading had been misplaced in rev. 1.12 already (but probably had been way above as the code was there twice) and describe what was last changed in rev. 1.199 there (which now is in sync with in6_src.c r184096).
Pointed at by: mlaier MFC after: 2 mmonths
|
184096 |
20-Oct-2008 |
bz |
Bring over the change switching from using sequential to random ephemeral port allocation as implemented in netinet/in_pcb.c rev. 1.143 (initially from OpenBSD) and follow-up commits during the last four and a half years including rev. 1.157, 1.162 and 1.199. This now is relying on the same infrastructure as has been implemented in in_pcb.c since rev. 1.199.
Reviewed by: silby, rpaulo, mlaier MFC after: 2 months
|
184031 |
18-Oct-2008 |
rrs |
The flags value was not always being copied out in the recv routine like it should be. Obtained from: Michael Tuexen
|
184030 |
18-Oct-2008 |
rrs |
New sockets (accepted) were not inheriting the proper snd/rcv buffer value.
Obtained from: Michael Tuexen
|
184029 |
18-Oct-2008 |
rrs |
- Peers rwnd is now available for the MIB. Obtained from: Michael Tuexen
|
184028 |
18-Oct-2008 |
rrs |
- Adapt layer indication was always being given (it should only be given when the user has enabled it). (Michael Tuexen) - Sack Immediately was not being set properly on the actual chunk, it was only put in the rcvd_flags which is incorrect. (Michael Tuexen) - added an ifndef userspace to one of the already present macro's for inet (Brad Penoff) Obtained from: Michael Tuexen and Brad Penoff MFC after: 4 weeks
|
184027 |
18-Oct-2008 |
rrs |
Reported by Yehuda Weinraub (yehudasa@gamil.com) - CRC32C algorithm uses incorrect init_bytes value. It SHOULD have the number of bytes to get to a 4 byte boundary.
PR: 128134 MFC after: 4 weeks
|
183982 |
17-Oct-2008 |
bz |
Add cr_canseeinpcb() doing checks using the cached socket credentials from inp_cred which is also available after the socket is gone. Switch cr_canseesocket consumers to cr_canseeinpcb. This removes an extra acquisition of the socket lock.
Reviewed by: rwatson MFC after: 3 months (set timer; decide then)
|
183954 |
16-Oct-2008 |
zec |
Remove a useless global static variable.
Approved by: bz (ad-hoc mentor)
|
183887 |
14-Oct-2008 |
maxim |
o Remove unnecessary parentheses and restore identation.
Prodded by: mlaier
|
183881 |
14-Oct-2008 |
maxim |
o Reformat ipfw nat get|setsockopt code to look it more style(9) compliant. No functional changes.
|
183744 |
10-Oct-2008 |
rwatson |
Fix content and spelling of comment on _ipfw_insn.len -- a count of 32-bit words, not 32-byte words.
MFC after: 3 days
|
183662 |
07-Oct-2008 |
rwatson |
Don't pass curthread to sbreserve_locked() in tcp_do_segment(), as the netisr or ithread's socket buffer size limit is not the right limit to use. Instead, pass NULL as the other two calls to sbreserve_locked() in the TCP input path (tcp_mss()) do.
In practice, this is a no-op, as ithreads and the netisr run without a process limit on socket buffer use, and a NULL thread pointer leads to not using the process's limit, if any. However, if tcp_input() is called in other contexts that do have limits, this may prevent the incorrect limit from being used.
MFC after: 3 days
|
183610 |
04-Oct-2008 |
bz |
Remove an INP_RUNLOCK() missed in SVN r183606, cvs rev. 1.195 raw_ip.c when transitioning from so_cred to inp_cred.
MFC after: 6 weeks
|
183606 |
04-Oct-2008 |
bz |
Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking.
Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred
|
183571 |
03-Oct-2008 |
bz |
Implement IPv4 source address selection for unbound sockets.
For the jail case we are already looping over the interface addresses before falling back to the only IP address of a jail in case of no match. This is in preparation for the upcoming multi-IPv4/v6/no-IP jail patch this change was developed with initially.
This also changes the semantics of selecting the IP for processes within a jail as it now uses the same logic as outside the jail (with additional checks) but no longer is on a mutually exclusive code path.
Benchmarks had shown no difference at 95.0% confidence for neither the plain nor the jail case (even with the additional overhead). See: http://lists.freebsd.org/pipermail/freebsd-net/2008-September/019531.html
Inpsired by a patch from: Yahoo! (partially) Tested by: latest multi-IP jail patch users (implictly) Discussed with: rwatson (general things around this) Reviewed by: mostly silence (feedback from bms) Help with benchmarking from: kris MFC after: 2 months
|
183550 |
02-Oct-2008 |
zec |
Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
183461 |
29-Sep-2008 |
rwatson |
Expand comments relating various detach/free/drop inpcb routines.
MFC after: 3 days
|
183460 |
29-Sep-2008 |
rwatson |
Fix typo in comment.
MFC after: 3 days
|
183418 |
27-Sep-2008 |
rwatson |
When an inpcb doesn't have a socket but the inpcb is passed to ipfw in the transmit path, such as TCPS_TIMEWAIT, fail the credential extraction immediately rather than acquiring locks and looking up the inpcb on the global lists in order to reach the conclusion that the credential extraction has failed.
This is more efficient, but more importantly, it avoids lock recursion on the inpcbinfo, which is no longer allowed with rwlocks. This appears to have been responsible for at least two reported panics.
MFC after: 3 days Reported by: ganbold
|
183398 |
27-Sep-2008 |
rwatson |
Rather than shadowing global variable 'lookup' in check_uidgid(), rename it to ugid_lookupp. This should make debugging issues with ipfw uid rules easier.
MFC after: 3 days
|
183388 |
26-Sep-2008 |
emaste |
Move CTASSERT from header file to source file, per implementation note now in the CTASSERT man page.
Submitted by: Ryan Stone
|
183356 |
25-Sep-2008 |
rwatson |
As a follow-on to r183323, correct another case where ip_output() was called without an inpcb pointer despite holding the tcbinfo global lock, which lead to a deadlock or panic when ipfw tried to further acquire it recursively.
Reported by: Stefan Ehmann <shoesoft at gmx dot net> MFC after: 3 days
|
183323 |
24-Sep-2008 |
rwatson |
When dropping a packet and issuing a reset during TCP segment handling, unconditionally drop the tcbinfo lock (after all, we assert it lines before), but call tcp_dropwithreset() under both inpcb and inpcbinfo locks only if we pass in an tcpcb. Otherwise, if the pointer is NULL, firewall code may later recurse the global tcbinfo lock trying to look up an inpcb.
This is an instance where a layering violation leads not only potentially to code reentrace and recursion, but also to lock recursion, and was revealed by the conversion to rwlocks because acquiring a read lock on an rwlock already held with a write lock is forbidden. When these locks were mutexes, they simply recursed.
Reported by: Stefan Ehmann <shoesoft at gmx dot net> MFC after: 3 days
|
183240 |
21-Sep-2008 |
rik |
Export IPFW_TABLES_MAX value for compiled in defaults.
|
183015 |
14-Sep-2008 |
rik |
Export IPFW_TABLES_MAX via sysctl. Part of PR: 127058.
PR: 127058
|
183014 |
14-Sep-2008 |
julian |
oops commit the version that compiles
|
183013 |
14-Sep-2008 |
julian |
Revert a part of the MRT commit that proved un-needed. rt_check() in its original form proved to be sufficient and rt_check_fib() can go away (as can its evil twin in_rt_check()).
I believe this does NOT address the crashes people have been seeing in rt_check.
MFC after: 1 week
|
183012 |
14-Sep-2008 |
rik |
Make the commet for the default rule number more clear.
Submitted by: yar@
|
183001 |
13-Sep-2008 |
bz |
Implement IPv6 support for TCP MD5 Signature Option (RFC 2385) the same way it has been implemented for IPv4.
Reviewed by: bms (skimmed) Tested by: Nick Hilliard (nick netability.ie) (with more changes) MFC after: 2 months
|
182885 |
09-Sep-2008 |
bz |
Work around an integer division resulting in 0 and thus the congestion window not being incremented, if cwnd > maxseg^2. As suggested in RFC2581 increment the cwnd by 1 in this case.
See http://caia.swin.edu.au/reports/080829A/CAIA-TR-080829A.pdf for more details.
Submitted by: Alana Huebner, Lawrence Stewart, Grenville Armitage (caia.swin.edu.au) Reviewed by: dwmalone, gnn, rpaulo MFC After: 3 days
|
182855 |
07-Sep-2008 |
bz |
To my reading there are no real consumers of ip6_plen (IPv6 Payload Length) as set in tcpip_fillheaders(). ip6_output() will calculate it based of the length from the mbuf packet header itself. So initialize the value in tcpip_fillheaders() in correct (network) byte order.
With the above change, to my reading, all places calling tcp_trace() pass in the ip6 header via ipgen as serialized in the mbuf and with ip6_plen in network byte order. Thus convert the IPv6 payload length to host byte order before printing.
MFC after: 2 months
|
182851 |
07-Sep-2008 |
bz |
Split tcp_mss() in tcp_mss() and tcp_mss_update() where the former calls the latter.
Merge tcp_mss_update() with code from tcp_mtudisc() basically doing the same thing.
This gives us one central place where we calcuate and check mss values to update t_maxopd (maximum mss + options length) instead of two slightly different but almost equal implementations to maintain.
PR: kern/118455 Reviewed by: silby (back in March) MFC after: 2 months
|
182848 |
07-Sep-2008 |
bz |
V_irtualize SVN r182846 tcp_mssdflt/tcp_v6mssdflt procedure based sysctl implementations for VIMAGE the same way we did elsewhere: update the implementation but leave the globals and the SYSCTL statement untouched.
|
182846 |
07-Sep-2008 |
bz |
Convert SYSCTL_INTs for tcp_mssdflt and tcp_v6mssdflt to SYSCTL_PROCs and check that the default mss for neither v4 nor v6 goes below the minimum MSS constant (216).
This prevents people from shooting themselves in the foot.
PR: kern/118455 (remotely related) Reviewed by: silby (as part of a larger patch in March) MFC after: 2 months
|
182841 |
07-Sep-2008 |
bz |
Add a second KASSERT checking for len >= 0 in the tcp output path.
This is different to the first one (as len gets updated between those two) and would have caught various edge cases (read bugs) at a well defined place I had been debugging the last months instead of triggering (random) panics further down the call graph.
MFC after: 2 months
|
182818 |
06-Sep-2008 |
rik |
Export the IPFW_DEFAULT_RULE outside ip_fw2.c. This number in not only the default rule number but also the maximum rule number. User space software such as ipfw and natd should be aware of its value. The software that already includes ip_fw.h should use the defined value. All other a expected to use sysctl (as discussed on net@).
MFC after: 5 days. Discussed on: net@
|
182775 |
05-Sep-2008 |
keramida |
Slightly reword comment and remove typos.
|
182733 |
03-Sep-2008 |
julian |
whitespace nit
|
182633 |
01-Sep-2008 |
brooks |
Wrap an 81 column SYSCTL_NODE decleration.
Obtained from: //depot/projects/vimage-commit2/...
|
182591 |
01-Sep-2008 |
kmacy |
Don't check if an interface can do tcp offload if there are no offload devices registered on the system.
Suggested by: rwatson MFC after: 3 days
|
182563 |
31-Aug-2008 |
julian |
fix tiny nti in comment
|
182488 |
30-Aug-2008 |
csjp |
Improve the entropy of the source port randomization for network address translation. It turns out this is useful for applications which require source port randomization for security (i.e. dns servers).
Discussed with: secteam Requested by: mlaier MFC after: 2 weeks
|
182463 |
29-Aug-2008 |
gnn |
Fix a bug whereby multicast packets that are looped back locally wind up with the incorrect checksum on the wire when transmitted via devices that do checksum offloading.
PR: kern/119635 Reviewed by: rwatson MFC after: 5 days
|
182411 |
28-Aug-2008 |
rpaulo |
Fix typo in comment.
|
182405 |
28-Aug-2008 |
rrs |
ok, non static the function and put in the .h so when we do INVARANT compile the compiler will not dis the function that is not used. Hmm maybe I should have made it ifndef INVARIANTs..
|
182403 |
28-Aug-2008 |
rrs |
Fixes compile error when INVARIANTs is on. Adds an empty goto to keep the compiler happy.
|
182367 |
28-Aug-2008 |
rrs |
- Make strict-sacks be the default. - Change it so that without INVARIANTs there are no panics in SCTP. - sctp_timer changes so that we have a recovery mechanism when the sent list is out of order.
|
182311 |
27-Aug-2008 |
csjp |
Fix a panic in MAC kernels that was a result of un-initialized label storage. We can safely remove the label copying operations since M_MOVE_PKTHDR will move the mbuf tags (which contain MAC labels) to the destination mbuf.
MFC after: 1 week Discussed with: rwatson
|
182268 |
27-Aug-2008 |
rrs |
- When we close a socket with pending assoc's that are still shutting down, NULL out the socket pointer so we won't ever refer to a dead socket.
Obtained from: Neil Wilson
|
182148 |
25-Aug-2008 |
julian |
Another missed V_ instance
|
182146 |
25-Aug-2008 |
julian |
Another V_ forgotten
|
182145 |
25-Aug-2008 |
julian |
We left out V_static_len from ip_fw2.c (also a whitespace diff that i'd rahter fix her ethan break in the vimage branch.)
|
182129 |
25-Aug-2008 |
julian |
Move some struct defs around. This is a prep step for Vimage.A No real effect of this at this time.
|
182114 |
24-Aug-2008 |
bz |
Make the kernel compile with SCTP and SCTP_DEBUG but no INET6 defined.
|
182089 |
24-Aug-2008 |
kmacy |
Don't calculate checksum if it has already been validated
Obtained from: Chelsio Inc. MFC after: 3 days
|
182056 |
23-Aug-2008 |
bz |
Cache the cred locally in _syncache_add() while holding the locks, so we can be sure that it's valid. In case we abort early free it again else put it into the syncache.
We need the cred in the syncache to be able to restrict what will be exportet by the sysctl helper function syncache_pcblist() (to netstat) within jails.
PR: kern/126493 Reviewed by: rwatson (earlier versions) MFC after: 3 days
|
182045 |
23-Aug-2008 |
bz |
Add an explicit comment why we NULLify the two variables.
Reviewed by: rwatson MFC after: 3 days
|
181966 |
21-Aug-2008 |
rwatson |
Remove comments and #ifdef notyet'd code relating to directly dispatching the IP multicast input code from the output path; we don't allow reentrance of the input path from the IP output path, it must use the netisr due to potential lock recursion.
MFC after: 3 days
|
181888 |
20-Aug-2008 |
julian |
Fix some of the formatting fixes.. It's amazing how some thing stand out in a commit message.
|
181887 |
20-Aug-2008 |
julian |
A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.
|
181824 |
18-Aug-2008 |
philip |
Fix ARP in bridging scenarios where the bridge shares its MAC address with one of its members (see my r180140).
Pointy hat to: philip Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru> MFC after: 3 days
|
181803 |
17-Aug-2008 |
bz |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course of the next few weeks.
Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
181782 |
16-Aug-2008 |
bz |
Fix a regression introduced in r179289 splitting up ip6_savecontrol() into v4-only vs. v6-only inp_flags processing. When ip6_savecontrol_v4() is called from ip6_savecontrol() we were not passing back the **mp thus the information will be missing in userland. Istead of going with a *** as suggested in the PR we are returning **mp now and passing in the v4only flag as a pointer argument.
PR: kern/126349 Reviewed by: rwatson, dwmalone
|
181464 |
09-Aug-2008 |
des |
Nit
|
181365 |
07-Aug-2008 |
rwatson |
Minor white space tweaks.
MFC after: 1 week
|
181364 |
07-Aug-2008 |
rwatson |
Correct comment typo.
MFC after: 1 week (after inpcb rwlocking)
|
181337 |
05-Aug-2008 |
jhb |
Minor style tweaks.
|
181139 |
01-Aug-2008 |
julian |
The IPFW code accepts the use of the tablearg keyword along with the skipto keyword. But it doesn't work. Two options.. make it no longer accept it, or actually make it work.. I chose the 2nd..
Allow the tablearg to be used to specify a skipto destination.
This is actually a very powerful construct if used correctly, or a sink of cpu cycles if used badly.
changes t teh man page will follow.
|
181056 |
31-Jul-2008 |
rpaulo |
MFp4 (//depot/projects/tcpecn/):
TCP ECN support. Merge of my GSoC 2006 work for NetBSD. TCP ECN is defined in RFC 3168.
Partly reviewed by: dwmalone, silby Obtained from: NetBSD
|
181054 |
31-Jul-2008 |
rrs |
Adds support for the SCTP_PORT_REUSE option Fixes a refcount bug found in the process
Obtained from: With the help of Michael Tuexen
|
180956 |
29-Jul-2008 |
rrs |
Fix build breakage - kthread_exit() in 8 now has no arguments MFC after: 1 week
|
180955 |
29-Jul-2008 |
rrs |
- Out with some printfs. - Fix a initialization of last_tsn_used - Fix handling of mapped IPv4 addresses Obtained from: Michael Tuexen and I :-) MFC after: 1 week
|
180874 |
28-Jul-2008 |
mav |
Some style and assertion fixes to the previous commits hinted by rwatson. There is no functional changes.
|
180851 |
27-Jul-2008 |
mav |
According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock.
|
180836 |
26-Jul-2008 |
mav |
Increase UDBHASHSIZE from 16 to 128 items. Previous value was chosen 10 years ago and not very effective now. This change gives several percents speedup on 1000 L2TP mpd links.
|
180833 |
26-Jul-2008 |
mav |
According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock. This relaxed locking noticably increses receive socket lookup performance.
|
180828 |
26-Jul-2008 |
mav |
Add hash table lookup for a fully connected raw sockets.
This gives significant performance improvements when many raw sockets used. Benchmarks of mpd handeling 1000 simultaneous PPTP connections show up to 50% performance boost. With higher number of connections benefit becomes even bigger. PopTop snd others should also get some benefits.
|
180683 |
22-Jul-2008 |
avatar |
Trying to fix compilation bustage: - removing 'const' qualifier from an input parameter to conform to the type required by rw_assert(); - using in_addr->s_addr to retrive 32 bits address value.
Observed by: tinderbox
|
180678 |
21-Jul-2008 |
kmacy |
make new accessor functions consistent with existing style
|
180674 |
21-Jul-2008 |
kmacy |
- Switch to INP_WLOCK macro from inp_wlock - calling sodisconnect after tcp_twstart is both gratuitous and unsafe - remove
Submitted by: rwatson
|
180648 |
21-Jul-2008 |
kmacy |
Add versions of tcp_twstart, tcp_close, and tcp_drop that hide the acquisition the tcbinfo lock.
MFC after: 1 week
|
180645 |
21-Jul-2008 |
kmacy |
add interface for external consumers to syncache_expand - rename syncache_add in a manner consistent with other bits intended for offload
|
180641 |
21-Jul-2008 |
kmacy |
Add accessor functions for socket fields.
MFC after: 1 week
|
180640 |
21-Jul-2008 |
kmacy |
add inpcb accessor functions for fields needed by TOE devices
|
180631 |
20-Jul-2008 |
trhodes |
Document a few sysctls.
Reviewed by: rwatson
|
180629 |
20-Jul-2008 |
bz |
ia is a pointer thus use NULL rather then 0 for initialization and in comparisons to make this more obvious.
MFC after: 5 days
|
180624 |
20-Jul-2008 |
kmacy |
remove unused toedev functions and add comments for rest
|
180593 |
18-Jul-2008 |
dwmalone |
Add an accept filter for TCP based DNS requests. It waits until the whole first request is present before returning from accept.
|
180589 |
18-Jul-2008 |
rwatson |
Eliminate use of the global ripsrc which was being used to pass address information from rip_input() to rip_append(). Instead, pass the source address for an IP datagram to rip_append() using a stack-allocated sockaddr_in, similar to udp_input() and udp_append().
Prior to the move to rwlocks for inpcbinfo, this was not a problem, as use of the global was synchronized using the ripcbinfo mutex, but with read-locking there is the potential for a race during concurrent receive.
This problem is not present in the IPv6 raw IP socket code, which already used a stack variable for the address.
Spotted by: mav MFC after: 1 week (before inpcbinfo rwlock changes)
|
180558 |
16-Jul-2008 |
rwatson |
Fix error in comment.
MFC after: 3 weeks
|
180536 |
15-Jul-2008 |
rwatson |
Merge last of a series of rwlock conversion changes to UDP, which completes the move to a fully parallel UDP transmit path by using global read, rather than write, locking of inpcbinfo in further semi-connected cases:
- Add macros to allow try-locking of inpcb and inpcbinfo. - Always acquire an incpcb read lock in udp_output(), which stablizes the local inpcb address and port bindings in order to determine what further locking is required: - If the inpcb is currently not bound (at all) and are implicitly connecting, we require inpcbinfo and inpcb write locks, so drop the read lock and re-acquire. - If the inpcb is bound for at least one of the port or address, but an explicit source or destination is requested, trylock the inpcbinfo lock, and if that fails, drop the inpcb lock, lock the global lock, and relock the inpcb lock. - Otherwise, no further locking is required (common case). - Update comments.
In practice, this means that the vast majority of consumers of UDP sockets will not acquire any exclusive locks at the socket or UDP levels of the network stack. This leads to a marked performance improvement in several important workloads, including BIND, nsd, and memcached over UDP, as well as significant improvements in pps microbenchmarks.
The plan is to MFC all of the rwlock changes to RELENG_7 once they have settled for a weeks in the tree.
Tested by: ps, kris (older revision), bde MFC after: 3 weeks
|
180535 |
15-Jul-2008 |
rpaulo |
Fix commment in typo.
M tcp_output.c
|
180513 |
14-Jul-2008 |
eri |
Fix carp(4) panics that can occur during carp interface configuration.
Approved by: mlaier (mentor) Reported by: Scott Ullrich MFC after: 1 week
|
180429 |
10-Jul-2008 |
rwatson |
Slightly rearrange validation of UDP arguments and jail processing in udp_output() so that argument validation occurs before jail processing.
Add additional comments explaining what's going on when we process addresses and binding during udp_output().
MFC after: 3 weeks
|
180427 |
10-Jul-2008 |
bz |
Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks.
Reviewed by: rwatson
|
180425 |
10-Jul-2008 |
bz |
For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t.
Reviewed by: rwatson
|
180422 |
10-Jul-2008 |
rwatson |
Apply the MAC label to an outgoing UDP packet when other inpcb properties are processed, meaning that we avoid the cost of MAC label assignment if we're going to drop the packet due to mbuf exhaustion, etc.
MFC after: 3 weeks
|
180392 |
09-Jul-2008 |
bz |
For consistency with the rest of the function use the locally cached pointer pcbinfo rather than inp->inp_pcbinfo.
MFC after: 3 weeks
|
180387 |
09-Jul-2008 |
rrs |
1) Adds the rest of the VIMAGE change macros 2) Adds some __UserSpace__ on some of the common defines that the user space code needs 3) Fixes a bug when we send up data to a user that failed. We need to a) trim off the data chunk headers, if present, and b) make sure the frag bit is communicated properly for the msgs coming off the stream queues... i.e. we see if some of the msg has been taken.
Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!
|
180368 |
08-Jul-2008 |
rwatson |
Provide some initial chicken-scratching annotations of locking for struct inpcb.
Prodded by: bz MFC after: 3 days
|
180348 |
07-Jul-2008 |
rwatson |
Allow udp_notify() to accept read, as well as write, locks on the passed inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire only a read lock; we may still see write locks in udp_notify() as the in_pcbnotifyall() routine is shared with TCP and always uses a write lock on the inpcb being notified.
MFC after: 1 month
|
180346 |
07-Jul-2008 |
rwatson |
Add additional udbinfo and inpcb locking assertions to udp_output(); for some code paths, global or inpcb write locks are required, but for other code paths, read locks or no locking at all are sufficient for the data structures.
MFC after: 1 month
|
180344 |
07-Jul-2008 |
rwatson |
First step towards parallel transmit in UDP: if neither a specific source or a specific destination address is requested as part of a send on a UDP socket, read lock the inpcb rather than write lock it. This will allow fully parallel transmit down to the IP layer when sending simultaneously from multiple threads on a connected UDP socket.
Parallel transmit for more complex cases, such as when sendto(2) is invoked with an address and there's already a local binding, will follow.
MFC after: 1 month
|
180338 |
07-Jul-2008 |
rwatson |
Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp_append().
MFC after: 1 month
|
180306 |
05-Jul-2008 |
rwatson |
Rename raw_append() to rip_append(): the raw_ prefix is generally used for functions in the generic raw socket library (raw_cb.c, raw_usrreq.c), and they are not used for IPv4 raw sockets.
MFC after: 3 days
|
180305 |
05-Jul-2008 |
rwatson |
Improve approximation of style(9) in raw socket code.
|
180264 |
04-Jul-2008 |
gonzo |
Enqueue de-capsulated packet instead of performing direct dispatch. It's possible to exhaust and garble stack with a packet that contains a couple of hundreds nested encapsulation levels.
Submitted by: Ming Fu <fming@borderware.com> Reviewed by: rwatson PR: kern/85320
|
180239 |
04-Jul-2008 |
rwatson |
Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers.
Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context.
It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported.
Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.
|
180215 |
03-Jul-2008 |
bz |
Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, generating an RTM_MISS for every IP packet forwarded making user space routing daemons unhappy.
PR: kern/123621, kern/124540, kern/122338 Reported by: Paul <paul gtcomm.net>, Mike Tancsa <mike sentex.net> on net@ Tested by: Paul and Mike Reviewed by: andre MFC after: 3 days
|
180198 |
02-Jul-2008 |
rwatson |
Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets.
This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers *significant* performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP.
Tested by: kris, ps
|
180127 |
30-Jun-2008 |
rwatson |
In udp_append() and udp_input(), make use of read locking on incpbs rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking.
While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc.
MFC after: 2 months Tested by: gnn, kris, ps
|
179971 |
24-Jun-2008 |
gonzo |
In case of interface initialization failure remove struct in_ifaddr* from in_ifaddrhashtbl in in_ifinit because error handler in in_control removes entries only for AF_INET addresses. If in_ifinit is called for the cloned inteface that has just been created its address family is not AF_INET and therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and freed entries remain in in_ifaddrhashtbl and lead to memory corruption.
PR: kern/124384
|
179924 |
22-Jun-2008 |
mav |
Partially revert previous commit. DeleteLink() does not deletes permanent links so we should be aware of it and try to delete every link only once or we will loop forever.
|
179920 |
21-Jun-2008 |
mav |
Implement UDP transparent proxy support.
PR: bin/54274 Submitted by: Nicolai Petri <nicolai@petri.cc>
|
179912 |
21-Jun-2008 |
mav |
Add support for PORT/EPRT FTP commands in lowercase. Use strncasecmp() instead of huge local implementation to reduce code size. Check space presence after command/code.
PR: kern/73034
|
179833 |
16-Jun-2008 |
ups |
Change incorrect stale cookie detection in syncookie_lookup() that prematurely declared a cookie as expired.
Reviewed by: andre@, silby@ Reported by: Yahoo!
|
179832 |
16-Jun-2008 |
ups |
Fix a check in SYN cache expansion (syncache_expand()) to accept packets that arrive in the receive window instead of just on the left edge of the receive window. This is needed for correct behavior when packets are lost or reordered.
PR: kern/123950 Reviewed by: andre@, silby@ Reported by: Yahoo!, Wang Jin MFC after: 1 week
|
179803 |
15-Jun-2008 |
rrs |
More prep for Vimage: - only one functino to destroy an SCTP stack sctp_finish() - Make it so this function also arranges for any threads created by the image to do a kthread_exit()
|
179786 |
14-Jun-2008 |
rrs |
- Fixes foobar on my part. Some missing virtualization macros from specific logging cases.
|
179783 |
14-Jun-2008 |
rrs |
- Macro-izes the packed declaration in all headers. - Vimage prep - these are major restructures to move all global variables to be accessed via a macro or two. The variables all go into a single structure. - Asconf address addition tweaks (add_or_del Interfaces) - Fix rwnd calcualtion to be more conservative. - Support SACK_IMMEDIATE flag to skip delayed sack by demand of peer. - Comment updates in the sack mapping calculations - Invarients panic added. - Pre-support for UDP tunneling (we can do this on MAC but will need added support from UDP to get a "pipe" of UDP packets in. - clear trace buffer sysctl added when local tracing on.
Note the majority of this huge patch is all the vimage prep stuff :-)
|
179737 |
11-Jun-2008 |
jfv |
Add generic TCP LOR into netinet
|
179490 |
02-Jun-2008 |
mlaier |
Sort IP addresses before hashing them for the signature. Otherwise carp is sensitive to address configuration order.
PR: kern/121574 Reported by: Douglas K. Rand, Wouter de Jong Obtained from: OpenBSD (rev 1.114 + fixes) MFC after: 2 weeks
|
179487 |
02-Jun-2008 |
rwatson |
When allocating temporary storage to hold a TCP/IP packet header template, use an M_TEMP malloc(9) allocation rather than an mbuf with mtod(9) and dtom(9). This eliminates the last use of dtom(9) in TCP.
MFC after: 3 weeks
|
179480 |
01-Jun-2008 |
mav |
Increase LINK_TABLE_OUT_SIZE from 101 to 4001 like LINK_TABLE_IN_SIZE to reduce performance degradation under heavy outgoing scan/flood. Scalability is now much more important then several kilobytes of RAM.
Remove unneded TCP-specific expiration handeling. Before this connected TCP sessions could never expire. Now connected TCP sessions will expire after 24hours of inactivity.
Simplify HouseKeeping() to avoid several mul/div-s per packet. Taking into account increased LINK_TABLE_OUT_SIZE, precision is still much more then required.
|
179478 |
01-Jun-2008 |
mav |
Make m_megapullup() more intelligent: - to increase performance do not reallocate mbuf when possible, - to support up to 16K packets (was 2K max) use mbuf cluster of proper size. This change depends on recent ng_nat and ip_fw_nat changes.
|
179473 |
01-Jun-2008 |
mav |
PKT_ALIAS_FOUND_HEADER_FRAGMENT result is not an error, so pass that packet. This fixes packet fragmentation handeling.
Pass really available buffer size to libalias instead of MCLBYTES constant. MCLBYTES constant were used with believe that m_megapullup() always moves date into a fresh cluster that sometimes may become not so.
|
179472 |
01-Jun-2008 |
mav |
Fix packet fragmentation support broken by copy/paste error in rev.1.60. ip_id should be u_short, but not u_char.
|
179414 |
29-May-2008 |
rwatson |
Read lock rather than write lock TCP inpcbs in monitoring sysctls. In some cases, add explicit inpcb locking rather than relying on the global lock, as we dereference inp_socket, but also allowing us to drop the global lock more quickly.
MFC after: 1 week
|
179412 |
29-May-2008 |
rwatson |
Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly.
MFC after: 1 week
|
179289 |
24-May-2008 |
bz |
Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options.
Requested by: rwatson Reviewed by: gnn MFC after: 10 days
|
179201 |
22-May-2008 |
rwatson |
Consistently check IPFW and DUMMYNET privileges in the configuration routines for those modules, rather than in the raw socket code. This each privilege check to occur in exactly once place and avoids duplicate checks across layers.
MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.
|
179180 |
21-May-2008 |
rrs |
- sctputil.c - If debug is on, the INPKILL timer can deref a freed value. Change so that we save off a type field for display and NULL inp just for good measure.
- sctp_output.c - Fix it so in sending to the loopback we use the src address of the inbound INIT. We don't want to do this for non local addresses since otherwise we might be ingressed filtered so we need to use the best src address and list the address sent to.
Obtained from: time bug - Neil Wilson MFC after: 1 week
|
179157 |
20-May-2008 |
rrs |
- Adds support for the multi-asconf (From Kozuka-san) - Adds some prepwork (Not all yet) for vimage in particular support the delete the sctppcbinfo.xx structs. There is still a leak in here if it were to be called plus we stil need the regrouping (From Me and Michael Tuexen) - Adds support for UDP tunneling. For BSD there is no socket yet setup so its disabled, but major argument changes are in here to emcompass the passing of the port number (zero when you don't have a udp tunnel, the default for BSD). Will add some hooks in UDP here shortly (discussed with Robert) that will allow easy tunneling. (Mainly from Peter Lei and Michael Tuexen with some BSD work from me :-D) - Some ease for windows, evidently leave is reserved by their compile move label leave: -> out:
MFC after: 1 week
|
179141 |
20-May-2008 |
rrs |
- Define changes in sctp.h - Bug in CA that does not get us incrementing the PBA properly which made us more conservative. - comment updated in sctp_input.c - memsets added before we log - added arg to hmac id's MFC after: 2 weeks
|
178960 |
12-May-2008 |
gnn |
Fix the loopback interface. Cleaning up some code with new macros was a tad too aggressive.
PR: kern/123568 Submitted by: Vladimir Ermakov <samflanker at gmail dot com> Obtained from: antoine
|
178888 |
09-May-2008 |
julian |
Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address.
Constraints: ------------
I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing".
One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later.
One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically).
You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it.
This brings us as to how the correct FIB is selected for an outgoing IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands.
2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to.
6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process.
Early testing experience: -------------------------
Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks.
For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something.
Where to next: --------------------
After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it.
When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry.
Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
|
178862 |
08-May-2008 |
jhb |
Always bump tcpstat.tcps_badrst if we get a RST for a connection in the syncache that has an invalid SEQ instead of only doing it when we suceed in mallocing space for the log message.
MFC after: 1 week Reviewed by: sam, bz
|
178801 |
05-May-2008 |
kmacy |
replace spaces added in last change with tabs
|
178793 |
05-May-2008 |
kmacy |
add rcv_nxt, snd_nxt, and toe offload id to FreeBSD-specific extension fields for tcp_info
|
178730 |
02-May-2008 |
marck |
Fix build, together with a bit of style breakage.
|
178673 |
29-Apr-2008 |
rwatson |
Fix a comment typo.
MFC after: 3 days
|
178377 |
21-Apr-2008 |
rwatson |
With IPv4 raw sockets, read lock rather than write lock the inpcb when receiving or transmitting.
With IPv6 raw sockets, read lock rather than write lock the inpcb when receiving. Unfortunately, IPv6 source address selection appears to require a write lock on the inpcb for the time being.
MFC after: 3 months
|
178376 |
21-Apr-2008 |
rwatson |
Read lock, rather than write lock, the inpcb when transmitting with or delivering to an IP divert socket.
MFC after: 3 months
|
178349 |
20-Apr-2008 |
bz |
Revert to rev. 1.161 - switch back to optimized TCP options ordering.
A lot of testing has shown that the problem people were seeing was due to invalid padding after the end of option list option, which was corrected in tcp_output.c rev. 1.146.
Thanks to: anders@, s3raphi, Matt Reimer Thanks to: Doug Hardie and Randy Rose, John Mayer, Susan Guzzardi Special thanks to: dwhite@ and BitGravity Discussed with: silby MFC after: 1 day
|
178325 |
20-Apr-2008 |
rwatson |
Teach pf and ipfw to use read locks in inpcbs write than write locks when reading credential data from sockets.
Teach pf to unlock the pcbinfo more quickly once it has acquired an inpcb lock, as the inpcb lock is sufficient to protect the reference.
Assert locks, rather than read locks or write locks, on inpcbs in subroutines--this is necessary as the inpcb may be passed down with a write lock from the protocol, or may be passed down with a read lock from the firewall lookup routine, and either is sufficient.
MFC after: 3 months
|
178319 |
19-Apr-2008 |
rwatson |
In ip_output(), allow a read lock as well as a write lock when asserting a lock on the passed inpcb.
MFC after: 3 months
|
178318 |
19-Apr-2008 |
rwatson |
When querying the local or foreign address from an IP socket, acquire only a read lock on the inpcb.
When an external module requests a read lock, acquire only a read lock.
MFC after: 3 months
|
178303 |
19-Apr-2008 |
kmacy |
move tcbinfo lock acquisition in to syncache
|
178302 |
19-Apr-2008 |
kmacy |
move cxgb_lt2.[ch] from NIC to TOE move most offload functionality from NIC to TOE factor out all socket and inpcb direct access factor out access to locking in incpb, pcbinfo, and sockbuf
|
178290 |
17-Apr-2008 |
gnn |
Add in check for loopback as well, which was missing from the original patch.
PR: 120958 Submitted by: James Snow <snow at teardrop.org> MFC after: 2 weeks
|
178285 |
17-Apr-2008 |
rwatson |
Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive.
This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code.
MFC after: 3 months Tested by: kris (superset of committered patch)
|
178280 |
17-Apr-2008 |
gnn |
Clean up the code that checks the types of address so that it is done by understandable macros.
Fix the bug that prevented the system from responding on interfaces with link local addresses assigned.
PR: 120958 Submitted by: James Snow <snow at teardrop.org> MFC after: 2 weeks
|
178251 |
16-Apr-2008 |
rrs |
Allow SCTP to compile without INET6. PR: 116816 Obtained from tuexen@fh-muenster.de: MFC after: 2 weeks
|
178202 |
14-Apr-2008 |
rrs |
Use the pru_flush infrastructure to avoid a panic
PR: 122710 MFC after: 1 week
|
178198 |
14-Apr-2008 |
rrs |
Protection against errant sender sending a stream seq number out of order with no missing TSN's (a cisco box has this problem which will make a ssn be held forever). MFC after: 1 week
|
178197 |
14-Apr-2008 |
rrs |
New logging values.
|
178196 |
14-Apr-2008 |
rrs |
1) adds some additional logging 2) changes to use a inqueue_bytes calculated value in max_len calc's. MFC after: 1 week
|
178167 |
13-Apr-2008 |
qingli |
This patch provides the back end support for equal-cost multi-path (ECMP) for both IPv4 and IPv6. Previously, multipath route insertion is disallowed. For example,
route add -net 192.103.54.0/24 10.9.44.1 route add -net 192.103.54.0/24 10.9.44.2
The second route insertion will trigger an error message of "add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"
Multiple default routes can also be inserted. Here is the netstat output:
default 10.2.5.1 UGS 0 3074 bge0 => default 10.2.5.2 UGS 0 0 bge0
When multipath routes exist, the "route delete" command requires a specific gateway to be specified or else an error message would be displayed. For example,
route delete default
would fail and trigger the following error message:
"route: writing to routing socket: No such process" "delete net default: not in table"
On the other hand,
route delete default 10.2.5.2
would be successful: "delete net default: gateway 10.2.5.2"
One does not have to specify a gateway if there is only a single route for a particular destination.
I need to perform more testings on address aliases and multiple interfaces that have the same IP prefixes. This patch as it stands today is not yet ready for prime time. Therefore, the ECMP code fragments are fully guarded by the RADIX_MPATH macro. Include the "options RADIX_MPATH" in the kernel configuration to enable this feature.
Reviewed by: robert, sam, gnn, julian, kmacy
|
178029 |
09-Apr-2008 |
bz |
Take the route mtu into account, if available, when sending an ICMP unreach, frag needed. Up to now we only looked at the interface MTU. Make sure to only use the minimum of the two.
In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu() to avoid any further conditional maths.
Without this, PMTU was broken in those cases when there was a route with a lower MTU than the MTU of the outgoing interface.
PR: kern/122338 Tested by: Mark Cammidge mark peralex.com Reviewed by: silence on net@ MFC after: 2 weeks
|
177988 |
07-Apr-2008 |
andre |
Remove TCP options ordering assumptions in tcp_addoptions(). Ordering was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient space in TCP header before getting added.
Reported by: Mark Atkinson <atkin901-at-yahoo.com> Tested by: Mark Atkinson <atkin901-at-yahoo.com> MFC after: 1 week
|
177987 |
07-Apr-2008 |
andre |
Remove now unnecessary comment.
|
177986 |
07-Apr-2008 |
andre |
Use #defines for TCP options padding after EOL to be consistent.
Reviewed by: bz
|
177978 |
07-Apr-2008 |
rwatson |
Add further TCP inpcb locking assertions to some TCP input code paths.
MFC after: 1 month
|
177961 |
06-Apr-2008 |
rwatson |
In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and eliminate unnecessary local variable caching of the list head pointer, making the code a bit easier to read.
MFC after: 3 weeks
|
177599 |
25-Mar-2008 |
ru |
Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
|
177575 |
24-Mar-2008 |
kmacy |
change inp_wlock_assert to inp_lock_assert
|
177536 |
24-Mar-2008 |
kmacy |
Label inp as unused in the non-INVARIANTS case
|
177530 |
23-Mar-2008 |
kmacy |
Insulate inpcb consumers outside the stack from the lock type and offset within the pcb by adding accessor functions.
Reviewed by: rwatson MFC after: 3 weeks
|
177382 |
19-Mar-2008 |
piso |
Explicitate the newpacket size.
Bug pointed out by: many Pointy hat to: me :(
|
177326 |
17-Mar-2008 |
piso |
Don't cache ptr to nat rule in case of tablearg argument.
Bug spotted by: Dyadchenko Mihail
|
177323 |
17-Mar-2008 |
piso |
Don't abuse stack space while in kernel land, use heap instead.
|
177300 |
17-Mar-2008 |
rwatson |
Fix indentation for a closing brace in in_pcballoc().
MFC after: 3 days
|
177175 |
14-Mar-2008 |
bz |
Correct IPsec behaviour with a 'use' level in SP but no SA available. In that case return an continue processing the packet without IPsec.
PR: 121384 MFC after: 5 days Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]
|
177098 |
12-Mar-2008 |
piso |
-Don't pass down the entire pkt to ProtoAliasIn, ProtoAliasOut, FragmentIn and FragmentOut. -Axe the old PacketAlias API: it has been deprecated since 5.x.
|
176978 |
09-Mar-2008 |
bz |
Padding after EOL option must be zeros according to RFC793 but the NOPs used are 0x01. While we could simply pad with EOLs (which are 0x00), rather use an explicit 0x00 constant there to not confuse poeple with 'EOL padding'. Put in a comment saying just that.
Problem discussed on: src-committers with andre, silby, dwhite as follow up to the rev. 1.161 commit of tcp_var.h. MFC after: 11 days
|
176884 |
06-Mar-2008 |
piso |
MFP4: restrict the utilization of direct pointers to the content of ip packet. These modifications are functionally nop()s thus can be merged with no side effects.
|
176805 |
04-Mar-2008 |
rpaulo |
Change the default port range for outgoing connections by introducing IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values 10000 and 65535 respectively. The rationale behind is that it makes the attacker's life more difficult if he/she wants to guess the ephemeral port range and also lowers the probability of a port colision (described in draft-ietf-tsvwg-port-randomization-01.txt).
While there, remove code duplication in in_pcbbind_setup().
Submitted by: Fernando Gont <fernando at gont.com.ar> Approved by: njl (mentor) Reviewed by: silby, bms Discussed on: freebsd-net
|
176778 |
03-Mar-2008 |
piso |
When unloading kld, don't forget to flush the nat pointers.
|
176765 |
03-Mar-2008 |
piso |
Raise a bit ipfw kld priority.
Discussed on: net-, ipfw-.
|
176736 |
02-Mar-2008 |
bz |
Some "cleanup" of tcp_mss(): - Move the assigment of the socket down before we first need it. No need to do it at the beginning and then drop out the function by one of the returns before using it 100 lines further down. - Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect AF already instead of another #ifdef ? : #endif block doing the same. - Remove an unneeded (duplicate) assignment of mss to t_maxseg just before we possibly change mss and re-do the assignment without using t_maxseg in between.
Reviewed by: silby No objections: net@ (silence) MFC after: 5 days
|
176716 |
01-Mar-2008 |
bz |
Fix indentation (whitespace changes only).
MFC after: 6 days
|
176669 |
29-Feb-2008 |
piso |
Move ipfw's nat code into its own kld: ipfw_nat.
|
176626 |
27-Feb-2008 |
dwmalone |
Dummynet has a limit of 100 slots queue size (or 1MB, if you give the limit in bytes) hard coded into both the kernel and userland. Make both these limits a sysctl, so it is easy to change the limit. If the userland part of ipfw finds that the sysctls don't exist, it will just fall back to the traditional limits.
(100 packets is quite a small limit these days. If you want to test TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.)
Note these sysctls in the man page and warn against increasing them without thinking first.
MFC after: 3 weeks
|
176517 |
24-Feb-2008 |
piso |
Add table/tablearg support to ipfw's nat.
MFC After: 1 week
|
176502 |
24-Feb-2008 |
silby |
Change FreeBSD 7 so that it returns TCP options in the same order that FreeBSD 6 and before did. Doug White and the other bloodhounds at ISC discovered that while FreeBSD 7's ordering of options was more efficient, it caused some cable modem routers to ignore the SYN-ACKs ordered in this fashion.
The placement of sackOK after the timestamp option seems to be the critical difference:
FreeBSD 6: <mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol>
FreeBSD 7.0: <mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0>
FreeBSD 7.0 + this change: <mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol>
MFC after: 1 week
|
176464 |
22-Feb-2008 |
rrs |
Fixes a memory leak when VRF's are in play.
Submitted by: Prasad Narasimha (snprasad@cisco.com) Reviewed by: rrs
|
176463 |
22-Feb-2008 |
rrs |
- Takes out stray ifdef code that should not have been present.
|
176093 |
07-Feb-2008 |
glebius |
If the vhid already present, return EEXIST instead of non-informative EINVAL.
|
176086 |
07-Feb-2008 |
glebius |
Remove unused structure member from struct in_ifadown_arg.
|
176042 |
06-Feb-2008 |
silby |
Replace the random IP ID generation code we obtained from OpenBSD with an algorithm suggested by Amit Klein. The OpenBSD algorithm has a few flaws; see Amit's paper for more information.
For a description of how this algorithm works, please see the comments within the code.
Note that this commit does not yet enable random IP ID generation by default. There are still some concerns that doing so will adversely affect performance.
Reviewed by: rwatson MFC After: 2 weeks
|
175892 |
02-Feb-2008 |
bz |
Rather than passing around a cached 'priv', pass in an ucred to ipsec*_set_policy and do the privilege check only if needed.
Try to assimilate both ip*_ctloutput code blocks calling ipsec*_set_policy.
Reviewed by: rwatson
|
175845 |
31-Jan-2008 |
rwatson |
Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close():
- Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv().
- Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it.
To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked.
Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>
|
175752 |
28-Jan-2008 |
rrs |
- Fix a comment about prison. - Fix it so the VRF is captured while locks are held. MFC after: 1 week
|
175751 |
28-Jan-2008 |
rrs |
- Change back to using prioity 0. Which means don't change the prioity when running the thread. (this is for the sctp_interator thread).
MFC after: 1 week
|
175750 |
28-Jan-2008 |
rrs |
- Fix a bug where the socket may have been closed which could cause a crash in the auth code. Obtained from: Michael Tuexen MFC after: 1 week
|
175748 |
28-Jan-2008 |
rrs |
- Fixes a comparison wrap issue with sack gap ack blocks that span the 32 bit roll over mark.
|
175659 |
25-Jan-2008 |
rwatson |
Hide ipfw internal data structures behind IPFW_INTERNAL rather than exposing them to all consumers of ip_fw.h. These structures are used in both ipfw(8) and ipfw(4), but not part of the user<->kernel interface for other applications to use, rather, shared implementation.
MFC after: 3 days Reported by: Paul Vixie <paul at vix dot com>
|
175630 |
24-Jan-2008 |
bz |
Replace the last susers calls in netinet6/ with privilege checks.
Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers).
Leave a few comments to be addressed later.
Reviewed by: rwatson (older version, before addressing his comments)
|
175626 |
24-Jan-2008 |
bz |
Differentiate between addifaddr and delifaddr for the privilege check.
Reviewed by: rwatson MFC after: 2 weeks
|
175612 |
23-Jan-2008 |
rwatson |
tcp_usrreq.c:1.313 removed tcbinfo locking from tcp_usr_accept(), which while in principle a good idea, opened us up to a race inherrent to the syncache's direct insertion of incoming TCP connections into the "completed connection" listen queue, as it transpires that the socket is inserted before the inpcb is fully filled in by syncache_expand(). The bug manifested with the occasional returning of 0.0.0.0:0 in the address returned by the accept() system call, which occurred if accept managed to execute tcp_usr_accept() before syncache_expand() had copied the endpoint addresses into inpcb connection state.
Re-add tcbinfo locking around the address copyout, which has the effect of delaying the copy until syncache_expand() has finished running, as it is run while the tcbinfo lock is held. This is undesirable in that it increases contention on tcbinfo further, but a more significant change will be required to how the syncache inserts new sockets in order to fix this and keep more granular locking here. In particular, either more state needs to be passed into sonewconn() so that pru_attach() can fill in the fields *before* the socket is inserted, or the socket needs to be inserted in the incomplete connection queue until it is actually ready to be used.
Reported by: glebius (and kris) Tested by: glebius
|
175438 |
18-Jan-2008 |
rwatson |
In tcp_ctloutput(), don't hold the inpcb lock over sooptcopyin(), rather, drop the lock and then re-acquire it, revalidating TCP connection state assumptions when we do so. This avoids a potential lock order reversal (and potential deadlock, although none have been reported) due to the inpcb lock being held over a page fault.
MFC after: 1 week PR: 102752 Reviewed by: bz Reported by: Václav Haisman <v dot haisman at sh dot cvut dot cz>
|
175025 |
31-Dec-2007 |
julian |
Don't duplicate the whole of arpresolve to arpresolve 2 for the sake of two compares against 0. The negative effect of cache flushing is probably more than the gain by not doing the two compares (the value is almost certainly in register or at worst, cache). Note that the uses of m_freem() are in error cases and m_freem() handles NULL anyhow. So fast-path really isn't changed much at all.
|
174893 |
25-Dec-2007 |
oleg |
Workaround p->numbytes overflow, which can result in infinite loop inside dummynet module (prerequisite is using queues with "fat" pipe).
PR: kern/113548
|
174857 |
22-Dec-2007 |
rwatson |
When IPSEC fails to allocate policy state for an inpcb, and MAC is in use, free the MAC label on the inpcb before freeing the inpcb.
MFC after: 3 days Submitted by: tanyong <tanyong at ercist dot iscas dot ac dot cn>, zhouzhouyi
|
174775 |
19-Dec-2007 |
ru |
Fix bugs in the TCP syncache timeout code. including:
When system ticks are positive, for entries in the cache bucket, syncache_timer() ran on every tick (doing nothing useful) instead of the supposed 3, 6, 12, and 24 seconds later (when it's time to retransmit SYN,ACK).
When ticks are negative, syncache_timer() was scheduled for the too far future (up to ~25 days on systems with HZ=1000), no SYN,ACK retransmits were attempted at all, and syncache entries added in that period that correspond to non-established connections stay there forever.
Only HEAD and RELENG_7 are affected.
Reviewed by: silby, kmacy (earlier version) Submitted by: Maxim Dounin, ru
|
174768 |
19-Dec-2007 |
kmacy |
Remove extraneous debug statements.
Noticed by: Andrey Chernov
|
174757 |
18-Dec-2007 |
kmacy |
Incorporate TCP offload hooks in to core TCP code. - Rename output routines tcp_gen_* -> tcp_output_*. - Rename notification routines that turn in to no-ops in the absence of TOE from tcp_gen_* -> tcp_offload_*. - Fix some minor comment nits. - Add a /* FALLTHROUGH */
Reviewed by: Sam Leffler, Robert Watson, and Mike Silbersack
|
174736 |
18-Dec-2007 |
rrs |
- sctp-iterator should run at PI_NET priority ...not 0.
MFC after: 1 week
|
174704 |
17-Dec-2007 |
kmacy |
incorporate feedback since initial commit - rename tcp_ofld.[ch] to tcp_offload.[ch] - document usage and locking conventions of the functions in the toe_usrreqs function vector - document tcpcb, inpcb, and socket fields used by toe - widen the listen interface into 2 functions - rename DISABLE_TCP_OFFLOAD to TCP_OFFLOAD_DISABLE - shrink conditional compilation to reduce the likelihood of bitrot - replace sc->sc_toepcb checks in tcp_syncache.c with TOEPCB_ISSET
|
174703 |
17-Dec-2007 |
kmacy |
widen the routing event interface (arp update, redirect, and eventually pmtu change) into separate functions
revert previous commit's changes to arpresolve and add a new interface arpresolve2 which does arp resolution without an mbuf
|
174699 |
17-Dec-2007 |
kmacy |
Don't panic in arpresolve if we're given a null mbuf. We could insist that the caller just pass in an initialized mbuf even if didn't have any data - but that seems rather contrived.
|
174651 |
16-Dec-2007 |
kmacy |
Update tod_connect call to reflect updated interface
|
174648 |
16-Dec-2007 |
kmacy |
Move arp update upcall to always be called for ARP replies - previous invocation would not always get called at the appropriate times
|
174642 |
16-Dec-2007 |
kmacy |
Update the toedev's connect interface to reflect the fact that the inpcb doesn't cache the rtentry in HEAD.
|
174636 |
16-Dec-2007 |
kmacy |
Add socket option for setting and retrieving the congestion control algorithm. The name used is to allow compatibility with Linux.
|
174623 |
15-Dec-2007 |
kmacy |
make naming prefixes consistent across tom_info
|
174569 |
13-Dec-2007 |
kmacy |
Fix error in previous commit - the style fix changed flag name without changing references to the flag
|
174560 |
12-Dec-2007 |
kmacy |
Fix style issues with initial TCP offload commit
Requested by: rwatson Submitted by: rwatson
|
174559 |
12-Dec-2007 |
kmacy |
add interface for allowing consumers to register for ARP updates, redirects, and path MTU changes
Reviewed by: silby
|
174558 |
12-Dec-2007 |
kmacy |
Add interface for tcp offload to syncache: - make neccessary changes to release offload resources when a syncache entry is removed before connection establishment - disable checks for offloaded connection where insufficient information is available
Reviewed by: silby
|
174556 |
12-Dec-2007 |
kmacy |
Add driver independent interface to offload active established TCP connections
Reviewed by: silby
|
174545 |
12-Dec-2007 |
kmacy |
Remove spurious timestamp check. RFC 1323 explicitly states that timestamps MAY be transmitted if negotiated.
|
174479 |
09-Dec-2007 |
dwmalone |
If we are walking the IPv6 header chain and we hit an IPPROTO_NONE header, then don't try to pullup anything, because there is no next header if we hit IPPROTO_NONE. Set ulp to a non-NULL value so the search for an upper layer header terinates.
This is based on Pekka's diagnosis, but I chose a simpler fix.
PR: 115261 Submitted by: Pekka Savola <pekkas@netcore.fi> Reviewed by: mlaier MFC after: 2 weeks
|
174388 |
07-Dec-2007 |
kmacy |
Add padding for anticipated functionality - vimage - TOE - multiq - host rtentry caching
Rename spare used by 80211 to if_llsoftc
Reviewed by: rwatson, gnn MFC after: 1 day
|
174387 |
07-Dec-2007 |
rrs |
- More fixes for lock misses on the transfer of data to the sent_queue. Sometimes I wonder why any code ever works :-) - Fix the pad of the last mbuf routine, It was working improperly on non-4 byte aligned chunks which could cause memory overruns.
MFC after: 1 week
|
174348 |
06-Dec-2007 |
des |
Simpler version of the previous commit.
|
174323 |
06-Dec-2007 |
rrs |
- optimize the initialization of the SB max variables. - Missing lock when sending data and moving it to the outqueue. - If a mbuf alloc fails during moving to outqueue the reassembly of the old mbuf chain was incorrect. - some_taken becomes a counter in sctputil.c instead of a set to 1. - Fix a panic to be only under invarients and have a proper recovery. - msg_flags needed to be set.to the value collected not or'd.
MFC after: 1 week
|
174266 |
04-Dec-2007 |
rrs |
- More fixes for the non-blocking msg send, had the skip of the pre-block test incorrect. - Fix the initial buf calculation to be more friendly, calc is the same but we use different variable to make it easier amongst the different code versions.
MFC after: 1 week
|
174258 |
04-Dec-2007 |
rrs |
- Opps, signedness issue with one of the new var's (this is an issue mainly in apple but with the right -Wall it could effect us too).
MFC after: 1 week
|
174257 |
04-Dec-2007 |
rrs |
- Found a problem in non-blocking sends. When sending, once the locks are all unlocked to do the copy's in, its possible that other events could then raise the number of bytes outstanding pushing it so not all the message would fit. This would then cause us to send only part of the message. This fix makes it so we keep a "reserved" amount that can be kept in mind when making calculations to send. - rcv msg args with a NULL/NULL for to/tolen will return an error incorrectly for the 1-2-1 model. - We were not doing 0 len return correctly and not setting cantrcv more correctly. Previouly we "fixed" this area by taking out the socantrcv since we then could not get the data out. The correct rix is to still flag the socket but alow a by-pass route to continue to read until all data is consumed.
MFC after: 1 week
|
174256 |
04-Dec-2007 |
yar |
For the sake of convenience, print the name of the network interface IPv4 address duplication was detected on.
Idea by: marck
|
174248 |
04-Dec-2007 |
silby |
Fix SACK negotiation that was broken in rev 1.105.
Before this fix, FreeBSD would negotiate SACK on outgoing connections, but would always fail to negotiate it on incoming connections.
Discovered by: James Healy and Lawrence Stewart Submitted by: James Healy and Lawrence Stewart MFC after: 3 days
|
174171 |
02-Dec-2007 |
guido |
Consider the following situation: 1. A packet comes in that is to be forwarded 2. The destination of the packet is rewritten by some firewall code 3. The next link's MTU is too small 4. The packet has the DF bit set
Then the current code is such that instead of setting the next link's MTU in the ICMP error, ip_next_mtu() is called and a guess is sent as to which MTU is supposed to be tried next. This is because in this case ip_forward() is called with srcrt set to 1. In that case the ia pointer remains NULL but it is needed to get the MTU of the interface the packet is to be sent out from. Thus, we always set ia to the outgoing interface.
MFC after: 2 weeks
|
174120 |
30-Nov-2007 |
bz |
Centralize and correct computation of TCP-MD5 signature offset within the packet (tcp header options field).
Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@)
|
174119 |
30-Nov-2007 |
bz |
Move call to tcp_signature_compute() after we adjusted the payload offset in the tcp header. With relevant parts of the tcp header changing after the 'signature' was computed, the signature becomes invalid.
Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@)
|
174023 |
28-Nov-2007 |
bz |
Let opt be an array. Though &opt[0] == opt == &opt, &opt is highly confusing and hard to understand so change it to just opt and remove the extra cast no longer/not needed.
Discussed with: rwatson MFC after: 3 days
|
174022 |
28-Nov-2007 |
bz |
Correctly get the authentication key for TCP-MD5 from the SA.
Submitted by: Nick Hilliard on net@ MFC after: 8 weeks
|
173884 |
24-Nov-2007 |
rwatson |
More carefully handle various cases in sysctl_drop(), such as unlocking the inpcb when there's an inpcb without associated timewait state, and not unlocking when the inpcb has been freed. This avoids a kernel panic when tcpdrop(8) is run on a socket in the TIMEWAIT state.
MFC after: 3 days Reported by: Rako <rako29 at gmail dot com>
|
173874 |
23-Nov-2007 |
jb |
Fix strict alias warnings.
|
173835 |
21-Nov-2007 |
bz |
Make TSO work with IPSEC compiled into the kernel.
The lookup hurts a bit for connections but had been there anyway if IPSEC was compiled in. So moving the lookup up a bit gives us TSO support at not extra cost.
PR: kern/115586 Tested by: gallatin Discussed with: kmacy MFC after: 2 months
|
173771 |
20-Nov-2007 |
silby |
Comment out the syncache's test which ensures that hosts which negotiate TCP timestamps in the initial SYN packet actually use them in the rest of the connection. Unfortunately, during the 7.0 testing cycle users have already found network devices that violate this constraint.
RFC 1323 states 'and may send a TSopt in other segments' rather than 'and MUST send', so we must allow it.
Discovered by: Rob Zietlow Tracked down by: Kip Macy PR: bin/118005
|
173706 |
17-Nov-2007 |
oleg |
- New sysctl variable: net.inet.ip.dummynet.io_fast If it is set to zero value (default) dummynet module will try to emulate real link as close as possible (bandwidth & latency): packet will not leave pipe faster than it should be on real link with given bandwidth. (This is original behaviour of dummynet which was altered in previous commit) If it is set to non-zero value only bandwidth is enforced: packet's latency can be lower comparing to real link with given bandwidth.
- Document recently introduced dummynet(4) sysctl variables.
Requested by: luigi, julian MFC after: 3 month
|
173509 |
10-Nov-2007 |
rrs |
- Fix a bug in sctp_calc_rwnd() which resulted in wrong rwnd predictions. - Fix a signedness problem that shows up in some 64 bit platforms (macos).
MFC after: 1 week
|
173399 |
06-Nov-2007 |
oleg |
1) dummynet_io() declaration has changed. 2) Alter packet flow inside dummynet: allow certain packets to bypass dummynet scheduler. Benefits are:
- lower latency: if packet flow does not exceed pipe bandwidth, packets will not be (up to tick) delayed (due to dummynet's scheduler granularity). - lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip stack later. Such packets can be fastforwarded. - recursion (which can lead to kernel stack exhaution) eliminated. This fix long existed panic, which can be triggered this way: kldload dummynet sysctl net.inet.ip.fw.one_pass=0 ipfw pipe 1 config bw 0 for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done ping -c 1 localhost
3) Three new sysctl nodes are added: net.inet.ip.dummynet.io_pkt - packets passed to dummynet net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow is not changed yet.
MFC after: 3 month
|
173398 |
06-Nov-2007 |
oleg |
style(9) cleanup.
MFC after: 3 month
|
173179 |
30-Oct-2007 |
rrs |
- Change the Time Wait of vtags value to match the cookie-life - Select a tag gains ability to optionally save new tags off in the timewait system. - When looking up associations do not give back a stcb that is in the about-to-be-freed state, and instead continue looking for other candiates. - New function to query to see if value is in time-wait. - Timewait had a time comparison error that caused very few vtags to actually stay in time-wait. - When setting tags in time-wait, we now use the time requested NOT a fixed constant value. - sstat now gets the proper associd when we do the query. - When we process an association, we expect the tag chosen (if we have one from a cookie) to be in time-wait. Before we would NOT allow the assoc up by checking if its good. In theory this should have caused almost all assoc not to come up except for the time-comparison bug above (this bug was hidden by the time comparison bug :-D). - Don't save tags for nonce values in the time-wait cache since these are used only during cookie collisions and do not matter if they are unique or not. MFC after: 1 week
|
173102 |
28-Oct-2007 |
rwatson |
Continue to move from generic network entry points in the TrustedBSD MAC Framework by moving from mac_mbuf_create_netlayer() to more specific entry points for specific network services:
- mac_netinet_firewall_reply() to be used when replying to in-bound TCP segments in pf and ipfw (etc).
- Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite a label in place, but in others we apply the label to a new mbuf.
Obtained from: TrustedBSD Project
|
173095 |
28-Oct-2007 |
rwatson |
Move towards more explicit support for various network protocol stacks in the TrustedBSD MAC Framework:
- Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send() for AARP packet labeling, rather than using a generic link layer entry point.
- Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send() for ND6 packet labeling, rather than using a generic link layer entry point.
- Add expliict entry point mac_netinet_arp_send() for ARP packet labeling, and mac_netinet_igmp_send() for IGMP packet labeling, rather than using a generic link layer entry point.
- Remove previous genering link layer entry point, mac_mbuf_create_linklayer() as it is no longer used.
- Add implementations of new entry points to various policies, largely by replicating the existing link layer entry point for them; remove old link layer entry point implementation.
- Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global to the MAC Framework rather than static to mac_net.c as it is now needed outside of mac_net.c.
Obtained from: TrustedBSD Project
|
173018 |
26-Oct-2007 |
rwatson |
Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as we move towards netinet as a pseudo-object for the MAC Framework.
Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to reflect general object-first ordering preference.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
172970 |
25-Oct-2007 |
rwatson |
Normalize TCP syncache-related MAC Framework entry points to match most other entry points in the form mac_<object>_method().
Discussed with: csjp Obtained from: TrustedBSD Project
|
172930 |
24-Oct-2007 |
rwatson |
Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms:
mac_<object>_<method/action> mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names.
All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
|
172836 |
20-Oct-2007 |
julian |
Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
|
172800 |
19-Oct-2007 |
rpaulo |
Remove IPTOS_CE and IPTOS_ECT constants. They were defined in RFC 2481 but later obsoleted by RFC 3168. Discussed on freebsd-net with no objections.
Approved by: njl (mentor), rwatson
|
172795 |
19-Oct-2007 |
silby |
Pick the smallest possible TCP window scaling factor that will still allow us to scale up to sb_max, aka kern.ipc.maxsockbuf.
We do this because there are broken firewalls that will corrupt the window scale option, leading to the other endpoint believing that our advertised window is unscaled. At scale factors larger than 5 the unscaled window will drop below 1500 bytes, leading to serious problems when traversing these broken firewalls.
With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by this algorithm. Those who choose a larger maxsockbuf should watch out for the compatiblity problems mentioned above.
Reviewed by: andre
|
172703 |
16-Oct-2007 |
rrs |
- fix sctp_ifn initial refcount issue (prevents deletion) - fix a bug during cookie collision that prevented an association from coming up in a specific restart case. - Fix it so the shutdown-pending flag gets removed (this is more for correctness then needed) when we enter shutdown-sent or shutdown-ack-sent states. - Fix a bug that caused the receiver to sometimes NOT send a SACK when a duplicate TSN arrived. Without this fix it was possible for the association to fall down if the - Deleted primary destination is also stored when SCTP_MOBILITY_BASE. (Previously, it is stored when only SCTP_MOBILITY_FASTHANDOFF) - Fix a locking issue where we might call send_initiate_ack() and incorrectly state the lock held/not held. Also fix it so that when we release the lock the inp cannot be deleted on us. - Add the debug option that can cause the stack to panic instead of aborting an assoc. This does not and should never show up in options but is useful for debugging unexpected aborts. - Add cumack_log sent to track sending cumack information for the debug case where we are running a special log per assoc. - Added extra () aroudn sctp_sbspace macro to avoid compile warnings. MFC after: 1 week
|
172568 |
12-Oct-2007 |
kevlo |
Spelling fix for interupt -> interrupt
|
172467 |
07-Oct-2007 |
silby |
Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports.
Approved by: re (kensmith)
|
172464 |
07-Oct-2007 |
silby |
Improve the debugging message:
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received data after socket was closed, sending RST and removing tcpcb
So that it also includes how many bytes of data were received. It now looks like this:
TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received X bytes of data after socket was closed, sending RST and removing tcpcb
Approved by: re (gnn)
|
172458 |
06-Oct-2007 |
rrs |
- Fix the one-2-one model to properly do a socantrecv() Approved by: re@freeBSD.org (Ken Smith)
|
172454 |
05-Oct-2007 |
rwatson |
Disable TCP syncache debug logging by default. While useful in debugging problems with the syncache, it produces a lot of console noise and has led to quite a few false positive bug reports. It can be selectively re-enabled when debugging specific problems by frobbing the same sysctl.
Discussed with: silby Approved by: re (gnn)
|
172437 |
04-Oct-2007 |
rrs |
- We should return error = 0 and the upper processing would return a zero length read. Otherwise we don't return the right error indication.
Approved by: re@freebsd.org (gnn)
|
172396 |
01-Oct-2007 |
rrs |
- Bug fix managing congestion parameter on immediate retransmittion by handover event (fast mobility code) - Fixed problem of mobility code which is caused by remaining parameters in the deleted primary destination. - Add a missing lock. When a peer sends an INIT, and while we are processing it to send an INIT-ACK the socket is closed, we did not hold a lock to keep the socket from going away. Add protection for this case. - Fix so that arwnd is alway uses the minimal rwnd if the user has set the socket buffer smaller. Found this when the test org decided to see what happens when you set in a rwnd of 10 bytes (which is not allowed per RFC .. 4k is minimum). - Fixes so a cookie-echo ootb will NOT cause an abort to be sent. This was happening in a MPI collision case. - Examined all panics and unless there was no recovery, moved any that were not already to INVARANTS.
Approved by: re@freebsd.org (gnn)
|
172387 |
29-Sep-2007 |
maxim |
o For dynamic rules log a parent rule number. Prefix a log message by 'ipfw: '.
PR: kern/115755 Submitted by: sem Approved by: re (gnn) MFC after: 4 weeks
|
172312 |
24-Sep-2007 |
kib |
Revert rev. 1.94. After recent tcp backouts, tcp_close() may return NULL. Check the return value of tcp_close() being NULL before dereferencing it in #ifdef TCPDEBUG block.
Reviewed by: rwatson Approved by: re (gnn)
|
172309 |
24-Sep-2007 |
silby |
Two changes:
- Reintegrate the ANSI C function declaration change from tcp_timer.c rev 1.92
- Reorganize the tcpcb structure so that it has a single pointer to the "tcp_timer" structure which contains all of the tcp timer callouts. This change means that when the single tcp timer change is reintegrated, tcpcb will not change in size, and therefore the ABI between netstat and the kernel will not change.
Neither of these changes should have any functional impact.
Reviewed by: bmah, rrs Approved by: re (bmah)
|
172307 |
23-Sep-2007 |
csjp |
Certain consumers of rtalloc like gif(4) and if_stf(4) lookup the route and once they are done with it, call rtfree(). rtfree() should only be used when we are certain we hold the last reference to the route. This bug results in console messages like the following:
rtfree: 0xc40f7000 has 1 refs
This patch switches the rtfree() to use RTFREE_LOCKED() instead, which should handle the reference counting on the route better.
Approved by: re@ (gnn) Reviewed by: bms Reported by: many via net@ and current@ Tested by: many
|
172266 |
21-Sep-2007 |
rrs |
- fix (global) address handling in the presence of duplicates, the last interface should own the address, but the current code fumbles the handoff. This fixes that. - move address related debugs to PCB4 and add additional ones to help in debugging address problems.
Approved by: re@freebsd.org (K Smith)
|
172218 |
18-Sep-2007 |
rrs |
- The address lock is changed to a rwlock. This also involves macro changes to have a RLOCK and a WLOCK and placing the correct version within the code. - The INP-INFO lock is changed to a rwlock. - When sctp_shutdown() is called on Mac OS X, the socket lock is held. So call sctp_chunk_output with SCTP_SO_LOCKED and not SCTP_SO_NOT_LOCKED. - Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X. - u_int64_t -> uint64_t - add missing addr unlock for error return path Approved by: re@freebsd.org (K Smith)
|
172203 |
16-Sep-2007 |
rrs |
- For the 1-to-1 model, fix an off by one error that allowed an extra connection over the backlog (by one) Approved by: re@freebsd.org (B. Mah)
|
172190 |
15-Sep-2007 |
rrs |
- Get rid of unsused constants for sysctl variables. - Fix panic from mutex unlock on freed lock when ASCONF-ACK aborts an assoc - Fix panic from addr lock recursion when ASCONFs are queued in the front states - ASCONFs "queued" in the front states should really be bundled after the COOKIE-ACK, not in front of it - Fix issue with addresses deleted in the front states from being sent with ASCONF(DELETE)-- replaced sctp_asconf_queue_add_sa() with delete specific function - Comment change in sctp.h the drafts are now RFC's Approved by: re@freebsd.org (B Mah)
|
172157 |
13-Sep-2007 |
rrs |
- DF bit was on for COOKIE-ECHO chunks. This is incorrect and should be OFF letting IP fragment large cookie-echos. - Rename sysctl variable logging to log_level. - Fix description of sysctl variable stats. - Add sysctl variable log to make sctp_log readable via sysctl mechanism (this is by compile switch and targets non KTR platforms or when someone wants to do performance wise tracing). - Removed debug code
Approved by: re@freebsd.org (B Mah)
|
172156 |
13-Sep-2007 |
rrs |
- Incorrect error EAGAIN returned for invalid send on a locked stream (using EEOR mode). Changed to EINVAL (in sctp_output.c) - Static analysis comments added - fix in mobility code to return a value (static analysis found). - sctp6_notify function made visible instead of static (this is needed for Panda).
Approved by: re@freebsd.org (B Mah)
|
172137 |
10-Sep-2007 |
rrs |
- Removed debug code and more C++ style comments in the mobility code in sctp_asconf.c Approved by: re@freebsd.org (B Mah)
|
172118 |
10-Sep-2007 |
rrs |
- Added some comments to tell where the htcp code comes from. - Fix a LOR on Mac OS X: Do not hold an stcb lock when calling soisconnected for a socket which has the SS_INCOMP bit set on so_state. - fix a comment to be non c++ style.
Approved by: re@freebsd.org (B Mah)
|
172116 |
10-Sep-2007 |
kensmith |
Make sure that either inp is NULL or we have obtained a lock on it before jumping to dropunlock to avoid a panic. While here move the calls to ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain the lock on inp.
Original patch to avoid panic: pjd Review of locking adjustments: gnn, sam Approved by: re (rwatson)
|
172114 |
10-Sep-2007 |
rwatson |
Further UDPv4 cleanup:
- Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist.
Approved by: re (kensmith)
|
172091 |
08-Sep-2007 |
rrs |
- send call has a reference to uio->uio_resid in the recent send code, but uio may be NULL on sendfile calls. Change to use sndlen variable. - EMSGSIZE is not being returned in non-blocking mode and needs a small tweak to look if the msg would ever fit when returning EWOULDBLOCK. - FWD-TSN has a bug in stream processing which could cause a panic. This is a follow on to the codenomicon fix. - PDAPI level 1 and 2 do not work unless the reader gets his returned buffer full. Fix so we can break out when at level 1 or 2. - Fix fast-handoff features to copy across properly on accepted sockets - Fix sctp_peeloff() system call when no true system call exists to screen arguments for errors. In cases where a real system call exists the system call itself does this. - Fix raddr leak in recent add-ip code change for bundled asconfs (even when non-bundled asconfs are received) - Make sure ipi_addr lock is held when walking global addr list. Need to change this lock type to a rwlock(). - Add don't wake flag on both input and output when the socket is closing. - When deleting an address verify the interface is correct before allowing the delete to process. This protects panda and unnumbered. - Clean up old sysctl stuff and get rid of the old Open/Net BSD structures. - Add a function to watch the ranges in the sysctl sets. - When appending in the reassembly queue, validate that the assoc has not gone to about to be freed. If so (in the middle) abort out. Note this especially effects MAC I think due to the lock/unlock they do (or with LOCK testing in place). - Netstat patch to get rid of warnings. - Make sure that no data gets queued to inactive/unconfirmed destinations. This especially effect CMT but also makes a impact on regular SCTP as well. - During init collision when we detect seq number out of sync we need to treat it like Case C and discard the cookie (no invarient needed here). - Atomic access to the random store. - When we declare a vtag good, we need to shove it into the time wait hash to prevent further use. When the tag is put into the assoc hash, we need to remove it from the twait hash (where it will surely be). This prevents duplicate tag assignments. - Move decr-ref count to better protect sysctl out of data. - ltrace error corrections in sctp6_usrreq.c - Add hook for interface up/down to be sent to us. - Make sysctl() exported structures independent of processor architecture. - Fix route and src addr cache clearing for delete address case. - Make sure address marked SCTP_DEL_IP_ADDRESS is never selected as src addr. - in icmp handling fixed so we actually look at the icmp codes to figure out what to do. - Modified mobility code. Reception of DELETE IP ADDRESS for a primary destination and SET PRIMARY for a new primary destination is used for retransmission trigger to the new primary destination. Also, in this case, destination of chunks in send_queue are changed to the new primary destination. - Fix so that we disallow sending by mbuf to ever have EEOR mode set upon it.
Approved by: re@freebsd.org (B Mah)
|
172090 |
08-Sep-2007 |
rrs |
- Locking compatiability changes. This involves adding additional flags to many function calls. The flags only get used in BSD when we compile with lock testing. These flags allow apple to escape the "giant" lock it holds on the socket and have more fine-grained locking in the NKE. It also allows us to test (with witness) the locking used by apple via a compile switch (manually applied).
Approved by: re@freebsd.org(B Mah)
|
172074 |
07-Sep-2007 |
rwatson |
Back out tcp_timer.c:1.93 and associated changes that reimplemented the many TCP timers as a single timer, but retain the API changes necessary to reintroduce this change. This will back out the source of at least two reported problems: lock leaks in certain timer edge cases, and TCP timers continuing to fire after a connection has closed (a bug previously fixed and then reintroduced with the timer rewrite).
In a follow-up commit, some minor restylings and comment changes performed after the TCP timer rewrite will be reapplied, and a further change to allow the TCP timer rewrite to be added back without disturbing the ABI. The new design is believed to be a good thing, but the outstanding issues are leading to significant stability/correctness problems that are holding up 7.0.
This patch was generated by silby, but is being committed by proxy due to poor network connectivity for silby this week.
Approved by: re (kensmith) Submitted by: silby Tested by: rwatson, kris Problems reported by: peter, kris, others
|
172006 |
29-Aug-2007 |
green |
Repair ALTQ-tagging rules in IPFW which got broken in the last PF import. The PF mbuf-tagging support routines changed to link the allocated tags into the provided mbuf themselves, so the left-over m_tag_prepend() was trying to add a bogus (usually NULL) tag.
Reviewed by: mlaier Approved by: re
|
171990 |
27-Aug-2007 |
rrs |
- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c)
Approved by: re@freebsd.org(Ken Smith)
|
171989 |
26-Aug-2007 |
maxim |
o Fix bug I introduced in the previous commit (ipfw set extention): pack a set number correctly.
Submitted by: oleg
o Plug a memory leak.
Submitted by: oleg and Andrey V. Elsukov Approved by: re (kensmith) MFC after: 1 week
|
171943 |
24-Aug-2007 |
rrs |
- Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. - Fix sctp_lower_sosend to send shutdown chunk for mbuf send case when sndlen = 0 and sinfoflag = SCTP_EOF - Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data, So that it does not send the "null" data mbuf out and cause it to get freed twice. - Fix so auto-asconf sysctl actually effect the socket's asconf state. - Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets. - Memset bug in sctp_output.c (arguments were reversed) submitted found and reported by Dave Jones (davej@codemonkey.org.uk). - PD-API point needs to be invoked >= not just > to conform to socket api draft this fixes sctp_indata.c in the two places need to be >=. - move M_NOTIFICATION to use M_PROTO5. - PEER_ADDR_PARAMS did not fail properly if you specify an address that is not in the association with a valid assoc_id. This meant you got or set the stcb level values instead of the destination you thought you were going to get/set. Now validate if the stcb is non-null and the net is NULL that the sa_family is set and the address is unspecified otherwise return an error. - The thread based iterator could crash if associations were freed at the exact time it was running. rework the worker thread to use the increment/decrement to prevent this and no longer use the markers that the timer based iterator uses. - Fix the memleak in sctp_add_addr_to_vrf() for the case when it is detected that ifa is already pointing to a ifn. - Fix it so that if someone is so insane that they drop the send window below the minimal add mark, they still can send. - Changed all state for associations to use mask safe macro. - During front states in association freeing in sctp_inpcbfree, we had a locking problem where locks were not in place where they should have been. - Free association calls were not testing the return value in sctp_inpcb_free() properly... others should be cast void returns where we don't care about the return value. - If a reference count is held on an assoc, even from the "force free" we should not do the actual free.. but instead let the timer free it. - When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED flag is set, we must NOT process the packet but handle it like ootb. This is because while freeing an assoc we release the locks to get all the higher order locks so we can purge all the hash tables. This leaves a hole if a packet comes in just at that point. Now sctp_common_input_processing() will call the ootb code in such a case. - Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes it so we don't have a conflict (I think this is a covertity change). We made this change AFTER some conversation and looking to make sure that M_PROTO5 does not have a problem between SCTP and the 802.11 stuff (which is the only other place its used). - Fixed lock order reversal and missing atomic protection around locked_tcb during association lookup and the 1-2-1 model. - Added debug to source address selection. - V6 output must always do checksum even for loopback. - Remove more locks around inp that are not needed for an atomically added/subtracted ref count. - slight optimization in the way we zero the array in sctp_sack_check() - It was possible to respond to a ABORT() with bad checksum with a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT send a PKT-DROP to any ABORT(). - Add an option for local logging (useful for macintosh or when you need better performing during debugging). Note no commands are here to get the log info, you must just use kgdb. - The timer code needs to be aware of if it needs to call sctp_sack_check() to slide the maps and adjust the cum-ack. This is because it may be out of sync cum-ack wise. - Added threshold managment logging. - If the user picked just the right size, that just filled the send window minus one mtu, we would enter a forever loop not copying and at the same time not blocking. Change from < to <= solves this. - Sysctl added to control the fragment interleave level which defaults to 1. - My rwnd control was not being used to control the rwnd properly (we did not add and subtract to it :-() this is now fixed so we handle small messages (1 byte etc) better to bring our rwnd down more slowly.
Approved by: re@freebsd.org (Bruce Mah)
|
171858 |
16-Aug-2007 |
rrs |
- Remove extra comment for 7.0 (no GIANT here). - Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock. - Fix panic that may occur when freeing an assoc that has partial delivery in progress (may dereference null socket pointer when queuing partial delivery aborted notification) - Some spacing and comment fixes. - Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. Approved by: re@freebsd.org (Bruce Mah)
|
171857 |
16-Aug-2007 |
qingli |
Use the sequence number comparison macro to compare projected_offset against isn_offset to account for wrap around.
Reviewed by: gnn, kmacy, silby Submitted by: yusheng.huang@bluecoat.com Approved by: re MFC: 3 days
|
171746 |
06-Aug-2007 |
csjp |
Over the past couple of years, there have been a number of reports relating the use of divert sockets to dead locks. A number of LORs have been reported between divert and a number of other network subsystems including: IPSEC, Pfil, multicast, ipfw and others. Other dead locks could occur because of recursive entry into the IP stack. This change should take care of most if not all of these issues.
A summary of the changes follow:
- We disallow multicast operations on divert sockets. It really doesn't make semantic sense to allow this, since typically you would set multicast parameters on multicast end points.
NOTE: As a part of this change, we actually dis-allow multicast options on any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family
- We check to see if there are any socket options that have been specified on the socket, and if there was (which is very un-common and also probably doesnt make sense to support) we duplicate the mbuf carrying the options.
- We then drop the INP/INFO locks over the call to ip_output(). It should be noted that since we no longer support multicast operations on divert sockets and we have duplicated any socket options, we no longer need the reference to the pcb to be coherent.
- Finally, we replaced the call to ip_input() to use netisr queuing. This should remove the recursive entry into the IP stack from divert.
By dropping the locks over the call to ip_output() we eliminate all the lock ordering issues above. By switching over to netisr on the inbound path, we can no longer recursively enter the ip_input() code via divert.
I have tested this change by using the following command:
ipfwpcap -r 8000 - | tcpdump -r - -nn -v
This should exercise the input and re-injection (outbound) path, which is very similar to the work load performed by natd(8). Additionally, I have run some ospf daemons which have a heavy reliance on raw sockets and multicast.
Approved by: re@ (kensmith) MFC after: 1 month LOR: 163 LOR: 181 LOR: 202 LOR: 203 Discussed with: julian, andre et al (on freebsd-net) In collaboration with: bms [1], rwatson [2]
[1] bms helped out with the multicast decisions [2] rwatson submitted the original netisr patches and came up with some of the original ideas on how to combat this issue.
|
171745 |
06-Aug-2007 |
rrs |
- change number assignments for SHA225-512 (match artisync for bakeoff.. using the next sequential ones) - In cookie processing 1-2-1, we did not increment the stcb refcnt before releasing the tcb lock. We need to do this to keep the tcb from being freed by a abort or ?? unlikely but worth doing. Also get rid of unneed INP_WLOCK. - extra receive info included the rcvinfo which killed the padding/alignment. We now redefine all the fields properly so they both align properly both to 128 bytes. - A peeled off socket would not close without an error due to its misguided idea that sctp_disconnect() was not supported on it. This fixes it so it goes through the proper path. - When an assoc was being deleted after abort (via a timer) a small race condition exists where we might take a packet for the old assoc (since we are waiting for a cleanup timer). This state especially happens in mac. We now add a state in the asoc so these can properly handle the packet as OOTB. Approved by: re@freebsd.org(Ken Smith)
|
171744 |
06-Aug-2007 |
rwatson |
Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency.
Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
|
171732 |
05-Aug-2007 |
bz |
Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. Also rename the related functions in a similar way. There are no functional changes.
For a packet coming in with IPsec tunnel mode, the default is to only call into the firewall with the "outer" IP header and payload.
With this option turned on, in addition to the "outer" parts, the "inner" IP header and payload are passed to the firewall too when going through ip_input() the second time.
The option was never only related to a gif(4) tunnel within an IPsec tunnel and thus the name was very misleading.
Discussed at: BSDCan 2007 Best new name suggested by: rwatson Reviewed by: rwatson Approved by: re (bmah)
|
171677 |
31-Jul-2007 |
peter |
Change TCPTV_MIN to be independent of HZ. While it was documented to be in ticks "for algorithm stability" when originally committed, it turns out that it has a significant impact in timing out connections. When we changed HZ from 100 to 1000, this had a big effect on reducing the time before dropping connections.
To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet and establish a reliable round-trip-time (ie: type a few commands). Then unplug the ethernet and press a key. Time how long it takes to drop the connection.
The old behavior (with hz=100) caused the connection to typically drop between 90 and 110 seconds of getting no response.
Now boot with kern.hz=1000 (default). The same test causes the ssh session to drop after just 9-10 seconds. This is a big deal on a wifi connection.
With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30. Note how it behaves the same as when HZ was 100. Also, note that when booting with hz=100, net.inet.tcp.rexmit_min *used* to be 30.
This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should always be about 30. If you set hz to Really Slow(TM), there is a safety feature to prevent a value of 0 being used.
This may be revised in the future, but for the time being, it restores the old, pre-hz=1000 behavior, which is significantly less annoying.
As a workaround, to avoid rebooting or rebuilding a kernel, you can run "sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30" to /etc/sysctl.conf. This is safe to run from 6.0 onwards.
Approved by: re (rwatson) Reviewed by: andre, silby
|
171656 |
30-Jul-2007 |
des |
Make tcpstates[] static, and make sure TCPSTATES is defined before <netinet/tcp_fsm.h> is included into any compilation unit that needs tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG conditionals. This allows kernels both with and without TCPDEBUG to build, and unbreaks the tinderbox.
Approved by: re (rwatson)
|
171652 |
29-Jul-2007 |
bmah |
Fix a typo in a log message: s/Reveived/Received/.
Approved by: re (rwatson)
|
171648 |
29-Jul-2007 |
mjacob |
Fix compilation problems- tcpstates is only available if TCPDEBUG is set.
Approved by: re (in spirit)
|
171643 |
28-Jul-2007 |
silby |
Fix a panic introduced in rev 1.126.
Approved by: re (rwatson)
|
171640 |
28-Jul-2007 |
andre |
Provide a sysctl to toggle reporting of TCP debug logging:
sys.net.inet.tcp.log_debug = 1
It defaults to enabled for the moment and is to be turned off for the next release like other diagnostics from development branches.
It is important to note that sysctl sys.net.inet.tcp.log_in_vain uses the same logging function as log_debug. Enabling of the former also causes the latter to engage, but not vice versa.
Use consistent terminology in tcp log messages:
"ignored" means a segment contains invalid flags/information and is dropped without changing state or issuing a reply.
"rejected" means a segments contains invalid flags/information but is causing a reply (usually RST) and may cause a state change.
Approved by: re (rwatson)
|
171639 |
28-Jul-2007 |
andre |
o Move setting/resetting logic of syncache timer from macro SYNCACHE_TIMEOUT to new function syncache_timeout(). o Fix inverted timeout callout engagement logic to actually enable the timer for the bucket row. Before SYN|ACK was not retransmitted. o Simplify SYN|ACK retransmit timeout backoff calculation. o Improve logging of retransmit and timeout events. o Reset timeout when duplicate SYN arrives. o Add comments. o Rearrange SYN cookie statistics counting.
Bug found by: silby Submitted by: silby (different version) Approved by: re (rwatson)
|
171638 |
28-Jul-2007 |
andre |
o Move all detailed checks for RST in LISTEN state from tcp_input() to syncache_rst(). o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before a RST for a connection in syncache did not properly free the entry. o Add more detailed logging.
Approved by: re (rwatson)
|
171637 |
28-Jul-2007 |
rwatson |
Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed.
The once over: bz Approved by: re (kensmith)
|
171605 |
27-Jul-2007 |
silby |
Export the contents of the syncache to netstat.
Approved by: re (kensmith) MFC after: 2 weeks
|
171591 |
25-Jul-2007 |
andre |
Fix comments in tcp_do_segment().
Approved by: re (kensmith)
|
171572 |
24-Jul-2007 |
rrs |
- take out a needless panic under invariants for sctp_output.c - Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than SCTP_SMALL_IOVEC_SIZE - re-add back inpcb_bind local address check bypass capability - Fix it so sctp_opt_info is independant of assoc_id postion. - Fix cookie life set to use MSEC_TO_TICKS() macro. - asconf changes o More comment changes/clarifications related to the old local address "not" list which is now an explicit restricted list.
o Rename some functions for clarity: - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted() - asconf related iterator functions to sctp_asconf_iterator_xxx()
o Fix bug when the same address is deleted and added (and removed from the asconf queue) where the ifa is "freed" twice refcount wise, possibly freeing it completely.
o Fix bug in output where the first ASCONF would not go out after the last address is changed (e.g. only goes out when retransmitted).
o Fix bug where multiple ASCONFs can be bundled in the same packet with the and with the same serial numbers.
o Fix asconf stcb iterator to not send ASCONF until after all work queue entries have been processed.
o Change behavior so that when the last address is deleted (auto asconf on a bound all endpoint) no action is taken until an address is added; at that time, an ASCONF add+delete is sent (if the assoc is still up).
o Fix local address counting so that address scoping is taken into account.
o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending of ASCONF (after an RTO). The default now is to send ASCONF immediately (except for the case of changing/deleting the last usable address). Approved by: re(ken smith)@freebsd.org
|
171531 |
21-Jul-2007 |
rrs |
- remove duplicate code from sctp_asconf.c - remove duplicate #include <sys/priv.h> that is not under #ifdef FreeBSD version to allow compile on 6.1 - static analysis changes per the cisco SA tool including: o some SA_IGNORE comments o some checks for NULL before unlock. o type corrections int -> size_t - Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this we pass a NULL in to bind on implicit assoc setup and crash :-( Approved by: re@freebsd.org(Ken Smith)
|
171508 |
19-Jul-2007 |
rwatson |
Attempt to improve feature parity between UDPv4 and UDPv6 by merging UDPv4 features to UDPv6:
- Add MAC checks on delivery and MAC labeling on transmit. - Check for (and reject) datagrams with destination port 0. - For multicast delivery, check the source port only if the socket being considered as a destination has been connected. - Implement UDP blackholing based on net.inet.udp.blackhole. - Add a new ICMPv6 unreachable reply rate limiting category for failed delivery attempts and implement rate limiting for UDPv6 (submitted by bz).
Approved by: re (kensmith) Reviewed by: bz
|
171477 |
17-Jul-2007 |
rrs |
- added pre-checks to the bindx call. - use proper tick gathering macro instead of ticks directly. - Placed reasonable boundaries on sets that a user can do that are converted to ticks from ms. - Fix CMT_PF to always check to be sure CMT is on. - Fix ticks use of CMT_PF. - put back code to allow asconfs to be queued while INITs are in flight and before the assoc is established. - During window probes, an ack'd packet might be left with the window probe mark on it causing it to be retransmitted. Change so that the flight decrease macro clears the window_probe mark. - Additional logging flight size/reading and ASOC LOG. This is only enabled if you manually insert things into opt_sctp.h since its a set of debug code only. - Found an interesting SMP race in the way data was appended which could cause a reader to lose a part of a message, had to reorder when we marked the message was complete to after the data was appended. - bug in ADD-IP for the subset bound socket case when the peer has only one address - fix ASCONF implicit success/error handling case - proper support of jails in Freebsd 6> - copy out the timeval for the 64 bit sparc world on cookie-echo alignment error crashes without this). Approved by: re(Ken Smith)
|
171440 |
14-Jul-2007 |
rrs |
- Modular congestion control, with RFC2581 being the default. - CMT_PF states added (w/sysctl to turn the PF version on) - sctp_input.c had a missing incr of cookie case when the auth was bad. This meant a free was called without an increment to refcnt, added increment like rest of code. - There was a case, unlikely, when the scope of the destination changed (this is a TSNH case). In that case, it would not free the alloc'ed asoc (in sctp_input.c). - When listed addresses found a colliding cookie/Init, then the collided upon tcb was not unlocked in sctp_pcb.c - Add error checking on arguments of sctp_sendx(3) to prevent it from referencing a NULL pointer. - Fix an error return of sctp_sendx(3), it was returing ENOMEM not -1. - Get assoc id was changed to use the sanctified socket api method for getting a assoc id (PEER_ADDR_INFO instead of PEER_ADDR_PARAMS). - Fix it so a peeled off socket will get a proper error return if it trys to send to a different address then it is connected to. - Fix so that select_a_stream can avoid an endless loop that could hang a caller. - time_entered (state set time) was not being set in all cases to the time we went established. Approved by: re(ken smith)
|
171339 |
10-Jul-2007 |
rwatson |
Further cleanup of UDPv4:
- Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present.
- Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable.
- Move udp_append() above udp_input() to match the function order in udp6_usrreq.c.
Approved by: re (kensmith)
|
171317 |
09-Jul-2007 |
bms |
Fix a regression in IPv4 multicast join path (IP_ADD_MEMBERSHIP).
With the in_mcast.c code, if an interface for an IPv4 multicast join was not specified, and a route did not exist for the specified group in the unicast forwarding tables, the join would be rejected with the error EADDRNOTAVAIL. This change restores the old behaviour whereby if no interface is specified, and no route exists for the group destination, the IPv4 address list is walked to find a non-loopback, multicast-capable interface to satisfy the join request. This should resolve problems with starting multicast services during system boot or when a default forwarding entry does not exist.
Approved by: re (rwatson)
|
171290 |
07-Jul-2007 |
rwatson |
Minor UDPv4 cleanup: capitalize comment, move statistics update after mbuf free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after.
Approved by: re (kensmith)
|
171230 |
05-Jul-2007 |
peter |
Fix a second warning, introduced by my last "fix". I committed the wrong diff from the wrong machine.
Pointy hat to: peter Approved by: re (rwatson - blanket, several days ago)
|
171229 |
05-Jul-2007 |
peter |
Fix cast-qualifiers warning when INET6 is not present
Approved by: re (rwatson)
|
171173 |
03-Jul-2007 |
mlaier |
Link pf 4.1 to the build: - move ftp-proxy from libexec to usr.sbin - add tftp-proxy - new altq mtag link
Approved by: re (kensmith)
|
171167 |
03-Jul-2007 |
gnn |
Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC.
Approved by: re Sponsored by: Secure Computing
|
171158 |
02-Jul-2007 |
rrs |
- Consolidate the code that free's chunks to actually also call the sctp_free_remote_address() function. - Assure that when we allocate a chunk the whoTo is NULL, also when we free it and place it into the cache we NULL it (that way the consolidation code will always work). - Fix a small race, when a empty data holder is left on the stream out queue, and both sides do a shutdown, the empty data holder would prevent us from sending a SHUTDOWN-ACK and at the same time we never would cleanup the empty holder (since nothing was ever in queue). We now add a utility function that a) cleans up empty holders and b) properly determines if there are still pending data chunks on the stream out wheel. Approved by: re@freebsd.org (Ken Smith)
|
171157 |
02-Jul-2007 |
rwatson |
Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9) comments.
Approved by: re (bmah)
|
171139 |
01-Jul-2007 |
gnn |
Fix a dangling netinet6 to netipsec transition for SCTP include files.
Approved by: re
|
171133 |
01-Jul-2007 |
gnn |
Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit.
Reviewed by: bz Approved by: re Supported by: Secure Computing
|
171088 |
29-Jun-2007 |
rrs |
- When a SCTP socket is closed, but the last data SACK is lost, we would incorrectly abort the association instead of retransmitting the SACK. Approved by: re@freebsd.org (Ken Smith)
|
171032 |
25-Jun-2007 |
rrs |
- Update bindx address checking to properly screen out address per the socket api, adding port validation. We allow port 0 or the already bound port number and no others.
Approved by: re@freebsd.org (Ken Smith)
|
170994 |
22-Jun-2007 |
rrs |
- Fix type casts in calling sctp_m_getptr, it expects a int not an unsigned (returned by sizeof) also add cast to comparison check for size bounds. Approved by: re(bmah@freebsd.org)
|
170992 |
22-Jun-2007 |
rrs |
- Fix stream reset so it limits the number of streams that can be listed - Fix fwd-tsn to use proper accessor so it does not overrun mbufs - Fix stream reset error reporting to actually work (it has always been broken if the peer rejects a stream reset) - Some 64 bit friendly changes
Approved by: re(bmah@freebsd.org)
|
170943 |
18-Jun-2007 |
rrs |
- Two more static analisys bugs found by cisco's tool on a subsequent run.
|
170931 |
18-Jun-2007 |
rrs |
- Fixes cstatic issues found by cisco sa tool (missing frees and such on error legs) - align sctp_sockstore to 64 bit boundary ..
|
170923 |
18-Jun-2007 |
maxim |
o Make ipfw set more robust -- now it is possible: - to show a specific set: ipfw set 3 show - to delete rules from the set: ipfw set 9 delete 100 200 300 - to flush the set: ipfw set 4 flush - to reset rules counters in the set: ipfw set 1 zero
PR: kern/113388 Submitted by: Andrey V. Elsukov Approved by: re (kensmith) MFC after: 6 weeks
|
170921 |
18-Jun-2007 |
rrs |
Add additional logging level mask for packet_logging too.
|
170899 |
17-Jun-2007 |
rrs |
- The packet log needs to copy all of the buffer not to the end.
|
170894 |
17-Jun-2007 |
rrs |
Back out last change to inpcb_free. Turns out we need to hold off freeing if there is data pending ... someone might do send/close. Which means we want the data to go and then close it after startup. Added comments to the code as well to note that this is done for a reason.
|
170861 |
17-Jun-2007 |
mjacob |
Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0) codepath.
|
170859 |
17-Jun-2007 |
rrs |
- For sctp_input/sctp6_input add announcment when a packet arrives (debug) - re-factor the packet drop in sctp_output a bit more, we don't need the trim after all, but the size calc is now corrected. - When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user closes, it should not matter if data is queued, the assoc should be purged. - In error leg a missing free_chunk when iph comes in NULL (should not happen but just in case).
|
170856 |
17-Jun-2007 |
mjacob |
Replace incorrect local OFFSET_OF macro with the correct and generic offsetof macro.
|
170855 |
17-Jun-2007 |
mjacob |
Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr to nonzero you fulfill the same function as the variable 'cmp'. so you might as well zero match and test against it later.
Reviewed by: timeout on review request
|
170824 |
16-Jun-2007 |
rrs |
- Better handle sending large pkt-drops. We were not triming the data with m_adj if a large pkt arrived with a bad csum some systems can't handle you not triming the tail (think panda :-D)
|
170814 |
16-Jun-2007 |
rrs |
- Raise max range of sctp_logging sysctl so panda does not disallow us to turn on logging levels.
|
170806 |
16-Jun-2007 |
rrs |
- Matthew's changes to get inlines out, plus a few of my own to deal with the VRF inline function -> becomes a macro now. Submitted by: Matthew Jacobs
|
170800 |
15-Jun-2007 |
mjacob |
Garbage collect some debug code that not only no longer could work but in fact probably causes a random pointer dereferences. Garbage collect the tp variable too.
|
170791 |
15-Jun-2007 |
rrs |
Name change SCTP_KTR_SUBSYS -> KTR_SCTP
|
170790 |
15-Jun-2007 |
rrs |
Remove extraneous extern (its gotten from sctp_sysctl.h)
|
170788 |
15-Jun-2007 |
rrs |
When removing a stream from the output-stream-wheel, if its the first stream we saw we must update the starting point in the wheel, else we may loop in an endless loop.
|
170786 |
15-Jun-2007 |
rrs |
- Update the comment lines in sctp_input.c - We need to init the INP_LOCK since otherwise for non-SMP kernels you crash when you set the TOS.
|
170785 |
15-Jun-2007 |
bms |
Stub out imported IGMPv3 definitions which clash with those of the XORP router; the IGMPv3 definitions will be updated at a later point in time when IGMPv3/MLDv2 support is fully merged.
|
170781 |
15-Jun-2007 |
rrs |
- Issue one, new stack reduction left packet_drop handling still thinking it had the whole chunk. This could cause a crash if a large packet drop came in. Fixed by adjusting the trunc length down to the limit. - Large sacks with lots of segments could also have same issue. Changed duplicate and segment handling to use proper get_m_ptr function to pull each block from mbuf chains.
|
170751 |
15-Jun-2007 |
rrs |
- Add VRF id to sctp_ifa structure, needed mainly in panda but useful during deletes of ifa's in diff VRF's when applicable.
|
170747 |
15-Jun-2007 |
rrs |
KTR_GEN -> KTR_SUBSYS (for Kris).
|
170744 |
14-Jun-2007 |
rrs |
- Fix so ifn's are properly deleted when the ref count goes to 0. - Fix so VRF's will clean themselves up when no references are around. - Allow sctp_ifa to be passed into inpcb_bind, addr_mgmt_ep_sa to bypass normal validation checks. - turn auto-asconf off for subset bound sockets - Moves all logging to use KTR. This gets rid of most of the logging #ifdef's with a few exceptions reducing the number of config options for SCTP.
|
170665 |
13-Jun-2007 |
rrs |
- fix bindx to check addresses against socket's protocol family
|
170664 |
13-Jun-2007 |
rwatson |
Remove IPX over IP tunneling support, which allows IPX routing over IP tunnels, and was not MPSAFE. The code can be easily restored in the event that someone with an IPX over IP tunnel configuration can work with me to test patches.
This removes one of five remaining consumers of NET_NEEDS_GIANT.
Approved by: re (kensmith)
|
170642 |
13-Jun-2007 |
rrs |
- Fixed cookie handling to calc an RTO when its an INIT collision case. - Fixed RTO calc to maintain a seperate variable to track if a RTO calc as been done, this allows the RTO var to be doubled during initial timeouts. - Reduces the amount of stack used by process control. - Use a constant for the peer chunk overhead. - Name change to spell candidate correctly.
|
170613 |
12-Jun-2007 |
bms |
Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)
|
170606 |
12-Jun-2007 |
rrs |
- Restructure so bindx functions are not done inline to socket option but are a seperate call that can be re-used if needed. - 64 bit issues o re-arrange cookie so it is better 64 bit aligned o For wire level things we need the packed attribute.
|
170587 |
12-Jun-2007 |
rwatson |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp Obtained from: TrustedBSD Project
|
170516 |
10-Jun-2007 |
andre |
Fix a case in tcp_do_segment() where tcp_update_sack_list() would be called with an incorrect segment end value. tcp_reass() may trim segments when they overlap with already existing ones in the reassembly queue. Instead of saving the segment end value before the call to tcp_reass() compute it on the fly based on the effective segment length afterwards.
This bug was not really problematic as no information got lost and the eventual SACK information computation was correct nontheless.
MFC after: 1 week
|
170515 |
10-Jun-2007 |
andre |
Fix style for comments, be more verbose and add some more.
|
170470 |
09-Jun-2007 |
andre |
Make the handling of the tcp window explicit for the SYN_SENT case in tcp_outout(). This is currently not strictly necessary but paves the way to simplify the entire SYN options handling quite a bit. Clarify comment. No change in effective behavour with this commit.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled.
|
170469 |
09-Jun-2007 |
andre |
Remove some bogosity from the SYN_SENT case in tcp_do_segment and simplify handling of the send/receive window scaling. No change in effective behavour.
RFC1323 requires the window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself never be scaled.
Noticed by: yar
|
170467 |
09-Jun-2007 |
andre |
Don't send pure window updates when the peer has closed the connection and won't ever send more data.
|
170464 |
09-Jun-2007 |
andre |
Handle a race condition on >2 core machines in tcp_timer() when a timer issues a shutdown and a simultaneous close on the socket happens. This race condition is inherent in the current socket/ inpcb life cycle system but can be handled well.
Reported by: kris Tested by: kris (on 8-core machine)
|
170463 |
09-Jun-2007 |
rrs |
- Opps.. takes out debug printfs I accidentally left in :-(
|
170462 |
09-Jun-2007 |
rrs |
- fix send_failed notification contents - Reorder send failed to be in correct order. - Fixed calulation of init-ack to be right off mbuf lengths instead of the precalculated value. This will fix one 64 bit platform issue.
|
170435 |
08-Jun-2007 |
yar |
Replace a constant with an already defined symbolic name for it.
Tested with: md5(1)
|
170434 |
08-Jun-2007 |
yar |
Add a sysctl for the purge run interval so that it can be tuned along with the rest of hostcache parameters. The new sysctl name is `net.inet.tcp.hostcache.prune'.
|
170428 |
08-Jun-2007 |
rrs |
- RTO was not being initialized to 0, thus the rtt calculation algoritm would not go through the proper initialization. - The initialization was incorrect as well, causing problems in sat networks with > 1sec RTT - Get rid of magic numbers in RTT calculations.
|
170405 |
07-Jun-2007 |
andre |
In tcp_hc_insert() we may have the case where we have hit the global cache size limit but this bucket row is empty. Normally we want to recycle the oldest entry in the bucket row. If there isn't any the TAILQ_REMOVE leads to a panic by trying to remove a non-existing element. Fix this by just returning NULL and failing the insert. This is not a problem as the TCP hostache is only advisory.
Submitted by: jhb
|
170385 |
06-Jun-2007 |
andre |
Correctly print SEQ and IRS in the corresponding log message in syncache_expand().
|
170373 |
06-Jun-2007 |
glebius |
Do not leak lock in the case of EEXIST error.
PR: kern/92776 Submitted by: Ed Schouten <Ed.Schouten tunix.nl>
|
170354 |
06-Jun-2007 |
rrs |
- Fixes a case where doing a sysctl would leave locks held when coping out association data. - Fixes a small bug that prevented the SCTP_UNORDERED indication from going up to the app on a recv in the sinfo_flags field.
|
170289 |
04-Jun-2007 |
dwmalone |
Despite several examples in the kernel, the third argument of sysctl_handle_int is not sizeof the int type you want to export. The type must always be an int or an unsigned int.
Remove the instances where a sizeof(variable) is passed to stop people accidently cut and pasting these examples.
In a few places this was sysctl_handle_int was being used on 64 bit types, which would truncate the value to be exported. In these cases use sysctl_handle_quad to export them and change the format to Q so that sysctl(1) can still print them.
|
170205 |
02-Jun-2007 |
rrs |
- fix initial pcb vrf setting when the initial vrf is not the default_vrf_id - Missing lock/unlock of inp added as well in the v6 side. - IFN hash table moves to sctppcbinfo since indexes are unique across systems (including different VRFs) this makes it easier to do ifn lookups.
|
170181 |
01-Jun-2007 |
rrs |
- Take out the broken table-id concept. Panda Routers have a M-VRF concept that is NOT well thought out for a multi-homed transport protocol. So the useless table-id entries passed around need to be removed. - Add a event timer for the zero copy api. - Fix a bug in sctp_timer.c when searching for an alternate with the largest ssthresh (the compare was wrong).
|
170174 |
01-Jun-2007 |
jeff |
- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits.
Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
|
170153 |
31-May-2007 |
rwatson |
(1) In tcp_usrclosed(), tp can never become NULL, so don't test for NULL before handling the socket disconnection case.
(2) Clean up surrounding comments and formatting.
Found with: Coverity Prevent(tm) (1) CID: 2203
|
170140 |
30-May-2007 |
rrs |
- Fixed (Apple) compiler warnings in sctp_input.c, sctputil.c, sctp_output.c - Fixed a LOR in handling a cookie. Turns out create lock is applied. And if we abort processing, this causes LOR. Changed to force the timer to clean up, that way create lock is released.
|
170138 |
30-May-2007 |
rrs |
- Fix a memory overwrite when the mapping array is expanded, size of expansion was not taken int consideration. - Fix so vtag hash is 1 bigger so that it modulo's out correctly, avoids a panic when restart with right modulo happens. - do not dereference stcb when control->do_not_ref_stcb is set - Fix up packet logging to not often use a lock and also to add to options. - Fix some logging option duplication in the sctputil.h
|
170099 |
29-May-2007 |
rrs |
Adds gcc attribute to prevent inlining of a function. If it goes inline we may well blow the stack if witness and such are enabled.
|
170094 |
29-May-2007 |
rrs |
- Fix spelling errors in comments per Ruslan (.. thanks... )
|
170091 |
29-May-2007 |
rrs |
- Fixes so we won't try to start a timer when we hold a wq lock for the iterator. Panda uses a silly recursive lock they hold through the timer. - Add poor mans wireshark compile option.. - Allocate and start using SCTP_M_XXX for all SCTP_MALLOC() calls. - sysctl now will get back the refcnt for viewing by onlookers.
Reviewed by: gnn
|
170078 |
28-May-2007 |
andre |
Make log messages more verbose and simpler to understand for non-experts. Update comments to be more conscious, verbose and fully reflect reality.
|
170058 |
28-May-2007 |
andre |
Fix indentation of the syncache_expand() section in tcp_input().
|
170056 |
28-May-2007 |
rrs |
- fixed autclose to not allow setting on 1-2-1 model. - bounded cookie-life to 1 second minimum in socket option set. - Delayed_ack_time becomes delayed_ack per new socket api document. - Improve port number selection, we now use low/high bounds and no chance of a endless loop. Only one call to random per bind as well. - fixes so set_peer_primary pre-screens addresses to be valid to this host. - maxseg did not allow setting on an assoc basis. We needed to thus track and use an association value instead of a inp value. - Fixed ep get of HB status to report back properly. - use settings flag to tell if assoc level hb is on off not the timer.. since the timer may still run if unconf address are present. - check for crazy ENABLE/DISABLE conditions. - set and get of pmtud (fixed path mtu) not always taking into account ovh. - Getting PMTU info on stcb only needs to return PMTUD_ENABLED if any net is doing PMTU discovery. - Panic or warning fixed to not do so when a valid ip frag is taking place. - sndrcvinfo appearing in both inp and stcb was full size, instead of the non-pad version. This saves about 92 bytes from each struct by carefully converting to use the smaller version. - one-2-one model get(maxseg) would always get ep value, never the tcb's value. - The delayed ack time could be under a tick, this fixes so it bounds it to at least 1 tick for platforms whos tick is more than a ms. - Fragment interleave level set to wrong default value. - Fragment interleave could not set level 0. - Defered stream reset was broken due to a guard check and ntohl issue. - Found two lock order reversals and fixed. - Tighten up address checking, if the user gives an address the sa_len had better be set properly. - Get asoc by assoc-id would return a locked tcb when it was asked not to if the tcb was in the restart hash. - sysctl to dig down and get more association details
Reviewed by: gnn
|
170055 |
28-May-2007 |
andre |
Refactor and rewrite in parts the SYN handling code on listen sockets in tcp_input():
o tighten the checks on allowed TCP flags to be RFC793 and tcp-secure conform o log check failures to syslog at LOG_DEBUG level o rearrange the code flow to be easier to follow o add KASSERTs to validate assumptions of the code flow
Add sysctl net.inet.tcp.syncache.rst_on_sock_fail defaulting to enable that controls the behavior on socket creation failure for a otherwise successful 3-way handshake. The socket creation can fail due to global memory shortage, listen queue limits and file descriptor limits. The sysctl allows to chose between two options to deal with this. One is to send a reset to the other endpoint to notify it about the failure (default). The other one is to ignore and treat the failure as a transient error and have the other endpoint retransmit for another try.
Reviewed by: rwatson (in general)
|
170030 |
27-May-2007 |
rwatson |
Normalize spelling and grammar in TCP hostcache comments.
|
170024 |
27-May-2007 |
rwatson |
In tcp_timer_2msl(), tp can never become NULL, so don't check it for NULL before entering tcp_trace().
Found with: Coverity Prevent(tm) CID: 1840
|
170019 |
27-May-2007 |
rwatson |
Don't assign sp to the value of s when we're about to assign it instead to s + strlen(s).
Found with: Coverity Prevent(tm) CID: 2243
|
169997 |
25-May-2007 |
andre |
The printf %b list in PRINT_TH_FLAGS has to be in octal numbering. Thus convert \8 to \10 and the warnings go away.
Pointed out by: sam, ru, thompsa
|
169914 |
23-May-2007 |
andre |
Add CWR back into the PRINT_TH_FLAGS list as gcc42 doesn't complain about \8 in a string anymore.
|
169913 |
23-May-2007 |
andre |
In tcp_log_addrs(): o add the hex output of the th_flags field to the example log line in comments o simplify the log line length calculation and make it less evil o correct the test for the length panic; the line isn't on the stack but malloc'ed
|
169686 |
18-May-2007 |
andre |
Be more restrictive with segment validity checks in syncache_expand() and log check failures to syslog at LOG_DEBUG level.
Always prefill the sc->sc_ts field to use it in the checks.
|
169685 |
18-May-2007 |
andre |
o Add syslog logging under LOG_DEBUG to various failures caused by bogus segments o Add more KASSERT()s o Update comments
|
169683 |
18-May-2007 |
andre |
Add tcp_log_addrs() function to generate and standardized TCP log line for use thoughout the tcp subsystem.
It is IPv4 and IPv6 aware creates a line in the following format:
"TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags <RST>"
A "\n" is not included at the end. The caller is supposed to add further information after the standard tcp log header.
The function returns a NUL terminated string which the caller has to free(s, M_TCPLOG) after use. All memory allocation is done with M_NOWAIT and the return value may be NULL in memory shortage situations.
Either struct in_conninfo || (struct tcphdr && (struct ip || struct ip6_hdr) have to be supplied.
Due to ip[6].h header inclusion limitations and ordering issues the struct ip and struct ip6_hdr parameters have to be casted and passed as void * pointers.
tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, void *ip6hdr)
Usage example:
struct ip *ip; char *tcplog;
if (tcplog = tcp_log_addrs(NULL, th, (void *)ip, NULL)) { log(LOG_DEBUG, "%s; %s: Connection attempt to closed port\n", tcplog, __func__); free(s, M_TCPLOG); }
|
169682 |
18-May-2007 |
jhb |
Fix statistical accounting for bytes and packets during sack retransmits.
MFC after: 1 week Submitted by: mohans
|
169664 |
17-May-2007 |
jinmei |
- Disabled responding to NI queries from a global address by default as specified in RFC4620. A new flag for icmp6_nodeinfo was added to enable the feature. - Also cleaned up the code so that the semantics of the icmp6_nodeinfo flags is clearer (i.e., defined specific macro names instead of using hard-coded values).
Approved by: gnn (mentor) MFC after: 1 week
|
169655 |
17-May-2007 |
rrs |
- Fixed 1-2-1 model to not worry about associd in sockopts - Fixed RTOinfo for bounding. - Fixed connect() to return ECONNREFUSED when an ABORT is received. - Added comments to direct Static Analysis not to look at some things it does not understand (comments are /* sa_ignore XXXXX */) - Bind when colliding was broken, missing not_found = 1 before checking to see if the port was in use caused endless bind loop. - Cookie life needs to be in milliseconds to conform to socket api. - Cookie life is not supposed to change if its 0, On the assoc level set we changed it to 0 opps. - Two more static analysis issues identified by the cisco tool. Null checks needed. - An issue for sendfile(). Need to validate the correct input argument. - When sending failed due to a no route to host, we leaked the mbuf chain failing to call m_freem(). - Fix #ifdef issue for getting hash block len when HAVE_SHA2 is NOT defined Reviewed by: gnn
|
169635 |
17-May-2007 |
oleg |
Unbreak IPv4 kernel build.
|
169625 |
16-May-2007 |
rwatson |
Remove leading spaces before tabs spotted thanks to silby using kwrite to read ip_input.c.
|
169613 |
16-May-2007 |
andre |
Remove now unused stuff forgotten in the previous commit.
|
169608 |
16-May-2007 |
andre |
Move TIME_WAIT related functions and timer handling from files other than repo copied tcp_subr.c into tcp_timewait.c#1.284:
tcp_input.c#1.350 tcp_timewait() -> tcp_twcheck()
tcp_timer.c#1.92 tcp_timer_2msl_reset() -> tcp_tw_2msl_reset() tcp_timer.c#1.92 tcp_timer_2msl_stop() -> tcp_tw_2msl_stop() tcp_timer.c#1.92 tcp_timer_2msl_tw() -> tcp_tw_2msl_scan()
This is a mechanical move with appropriate renames and making them static if used only locally.
The tcp_tw_2msl_scan() cleanup function is still run from the tcp_slowtimo() in tcp_timer.c.
|
169598 |
16-May-2007 |
dwmalone |
When verifying the IPv4 UDP checksum, don't overwrite the checksum value in the mbuf with the result of the calculation. Previously, if we chose to return an ICMP message, the quoted UDP checksum bytes would be different to what was sent.
PR: 112471 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 3 weeks
|
169541 |
13-May-2007 |
andre |
Complete the (mechanical) move of the TCP reassembly and timewait functions from their origininal place to their own files.
TCP Reassembly from tcp_input.c -> tcp_reass.c TCP Timewait from tcp_subr.c -> tcp_timewait.c
|
169482 |
11-May-2007 |
andre |
Drop everything that doesn't belong into this new file. It's neither functional not connected to the build yet.
|
169481 |
11-May-2007 |
andre |
Drop everything that doesn't belong into this new file. It's neither functional nor connected to the build yet.
|
169480 |
11-May-2007 |
andre |
Make the TCP timer callout obtain Giant if the network stack is marked as non-mpsafe.
This change is to be removed when all protocols are mp-safe.
|
169477 |
11-May-2007 |
andre |
Add the timestamp offset to struct tcptw so we can generate proper ACKs in TIME_WAIT state that don't get dropped by the PAWS check on the receiver.
|
169469 |
11-May-2007 |
rwatson |
Coalesce two identical UCB licenses into a single license instance with one set of copyright years.
White space and comment cleanup.
Export $FreeBSD$ via __FBSDID.
|
169467 |
11-May-2007 |
rwatson |
Minor white space and style cleanups.
|
169466 |
11-May-2007 |
rwatson |
White space and style cleanup.
|
169465 |
11-May-2007 |
rwatson |
Minor white space/style normalization.
|
169464 |
11-May-2007 |
rwatson |
Normalize style a bit: reduce pseudo-randomness of comment layout and white space. Remove 'register'.
|
169462 |
11-May-2007 |
rwatson |
Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.
|
169461 |
11-May-2007 |
rwatson |
Remove unneeded wrappers for in_setsockaddr() and in_setpeeraddr(), which used to exist so pcbinfo locks could be acquired, but are no longer required as a result of socket/pcb reference model refinements.
|
169457 |
10-May-2007 |
andre |
Fix an incorrect replace of a timer reference made during the TCP timer rewrite in rev. 1.132. This unmasked yet another bug that causes certain connections to get indefinately stuck in LAST_ACK state.
|
169454 |
10-May-2007 |
rwatson |
Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.
|
169420 |
09-May-2007 |
rrs |
Two major items here: - All printf that was surrounded by #ifdef SCTP_DEBUG moves to a macro that does all of this. This removes all printfs from the code and makes the code more portable and easier to read. - Static Analysis (cisco) - found a few bugs, but mostly we add checks for NULL pointers and such to make the tool happy. We now pass the Cisco SA tools checks except for where it does not understand tailq/lists. We still need to look at the coverity tools output too (this is like the cisco SA tool) and see if it wants us to fix any other items. Hopefully this will be the last major churn in the code other than bug fixes.
|
169417 |
09-May-2007 |
maxim |
o Fix style(9) bugs introduced in the last commit.
Pointed out by: bde
|
169405 |
09-May-2007 |
maxim |
o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build.
PR: kern/112517 Submitted by: vd
|
169382 |
08-May-2007 |
rrs |
- Copyright change, cisco's silly tool wants it to say: "Copyright (c) 2001-2007, by Cisco Systems," instead of *Copyright (c) 2001-2007, Cisco Systems,"
- Also fix a few straglers that were still in 2006.
|
169380 |
08-May-2007 |
rrs |
- Get rid of the sctp_inpcb_free() "magic numbers", now they are sensible defines that tell what you are directing the function to do.
|
169378 |
08-May-2007 |
rrs |
- Static analyisis fixes for cisco's commit (this is equivilant to the coverity tool.. may even be the same one.. not sure). - A bug in the way sctp_abort() and friends were setting the IP_CLOSE flag.. and NOT passing the last argument as a (,1)... so that things would get freed..
|
169352 |
08-May-2007 |
rrs |
- More macros for OS compatabilty - PR-SCTP would ignore FWD-TSN's above a rwnd's worth of TSN's (1 byte msgs).. this left the peer hopelessly out of sync.. or an attacker. So now we abort the assoc. - New IFN hash, also rename hashes to match addr/ifn now that the vrf has multiple. - Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default as defined in the Socket API ID. - Export MTU information via sysctl. - Vrf's need table id's. This is default for BSD, but may be other things later when BSD fully supports VRFs. - Additional stream reset bug (caught by cisco dev-test). - Additional validations for the address in sending a message (socket api). -------- and ----- - Fix association notifications not to give the active open side false notifications. - Fix so sendfile and SENDALL will work properly (missing flag to say socket sender is done). - Fix Bug that prevented COOKIES from being retransmitted. - Break out connectx into helper sub-models so that iox routines can reuse the helpers. - When an address is added during system init (non-dynamic mode) make sure that the "defer use" flag is not set. ** its compiling on XR now :-D **
Reviewed by: gnn
|
169350 |
07-May-2007 |
rwatson |
Rather than selectively zeroing fields in the tcp_debug structure throughout tcp_trace(), zero the entire structure up front.
Minor style fixes.
|
169349 |
07-May-2007 |
rwatson |
Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr() and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().
|
169348 |
07-May-2007 |
rwatson |
Minor style tweaks.
|
169347 |
07-May-2007 |
rwatson |
When setting up timewait state for a TCP connection, don't hold the socket lock over a crhold() of so_cred: so_cred is constant after socket creation, so doesn't require locking to read.
|
169318 |
06-May-2007 |
andre |
Remove unused requested_s_scale from struct tcpcb.
|
169317 |
06-May-2007 |
andre |
Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of a decdicated sack_enable int for this bool. Change all users accordingly.
|
169316 |
06-May-2007 |
andre |
o Remove redundant tcp reassembly check in header prediction code o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()
|
169315 |
06-May-2007 |
andre |
Reorder the TCP header prediction test to check for the most volatile values first to spend less time on a fallback to normal processing.
|
169314 |
06-May-2007 |
andre |
Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment and change it to a void function.
We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.
|
169309 |
06-May-2007 |
andre |
Fix two comments.
|
169295 |
06-May-2007 |
rrs |
Two bugs: - Locks were not being unlocked when an invalid size chunk is sent in. - When a notification comes in, we cannot use it to look up the fragment interleave stream information since its not on a stream.
|
169272 |
04-May-2007 |
rwatson |
Add global mutex tcp_debug_mtx, which will protect global TCP debugging state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace().
Move to ANSI C function header, correct prototype types so that short TCP state is no longer promoted to int unnecessarily.
Add comments.
MFC after: 3 weeks
|
169268 |
04-May-2007 |
rwatson |
Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the pcbinfo lock will be released as well, not just the pcb lock.
|
169254 |
04-May-2007 |
rrs |
Fixes a missing unlock in the one-2-one hash table, if it was full and a collision occured, then we would leave a inp locked. Also fixes a missing inp unlock if IPSEC was on and it failed during the attach. Bug found by Weongyo Jeong.
|
169245 |
04-May-2007 |
bz |
Add support for filtering on Routing Header Type 0 and Mobile IPv6 Routing Header Type 2 in addition to filter on the non-differentiated presence of any Routing Header.
MFC after: 3 weeks
|
169236 |
03-May-2007 |
rwatson |
sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
|
169208 |
02-May-2007 |
rrs |
- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave (0-2). - Codenomicon security test updates - length checks and such. - Bug in stream reset (2 actually). - setpeerprimary could unlock a null pointer, fixed. - Added a flag in the pcb so netstat can see if we are listening easier.
Obtained from: (some of the Listen changes from Weongyo Jeong)
|
169179 |
01-May-2007 |
rwatson |
Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().
|
169154 |
30-Apr-2007 |
rwatson |
Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.
|
169149 |
30-Apr-2007 |
maxim |
o Kill EOLWS while I'm here.
|
169148 |
30-Apr-2007 |
maxim |
o Fix strtoul() error conditions check.
PR: kern/108211 Submitted by: Yong Tang MFC after: 2 weeks
|
168986 |
23-Apr-2007 |
andre |
o Fix INP lock leak in the minttl case o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()
|
168961 |
23-Apr-2007 |
rrs |
Fixes cut and paste bug using wrong pointer reference.
|
168945 |
22-Apr-2007 |
rrs |
Moves the PCB features and flags from sctp_pcb.h to sctp.h so that netstat can access and display these values.
|
168943 |
22-Apr-2007 |
rrs |
- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave.
|
168906 |
20-Apr-2007 |
andre |
o Remove unncessary TOF_SIGLEN flag from struct tcpopt o Correctly set to->to_signature in tcp_dooptions() o Update comments
|
168905 |
20-Apr-2007 |
andre |
Add more KASSERT's.
|
168904 |
20-Apr-2007 |
andre |
o Remove unused and redundant TCP option definitions o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN
|
168903 |
20-Apr-2007 |
andre |
Remove bogus check for accept queue length and associated failure handling from the incoming SYN handling section of tcp_input().
Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then.
Change return value of syncache_add() to void. No status communication is required.
|
168902 |
20-Apr-2007 |
andre |
Simplifly syncache_expand() and clarify its semantics. Zero is returned when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached.
For both cases an RST is sent back in tcp_input().
A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer.
Reported by: kris (the panic)
|
168901 |
20-Apr-2007 |
andre |
Only update TCP timestamp on SYN duplication if it is present on current SYN in syncache_add(). Otherwise disable timestamps.
|
168900 |
20-Apr-2007 |
andre |
o Plug memory leak in syncache_add() on MAC label allocation failure. o Simplify code flow with 'done' goto label. o Remove mbuf argument from syncache_respond(). It doesn't make use of it.
|
168859 |
19-Apr-2007 |
rrs |
- More work on making send lock contention. - Removed free-oqueue cache. - Fix counter for sq entries - Increased the amount of information retained on ASOC_TSN logging on the association. - Made it so with the ASOC_TSN logging on sending or recieving an abort we dump the log. - Went through and added invariant's around some panic's that needed them. - decrements went to atomic_subtact_int instead of add -1 - Removed residual count increment that threw off a strm oq count. - Tracks and complaints if we don't have a LAST fragment and clean up the sp structure. - Track a new stat that counts number of abandoned msgs that happen if you close without reading. - Fix lookup of frag point to be aware of a 0 assoc-id. Reviewed by: gnn
|
168845 |
18-Apr-2007 |
andre |
Make tcp_twrespond() use tcp_addoptions() instead of a home grown version.
|
168817 |
17-Apr-2007 |
andre |
When we run into the syncache entry limits syncache_add() tries to free the oldest entry in the current bucket row. The global entry limit may be smaller than the bucket rows and their limit combined however. Thus only try to free a syncache entry if we found one in this bucket row.
Reported by: kris
|
168812 |
17-Apr-2007 |
rwatson |
Shorten text string for ip_fw2 dynamic rules zone by removing the word "zone", which is generally not present in zone names. This reduces the incidence of line-wrapping in "vmstat -z " using 80-column displays.
MFC after: 3 days
|
168769 |
15-Apr-2007 |
rwatson |
Remove unused variable tcbinfo_mtx.
|
168757 |
15-Apr-2007 |
rrs |
Fix stupid syntax error - Pointy hat to me :-(
|
168755 |
15-Apr-2007 |
rrs |
- Add more comments to sctps_stats struture in sctp_uio.h - Fix bug that prevented EEOR mode from working and simplified the can_we_split code in the process. - Reduce lock contention for the tcb_send_lock. I did this especially for EEOR mode, still need to look at why I need a lock when removing from the tailq and the ->next is NOT null. A lock fixes it but it implies a bug yet exists. - Activated Andre's proposed changes to better use the mbuf infrastructure. - Fixed places that were not using the aloc macro's to take advantage of the per assoc cache. - Adds ifdef fix so any logging will enable stat_logging to get the right data structures in place (suggested by Max Laier).
|
168731 |
14-Apr-2007 |
mlaier |
Fix a typeo - unbreak the build.
|
168709 |
14-Apr-2007 |
rrs |
- fix source address selection when picking an acceptable address - name change of prefered -> preferred - CMT fast recover code added. - Comment fixes in CMT. - We were not giving a reason of cant_start_asoc per socket api if we failed to get init/or/cookie to bring up an assoc. Change so we don't just give a generic "comm lost" but look at actual states of dying assoc. - change "crc32" arguments to "crc32c" to silence strict/noisy compiler warnings when crc32() is also declared - A few minor tweaks to get the portable stuff truely portable for sctp6_usrreq.c :-D - one-2-one style vrf match problem. - window recovery would leave chks marked for retran during window probes on the sent queue. This would then cause an out-of-order problem and assure that the flight size "problem" would occur. - Solves a flight size logging issue that caused rwnd overruns, flight size off as well as false retransmissions.g - Macroize the up and down of flight size. - Fix a ECNE bug in its counting. - The strict_sacks options was causing aborts when window probing was active, fix to make strict sacks a bit smarter about what the next unsent TSN is. - Fixes a one-2-one wakeup bug found by Martin Kulas. - If-defed out form, Andre's copy routines pending his commit of at least m_last().. need to adjust for 6.2 as well.. since m_last won't exist. Reviewed by: gnn
|
168621 |
11-Apr-2007 |
ru |
Make "struct tcp_timer" visible only to the kernel, and unbreak world.
|
168615 |
11-Apr-2007 |
andre |
Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered.
Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout.
The single new callout is a mutex callout on inpcb simplifying the locking a bit.
tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions.
Reviewed by: rwatson (earlier version)
|
168590 |
10-Apr-2007 |
rwatson |
Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser checks to see whether bind() can reuse a port/address combination while it's already in use (for some definition of use).
|
168459 |
07-Apr-2007 |
piso |
Prevent the usage of an uninitialized variable: do not accept StartMediaTx message before an OpnRcvChnAck message was received.
Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days Found with: Coverity Prevent(tm) CID: 498
|
168458 |
07-Apr-2007 |
piso |
Silence Coverity about an unused variable.
Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days CID: 538
|
168369 |
04-Apr-2007 |
andre |
Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some further INP_INFO_WLOCK_ASSERT() while there.
|
168368 |
04-Apr-2007 |
andre |
Move last tcpcb initialization for the inbound connection case from tcp_input() to syncache_socket() where it belongs and the majority of it already happens.
The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.
|
168365 |
04-Apr-2007 |
andre |
Some local and style(9) cleanups.
|
168364 |
04-Apr-2007 |
andre |
Retire unused TCP_SACK_DEBUG.
|
168363 |
04-Apr-2007 |
andre |
In tcp_dooptions() skip over SACK options if it is a SYN segment.
|
168346 |
04-Apr-2007 |
kan |
Include string.h for non-kernel builds to get proper memcpy prototype.
|
168344 |
04-Apr-2007 |
kan |
Include string.h for non-kernel builds to get proper strcpy, strlen prototypes.
|
168342 |
04-Apr-2007 |
kan |
Do not assign result of (char *) cast to u_char * variable.
|
168328 |
03-Apr-2007 |
julian |
Since we switched to using monatomically increasing timestamps, they have been reported back to the userland as being in 1970. Add boot time to the timestamp to give the time in the scale of the 'current' real timescale. Not perfect if you change the time a lot but good enough to keep all the rules correct relative to each other correct in terms of time relative to "now".
|
168299 |
03-Apr-2007 |
rrs |
- fixed several places where we did not release INP locks. - fixed a refcount bug in the new ifa structures. - use vrf's from default stcb or inp whenever possible. - Address limits raised to account for a full IP fragmented packet (1000 addresses). - flight size correcting updated to include one message only and to handle case where the peer does not cumack the next segment aka lists 1/1 in sack blocks.. - Various bad init/init-ack handling could cause a panic since we tried to unlock the destroyed mutex. Fixes so we properly exit when we need to destroy an assoc. (Found by Cisco DevTest team :D) - name rename in src-addr-selection from pass to sifa. - route structure typedef'd to allow different platforms and updated into sctp_os_bsd file. - Max retransmissions a chunk can be made added. Reviewed by: gnn
|
168124 |
31-Mar-2007 |
rrs |
- Found bug in min split point bundling which caused incorrect, non-bundlable fragmentation. - Added min residual to better control split points for both how big a msg must be as well as how much needs to be left over. - With our new algo in place, we need to implicitly set "end of msg" on the sp-> structure otherwise we end up with "hung" associations. - Room reserved up front in IP header by pushing IP header to back of mbuf. - Fix so FR's peg count of retransmissions needed. - Fix so an unlucky chunk that never gets across will kill the assoc via the kill timer and send an abort too. - Fix bug in sctp_input which can result in a crash. - Do not strip off IP options anymore. - Clean up sctp_calculate_rto(). - Get rid of unused sysctl. - Fixed so we discard all M-Cast - Fixed so port check done AFTER checksum - Fixed bug in fragmentation code that prevented us from fragmenting a small complete message when we needed to. - Window probes were not marked back to unsent and flight adjusted when a sack came in with no window change or accepting of the probe data. We now fix this with having a mark on the net and the chunk so we can clear it out when the sack arrives forcing it to retran just like it was "new" this improves the handling of window probes, which were dropped by the receiver. - Tighten AUTH protocol error checks during INIT/INIT-ACK exchange
|
168032 |
29-Mar-2007 |
bms |
Fix a bug in IPv4 address configuration exposed by refcounting. * Join the IPv4 all-hosts multicast group 224.0.0.1 once only; that is, when an IPv4 address is first configured on an interface. * Do not join it for subsequent IPv4 addresses as this violates IGMP. * Be sure to leave the group when all IPv4 addresses have been removed from the interface. * Add two DIAGNOSTIC printfs related to the issue.
Further care and attention is needed in this area; it is suggested that netinet's attachment to the ifnet structure be compartmentalized and non-implicit.
Bug found by: andre MFC after: 1 month
|
167989 |
28-Mar-2007 |
andre |
When blackholing do a 'dropunlock' in the new world order to prevent the INP_INFO_LOCK from leaking.
Reported by: ache Found by: rwatson
|
167960 |
28-Mar-2007 |
rwatson |
Remove stale comment about not enabling inpcb and inpcbinfo lock assertions when IPv6 is enabled.
MFC after: 3 days
|
167888 |
25-Mar-2007 |
andre |
In tcp_sack_doack() remove too tight KASSERT() added in last revision. This function may be called without any TCP SACK option blocks present. Protect iteration over SACK option blocks by checking for SACK options present flag first.
Bug reported by: wkoszek, keramida, Nicolas Blais
|
167886 |
25-Mar-2007 |
rwatson |
Replace a comment about RSVP/mrouting with a different but similar comment explaining that some more locking is needed. The routing pieces are done, but there is an interlocking issue between optionally compiled code and mandatory code.
Spotted by: kris
|
167873 |
24-Mar-2007 |
maxim |
o Use a define for a buffer size.
Prodded by: db
o Add missed vars for TCPDEBUG in tcp_do_segment().
Prodded by: tinderbox
|
167839 |
23-Mar-2007 |
andre |
Split tcp_input() into its two functional parts:
o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic.
Change all KASSERT() messages to ("%s: ", __func__).
The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made.
Discussed with: rwatson
|
167834 |
23-Mar-2007 |
andre |
Tidy up some code to conform better to surroundings and style(9), 0 = NULL and space/tab.
|
167833 |
23-Mar-2007 |
andre |
Bring SACK option handling in tcp_dooptions() in line with all other options and ajust users accordingly.
|
167831 |
23-Mar-2007 |
bms |
Purge two redundant case labels.
|
167796 |
22-Mar-2007 |
glebius |
Remove global list of all llinfo_arp entries and use a callout per instance expiry of the ARP entries. Since we no longer abuse the IPv4 radix head lock, we can now enter arp_rtrequest() with a lock held on an arbitrary rt_entry.
Reviewed by: bms
|
167785 |
21-Mar-2007 |
andre |
ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.
|
167784 |
21-Mar-2007 |
andre |
Match up SYSCTL declarations in style.
|
167780 |
21-Mar-2007 |
andre |
Subtract optlen in the maximum length check for TSO and finally avoid slightly oversized TSO mbuf chains.
Submitted by: kmacy
|
167779 |
21-Mar-2007 |
andre |
Tidy up IPFIREWALL_FORWARD sections and comments.
|
167778 |
21-Mar-2007 |
andre |
Update and clarify comments in first section of tcp_input().
|
167777 |
21-Mar-2007 |
andre |
Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove old dead T/TCP code.
|
167775 |
21-Mar-2007 |
andre |
Tidy up tcp_log_in_vain and blackhole.
|
167774 |
21-Mar-2007 |
andre |
Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.
|
167772 |
21-Mar-2007 |
andre |
Remove tcp_minmssoverload DoS detection logic. The problem it tried to protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.
|
167739 |
20-Mar-2007 |
bms |
Increase default size of raw IP send and receive buffers to the same as udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB) are unnecessarily fragmented.
A common use case for this is OSPF link-state database synchronization during adjacency bringup on a high speed network with a large MTU.
It is not possible to auto-tune this setting until a socket is bound to a given interface, and because the laddr part of the inpcb tuple may be overridden, it makes no sense to do so. Applications may request a larger socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options.
Certain applications such as Quagga ospfd do not probe for interface MTU and therefore do not increase SO_SENDBUF in this use case. XORP is not affected by this problem as it preemptively uses SO_SENDBUF and SO_RECVBUF to account for any possible additional latency in XRL IPC.
PR: kern/108375 Requested by: Vladimir Ivanov MFC after: 1 week
|
167736 |
20-Mar-2007 |
rrs |
- window update sacks sent incorrectly after shutdown which caused extra abort from peer. - RTT time calculation was not being done in express sack handling since it refered to an unused variable (rto_pending). Removed variable. - socket buffer high water access macro-ized.
|
167729 |
20-Mar-2007 |
bms |
Implement reference counting for ifmultiaddr, in_multi, and in6_multi structures. Detect when ifnet instances are detached from the network stack and perform appropriate cleanup to prevent memory leaks.
This has been implemented in such a way as to be backwards ABI compatible. Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti() is unable to detect interface removal by design, as it performs searches on structures which are removed with the interface.
With this architectural change, the panics FreeBSD users have experienced with carp and pfsync should be resolved.
Obtained from: p4 branch bms_netdev Reviewed by: andre Sponsored by: Garance A Drosehn Idea from: NetBSD MFC after: 1 month
|
167721 |
19-Mar-2007 |
andre |
Match up SYSCTL declaration style.
|
167718 |
19-Mar-2007 |
andre |
Match up SYSCTL_INT declarations in style.
|
167715 |
19-Mar-2007 |
andre |
Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets.
Adjust tcp_output() make use of it.
Tested by: gallatin
|
167698 |
19-Mar-2007 |
rrs |
Adds a hash table to speed local address lookup on a per VRF basis (BSD has only one VRF currently). Hash table is sized to 16 but may need to be adjusted for machines with large numbers of addresses. Reviewed by: gnn
|
167695 |
19-Mar-2007 |
rrs |
- errno -> becomes error in sctp_output.c and sctputil.c - SB_CLEAR macro defined and used for sb clearing. - Fix for CMT express_sack_handling did not do proper pseudo-cumack updates. - Get rid of extraneous function that was never used ip_2_ip6_hdr() - Fixed source address selection bug (initialization problem). - Source address selection debug added.
|
167682 |
18-Mar-2007 |
bms |
In IPv4 fast forwarding path, send ICMP unreachable messages for routes which have RTF_REJECT set *and* a zero expiry timer.
PR: kern/109246 MFC after: 10 days Submitted by: Ingo Flaschberger
|
167659 |
17-Mar-2007 |
andre |
Unbreak IPv6 after consolidation of TCP options insertion.
Submitted by: tegge
|
167658 |
17-Mar-2007 |
kmacy |
Fix the most obvious of the bugs introduced by recent syncache changes
- *ip is not initialized in the case of inet6 connection, but ip->ip_len is being changed anyway
Now the question is, why does it think an ipv4 connection is an ipv6 connection? xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.
|
167636 |
16-Mar-2007 |
rwatson |
Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl.
|
167606 |
15-Mar-2007 |
andre |
Consolidate insertion of TCP options into a segment from within tcp_output() and syncache_respond() into its own generic function tcp_addoptions().
tcp_addoptions() is alignment agnostic and does optimal packing in all cases.
In struct tcpopt rename to_requested_s_scale to just to_wscale.
Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled."
Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005
|
167598 |
15-Mar-2007 |
rrs |
- Sysctl's move to seperate file - moved away from ifn/ifa access to sctp_ifa/sctp_ifn built and managed by the add-ip code. - cleaned up add-ip code to use the iterator - made iterator be a thread, which enables auto-asconf now. - rewrote and cleaned up source address selection (also made it use new structures). - Fixed a couple of memory leaks. - DACK now settable as to how many packets to delay as well as time. - connectx() to latest socket API, new associd arg. - Fixed issue with revoking and loosing potential to send when we inflate the flight size. We now inflate the cwnd too and deflate it later when the revoked chunk is sent or acked. - Got rid of some temp debug code - src addr selection moved to a common file (sctp_output.c) - Support for simple VRF's (we have support for multi-vfr via compile switch that is scrubbed from BSD but we won't need multi-vrf until we first get VRF :-D) - Rest of mib work for address information now done - Limit number of addresses in INIT/INIT-ACK to a #def (30).
Reviewed by: gnn
|
167593 |
15-Mar-2007 |
bms |
Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address is within the locally scoped multicast range 224.0.0.0/24.
|
167342 |
08-Mar-2007 |
bms |
Fix IP_SENDSRCADDR semantics.
* To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time.
* If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous.
* If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup().
Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days
|
167310 |
07-Mar-2007 |
qingli |
This patch is provided to fix a couple of deployment issues observed in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time .
The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments.
Reviewed by: silby, Mike Karels MFC after: 1 week
|
167205 |
04-Mar-2007 |
bms |
Purge an out-of-date comment.
|
167141 |
01-Mar-2007 |
bms |
Fix undirected broadcast sends for the case where SO_DONTROUTE has also been set at the socket layer, in our somewhat convoluted IPv4 source selection logic in ip_output().
IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255 must always be delivered on a local link with a TTL of 1.
If IP_ONESBCAST has been set at the socket layer, also perform destination interface lookup for point-to-point interfaces based on the destination address of the link; previously it was not possible to use the option with such interfaces; also, the destination/broadcast address fields map to the same field within struct ifnet, which doesn't help matters.
One more valid fix going forward for these issues is to treat 255.255.255.255 as a destination in its own right in the forwarding trie. Other implementations do this. It fits with the use of multiple paths, though it then becomes necessary to specify interface preference. This hack will eventually go away when that comes to pass.
Reviewed by: andre MFC after: 1 week
|
167139 |
01-Mar-2007 |
andre |
Prevent TSO mbuf chain from overflowing a few bytes by subtracting the TCP options size before the TSO total length calculation.
Bug found by: kmacy
|
167120 |
28-Feb-2007 |
mohans |
In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss(). The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
|
167116 |
28-Feb-2007 |
bms |
Style: Move declaration of subsystem mutex to where other mutexes are in this file, and use macros for dealing with it.
|
167107 |
28-Feb-2007 |
glebius |
Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't be returned up to the caller.
PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms
|
167106 |
28-Feb-2007 |
glebius |
Toss the code, that handles errors from ip_output(), to make it more readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors.
Reviewed by: rwatson, bms
|
167072 |
27-Feb-2007 |
bms |
Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support.
Obtained from: OpenSolaris
|
167036 |
26-Feb-2007 |
mohans |
Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default.
Reviewed by: gnn, silby.
|
166972 |
25-Feb-2007 |
bms |
Unlock a mutex which should be unlocked before returning.
MFC after: 1 week
|
166938 |
24-Feb-2007 |
bms |
Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel. It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko, if and only if IPv6 support is enabled for loadable modules. Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
|
166842 |
20-Feb-2007 |
rwatson |
Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain.
MFC after: 1 week
|
166841 |
20-Feb-2007 |
rwatson |
Gratuitous UDP restyling toward style(9) in 7.x.
|
166811 |
18-Feb-2007 |
rwatson |
#ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed with minor adjustments.
Submitted by: Florian C. Smeets <flo at kasimir dot com>
|
166807 |
17-Feb-2007 |
rwatson |
Add "show inpcb", "show tcpcb" DDB commands, which should come in handy for debugging sblock and other network panics.
|
166793 |
16-Feb-2007 |
rwatson |
Remove unused inp6_ifindex field from inpcb, as well as unused macro shortcut for it.
|
166792 |
16-Feb-2007 |
rwatson |
Remove unused in6p_ip6_hlim macro shortcut for non-present inp_depend6.inp6_hlim field in the inpcb.
|
166675 |
12-Feb-2007 |
rrs |
- Copyright updates (aka 2007) - ZONE get now also take a type cast so it does the cast like mtod does. - New macro SCTP_LIST_EMPTY, which in bsd is just LIST_EMPTY - Removal of const in some of the static hmac functions (not needed) - Store length changes to allow for new fields in auth - Auth code updated to current draft (this should be the RFC version we think). - use uint8_t instead of u_char in LOOPBACK address comparison - Some u_int32_t converted to uint32_t (in crc code) - A bug was found in the mib counts for ordered/unordered count, this was fixed (was referencing a freed mbuf). - SCTP_ASOCLOG_OF_TSNS added (code will probably disappear after my testing completes. It allows us to keep a small log on each assoc of the last 40 TSN's in/out and stream assignment. It is NOT in options and so is only good for private builds. - Some CMT changes in prep for Jana fixing his problem with reneging when CMT is enabled (Concurrent Multipath Transfer = CMT). - Some missing mib stats added. - Correction to number of open assoc's count in mib - Correction to os_bsd.h to get right sha2 macros - Add of special AUTH_04 flags so you can compile the code with the old format (in case the peer does not yet support the latest auth code). - Nonce sum was incorrectly being set in when ecn_nonce was NOT on. - LOR in listen with implicit bind found and fixed. - Moved away from using mbuf's for socket options to using just data pointers. The mbufs were used to harmonize NetBSD code since both Net and Open used this method. We have decided to move away from that and more conform to FreeBSD style (which makes more sense). - Very very nasty bug found in some of my "debug" code. The cookie_how collision case tracking had an endless loop in it if you got a second retransmission of a cookie collision case. This would lock up a CPU .. ugly.. - auth function goes to using size_t instead of int which conforms to socketapi better - Found the nasty bug that happens after 9 days of testing.. you get the data chunk, deliver it and due to the reference to a ch-> that every now and then has been deleted (depending on the postion in the mbuf) you have an invalid ch->ch.flags.. and thus you don't advance the stream sequence number.. so you block the stream permanently. The fix is to make local variables of these guys and set them up before you have any chance of trimming the mbuf. - style fix in sctp_util.h, not sure how this got bad maybe in the last patch? (aka it may not be in the real source). - Found interesting bug when using the extended snd/rcv info where we would get an error on receiving with this. Thats because it was NOT padded to the same size as the snd_rcv info. We increase (add the pad) so the two structs are the same size in sctp_uio.h - In sctp_usrreq.c one of the most common things we did for socket options was to cast the pointer and validate the size. This as been macro-ized to help make the code more readable. - in sctputil.c two things, the socketapi class found a missing flag type (the next msg is a notification) and a missing scope recovery was also fixed.
Reviewed by: gnn
|
166629 |
10-Feb-2007 |
bms |
Use MAXTTL.
Obtained from: NetBSD
|
166623 |
10-Feb-2007 |
bms |
If the rendezvous point for a group is not specified, do not send IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon, as an optimization to mitigate the effects of high multicast forwarding load.
This is an experimental change, therefore it must be explicitly enabled by setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value. The tunable may be set from the loader or from within the kernel environment when loading ip_mroute.ko as a module.
Submitted by: edrt <edrt at citiz.net> See also: http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html
|
166622 |
10-Feb-2007 |
bms |
Build PIM by default as part of the IPv4 multicast forwarding path. Make PIM dynamically loadable by using encap_attach_func(). PIM may now be loaded into a GENERIC kernel.
Tested with: ports/net/pimdd && tcpreplay && wireshark Reviewed by: Pavlin Radoslavov
|
166576 |
08-Feb-2007 |
bms |
Store the cached route in vifp in the normal send_packet() case. The VIFF_TUNNEL case no longer exists, therefore this field is free to use, and its use eliminates a static data member.
|
166575 |
08-Feb-2007 |
bms |
Nuke the token bucket filter code. Attempting to request rate limiting by the token bucket filter will result in EINVAL being returned.
If you want to rate-limit traffic in future, use ALTQ or dummynet; this isn't a general purpose QoS engine.
Preserve the now unused fields in struct vif so as to avoid having to recompile netstat(1) and other tools.
Reviewed by: Pavlin Radslavov, Bill Fenner
|
166555 |
07-Feb-2007 |
bms |
eliminate redundant macro MC_SEND()
|
166549 |
07-Feb-2007 |
bms |
Remove support for IPIP tunnels in IPv4 multicast forwarding. XORP has never used them; with mrouted, their functionality may be replaced by explicitly configuring gif(4) instances and specifying them with the 'phyint' keyword.
Bump __FreeBSD_version to 700030, and update UPDATING. A doc update is forthcoming.
Discussed on: net Reviewed by: fenner MFC after: 3 months
|
166507 |
05-Feb-2007 |
bms |
When fast-forwarding is enabled, do not forward directed IPv4 broadcasts to locally attached broadcast networks.
Note well: This relies on the layer 2 route cloning behaviour in BSD.
PR: 98799 Tested by: Dmitry Sergienko MFC after: 1 week
|
166479 |
03-Feb-2007 |
alc |
Include opt_ipdivert.h so that the message announcing ipfw correctly describes the state of IPDIVERT.
|
166452 |
03-Feb-2007 |
bms |
In fast forwarding path, defer processing of 169.254.0.0/16 to ip_input(). See RFC 3927 section 2.7.
|
166450 |
03-Feb-2007 |
bms |
In regular forwarding path, reject packets destined for 169.254.0.0/16 link-local addresses. See RFC 3927 section 2.7.
|
166436 |
02-Feb-2007 |
bms |
Comply with RFC 3927, by forcing ARP replies which contain a source address within the link-local IPv4 prefix 169.254.0.0/16, to be broadcast at link layer.
Reviewed by: fenner MFC after: 2 weeks
|
166433 |
02-Feb-2007 |
bms |
Expose smoothed RTT and RTT variance measurements to userland via socket option TCP_INFO. Note that the units used in the original Linux API are in microseconds, so use a 64-bit mantissa to convert FreeBSD's internal measurements from struct tcpcb from ticks.
|
166423 |
02-Feb-2007 |
glebius |
Since rev. 1.94 of netinet/in.c, the netinet layer frees all its multicast memberships, when interface is detached. Thus, when an underlying interface is detached, we do not need to free our multicast memberships.
Reviewed by: bms
|
166405 |
01-Feb-2007 |
andre |
Auto sizing TCP socket buffers.
Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around.
With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions.
FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size.
New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit)
Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month
|
166403 |
01-Feb-2007 |
andre |
Change the way the advertized TCP window scaling is computed. Instead of upper-bounding it to the size of the initial socket buffer lower-bound it to the smallest MSS we accept. Ideally we'd use the actual MSS information here but it is not available yet.
For socket buffer auto sizing to be effective we need room to grow the receive window. The window scale shift is determined at connection setup and can't be changed afterwards. The previous, original, method effectively just did a power of two roundup of the socket buffer size at connection setup severely limiting the headroom for larger socket buffers.
Tested by: many (as part of the socket buffer auto sizing patch) MFC after: 1 month
|
166368 |
31-Jan-2007 |
bms |
Import macros IN_LINKLOCAL(), IN_PRIVATE(), IN_LOCAL_GROUP(), IN_ANY_LOCAL(). This is not a functional change.
IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix. IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix. IN_LOCAL_GROUP() tests if an address falls within the statically assigned link-local multicast scope specified in RFC 2365. IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP().
As with the existing macros in the FreeBSD netinet stack, comparisons are performed in host-byte order.
See also: RFC 1918, RFC 2365, RFC 3927 Obtained from: NetBSD (dyoung@) MFC after: 2 weeks
|
166228 |
25-Jan-2007 |
glebius |
Make it possible that carpdetach() unlocks on return. Then, in carp_clone_destroy() we are on a safe side, we don't need to unlock the cif, that can me already non-existent at this point.
Reported by: Anton Yuzhaninov <citrin rambler-co.ru>
|
166226 |
25-Jan-2007 |
glebius |
Spacing.
|
166086 |
18-Jan-2007 |
rrs |
- most all includes (#include <>) migrate to the sctp_os_bsd.h file - Finally all splxx() are removed - Count error fixed in mapping array which might cause a wrong cumack generation. - Invariants around panic for case D + printf when no invariants. - one-to-one model race condition fixed by using a pre-formed connection and then completing the work so accept won't happen on a non-formed association. - Some additional paranoia checks in sctp_output. - Locks that were missing in the accept code.
Approved by: gnn
|
166023 |
15-Jan-2007 |
rrs |
- Macroizes the V6ONLY flag check. - Added a short time wait (not used yet) constant - Corrected the type of the crc32c table (it was unsigned long and really is a uint32_t - Got rid of the user of MHeaders until they are truely needed by lower layers. - Fixed an initialization problem in the readq structure (ordering was off). - Found yet another collision bug when the random number generator returns two numbers on one side (during a collision) that are the same. Also added some tracking of cookies that will go away when we know that we have the last collision bug gone. - Fixed an init bug for book_size_scale, that was causing Early FR code to run when it should not. - Fixed a flight size tracking bug that was associated with Early FR but due to above bug also effected all FR's - Fixed it so Max Burst also will apply to Fast Retransmit. - Fixed a bug in the temporary logging code that allowed a static log array overflow - hashinit_flags is now used. - Two last mcopym's were converted to the macro sctp_m_copym that has always been used by all other places - macro sctp_m_copym was converted to upper case. - We now validate sinfo_flags on input (we did not before). - Fixed a bug that prevented a user from sending data and immediately shuting down with one send operation. - Moved to use hashdestroy instead of free() in our macros. - Fixed an init problem in our timed_wait vtag where we did not fully initialize our time-wait blocks. - Timer stops were re-positioned. - A pcb cleanup method was added, however this probably will not be used in BSD.. unless we make module loadable protocols - I think this fixes the mysterious timer bug.. it was a ordering of locks problem in the way we did timers. It now conforms to the timeout(9) manual (except for the _drain part, we had to do this a different way due to locks). - Fixed error return code so we get either CONNREUSED or CONNRESET depending on where one is in progression - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. Approved by: gnn
|
166010 |
14-Jan-2007 |
maxim |
o Increment requests counter right before send out an ARP query actually. Otherwise the code could lead to the spurious EHOSTDOWN errors.
PR: kern/107807 Submitted by: Dmitrij Tejblum MFC after: 1 month
|
165966 |
12-Jan-2007 |
imp |
Marking this as __packed was needed to get the alignment and offset of members right. However, it also said it was aligned(1), which meant that gcc generated really bad code. Mark this as aligned(4). This makes things a little faster on arm (a couple percent), but also saves about 30k on the size of the kernel for arm.
I talked about doing this with bde, but didn't check with him before the commit, so I'm hesitant say 'reviewed by: bde'.
|
165919 |
09-Jan-2007 |
julian |
Remove two lines that somehow snuck back in after testing. ip is now an argument to the function ipfw_log()
|
165831 |
06-Jan-2007 |
maxim |
o One more typo in the comment.
PR: kern/107609 Submitted by: Dr. Markus Waldeck
|
165802 |
05-Jan-2007 |
piso |
Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined.
Reviewed: luigi
|
165750 |
03-Jan-2007 |
piso |
Wrap ipfw nat support in a new kernel config option named "IPFIREWALL_NAT": this way nat is turned off by default and POLA is preserved.
Reviewed by: rwatson
|
165738 |
02-Jan-2007 |
julian |
Remove a bunch of dependencies in the IP header being the first thing in the mbuf. First moves toward being able to cope better with having layer 2 (or other encapsulation data) before the IP header in the packet being examined. More commits to come to round out this functionality. This commit should have no practical effect but clears the way for what is coming. Revirewed by: luigi, yar MFC After: 2 weeks
|
165710 |
01-Jan-2007 |
imp |
Fix typo in comment.
Submitted by: remko
|
165709 |
31-Dec-2006 |
imp |
Add comment about udp checksums being off in BSD 4.2 compatibility mode.
Submitted by: Dr. Markus Waldeck PR: kern/106657
|
165657 |
30-Dec-2006 |
jhb |
Whitespace fix and remove an extra cast.
|
165648 |
29-Dec-2006 |
piso |
Summer of Code 2005: improve libalias - part 2 of 2
With the second (and last) part of my previous Summer of Code work, we get:
-ipfw's in kernel nat
-redirect_* and LSNAT support
General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page.
To enable in kernel nat in rc.conf, two options were added:
o firewall_nat_enable: equivalent to natd_enable
o firewall_nat_interface: equivalent to natd_interface
Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased.
NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso).
Approved by: glebius (mentor)
|
165647 |
29-Dec-2006 |
rrs |
a) macro-ization of all mbuf and random number access plus timers. This makes the code more portable and able to change out the mbuf or timer system used more easily ;-) b) removal of all use of pkt-hdr's until only the places we need them (before ip_output routines). c) remove a bunch of code not needed due to <b> aka worrying about pkthdr's :-) d) There was one last reorder problem it looks where if a restart occur's and we release and relock (at the point where we setup our alias vtag) we would end up possibly getting the wrong TSN in place. The code that fixed the TSN's just needed to be shifted around BEFORE the release of the lock.. also code that set the state (since this also could contribute). Approved by: gnn
|
165634 |
29-Dec-2006 |
jhb |
Some whitespace nits and remove a few casts.
|
165243 |
15-Dec-2006 |
piso |
o made in kernel libalias mpsafe o fixed a comment o made in kernel libalias a bit less verbose (disabled automatic logging everytime a new link is added or deleted)
Approved by: glebius (mentor)
|
165220 |
14-Dec-2006 |
rrs |
1) Fixes on a number of different collision case LOR's. 2) Fix all "magic numbers" to be constants. 3) A collision case that would generate two associations to the same peer due to a missing lock is fixed. 4) Added tracking of where timers are stopped. Approved by: gnn
|
165149 |
13-Dec-2006 |
csjp |
Fix LOR between the syncache and inpcb locks when MAC is present in the kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed:
- Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels.
This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf.
These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create.
This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized.
Submitted by: andre [1] Discussed with: rwatson
[1] andre submitted the tcp_syncache.c changes
|
165123 |
12-Dec-2006 |
bz |
In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.
This is the "+ one more change" missed in the original commit.
Noticed by: tinderbox Pointy hat to: me (#1)
|
165118 |
12-Dec-2006 |
bz |
MFp4: 92972, 98913 + one more change
In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.
|
165082 |
10-Dec-2006 |
bms |
Back out revision 1.264.
Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic.
Reported by: Mark Tinguely
|
164863 |
04-Dec-2006 |
rwatson |
Improve style(9) conformance of igmp.c.
|
164808 |
01-Dec-2006 |
imp |
Make sure that carp_header is 36 bytes long
|
164798 |
01-Dec-2006 |
piso |
Make libalias.conf parsing a bit smarter. This closes PR kern/106112.
While here, add mbuf's #includes i forgot in the previous commit.
Approved by: gleb
|
164797 |
01-Dec-2006 |
piso |
Remove m_megapullup from ng_nat and put it under libalias.
Approved by: gleb
|
164768 |
30-Nov-2006 |
rwatson |
Consistently use #ifdef INET6 rather than mixing and matching with #if defined(INET6).
Don't comment the end of short #ifdef blocks.
Comment cleanup.
Line wrap.
|
164516 |
22-Nov-2006 |
sam |
Change error codes returned by protocol operations when an inpcb is marked INP_DROPPED or INP_TIMEWAIT: o return ECONNRESET instead of EINVAL for close, disconnect, shutdown, rcvd, rcvoob, and send operations o return ECONNABORTED instead of EINVAL for accept
These changes should reduce confusion in applications since EINVAL is normally interpreted to mean an invalid file descriptor. This change does not conflict with POSIX or other standards I checked. The return of EINVAL has always been possible but rare; it's become more common with recent changes to the socket/inpcb handling and with finer-grained locking and preemption.
Note: there are other instances of EINVAL for this state that were left unchanged; they should be reviewed.
Reviewed by: rwatson, andre, ru MFC after: 1 month
|
164258 |
13-Nov-2006 |
bz |
Add SCTP as a known upper layer protocol over v6. We are not yet aware of the protocol internals but this way SCTP traffic over v6 will not be discarded.
Reported by: Peter Lei via rrs Tested by: Peter Lei <peterlei cisco.com>
|
164205 |
11-Nov-2006 |
rrs |
In a true restart case, the send_lock was not being aquired. This meant that when we cleanup the outbound we may have one in transit to be added with the old sequence number. This is bad since then we loose a message :(
Also the report_outbound needed to have the right lock when its called which it did not.. I added the lock with of course a flag since we want to have the lock before we call it in the restart case.
This also fixed the FIX ME case where, in the cookie collision case, we mark for retransmit any that were bundled with the cookie that was dropped. This also means changes to the output routine so we can assure getting the COOKIE-ACK sent BEFORE we retransmit the Data.
Approved by: gnn
|
164181 |
11-Nov-2006 |
rrs |
Turns out we would reset the TSN seq counter during a colliding INIT. This if fine except when we have data outstanding... we basically reset it to the previous value it was.. so then we end up assigning the same TSN to two different data chunks. This patch:
1) Finds a missing lock for when we change the stream numbers during COOKIE and INIT-ACK processing.. we were NOT locking the send_buffer.. which COULD cause problems (found by inspection looking for <2>)
2) Fixes a case during a colliding INIT where we incorrectly reset the sending Sequence thus in some cases duplicately assigning a TSN.
3) Additional enhancments to logging so we can see strm/tsn in the receiver AND new tracking to watch what the sender is doing with TSN and STRM seq's.
Approved by: gnn
|
164144 |
10-Nov-2006 |
rrs |
This patch fixes a LOR that happens during INIT-ACK collision. We were calling select_a_tag() inside sctp_send_initate_ack(). During collision cases we have a stcb and thus a SCTP_LOCK. When we call select_a_tag it (below it) locks the INFO lock. We now 1) pre-select the nonce-tie-tags in sctputil.c during setup of a tcb. 2) In the other case where we have to select tags, we unlock after incr the ref cnt (so assoc won't go away0 and then do the tag selection followed by a relock and decr the refcnt. Approved by: gnn
|
164139 |
09-Nov-2006 |
rrs |
Fixes an issue with handling of stream reset. When a reset comes in we need to calculate the length and therefore the number of listed streams (if any) based on the TLV type. Otherwise if we get a retran we could in theory panic by sending a notification to a user with a incorrect list and thus no memory listing the streams. Found in IOS by devtest :-) Approved by: gnn
|
164085 |
08-Nov-2006 |
rrs |
-Fixes first of all the getcred on IPv6 and V4. The copy's were incorrect and so was the locking. -A bug was also found that would create a race and panic when an abort arrived on a socket being read from. -Also fix the reader to get MSG_TRUNC when a partial delivery is aborted. -Also addresses a couple of coverity caught error path memory leaks and a couple of other valid complaints Approved by: gnn
|
164075 |
07-Nov-2006 |
marcus |
Fix TFTP NAT support by making sure the appropriate fingerprinting checks are done.
Reviewed by: piso
|
164039 |
06-Nov-2006 |
rwatson |
Convert three new suser(9) calls introduced between when the priv(9) patch was prepared and committed to priv(9) calls. Add XXX comments as, in each case, the semantics appear to differ from the TCP/UDP versions of the calls with respect to jail, and because cr_canseecred() is not used to validate the query.
Obtained from: TrustedBSD Project
|
164038 |
06-Nov-2006 |
rrs |
This changes tracks down the EEOR->NonEEOR mode failure to wakeup on close of the sender. It basically moves the return (when the asoc has a reader/writer) further down and gets the wakeup and assoc appending (of the PD-API event) moved up before the return. It also moves the flag set right before the return so we can assure only once adding the PD-API events.
Approved by: gnn
|
164033 |
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
163998 |
05-Nov-2006 |
ru |
Revert previous commit, and instead make the expression in rev. 1.2 match the style of this file.
OK'ed by: rrs
|
163996 |
05-Nov-2006 |
rrs |
Tons of fixes to get all the 64bit issues removed. This also moves two 16 bit int's to become 32 bit values so we do not have to use atomic_add_16. Most of the changes are %p, casts and other various nasty's that were in the orignal code base. With this commit my machine will now do a build universe.. however I as yet have not tested on a 64bit machine .. it may not work :-(
|
163980 |
04-Nov-2006 |
ru |
Fix pointer arithmetic to be 64-bit friendly.
|
163979 |
04-Nov-2006 |
ru |
Remove bogus casts that Randall for some reason didn't borrow from my supplied patch.
|
163974 |
04-Nov-2006 |
jb |
Remove a bogus cast in an attempt to fix the tinderbox builds on lots of arches.
|
163964 |
03-Nov-2006 |
rrs |
More 64 bit pointer fun. %p changed in multiple prints the mtod() was also fixed.
|
163959 |
03-Nov-2006 |
rrs |
Fix two of the 64bit errors on the printfs.
|
163957 |
03-Nov-2006 |
rrs |
Somehow I missed this one. The sys/cdef.h was out of order with respect to the FSBID..
|
163954 |
03-Nov-2006 |
rrs |
Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I inserted a few to the new files.. but I falied to add the #include <sys/cdef.h>
Which causes a compile error.. sorry about that... got it now :-)
Approved by:gnn
|
163953 |
03-Nov-2006 |
rrs |
Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: **** peterlei@cisco.com tuexen@fh-muenster.de **** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0
So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP.
I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-)
There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too..
If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-)
Reviewed by: gnn Approved by: gnn
|
163758 |
29-Oct-2006 |
oleg |
- Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70 - Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input()
Reviewed by: glebius, rwatson
- purge_flow_set(): - Do not leak memory while purging queues which are not bound to pipe. - style(9) cleanup
MFC after: 2 months
|
163721 |
27-Oct-2006 |
oleg |
- Convert net.inet.ip.dummynet.curr_time net.inet.ip.dummynet.searches net.inet.ip.dummynet.search_steps to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs.
- Implement simple mechanics for dummynet(4) internal time correction. Under certain circumstances (system high load, dummynet lock contention, etc) dummynet's tick counter can be significantly slower than it should be. (I've observed up to 25% difference on one of my production servers). Since this counter used for packet scheduling, it's accuracy is vital for precise bandwidth limitation.
Introduce new sysctl nodes: net.inet.ip.dummynet. tick_lost - number of ticks coalesced by taskqueue thread. tick_adjustment - number of time corrections done. tick_diff - adjusted vs non-adjusted tick counter difference tick_delta - last vs 'standard' tick differnece (usec). tick_delta_sum - accumulated (and not corrected yet) time difference (usec).
Reviewed by: glebius MFC after: 2 month
|
163720 |
27-Oct-2006 |
oleg |
Use separate thread for servicing dummynet(4). Utilize taskqueue(9) API.
Submitted by: glebius MFC after: 2 month
|
163717 |
27-Oct-2006 |
oleg |
style(9) cleanup.
MFC after: 2 month
|
163606 |
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
163548 |
21-Oct-2006 |
julian |
revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.
|
163545 |
20-Oct-2006 |
julian |
Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.
|
163237 |
11-Oct-2006 |
maxim |
o Do not do args->f_id.addr_type == 6 when there is IS_IP6_FLOW_ID() exactly for that.
|
163236 |
11-Oct-2006 |
maxim |
o Kill a nit in the comment.
|
163235 |
11-Oct-2006 |
maxim |
o Extend not very informative ipfw(4) message 'drop session, too many entries' by src:port and dst:port pairs. IPv6 part is non-functional as ``limit'' does not support IPv6 flows.
PR: kern/103967 Submitted by: based on Bruce Campbell patch MFC after: 1 month
|
163224 |
11-Oct-2006 |
ru |
Merge the rest of my changes.
|
163127 |
08-Oct-2006 |
piso |
Various mdoc and grammar fixes.
Approved by: glebius Reviewed by: glebius, ru
|
163069 |
07-Oct-2006 |
bz |
Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2.
PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days
|
163006 |
04-Oct-2006 |
glebius |
Save space on stack moving token ring stuff to its own hack block.
|
163005 |
04-Oct-2006 |
glebius |
Style rev. 1.152.
|
162798 |
29-Sep-2006 |
andre |
Remove stone-aged and irrelevant "#ifndef notdef".
|
162797 |
29-Sep-2006 |
bms |
Nits.
Submitted by: ru
|
162794 |
29-Sep-2006 |
bms |
Push removal of mrouted down to the rest of the tree.
|
162768 |
29-Sep-2006 |
maxim |
o Convert w/spaces to tabs in the previous commit.
|
162767 |
29-Sep-2006 |
silby |
Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5, scale it to min(ephemeral port range / 2, maxsockets / 5) so that people with large gobs of memory and/or large maxsockets settings will not exhaust their entire ephemeral port range with sockets in the TIME_WAIT state during periods of heavy load.
Those who wish to tweak the size of the TIME_WAIT zone can still do so with net.inet.tcp.maxtcptw.
Reviewed by: glebius, ru
|
162739 |
28-Sep-2006 |
andre |
When tcp_output() receives an error upon sending a packet it reverts parts of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time.
The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen.
Discussed with: glebius PR: kern/25986 PR: kern/102653
|
162725 |
28-Sep-2006 |
andre |
When doing TSO correctly do the check to prevent a maximum sized IP packet from overflowing.
|
162719 |
28-Sep-2006 |
bms |
Fix the IPv4 multicast routing detach path. On interface detach whilst the MROUTER is running, the system would panic as described in the PR.
The fix in the PR is a good start, however, the other state associated with the multicast forwarding cache has to be freed in order to avoid leaking memory and other possible panics.
More care and attention is needed in this area.
PR: kern/82882 MFC after: 1 week
|
162718 |
28-Sep-2006 |
bms |
The IPv4 code should clean up multicast group state when an interface goes away. Without this change, it leaks in_multi (and often ether_multi state) if many clonable interfaces are created and destroyed in quick succession.
The concept of this fix is borrowed from KAME. Detailed information about this behaviour, as well as test cases, are available in the PR.
PR: kern/78227 MFC after: 1 week
|
162685 |
27-Sep-2006 |
piso |
Compilation.
|
162674 |
26-Sep-2006 |
piso |
Summer of Code 2005: improve libalias - part 1 of 2
With the first part of my previous Summer of Code work, we get:
-made libalias modular:
-support for 'particular' protocols (like ftp/irc/etcetc) is no more hardcoded inside libalias, but it's available through external modules loadable at runtime
-modules are available both in kernel (/boot/kernel/alias_*.ko) and user land (/lib/libalias_*)
-protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp, skinny and smedia
-added logging support for kernel side
-cleanup
After a buildworld, do a 'mergemaster -i' to install the file libalias.conf in /etc or manually copy it.
During startup (and after every HUP signal) user land applications running the new libalias will try to read a file in /etc called libalias.conf: that file contains the list of modules to load.
User land applications affected by this commit are ppp and natd: if libalias.conf is present in /etc you won't notice any difference.
The only kernel land bit affected by this commit is ng_nat: if you are using ng_nat, and it doesn't correctly handle ftp/irc/etcetc sessions anymore, remember to kldload the correspondent module (i.e. kldload alias_ftp).
General information and details about the inner working are available in the libalias man page under the section 'MODULAR ARCHITECTURE (AND ipfw(4) SUPPORT)'.
NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat support will be part of the next libalias-related commit.
Approved by: glebius Reviewed by: glebius, ru
|
162642 |
26-Sep-2006 |
jmg |
fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...
|
162627 |
25-Sep-2006 |
bms |
Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set.
PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week
|
162625 |
25-Sep-2006 |
bms |
Spleling
Submitted by: pjd
|
162615 |
25-Sep-2006 |
bms |
Account for output IP datagrams on the ifaddr where they originated from, *not* the first ifaddr on the ifp. This is similar to what NetBSD does.
PR: kern/72936 Submitted by: alfred Reviewed by: andre
|
162612 |
25-Sep-2006 |
jmg |
if min is greater than max, prefer max over min... I managed to get a retransmit timer that was going to take 19 days to trigger...
Reviewed by: silby
|
162586 |
23-Sep-2006 |
jmg |
now that we don't automagicly increase the MTU of host routes, when we copy the loopback interface, copy it's mtu also.. This means that we again have large mtu support for local ip addresses...
|
162580 |
23-Sep-2006 |
bms |
Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'.
PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth
|
162376 |
17-Sep-2006 |
andre |
Make tcp_usr_send() free the passed mbufs on error in all cases as the comment to it claims.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
162351 |
16-Sep-2006 |
jhay |
Handle a list of IPv6 src and dst addresses correctly, eg. ipfw add allow ip6 from any to 2000::/16,2002::/16
PR: 102422 (part 3) Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru> MFC after: 5 days
|
162325 |
15-Sep-2006 |
andre |
When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len from wrapping when we generate a maximally sized packet for later segmentation.
Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005
|
162306 |
14-Sep-2006 |
ache |
Add missing #ifdef INET6 (can't be compiled)
|
162278 |
13-Sep-2006 |
andre |
Remove unessary includes and follow common ordering style.
|
162277 |
13-Sep-2006 |
andre |
Rewrite of TCP syncookies to remove locking requirements and to enhance functionality:
- Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1.
This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled.
The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK.
A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c.
Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005
|
162238 |
12-Sep-2006 |
csjp |
Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point exists to allow the mandatory access control policy to properly initialize mbufs generated by the firewall. An example where this might happen is keep alive packets, or ICMP error packets in response to other packets.
This takes care of kernel panics associated with un-initialize mbuf labels when the firewall generates packets.
[1] I modified this patch from it's original version, the initial patch introduced a number of entry points which were programmatically equivalent. So I introduced only one. Instead, we should leverage mac_create_mbuf_netlayer() which is used for similar situations, an example being icmp_error()
This will minimize the impact associated with the MFC
Submitted by: mlaier [1] MFC after: 1 week
This is a RELENG_6 candidate
|
162231 |
11-Sep-2006 |
andre |
Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet.
Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)
|
162205 |
10-Sep-2006 |
jmg |
make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.
|
162151 |
08-Sep-2006 |
glebius |
Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress creating a compress TIME WAIT states, if both connection endpoints are local. Default is off.
|
162111 |
07-Sep-2006 |
ru |
Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!).
Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).
|
162110 |
07-Sep-2006 |
andre |
Second step of TSO (TCP segmentation offload) support in our network stack.
TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet.
The length of TSO bursts is limited to TCP_MAXWIN.
The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled.
TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options).
IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet.
Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005
|
162108 |
07-Sep-2006 |
ru |
Remove a microoptimization for i386 that was a micropessimization for amd64.
|
162084 |
06-Sep-2006 |
andre |
First step of TSO (TCP segmentation offload) support in our network stack.
o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly
Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005
|
162071 |
06-Sep-2006 |
andre |
Check inp_flags instead of inp_vflag for INP_ONESBCAST flag.
PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
162068 |
06-Sep-2006 |
andre |
Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing.
Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address.
PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
162064 |
06-Sep-2006 |
glebius |
o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function.
Reviewed by: ru
|
162035 |
05-Sep-2006 |
glebius |
Finally fix rev. 1.256
Pointy hat to: glebius
|
162033 |
05-Sep-2006 |
glebius |
Remove extra parenthesis in last commit.
Nitpicked by: ru
|
162031 |
05-Sep-2006 |
glebius |
- Make net.inet.tcp.maxtcptw modifiable at run time. - If net.inet.tcp.maxtcptw was ever set explicitly, do not change it if kern.ipc.maxsockets is changed.
|
161974 |
04-Sep-2006 |
thomas |
Fix typo in comment.
|
161767 |
31-Aug-2006 |
jhay |
Recognise IPv6 PIM packets.
MFC after: 1 week
|
161645 |
26-Aug-2006 |
mohans |
Fix for a bug that causes the computation of "len" in tcp_output() to get messed up, resulting in an inconsistency between the TCP state and so_snd.
|
161456 |
18-Aug-2006 |
julian |
comply with style police
Submitted by: ru MFC after: 1 month
|
161424 |
17-Aug-2006 |
julian |
Allow ipfw to forward to a destination that is specified by a table. for example: fwd tablearg ip from any to table(1) where table 1 has entries of the form: 1.1.1.0/24 10.2.3.4 208.23.2.0/24 router2
This allows trivial implementation of a secondary routing table implemented in the firewall layer.
I expect more work (under discussion with Glebius) to follow this to clean up some of the messy parts of ipfw related to tables.
Reviewed by: Glebius MFC after: 1 month
|
161380 |
17-Aug-2006 |
julian |
Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was in older versions of FreeBSD. This option is pointless as it is needed in just about every interesting usage of forward that I have ever seen. It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x or 7.x Reviewed by: glebius MFC after: 1 week
|
161226 |
11-Aug-2006 |
mohans |
Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby
|
160981 |
04-Aug-2006 |
brooks |
With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it.
Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.
|
160966 |
04-Aug-2006 |
oleg |
Remove useless NULL pointer check: we are using M_WAITOK flag for memory allocation.
Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Approved by: glebius (mentor) MFC after: 1 week
|
160925 |
02-Aug-2006 |
rwatson |
Move soisdisconnected() in tcp_discardcb() to one of its calling contexts, tcp_twstart(), but not to the other, tcp_detach(), as the socket is already being torn down and therefore there are no listeners. This avoids a panic if kqueue state is registered on the socket at close(), and eliminates to XXX comments. There is one case remaining in which tcp_discardcb() reaches up to the socket layer as part of the TCP host cache, which would be good to avoid.
Reported by: Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>
|
160920 |
02-Aug-2006 |
oleg |
Do not leak memory while flushing rules.
Noticed by: yar Approved by: glebius (mentor) MFC after: 1 week
|
160549 |
21-Jul-2006 |
rwatson |
Change semantics of socket close and detach. Add a new protocol switch function, pru_close, to notify protocols that the file descriptor or other consumer of a socket is closing the socket. pru_abort is now a notification of close also, and no longer detaches. pru_detach is no longer used to notify of close, and will be called during socket tear-down by sofree() when all references to a socket evaporate after an earlier call to abort or close the socket. This means detach is now an unconditional teardown of a socket, whereas previously sockets could persist after detach of the protocol retained a reference.
This faciliates sharing mutexes between layers of the network stack as the mutex is required during the checking and removal of references at the head of sofree(). With this change, pru_detach can now assume that the mutex will no longer be required by the socket layer after completion, whereas before this was not necessarily true.
Reviewed by: gnn
|
160491 |
18-Jul-2006 |
ups |
Fix race conditions on enumerating pcb lists by moving the initialization ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked)
I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs)
Reviewed by: rwatson@, mohans@ MFC after: 1 week
|
160195 |
09-Jul-2006 |
sam |
Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code)
Reviewed by: arch@
|
160164 |
08-Jul-2006 |
mlaier |
Make in-kernel multicast protocols for pfsync and carp work after enabling dynamic resizing of multicast membership array.
Reported and testing by: Maxim Konovalov, Scott Ullrich Reminded by: thompsa MFC after: 2 weeks
|
160134 |
06-Jul-2006 |
rwatson |
Remove unneeded mac.h include.
MFC after: 3 days
|
160123 |
05-Jul-2006 |
oleg |
Complete timebase (time_second -> time_uptime) conversion.
PR: kern/94249 Reviewed by: andre (few months ago) Approved by: glebius (mentor)
|
160097 |
04-Jul-2006 |
maxim |
o Kill BUGS section as it is not valid since rev. 1.4 alias_pptp.c.
Spotted by: ru.unix.bsd activists MFC after: 1 week
|
160038 |
29-Jun-2006 |
yar |
There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs.
Suggested by: glebius, jhb, ru
|
160032 |
29-Jun-2006 |
yar |
Use TAILQ_FOREACH consistently.
|
160027 |
29-Jun-2006 |
glebius |
Fix URL to Bellovin's paper.
Submitted by: Anton Yuzhaninov <citrin rambler-co.ru>
|
160025 |
29-Jun-2006 |
bz |
Eliminate the offset argument from send_reject. It's not been used since FreeBSD-SA-06:04.ipfw. Adopt send_reject6 to what had been done for legacy IP: no longer send or permit sending rejects for any but the first fragment.
Discussed with: oleg, csjp (some weeks ago)
|
160024 |
29-Jun-2006 |
bz |
Use INPLOOKUP_WILDCARD instead of just 1 more consistently.
OKed by: rwatson (some weeks ago)
|
159976 |
27-Jun-2006 |
pjd |
- Use suser_cred(9) instead of directly checking cr_uid. - Change the order of conditions to first verify that we actually need to check for privileges and then eventually check them.
Reviewed by: rwatson
|
159955 |
26-Jun-2006 |
andre |
In syncache_respond() do not reply with a MSS that is larger than what the peer announced to us but make it at least tcp_minmss in size.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159950 |
26-Jun-2006 |
andre |
Some cleanups and janitorial work to tcp_syncache:
o don't assign remote/local host/port information manually between provided struct in_conninfo and struct syncache, bcopy() it instead o rename sc_tsrecent to sc_tsreflect in struct syncache to better capture the purpose of this field o rename sc_request_r_scale to sc_requested_r_scale for ditto reasons o fix IPSEC error case printf's to report correct function name o in syncache_socket() only transpose enhanced tcp options parameters to struct tcpcb when the inpcb doesn't has TF_NOOPT set o in syncache_respond() reorder stack variables o in syncache_respond() remove bogus KASSERT()
No functional changes.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159949 |
26-Jun-2006 |
andre |
Some cleanups and janitorial work to tcp_dooptions():
o redefine the parameter 'is_syn' to 'flags', add TO_SYN flag and adjust its usage accordingly o update the comments to the tcp_dooptions() invocation in tcp_input():after_listen to reflect reality o move the logic checking the echoed timestamp out of tcp_dooptions() to the only place that uses it next to the invocation described in the previous item o adjust parsing of TCPOPT_SACK_PERMITTED to use the same style as the others o add comments in to struct tcpopt.to_flags #defines
No functional changes.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159945 |
26-Jun-2006 |
andre |
Reverse the source/destination parameters to in[6]_pcblookup_hash() in syncache_respond() for the #ifdef MAC case.
Submitted by: Tai-hwa Liang <avatar-at-mmlab.cse.yzu.edu.tw>
|
159944 |
26-Jun-2006 |
rwatson |
In tcp6_usr_attach(), return immediately if SS_ISDISCONNECTED, to avoid dereferencing an uninitialized inp variable.
Submitted by: Michiel Boland <michiel at boland dot org> MFC after: 1 month
|
159922 |
25-Jun-2006 |
andre |
Decrement the global syncache counter in syncache_expand() when the entry is removed from the bucket. This fixes the syncache statistics.
|
159859 |
22-Jun-2006 |
andre |
Move the syncookie MD5 context from globals to the stack to make it MP safe.
|
159857 |
22-Jun-2006 |
ume |
- Pullup even when the extention header is unknown, to prevent infinite loop with net.inet6.ip6.fw.deny_unknown_exthdrs=0. - Teach ipv6 and ipencap as they appear in an IPv4/IPv6 over IPv6 tunnel. - Test the next extention header even when the routing header type is unknown with net.inet6.ip6.fw.deny_unknown_exthdrs=0.
Found by: xcast-fan-club MFC after: 1 week
|
159787 |
20-Jun-2006 |
andre |
Allocate a zero'ed syncache hashtable. mtx_init() tests the supplied memory location for already existing/initialized mutexes. With random data in the memory location this fails (ie. after a soft reboot).
Reported by: brueffer, YAMAMOTO Shigeru Submitted by: YAMAMOTO Shigeru <shigeru-at-iij.ad.jp>
|
159772 |
19-Jun-2006 |
dwmalone |
When we receive an out-of-window SYN for an "ESTABLISHED" connection, ACK the SYN as required by RFC793, rather than ignoring it. NetBSD have had a similar change since 1999.
PR: 93236 Submitted by: Grant Edwards <grante@visi.com> MFC after: 1 month
|
159733 |
18-Jun-2006 |
andre |
Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer used and needed.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159727 |
18-Jun-2006 |
andre |
Do not access syncache entry before it was allocated for the TF_NOOPT case in syncache_add().
Found by: Coverity Prevent CID: 1473
|
159725 |
18-Jun-2006 |
andre |
Move all syncache related structures to tcp_syncache.c. They are only used there.
This unbreaks userland programs that include tcp_var.h.
Discussed with: rwatson
|
159722 |
18-Jun-2006 |
andre |
Remove double lock acquisition in syncookie_lookup() which came from last minute conversions to macros.
Pointy hat to: andre
|
159701 |
17-Jun-2006 |
andre |
Fix the !INET6 compile.
Reported by: alc
|
159698 |
17-Jun-2006 |
andre |
Rearrange fields in struct syncache and syncache_head to make them more cache line friendly.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159697 |
17-Jun-2006 |
andre |
ANSIfy and tidy up comments.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
159695 |
17-Jun-2006 |
andre |
Add locking to TCP syncache and drop the global tcpinfo lock as early as possible for the syncache_add() case. The syncache timer no longer aquires the tcpinfo lock and timeout/retransmit runs can happen in parallel with bucket granularity.
On a P4 the additional locks cause a slight degression of 0.7% in tcp connections per second. When IP and TCP input are deserialized and can run in parallel this little overhead can be neglected. The syncookie handling still leaves room for improvement and its random salts may be moved to the syncache bucket head structures to remove the second lock operation currently required for it. However this would be a more involved change from the way syncookies work at the moment.
Reviewed by: rwatson Tested by: rwatson, ps (earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005
|
159636 |
15-Jun-2006 |
oleg |
Add support of 'tablearg' feature for: - 'tag' & 'untag' action parameters. - 'tagged' & 'limit' rule options. Rule examples: pipe 1 tag tablearg ip from table(1) to any allow ip from any to table(2) tagged tablearg allow tcp from table(3) to any 25 setup limit src-addr tablearg
sbin/ipfw/ipfw2.c: 1) new macros GET_UINT_ARG - support of 'tablearg' keyword, argument range checking. PRINT_UINT_ARG - support of 'tablearg' keyword. 2) strtoport(): do not silently truncate/accept invalid port list expressions like: '1,2-abc' or '1,2-3-4' or '1,2-3x4'. style(9) cleanup.
Approved by: glebius (mentor) MFC after: 1 month
|
159635 |
15-Jun-2006 |
oleg |
install_state(): style(9) cleanup
Approved by: glebius (mentor) MFC after: 1 month
|
159448 |
09-Jun-2006 |
thompsa |
Enable proxy ARP answers on any of the bridged interfaces if proxy record belongs to another interface within the bridge group.
PR: kern/94408 Submitted by: Eygene A. Ryabinkin MFC after: 1 month
|
159398 |
08-Jun-2006 |
oleg |
install_state() should properly initialize 'addr_type' field of newly created flows for O_LIMIT rules. Otherwise 'ipfw -d show' is unable to display PARENT rules properly. (This bug was exposed by ipfw2.c rev.1.90)
Approved by: glebius (mentor) MFC after: 2 weeks
|
159397 |
08-Jun-2006 |
oleg |
Fix following rules: pipe X (tag|altq) Y ...
Approved by: glebius (mentor) MFC after: 2 weeks
|
159218 |
04-Jun-2006 |
rwatson |
Push acquisition of pcbinfo lock out of tcp_usr_attach() into tcp_attach() after the call to soreserve(), as it doesn't require the global lock. Rearrange inpcb locking here also.
MFC after: 1 month
|
159199 |
03-Jun-2006 |
rwatson |
When entering a timer on a tcpcb, don't continue processing if it has been dropped. This prevents a bug introduced during the socket/pcb refcounting work from occuring, in which occasionally the retransmit timer may fire after a connection has been reset, resulting in the resulting R|A TCP packet having a source port of 0, as the port reservation has been released.
While here, fixing up some RUNLOCK->WUNLOCK bugs.
MFC after: 1 month
|
159198 |
03-Jun-2006 |
rwatson |
Acquire udbinfo lock after call to soreserve() rather than before, as it is not required. This simplifies error-handling, and reduces the time that this lock is held.
MFC after: 1 month
|
159180 |
02-Jun-2006 |
csjp |
Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom
This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load.
Summary of changes:
- Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present
Now what happens is:
(1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process
From the attach/detach side:
(1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list
Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1).
[1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets.
In collaboration with: sam@ MFC after: 1 month
|
159163 |
02-Jun-2006 |
rwatson |
Minor restyling and cleanup around ipport_tick().
MFC after: 1 month
|
158879 |
24-May-2006 |
oleg |
Implement internal (i.e. inside kernel) packet tagging using mbuf_tags(9). Since tags are kept while packet resides in kernelspace, it's possible to use other kernel facilities (like netgraph nodes) for altering those tags.
Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Submitted by: Vadim Goncharov <vadimnuclight at tpu dot ru> Approved by: glebius (mentor) Idea from: OpenBSD PF MFC after: 1 month
|
158800 |
21-May-2006 |
maxim |
o In udp|rip_disconnect() acquire a socket lock before the socket state modification. To prevent races do that while holding inpcb lock.
Reviewed by: rwatson
|
158799 |
21-May-2006 |
maxim |
o Add missed error check: in ip_ctloutput() sooptcopyin() returns a result but we never examine it.
Reviewed by: rwatson MFC after: 2 weeks
|
158729 |
18-May-2006 |
bms |
Initialize the new members of struct ip_moptions as a defensive programming measure.
Note that whilst these members are not used by the ip_output() path, we are passing an instance of struct ip_moptions here which is declared on the stack (which could be considered a bad thing).
ip_output() does not consume struct ip_moptions, but in case it does in future, declare an in_multi vector on the stack too to behave more like ip_findmoptions() does.
|
158645 |
16-May-2006 |
glebius |
Since m_pullup() can return a new mbuf, change gre_input2() to return mbuf back to gre_input(). If the former returns mbuf back to the latter, then pass it to raw_input().
Coverity ID: 829
|
158644 |
16-May-2006 |
glebius |
- Backout one line from 1.78. The tp can be freed by tcp_drop(). - Style next line.
Coverity ID: 912
|
158588 |
15-May-2006 |
maxim |
o In rip_disconnect() do not call rip_abort(), just mark a socket as not connected. In soclose() case rip_detach() will kill inpcb for us later.
It makes rawconnect regression test do not panic a system.
Reviewed by: rwatson X-MFC after: with all 1th April inpcb changes
|
158580 |
14-May-2006 |
mlaier |
Use only lower 64bit of src/dest (and src/dest port) for hashing of IPv6 connections and get rid of the flow_id as it is not guaranteed to be stable some (most?) current implementations seem to just zero it out.
PR: kern/88664 Reported by: jylefort Submitted by: Joost Bekkers (w/ changes) Tested by "regisr" <regisrApoboxDcom>
|
158563 |
14-May-2006 |
bms |
Fix a long-standing limitation in IPv4 multicast group membership.
By making the imo_membership array a dynamically allocated vector, this minimizes disruption to existing IPv4 multicast code. This change breaks the ABI for the kernel module ip_mroute.ko, and may cause a small amount of churn for folks working on the IGMPv3 merge.
Previously, sockets were subject to a compile-time limitation on the number of IPv4 group memberships, which was hard-coded to 20. The imo_membership relationship, however, is 1:1 with regards to a tuple of multicast group address and interface address. Users who ran routing protocols such as OSPF ran into this limitation on machines with a large system interface tree.
|
158500 |
12-May-2006 |
mlaier |
Remove ip6fw. Since ipfw has full functional IPv6 support now and - in contrast to ip6fw - is properly lockes, it is time to retire ip6fw.
|
158470 |
12-May-2006 |
mlaier |
Reintroduce net.inet6.ip6.fw.enable sysctl to dis/enable the ipv6 processing seperately. Also use pfil hook/unhook instead of keeping the check functions in pfil just to return there based on the sysctl. While here fix some whitespace on a nearby SYSCTL_ macro.
|
158433 |
11-May-2006 |
mlaier |
Don't claim "(+ipv6)" if we didn't build with INET6.
|
158332 |
06-May-2006 |
rwatson |
Modify UDP to use sosend_dgram() instead of sosend(). This allows for signicantly optimized UDP socket I/O when using a single UDP socket from many threads or processes that share it, by avoiding significant locking and other overhead in the general sosend() path that isn't necessary for simple datagram sockets. Specifically, this change results in a significant performance improvement for threaded name service in BIND9 under load.
Suggested by: Jinmei_Tatsuya at isc dot org
|
158305 |
05-May-2006 |
bz |
Make sure the ip data pointer is correct before touching it again after ipsec4_output processing else KAME IPSec using the handbook configuration with gif(4) will panic the kernel.
Problem reported by: t. patterson <tp lot.org> Tested by: t. patterson <tp lot.org>
|
158304 |
05-May-2006 |
rwatson |
Only return (tw) from tcp_twclose() if reuse is passed, otherwise return NULL. In principle this shouldn't change the behavior, but avoids returning a potentially invalid/inappropriate pointer to the caller.
Found with: Coverity Prevent (tm) Submitted by: pjd MFC after: 3 months
|
158302 |
05-May-2006 |
pjd |
/tmp/cvsTXPIwQ
|
158036 |
25-Apr-2006 |
marcel |
In in_pcbdrop(), fix !INVARIANTS build.
|
158021 |
25-Apr-2006 |
rwatson |
Rename 'last' to 'inp' in udp_append(): the name 'last' is due to the fact that the loop through inpcb's in udp_input() tracks the last inpcb while looping. We keep that name in the calling loop but not in the delivery routine itself.
MFC after: 3 months
|
158009 |
25-Apr-2006 |
rwatson |
Abstract inpcb drop logic, previously just setting of INP_DROPPED in TCP, into in_pcbdrop(). Expand logic to detach the inpcb from its bound address/port so that dropping a TCP connection releases the inpcb resource reservation, which since the introduction of socket/pcb reference count updates, has been persisting until the socket closed rather than being released implicitly due to prior freeing of the inpcb on TCP drop.
MFC after: 3 months
|
157993 |
24-Apr-2006 |
rwatson |
Instead of calling tcp_usr_detach() from tcp_usr_abort(), break out common pcb tear-down logic into tcp_detach(), which is called from either. Invoke tcp_drop() from the tcp_usr_abort() path rather than tcp_disconnect(), as we want to drop it immediately not perform a FIN sequence. This is one reason why some people were experiencing panics in sodealloc(), as the netisr and aborting thread were simultaneously trying to tear down the socket. This bug could often be reproduced using repeated runs of the listenclose regression test.
MFC after: 3 months PR: 96090 Reported by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris Tested by: Peter Kostouros <kpeter at melbpc dot org dot au>, kris
|
157977 |
23-Apr-2006 |
rwatson |
Replace isn_mtx direct use with ISN_*() lock macros so that locking details/strategy can be changed without touching every use.
MFC after: 3 months
|
157967 |
22-Apr-2006 |
rwatson |
Introduce a new TCP mutex, isn_mtx, which protects the initial sequence number state, rather than re-using pcbinfo. This introduces some additional mutex operations during isn query, but avoids hitting the TCP pcbinfo lock out of yet another frequently firing TCP timer.
MFC after: 3 months
|
157966 |
22-Apr-2006 |
rwatson |
Assert the inpcb lock when rehashing an inpcb.
Improve consistency of style around some current assertions.
MFC after: 3 months
|
157965 |
22-Apr-2006 |
rwatson |
Remove pcbinfo locking from in_setsockaddr() and in_setpeeraddr(); holding the inpcb lock is sufficient to prevent races in reading the address and port, as both the inpcb lock and pcbinfo lock are required to change the address/port.
Improve consistency of spelling in assertions about inp != NULL.
MFC after: 3 months
|
157927 |
21-Apr-2006 |
ps |
Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.
|
157833 |
18-Apr-2006 |
glebius |
Merge rev. 1.240 of ip_output.c, so that IPFIREWALL_FORWARD_EXTENDED kernel option will affect both forwarding methods - classic and fast.
|
157609 |
09-Apr-2006 |
rwatson |
Modify tcp_timewait() to accept an inpcb reference, not a tcptw reference. For now, we allow the possibility that the in_ppcb pointer in the inpcb may be NULL if a timewait socket has had its tcptw structure recycled. This allows tcp_timewait() to consistently unlock the inpcb.
Reported by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months
|
157569 |
06-Apr-2006 |
mohans |
Eliminate debug code that catches bugs in the hinting of sack variables (tcp_sack_output_debug checks cached hints aginst computed values by walking the scoreboard and reports discrepancies). The sack hinting code has been stable for many months now so it is time for the debug code to go. Leaving tcp_sack_output_debug ifdef'ed out in case we need to resurrect it at a later point.
|
157534 |
05-Apr-2006 |
rwatson |
Don't unlock a timewait structure if the pointer is NULL in tcp_timewait(). This corrects a bug (or lack of fixing of a bug) in tcp_input.c:1.295.
Submitted by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months
|
157526 |
05-Apr-2006 |
mohans |
Certain (bad) values of sack blocks can end up corrupting the sack scoreboard. Make the checks in tcp_sack_doack() more robust to prevent this.
Submitted by: Raja Mukerji (raja@mukerji.com) Reviewed by: Mohan Srinivasan
|
157478 |
04-Apr-2006 |
glebius |
Add a tunable net.inet.tcp.maxtcptw, that allows to set a limit on tcptw zone independently from setting a limit on socket zone.
|
157474 |
04-Apr-2006 |
rwatson |
Before dereferencing intotw() when INP_TIMEWAIT, check for inp_ppcb being NULL. We currently do allow this to happen, but may want to remove that possibility in the future. This case can occur when a socket is left open after TCP wraps up, and the timewait state is recycled. This will be cleaned up in the future.
Found by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months
|
157433 |
03-Apr-2006 |
rwatson |
In TCP notify routines, check inpcb for INP_TIMEWAIT and INP_DROPPED. The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT checks appear to have always been required, but not been there, which is/was a bug. This avoids unconditionally casting of in_ppcb to a tcpcb, when it may be a twtcb, which may have resulted in obscure ICMP-related panics in earlier releases.
MFC after: 3 months
|
157432 |
03-Apr-2006 |
rwatson |
Change inp_ppcb from caddr_t to void *, fix/remove associated related casts.
Consistently use intotw() to cast inp_ppcb pointers to struct tcptw * pointers.
Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb * pointers.
Don't assign tp to the results to intotcpcb() during variable declation at the top of functions, as that is before the asserts relating to locking have been performed. Do this later in the function after appropriate assertions have run to allow that operation to be conisdered safe.
MFC after: 3 months
|
157431 |
03-Apr-2006 |
rwatson |
Style tweaks: convert to ANSI from K&R function prototypes.
MFC after: 3 months
|
157430 |
03-Apr-2006 |
rwatson |
Update comment on tcp_close() for new world order.
MFC after: 3 months
|
157429 |
03-Apr-2006 |
rwatson |
Clarify comment on handling of non-timewait TCP states in tcp_usr_detach().
MFC after: 3 months
|
157427 |
03-Apr-2006 |
rwatson |
Fix up locking surrounding tcp_drop sysctl: in the new world order, we don't free inpcbs until after the socket is closed, so we always need to unlock an inpcb after calling tcp_drop() on it.
MFC after: 3 months
|
157424 |
03-Apr-2006 |
rwatson |
After checking for SO_ISDISCONNECTED in tcp_usr_accept(), return immediately rather than jumping to the normal output handling, which assumes we've pulled out the inpcb, which hasn't happened at this point (and isn't necessary).
Return ECONNABORTED instead of EINVAL when the inpcb has entered INP_TIMEWAIT or INP_DROPPED, as this is the documented error value.
This may correct the panic seen by Ganbold.
MFC after: 1 month Reported by: Ganbold <ganbold at micom dot mng dot net>
|
157423 |
03-Apr-2006 |
rwatson |
Correct incorrect assertion in div_bind(): inp must not be NULL here.
Reported by: tegge MFC after: 3 months
|
157410 |
02-Apr-2006 |
rwatson |
During reformulation of tcp_usr_detach(), the call to initiate TCP disconnect for fully connected sockets was dropped, meaning that if the socket was closed while the connection was alive, it would be leaked. Structure tcp_usr_detach() so that there are two clear parts: initiating disconnect, and reclaiming state, and reintroduce the tcp_disconnect() call in the first part.
MFC after: 3 months
|
157386 |
01-Apr-2006 |
rwatson |
Properly handle an edge case previously not handled correctly: a socket can have a tcp connection that has entered time wait attached to it, in the event that shutdown() is called on the socket and the FINs properly exchange before close(). In this case we don't detach or free the inpcb, just leave the tcptw detached and freed, but we must release the inpcb lock (which we didn't previously).
MFC after: 3 months
|
157376 |
01-Apr-2006 |
rwatson |
Update TCP for infrastructural changes to the socket/pcb refcount model, pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases.
- In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock.
- Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED.
- Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this.
- Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue.
- Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid.
- Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit.
- Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work.
These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces.
MFC after: 3 months
|
157374 |
01-Apr-2006 |
rwatson |
Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach():
- Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code.
- In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send.
- Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this.
- Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket.
MFC after: 3 months
|
157373 |
01-Apr-2006 |
rwatson |
Break out in_pcbdetach() into two functions:
- in_pcbdetach(), which removes the link between an inpcb and its socket.
- in_pcbfree(), which frees a detached pcb.
Unlike the previous in_pcbdetach(), neither of these functions will attempt to conditionally free the socket, as they are responsible only for managing in_pcb memory. Mirror these changes into in6_pcbdetach() by breaking it into in6_pcbdetach() and in6_pcbfree().
While here, eliminate undesired checks for NULL inpcb pointers in sockets, as we will now have as an invariant that sockets will always have valid so_pcb pointers.
MFC after: 3 months
|
157370 |
01-Apr-2006 |
rwatson |
Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic.
MFC after: 3 months
|
157366 |
01-Apr-2006 |
rwatson |
Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this.
This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components.
MFC after: 3 months
|
157143 |
26-Mar-2006 |
rwatson |
Define two new inpcb flags in the inp_vflag field, which for whatever reason, seems to be where new flags are getting defined:
INP_DROPPED - The protocol has terminated this connection and the socket is not reusable: when the socket code enters the protocol, an error is immediately returned. This will substitute for NULLing the so_pcb socket field, helping to implement the invariant that all valid sockets have valid pcb's in TCP.
INP_SOCKREF - The protocol has become the owner of the socket reference, and will need to free it when freeing the pcb, which will be used when a TCP socket is closed but still has queued data.
MFC after: 1 month
|
157142 |
26-Mar-2006 |
rwatson |
Minor style tweak: tab after #define, not space.
MFC after: 1 month
|
157136 |
26-Mar-2006 |
rwatson |
Explicitly assert socket pointer is non-NULL in tcp_input() so as to provide better debugging information.
Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans.
MFC after: 1 month
|
156947 |
21-Mar-2006 |
glebius |
o Introduce carp_multicast_cleanup(), which removes and frees multicast addresses from carp interface. [1] o Rewrite carpdetach(), so that it does the following things: [1] - Stops callouts. - Decrements carp_suppress_preempt, if needed. - Downs interface and sets CARP state to INIT. - Calls carp_multicast_cleanup(). - Detaches softc from carp_if and if we are the last frees the carp_if. o Use new carpdetach() in carp_clone_destroy(). o In carp_ifdetach() acquire the carp_if lock and cleanup all interfaces hanging on carp_if. [1] o Make carp_ifdetach() static and use EVENT(9) to call it from if_detach(). [2] o In carp_setrun() exit if the softc doesn't have a valid pointer to parent. [1]
Obtained from: OpenBSD [1] Submitted by: Dan Lukes <dan obluda.cz> [2] PR: kern/82908 [2]
|
156926 |
20-Mar-2006 |
keramida |
Add descriptions for the sysctls:
net.inet.icmp.drop_redirect net.inet.icmp.log_redirect net.inet.icmp.icmplim net.inet.icmp.icmplim_output
Approved & text by: andre
|
156877 |
19-Mar-2006 |
dwmalone |
Make net.inet.ip.portrange.reservedhigh and net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.
We could have made new sysctls for IPv6, but that potentially makes things complicated for mapped addresses. This seems like the least confusing option and least likely to cause obscure problems in the future.
This change makes the mac_portacl module useful with IPv6 apps.
Reviewed by: ume MFC after: 1 month
|
156763 |
16-Mar-2006 |
rwatson |
Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.
|
156409 |
07-Mar-2006 |
thompsa |
Further refine the bridge hack in the arp code. Only do the special arp handling for interfaces which are actually in the bridge group, ignore all others.
MFC after: 3 days
|
156240 |
03-Mar-2006 |
glebius |
- Do not leak read lock in IP_FW_TABLE_GETSIZE case of ipfw_ctl(). - Acquire read (not write) lock in case of IP_FW_TABLE_LIST.
In collaboration with: ru
|
156125 |
28-Feb-2006 |
andre |
Rework TCP window scaling (RFC1323) to properly scale the send window right from the beginning and partly clean up the differences in handling between SYN_SENT and SYN_RCVD (syncache).
Further changes to this code to come. This is a first incremental step to a general overhaul and streamlining of the TCP code.
PR: kern/15095 PR: kern/92690 (partly) Reviewed by: qingli (and tested with ANVL) Sponsored by: TCP/IP Optimization Fundraise 2005
|
155961 |
23-Feb-2006 |
qingli |
This patch fixes the problem where the current TCP code can not handle simultaneous open. Both the bug and the patch were verified using the ANVL test suite.
PR: kern/74935 Submitted by: qingli (before I became committer) Reviewed by: andre MFC after: 5 days
|
155861 |
20-Feb-2006 |
ume |
Obey opt_inet6.h in kernel build directory.
Reported by: Peter Losher <plosher-keyword-freebsd.a36e57__at__plosh.net> MFC after: 3 days
|
155819 |
18-Feb-2006 |
andre |
Remove unneeded includes and provide more accurate description to others.
Submitted by: garys PR: kern/86437
|
155817 |
18-Feb-2006 |
andre |
Add missing TH_PUSH to the TH_FLAGS enumeration.
Submitted by: Andre Albsmeier <Andre.Albsmeier-at-siemens.com> PR: kern/85203
|
155767 |
16-Feb-2006 |
andre |
Have TCP Inflight disable itself if the RTT is below a certain threshold. Inflight doesn't make sense on a LAN as it has trouble figuring out the maximal bandwidth because of the coarse tick granularity.
The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold in milliseconds below which inflight will disengage. It defaults to 10ms.
Tested by: Joao Barros <joao.barros-at-gmail.com>, Rich Murphey <rich-at-whiteoaklabs.com> Sponsored by: TCP/IP Optimization Fundraise 2005
|
155759 |
16-Feb-2006 |
andre |
In in_pcbconnect_setup() reduce code duplication and use ip_rtaddr() to find the outgoing interface for this connection.
Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 2 weeks
|
155758 |
16-Feb-2006 |
andre |
Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead of being private to tcp_timer.c.
Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
155659 |
14-Feb-2006 |
ru |
When sending a packet from dummynet, indicate that we're forwarding it so that ip_id etc. don't get overwritten. This fixes forwarding of fragmented IP packets through a dummynet pipe -- fragments came out with modified and different(!) ip_id's, making it impossible to reassemble a datagram at the receiver side.
Submitted by: Alexander Karptsov (reworked by me) MFC after: 3 days
|
155487 |
09-Feb-2006 |
qingli |
Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry.
Reviewed by: andre, glebius MFC after: 3 days
|
155463 |
08-Feb-2006 |
qingli |
Redo the previous fix by setting the UMA_ZONE_ZINIT bit in the syncache zone, eliminating the need to call bzero() after each syncache entry allocation.
Suggested by: glebius Reviewed by: andre MFC after: 3 days
|
155439 |
07-Feb-2006 |
qingli |
Fixes a crash due to the memory of the newly allocated syncache entry in syncache_lookup() is not cleared and may lead to an arbitrary and bogus rtentry pointer which later gets free'd.
Reviewed by: andre MFC after: 3 days
|
155425 |
07-Feb-2006 |
oleg |
Fix five years old bug in ip_reass(): if we are using 'full' (i.e. including pseudo header) hardware rx checksum offloading ip_reass() fails to calculate TCP/UDP checksum for reassembled packet correctly. This also should fix recent 'NFS over UDP over bge' issue exposed by if_bge.c rev. 1.123
Reviewed by: sam (earlier version), bde Approved by: glebius (mentor) MFC after: 2 weeks
|
155277 |
04-Feb-2006 |
ume |
Never select the PCB that has INP_IPV6 flag and is bound to :: if we have another PCB which is bound to 0.0.0.0. If a PCB has the INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME MFC after: 1 week
|
155248 |
03-Feb-2006 |
glebius |
Dropping the lock in the transmit_event() is not safe, because we store some pipe pointers on stack. If user reconfigures dummynet in the interlock gap, we can work with freed pipes after relock.
To fix this, we decided not to send packets in transmit_event(), but fill a queue. At the end of dummynet() and dummynet_io(), after the lock is dropped, if there is something in the queue we run dummynet_send() to process the queue.
In collaboration with: ru
|
155245 |
03-Feb-2006 |
glebius |
Axe unused function.
|
155221 |
02-Feb-2006 |
csjp |
Use PFIL_HOOKED macros in if_bridge and pass the right argument to rw_assert. This un-breaks the build.
Submitted by: Kostik Belousov Pointy hat to: csjp
|
155201 |
02-Feb-2006 |
csjp |
Somewhat re-factor the read/write locking mechanism associated with the packet filtering mechanisms to use the new rwlock(9) locking API:
- Drop the variables stored in the phil_head structure which were specific to conditions and the home rolled read/write locking mechanism. - Drop some includes which were used for condition variables - Drop the inline functions, and convert them to macros. Also, move these macros into pfil.h - Move pfil list locking macros intp phil.h as well - Rename ph_busy_count to ph_nhooks. This variable will represent the number of IN/OUT hooks registered with the pfil head structure - Define PFIL_HOOKED macro which evaluates to true if there are any hooks to be ran by pfil_run_hooks - In the IP/IP6 stacks, change the ph_busy_count comparison to use the new PFIL_HOOKED macro. - Drop optimization in pfil_run_hooks which checks to see if there are any hooks to be ran, and returns if not. This check is already performed by the IP stacks when they call:
if (!PFIL_HOOKED(ph)) goto skip_hooks;
- Drop in assertion which makes sure that the number of hooks never drops below 0 for good measure. This in theory should never happen, and if it does than there are problems somewhere - Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep - Drop variables which support home rolled read/write locking mechanism from the IPFW firewall chain structure. - Swap out the read/write firewall chain lock internal to use the rwlock(9) API instead of our home rolled version - Convert the inlined functions to macros
Reviewed by: mlaier, andre, glebius Thanks to: jhb for the new locking API
|
155179 |
01-Feb-2006 |
andre |
Move the IPSEC related code blocks to their own file to unclutter and signifincantly improve the readability of ip_input() and ip_output() again.
The resulting IPSEC hooks in ip_input() and ip_output() may be used later on for making IPSEC loadable.
This move is mostly mechanical and should preserve current IPSEC behaviour as-is. Nothing shall prevent improvements in the way IPSEC interacts with the IPv4 stack.
Discussed with: bz, gnn, rwatson; (earlier version)
|
155166 |
01-Feb-2006 |
ru |
Brain-o (use standard int types now).
|
155152 |
31-Jan-2006 |
ru |
Fix multicast routing on 64-bit platforms.
Tested on: amd64 MFC after: 3 days
|
155145 |
31-Jan-2006 |
thompsa |
Now that the bridge also processes Ethernet frames as itself, two arp replies will be sent if there is an address on the bridge. Exclude the bridge from the special arp handling.
This has been tested with all combinations of addresses on the bridge and members.
Pointed out by: Michal Mertl
|
155037 |
30-Jan-2006 |
glebius |
Add some initial locking to gif(4). It doesn't covers the whole driver, however IPv4-in-IPv4 tunnels are now stable on SMP. Details:
- Add per-softc mutex. - Hold the mutex on output.
The main problem was the rtentry, placed in softc. It could be freed by ip_output(). Meanwhile, another thread being in in_gif_output() can read and write this rtentry.
Reported by: many Tested by: Alexander Shiryaev <aixp mail.ru>
|
155018 |
29-Jan-2006 |
thompsa |
Back out of r1.148, it causes two arp replies to be sent with different mac addresses. One for the bridged interface with the IP address assigned but then another with the mac for the bridge itself.
|
154780 |
24-Jan-2006 |
andre |
When doing IP forwarding with [FAST_]IPSEC compiled into the kernel ip_forward() would report back a zero MTU in ICMP needfrag messages because on a IPSEC SP lookup failure no MTU got computed.
Fix this by changing the logic to compute a new MTU in any case if IPSEC didn't do it.
Change MTU computation logic to use egress interface MTU if available or the next smaller MTU compared to the current packet size instead of falling back to a very small fixed MTU.
Fix associated comment.
PR: kern/91412 MFC after: 3 days
|
154779 |
24-Jan-2006 |
andre |
In ip_mdq() compute the TV_DELTA the correct way around.
PR: kern/91851 Submitted by: SAKAI Hiroaki <sakai.hiroaki-at-jp.fujitsu.com> MFC after: 3 days
|
154777 |
24-Jan-2006 |
andre |
In in_control() remove the temporary in_ifaddr structure from the ia_hash only if it actually is an AF_INET address. All other places test for sa_family == AF_INET but this one.
PR: kern/92091 Submitted by: Seth Kingsley <sethk-at-meowfishies.com> MFC after: 3 days
|
154769 |
24-Jan-2006 |
oleg |
Fix minor bug in uRPF: If net.link.ether.inet.useloopback=1 and we send broadcast packet using our own source ip address it may be rejected by uRPF rules.
Same bug was fixed for IPv6 in rev. 1.115 by suz.
PR: kern/76971 Approved by: glebius (mentor) MFC after: 3 days
|
154767 |
24-Jan-2006 |
glebius |
Implement 'ipfw fwd laddr,port' feature for UDP. According to ipfw(8) it should work, however it never did. People expect it to work.
PR: kern/90834
|
154733 |
23-Jan-2006 |
glebius |
Fix build.
|
154728 |
23-Jan-2006 |
andre |
Simplify ip_next_mtu() and make its logic more easy to see while silencing code analysis tools.
Found by: Coverity Prevent(tm) Coverity ID: CID341 Sponsored by: TCP/IP Optimization Fundraise 2005
|
154666 |
22-Jan-2006 |
rwatson |
Convert remaining functions to ANSI C function declarations; remove 'register' where present.
MFC after: 1 week
|
154665 |
22-Jan-2006 |
rwatson |
Convert last remaining function in ip_gre.c to ANSI C function declaration.
MFC after: 1 week
|
154625 |
21-Jan-2006 |
bz |
Fix stack corruptions on amd64.
Vararg functions have a different calling convention than regular functions on amd64. Casting a varag function to a regular one to match the function pointer declaration will hide the varargs from the caller and we will end up with an incorrectly setup stack.
Entirely remove the varargs from these functions and change the functions to match the declaration of the function pointers. Remove the now unnecessary casts.
Lots of explanations and help from: peter Reviewed by: peter PR: amd64/89261 MFC after: 6 days
|
154567 |
20-Jan-2006 |
csjp |
- Change the return type for init_tables from void to int so we can propagate errors from rn_inithead back to the ipfw initialization function. - Check return value of rn_inithead for failure, if table allocation has failed for any reason, free up any tables we have created and return ENOMEM - In ipfw_init check the return value of init_tables and free up any mutexes or UMA zones which may have been created. - Assert that the supplied table is not NULL before attempting to dereference.
This fixes panics which were a result of invalid memory accesses due to failed table allocation. This is an issue mainly because the R_Zalloc function is a malloc(M_NOWAIT) wrapper, thus making it possible for allocations to fail.
Found by: Coverity Prevent (tm) Coverity ID: CID79 MFC after: 1 week
|
154563 |
20-Jan-2006 |
csjp |
Destroy the dynamic rule zone in the event that we fail to insert the initial default rule.
MFC after: 1 week
|
154528 |
18-Jan-2006 |
andre |
Do not derefence the ip header pointer in the IPv6 case. This fixes a bug in the previous commit.
Found by: Coverity Prevent(tm) Coverity ID: CID253 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
154526 |
18-Jan-2006 |
andre |
In in_delayed_cksum() we can't perform a m_pullup() as it may change the mbuf pointer and we don't have any way of passing it back to the callers. Instead just fail silently without updating the checksum but leaving the mbuf+chain intact.
A search in our GNATS database did not turn up any match for the existing warning message when this case is encountered.
Found by: Coverity Prevent(tm) Coverity ID: CID779 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
154524 |
18-Jan-2006 |
andre |
In syncache_expand() insert a proper syncache_free() to fix a case that currently can't be triggered. But better be safe than sorry later on. Additionally it properly silences Coverity Prevent for future tests.
Found by: Coverity Prevent(tm) Coverity ID: CID802 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
154520 |
18-Jan-2006 |
andre |
Prevent dereferencing a NULL route pointer when trying to update the route MTU.
This bug is very difficult to reach and not remotely exploitable.
Found by: Coverity Prevent(tm) Coverity ID: CID162 Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days
|
154518 |
18-Jan-2006 |
andre |
Return mbuf pointer or NULL from ip_fastforward() as the mbuf pointer may have changed by m_pullup() during fastforward processing.
While this is a bug it is actually never triggered in real world situations and it is not remotely exploitable.
Found by: Coverity Prevent(tm) Coverity ID: CID780 Sponsored by: TCP/IP Optimization Fundraise 2005
|
154400 |
15-Jan-2006 |
rwatson |
Modify the IP fragment reassembly code so that it uses a new UMA zone, ipq_zone, to allocate fragment headers from, rather than using cast mbuf storage. This was one of the few remaining uses of mbuf storage for local data structures that relied on dtom(). Implement the resource limit on ipq's using UMA zone limits, but preserve current sysctl semantics using a sysctl proc.
MFC after: 3 weeks
|
154395 |
15-Jan-2006 |
rwatson |
Staticize ipqlock, since it is local to ip_input.c.
MFC after: 3 days
|
154366 |
14-Jan-2006 |
gnn |
Check the correct TTL in both the IPv6 and IPv4 cases.
Submitted by: glebius Reviewed by: gnn, bz Found with: Coverity Prevent(tm)
|
154355 |
14-Jan-2006 |
glebius |
UMA can return NULL not only in case when our zone is full, but also in case of generic memory shortage. In the latter case we may not find an old entry.
Found with: Coverity Prevent(tm)
|
154349 |
14-Jan-2006 |
rwatson |
Remove dead code: 'opts' is not used in udp_append(), only in udp_input(), so no need to assign it to NULL or conditionally free it.
Found with: Coverity Prevent(tm) MFC after: 3 days
|
154271 |
12-Jan-2006 |
thompsa |
Include the bridge interface itself in the special arp handling.
PR: 90973 MFC after: 1 week
|
154216 |
11-Jan-2006 |
cperciva |
Correct insecure temporary file usage in texindex. [06:01] Correct insecure temporary file usage in ee. [06:02] Correct a race condition when setting file permissions, sanitize file names by default, and fix a buffer overflow when handling files larger than 4GB in cpio. [06:03] Fix an error in the handling of IP fragments in ipfw which can cause a kernel panic. [06:04]
Security: FreeBSD-SA-06:01.texindex Security: FreeBSD-SA-06:02.ee Security: FreeBSD-SA-06:03.cpio Security: FreeBSD-SA-06:04.ipfw
|
153621 |
21-Dec-2005 |
thompsa |
Add RFC 3378 EtherIP support. This change makes it possible to add gif interfaces to bridges, which will then send and receive IP protocol 97 packets. Packets are Ethernet frames with an EtherIP header prepended.
Obtained from: NetBSD MFC after: 2 weeks
|
153553 |
20-Dec-2005 |
delphij |
Use consistent indent character as other IPPROTO_* lines did.
|
153552 |
20-Dec-2005 |
gnn |
Add protocol number for SCTP.
Submitted by: Randall Stewart rrs at cisco.com MFC after: 1 week
|
153513 |
18-Dec-2005 |
glebius |
Add a knob to suppress logging of attempts to modify permanent ARP entries.
Submitted by: Andrew Alcheyev <buddy telenet.ru>
|
153478 |
16-Dec-2005 |
emaste |
Add descriptions for sysctl -d.
Approved by: glebius Silence from: rwatson (mentor)
|
153476 |
16-Dec-2005 |
glebius |
Cleanup __FreeBSD_version.
|
153461 |
15-Dec-2005 |
jhb |
Use %t (ptrdiff_t modifier) to print a couple of pointer differences rather than casting them to int.
|
153427 |
14-Dec-2005 |
mux |
Fix a bunch of SYSCTL_INT() that should have been SYSCTL_ULONG() to match the type of the variable they are exporting.
Spotted by: Thomas Hurst <tom@hur.st> MFC after: 3 days
|
153374 |
13-Dec-2005 |
glebius |
Add a new feature for optimizining ipfw rulesets - substitution of the action argument with the value obtained from table lookup. The feature is now applicable only to "pipe", "queue", "divert", "tee", "netgraph" and "ngtee" rules.
An example usage:
ipfw pipe 1000 config bw 1000Kbyte/s ipfw pipe 4000 config bw 4000Kbyte/s ipfw table 1 add x.x.x.x 1000 ipfw table 1 add x.x.x.y 4000 ipfw pipe tablearg ip from table(1) to any
In the example above the rule will throw different packets to different pipes.
TODO: - Support "skipto" action, but without searching all rules. - Improve parser, so that it warns about bad rules. These are: - "tablearg" argument to action, but no "table" in the rule. All traffic will be blocked. - "tablearg" argument to action, but "table" searches for entry with a specific value. All traffic will be blocked. - "tablearg" argument to action, and two "table" looks - for src and for dst. The last lookup will match.
|
153164 |
06-Dec-2005 |
glebius |
When we drop packet due to no space in output interface output queue, also increase the ifp->if_snd.ifq_drops.
PR: 72440 Submitted by: ikob
|
153163 |
06-Dec-2005 |
glebius |
Optimize parallel processing of ipfw(4) rulesets eliminating the locking of the radix lookup tables. Since several rnh_lookup() can run in parallel on the same table, we can piggyback on the shared locking provided by ipfw(4). However, the single entry cache in the ip_fw_table can't be used lockless, so it is removed. This pessimizes two cases: processing of bursts of similar packets and matching one packet against the same table several times during one ipfw_chk() lookup. To optimize the processing of similar packet bursts administrator should use stateful firewall. To optimize the second problem a solution will be provided soon.
Details: o Since we piggyback on the ipfw(4) locking, and the latter is per-chain, the tables are moved from the global declaration to the struct ip_fw_chain. o The struct ip_fw_table is shrunk to one entry and thus vanished. o All table manipulating functions are extended to accept the struct ip_fw_chain * argument. o All table modifing functions use IPFW_WLOCK_ASSERT().
|
153072 |
04-Dec-2005 |
ru |
Fix -Wundef.
|
152928 |
29-Nov-2005 |
ume |
obey opt_inet6.h and opt_ipsec.h in kernel build directory.
Requested by: hrs
|
152917 |
29-Nov-2005 |
glebius |
Garbage-collect now unused struct _ipfw_insn_pipe and flush_pipe_ptrs(), thus removing a few XXXes. Document the ABI breakage in UPDATING.
|
152910 |
29-Nov-2005 |
glebius |
First step in removing welding between ipfw(4) and dummynet.
o Do not use ipfw_insn_pipe->pipe_ptr in locate_flowset(). The _ipfw_insn_pipe isn't touched by this commit to preserve ABI compatibility. o To optimize the lookup of the pipe/flowset in locate_flowset() introduce hashes for pipes and queues: - To preserve ABI compatibility utilize the place of global list pointer for SLIST_ENTRY. - Introduce locate_flowset(queue nr) and locate_pipe(pipe nr). o Rework all the dummynet code to deal with the hashes, not global lists. Also did some style(9) changes in the code blocks that were touched by this sweep: - Be conservative about flowset and pipe variable names on stack, use "fs" and "pipe" everywhere. - Cleanup whitespaces. - Sort variables. - Give variables more meaningful names. - Uppercase and dots in comments. - ENOMEM when malloc(9) failed.
|
152767 |
24-Nov-2005 |
ru |
Fix prototype.
|
152655 |
21-Nov-2005 |
ps |
Fix for a bug that causes SACK scoreboard corruption when the limit on holes per connection is reached.
Reported by: Patrik Roos Submitted by: Mohan Srinivasan Reviewed by: Raja Mukerji, Noritoshi Demizu
|
152612 |
19-Nov-2005 |
andre |
Remove 'ipprintfs' which were protected under DIAGNOSTIC. It doesn't have any know to enable it from userland and could only be enabled by either setting it to 1 at compile time or through the kernel debugger.
In the future it may be brought back as KTR tracing points.
Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
|
152608 |
19-Nov-2005 |
andre |
Move MAX_IPOPTLEN and struct ipoption back into ip_var.h as userland programs depend on it.
Pointed out by: le Sponsored by: TCP/IP Optimization Fundraise 2005
|
152592 |
18-Nov-2005 |
andre |
Consolidate all IP Options handling functions into ip_options.[ch] and include ip_options.h into all files making use of IP Options functions.
From ip_input.c rev 1.306: ip_dooptions(struct mbuf *m, int pass) save_rte(m, option, dst) ip_srcroute(m0) ip_stripoptions(m, mopt)
From ip_output.c rev 1.249: ip_insertoptions(m, opt, phlen) ip_optcopy(ip, jp) ip_pcbopts(struct inpcb *inp, int optname, struct mbuf *m)
No functional changes in this commit.
Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
|
152583 |
18-Nov-2005 |
andre |
Purge layer specific mbuf flags on layer crossings to avoid confusing upper or lower layers.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
152582 |
18-Nov-2005 |
andre |
Rework icmp_error() to deal with truncated IP packets from ip_forward() when doing extended quoting in error messages.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
152581 |
18-Nov-2005 |
andre |
In ip_forward() copy as much into the temporary error mbuf as we have free space in it. Allocate correct mbuf from the beginning. This allows icmp_error() to quote the entire TCP header in error messages.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
152550 |
17-Nov-2005 |
glebius |
MFOpenBSD 1.62:
Prevent backup CARP hosts from replying to arp requests, fixes strangeness with some layer-3 switches. From Bill Marquette.
Tested by: Kazuaki Oda <kaakun highway.ne.jp>
|
152410 |
14-Nov-2005 |
ru |
Unbreak for !INET6 case.
|
152315 |
11-Nov-2005 |
ru |
- Store pointer to the link-level address right in "struct ifnet" rather than in ifindex_table[]; all (except one) accesses are through ifp anyway. IF_LLADDR() works faster, and all (except one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom", and drop the IFP2ENADDR() macro; all users have been converted to use IF_LLADDR() instead.
|
152288 |
10-Nov-2005 |
suz |
fixed a bug that uRPF does not work properly for an IPv6 packet bound for the sending machine itself (this is a bug introduced due to a change in ip6_input.c:Rev.1.83)
Pointed out by: Sean McNeil and J.R.Oldroyd MFC after: 3 days
|
152242 |
09-Nov-2005 |
ru |
Use sparse initializers for "struct domain" and "struct protosw", so they are easier to follow for the human being.
|
152209 |
08-Nov-2005 |
thompsa |
Move the cloned interface list management in to if_clone. For some drivers the softc lists and associated mutex are now unused so these have been removed.
Calling if_clone_detach() will now destroy all the cloned interfaces for the driver and in most cases is all thats needed to unload.
Idea by: brooks Reviewed by: brooks
|
152188 |
08-Nov-2005 |
glebius |
Rework ARP retransmission algorythm so that ARP requests are retransmitted without suppression, while there is demand for such ARP entry. As before, retransmission is rate limited to one packet per second. Details: - Remove net.link.ether.inet.host_down_time - Do not set/clear RTF_REJECT flag on route, to avoid rt_check() returning error. We will generate error ourselves. - Return EWOULDBLOCK on first arp_maxtries failed requests , and return EHOSTDOWN/EHOSTUNREACH on further requests. - Retransmit ARP request always, independently from return code. Ratelimit to 1 pps.
|
151967 |
02-Nov-2005 |
andre |
Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag.
Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
151897 |
31-Oct-2005 |
rwatson |
Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
|
151888 |
30-Oct-2005 |
rwatson |
Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol.
Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.
|
151824 |
28-Oct-2005 |
glebius |
First fill in structure with valid values, and only then attach it to the global list.
Reviewed by: rwatson
|
151688 |
26-Oct-2005 |
yar |
Since carp(4) interfaces presently are kinda fake yet possess IP addresses, mark them with LOOPBACK so that routing daemons take them easy for link-state routing protocols.
Reviewed by: glebius
|
151556 |
22-Oct-2005 |
mlaier |
Fix build after in6_joingroup change. It remains unclear if DAD breaks CARP or not.
|
151555 |
22-Oct-2005 |
glebius |
In in_addprefix() compare not only route addresses, but their masks, too. This fixes problem when connected prefixes overlap.
Obtained from: OpenBSD (rev. 1.40 by claudio); [ I came to this fix myself, and then found out that OpenBSD had already fixed it the same way.]
|
151539 |
21-Oct-2005 |
suz |
sync with KAME regarding NDP
- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners - supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt> - better prefix lifetime management - more spec-comformant DAD advertisement - updated RFC/internet-draft revisions
Obtained from: KAME Reviewed by: ume, gnn MFC after: 2 month
|
151464 |
19-Oct-2005 |
rwatson |
Convert if (tp->t_state == TCPS_LISTEN) panic() into a KASSERT.
MFC after: 2 weeks
|
151266 |
12-Oct-2005 |
thompsa |
Change the reference counting to count the number of cloned interfaces for each cloner. This ensures that ifc->ifc_units is not prematurely freed in if_clone_detach() before the clones are destroyed, resulting in memory modified after free. This could be triggered with if_vlan.
Assert that all cloners have been destroyed when freeing the memory.
Change all simple cloners to destroy their clones with ifc_simple_destroy() on module unload so the reference count is properly updated. This also cleans up the interface destroy routines and allows future optimisation.
Discussed with: brooks, pjd, -current Reviewed by: brooks
|
151263 |
12-Oct-2005 |
maxim |
o INP_ONESBCAST is inpcb.inp_vflag flag not inp_flags. The confusion with IP_PORTRANGE_HIGH leads to the incorrect checksum calculation.
PR: kern/87306 Submitted by: Rickard Lind Reviewed by: bms MFC after: 2 weeks
|
151254 |
12-Oct-2005 |
philip |
Unbreak the net.inet6.tcp6.getcred sysctl.
This makes inetd/auth work again in IPv6 setups.
Pointy hat to: ume/KAME
|
150942 |
04-Oct-2005 |
thompsa |
When bridging is enabled and an ARP request is recieved on a member interface, the arp code will search all local interfaces for a match. This triggers a kernel log if the bridge has been assigned an address.
arp: ac:de:48:18:83:3d is using my IP address 192.168.0.142!
bridge0: flags=8041<UP,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.142 netmask 0xffffff00 ether ac:de:48:18:83:3d
Silence this warning for 6.0 to stop unnecessary bug reports, the code will need to be reworked.
Approved by: mlaier (mentor) MFC after: 3 days
|
150941 |
04-Oct-2005 |
andre |
Correct brainfart in SO_BINTIME test.
Pointed out by: nate Pointy hat to: andre
|
150940 |
04-Oct-2005 |
andre |
Make SO_BINTIME timestamps available on raw_ip sockets.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
150853 |
03-Oct-2005 |
rwatson |
Unlock Giant symmetrically with respect to lock acquire order as that's generally nicer.
Spotted by: johan MFC after: 1 week
|
150852 |
03-Oct-2005 |
rwatson |
Acquire Giant conditionally in in_addmulti() and in_delmulti() based on whether the interface being accessed is IFF_NEEDSGIANT or not. This avoids lock order reversals when calling into the interface ioctl handler, which could potentially lead to deadlock.
The long term solution is to eliminate non-MPSAFE network drivers.
Discussed with: jhb MFC after: 1 week
|
150804 |
02-Oct-2005 |
maxim |
o Teach sysctl_drop() how to deal with the sockets in TIME_WAIT state. This is a special case because tcp_twstart() destroys a tcp control block via tcp_discardcb() so we cannot call tcp_drop(struct *tcpcb) on such connections. Use tcp_twclose() instead.
MFC after: 5 days
|
150636 |
27-Sep-2005 |
mlaier |
Remove bridge(4) from the tree. if_bridge(4) is a full functional replacement and has additional features which make it superior.
Discussed on: -arch Reviewed by: thompsa X-MFC-after: never (RELENG_6 as transition period)
|
150594 |
26-Sep-2005 |
andre |
Implement IP_DONTFRAG IP socket option enabling the Don't Fragment flag on IP packets. Currently this option is only repected on udp and raw ip sockets. On tcp sockets the DF flag is controlled by the path MTU discovery option.
Sending a packet larger than the MTU size of the egress interface returns an EMSGSIZE error.
Discussed with: rwatson Sponsored by: TCP/IP Optimization Fundraise 2005
|
150351 |
19-Sep-2005 |
andre |
Use monotonic 'time_uptime' instead of 'time_second' as timebase for rt->rt_rmx.rmx_expire.
|
150350 |
19-Sep-2005 |
andre |
Use monotonic 'time_uptime' instead of 'time_second' as timebase for timeouts.
|
150296 |
18-Sep-2005 |
rwatson |
Take a first cut at cleaning up ifnet removal and multicast socket panics, which occur when stale ifnet pointers are left in struct moptions hung off of inpcbs:
- Add in_ifdetach(), which matches in6_ifdetach(), and allows the protocol to perform early tear-down on the interface early in if_detach().
- Annotate that if_detach() needs careful consideration.
- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR -- this is not the place to detect interface removal! This also removes what is basically a nasty (and now unnecessary) hack.
- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP IPv4 sockets.
It is now possible to run the msocket_ifnet_remove regression test using HEAD without panicking.
MFC after: 3 days
|
150131 |
14-Sep-2005 |
andre |
Do not ignore all other TCP options (eg. timestamp, window scaling) when responding to TCP SYN packets with TCP_MD5 enabled and set.
PR: kern/82963 Submitted by: <demizu at dd.iij4u.or.jp> MFC after: 3 days
|
150122 |
14-Sep-2005 |
bz |
Fix panic when kernel compiled without INET6 by rejecting IPv6 opcodes which are behind #if(n)def INET6 now.
PR: kern/85826 MFC after: 3 days
|
149929 |
10-Sep-2005 |
andre |
In tcp_ctlinput() do not swap ip->ip_len a second time. It has been done in icmp_input() already.
This fixes the ICMP_UNREACH_NEEDFRAG case where no MTU was proposed in the ICMP reply.
PR: kern/81813 Submitted by: Vitezslav Novy <vita at fio.cz> MFC after: 3 days
|
149909 |
09-Sep-2005 |
glebius |
- Do not hold route entry lock, when calling arprequest(). One such call was introduced by me in 1.139, the other one was present before. - Do all manipulations with rtentry and la before dropping the lock. - Copy interface address from route into local variable before dropping the lock. Supply this copy as argument to arprequest()
LORs fixed: http://sources.zabbadoz.net/freebsd/lor/003.html http://sources.zabbadoz.net/freebsd/lor/037.html http://sources.zabbadoz.net/freebsd/lor/061.html http://sources.zabbadoz.net/freebsd/lor/062.html http://sources.zabbadoz.net/freebsd/lor/064.html http://sources.zabbadoz.net/freebsd/lor/068.html http://sources.zabbadoz.net/freebsd/lor/071.html http://sources.zabbadoz.net/freebsd/lor/074.html http://sources.zabbadoz.net/freebsd/lor/077.html http://sources.zabbadoz.net/freebsd/lor/093.html http://sources.zabbadoz.net/freebsd/lor/135.html http://sources.zabbadoz.net/freebsd/lor/140.html http://sources.zabbadoz.net/freebsd/lor/142.html http://sources.zabbadoz.net/freebsd/lor/145.html http://sources.zabbadoz.net/freebsd/lor/152.html http://sources.zabbadoz.net/freebsd/lor/158.html
|
149907 |
09-Sep-2005 |
glebius |
When a carp(4) interface is being destroyed and is in a promiscous mode, first interface is detached from parent and then bpfdetach() is called. If the interface was the last carp(4) interface attached to parent, then the mutex on parent is destroyed. When bpfdetach() calls if_setflags() we panic on destroyed mutex.
To prevent the above scenario, clear pointer to parent, when we detach ourselves from parent.
|
149783 |
04-Sep-2005 |
sam |
clear lock on error in O_LIMIT case of install_state
Submitted by: Ted Unangst MFC after: 3 days
|
149635 |
30-Aug-2005 |
andre |
Use the correct mbuf type for MGET().
|
149506 |
26-Aug-2005 |
glebius |
Add newline to debuging printf.
PR: kern/85271 Submitted by: Simon Morgan
|
149455 |
25-Aug-2005 |
glebius |
- Refuse hashsize of 0, since it is invalid. - Use defined constant instead of 512.
|
149451 |
25-Aug-2005 |
glebius |
When we have a published ARP entry for some IP address, do reply on ARP requests only on the network where this IP address belong, to.
Before this change we did replied on all interfaces. This could lead to an IP address conflict with host we are doing ARP proxy for.
PR: kern/75634 Reviewed by: andre
|
149404 |
24-Aug-2005 |
ps |
Remove a KASSERT in the sack path that fails because of a interaction between sack and a bug in the "bad retransmit recovery" logic. This is a workaround, the underlying bug will be fixed later.
Submitted by: Mohan Srinivasan, Noritoshi Demizu
|
149403 |
24-Aug-2005 |
ps |
Fix up the comment for MAX_SACK_BLKS.
Submitted by: Noritoshi Demizu
|
149391 |
23-Aug-2005 |
andre |
Remove unnecessary IPSEC includes.
MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
|
149378 |
22-Aug-2005 |
andre |
o Fix a logic error when not doing mbuf cluster allocation. o Change an old panic() to a clean function exit.
MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
|
149371 |
22-Aug-2005 |
andre |
Add socketoption IP_MINTTL. May be used to set the minimum acceptable TTL a packet must have when received on a socket. All packets with a lower TTL are silently dropped. Works on already connected/connecting and listening sockets for RAW/UDP/TCP.
This option is only really useful when set to 255 preventing packets from outside the directly connected networks reaching local listeners on sockets.
Allows userland implementation of 'The Generalized TTL Security Mechanism (GTSM)' according to RFC3682. Examples of such use include the Cisco IOS BGP implementation command "neighbor ttl-security".
MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
|
149370 |
22-Aug-2005 |
andre |
Always quote the entire TCP header when responding and allocate an mbuf cluster if needed.
Fixes the TCP issues raised in I-D draft-gont-icmp-payload-00.txt.
This aids in-the-wild debugging a lot and allows the receiver to do more elaborate checks on the validity of the response.
MFC after: 2 weeks Sponsored by: TCP/IP Optimization Fundraise 2005
|
149369 |
22-Aug-2005 |
andre |
Handle pure layer 2 broad- and multicasts properly and simplify related checks.
PR: kern/85052 Submitted by: Dmitrij Tejblum <tejblum at yandex-team.ru> MFC after: 3 days
|
149350 |
21-Aug-2005 |
andre |
Commit correct version of the change and note the name of the new sysctl: net.inet.icmp.quotelen and defaults to 8 bytes.
Pointy hat to: andre
|
149349 |
21-Aug-2005 |
andre |
Add a sysctl to change to length of the quotation of the original packet in an ICMP reply. The minimum of 8 bytes is internally enforced. The maximum quotation is the remaining space in the reply mbuf.
This option is added in response to the issues raised in I-D draft-gont-icmp-payload-00.txt.
MFC after: 2 weeks Spnsored by: TCP/IP Optimizations Fundraise 2005
|
149347 |
21-Aug-2005 |
andre |
Add an option to have ICMP replies to non-local packets generated with the IP address the packet came through in. This is useful for routers to show in traceroutes the actual path a packet has taken instead of the possibly different return path.
The new sysctl is named net.inet.icmp.reply_from_interface and defaults to off.
MFC after: 2 weeks
|
149221 |
18-Aug-2005 |
glebius |
In order to support CARP interfaces kernel was taught to handle more than one interface in one subnet. However, some userland apps rely on the believe that this configuration is impossible.
Add a sysctl switch net.inet.ip.same_prefix_carp_only. If the switch is on, then kernel will refuse to add an additional interface to already connected subnet unless the interface is CARP. Default value is off.
PR: bin/82306 In collaboration with: mlaier
|
149052 |
14-Aug-2005 |
bz |
Fix broken build of rev. 1.108 in case of no INET6 and IPFIREWALL compiled into kernel.
Spotted and tested by: Michal Mertl <mime at traveller.cz>
|
149020 |
13-Aug-2005 |
bz |
* Add dynamic sysctl for net.inet6.ip6.fw. * Correct handling of IPv6 Extension Headers. * Add unreach6 code. * Add logging for IPv6.
Submitted by: sysctl handling derived from patch from ume needed for ip6fw Obtained from: is_icmp6_query and send_reject6 derived from similar functions of netinet6,ip6fw Reviewed by: ume, gnn; silence on ipfw@ Test setup provided by: CK Software GmbH MFC after: 6 days
|
148980 |
12-Aug-2005 |
rodrigc |
Add NATM_LOCK() and NATM_UNLOCK() in places where npcb_add() and npcb_free() are called, in order to eliminate witness panics. This was overlooked in removal of GIANT from ATM.
Reviewed by: rwatson
|
148955 |
11-Aug-2005 |
glebius |
o Fix a race between three threads: output path, incoming ARP packet and route request adding/removing ARP entries. The root of the problem is that struct llinfo_arp was accessed without any locks. To close race we will use locking provided by rtentry, that references this llinfo_arp: - Make arplookup() return a locked rtentry. - In arpresolve() hold the lock provided by rt_check()/arplookup() until the end of function, covering all accesses to the rtentry itself and llinfo_arp it refers to. - In in_arpinput() do not drop lock provided by arplookup() during first part of the function. - Simplify logic in the first part of in_arpinput(), removing one level of indentation. - In the second part of in_arpinput() hold rtentry lock while copying address.
o Fix a condition when route entry is destroyed, while another thread is contested on its lock: - When storing a pointer to rtentry in llinfo_arp list, always add a reference to this rtentry, to prevent rtentry being destroyed via RTM_DELETE request. - Remove this reference when removing entry from llinfo_arp list.
o Further cleanup of arptimer(): - Inline arptfree() into arptimer(). - Use official queue(3) way to pass LIST. - Hold rtentry lock while reading its structure. - Do not check that sdl_family is AF_LINK, but assert this.
Reviewed by: sam Stress test: http://www.holm.cc/stress/log/cons141.html Stress test: http://people.freebsd.org/~pho/stress/log/cons144.html
|
148920 |
10-Aug-2005 |
obrien |
Remove public declarations of variables that were forgotten when they were made static.
|
148918 |
10-Aug-2005 |
obrien |
Match IPv6 and use a static struct pr_usrreqs nousrreqs.
|
148903 |
09-Aug-2005 |
rwatson |
Add helper function ip_findmoptions(), which accepts an inpcb, and attempts to atomically return either an existing set of IP multicast options for the PCB, or a newlly allocated set with default values. The inpcb is returned locked. This function may sleep.
Call ip_moptions() to acquire a reference to a PCB's socket options, and perform the update of the options while holding the PCB lock. Release the lock before returning.
Remove garbage collection of multicast options when values return to the default, as this complicates locking substantially. Most applications allocate a socket either to be multicast, or not, and don't tend to keep around sockets that have previously been used for multicast, then used for unicast.
This closes a number of race conditions involving multiple threads or processes modifying the IP multicast state of a socket simultaenously.
MFC after: 7 days
|
148887 |
09-Aug-2005 |
rwatson |
Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so.
Reviewed by: pjd, bz MFC after: 7 days
|
148883 |
09-Aug-2005 |
glebius |
In preparation for fixing races in ARP (and probably in other L2/L3 mappings) make rt_check() return a locked rtentry.
|
148682 |
03-Aug-2005 |
rwatson |
Introduce in_multi_mtx, which will protect IPv4-layer multicast address lists, as well as accessor macros. For now, this is a recursive mutex due code sequences where IPv4 multicast calls into IGMP calls into ip_output(), which then tests for a multicast forwarding case.
For support macros in in_var.h to check multicast address lists, assert that in_multi_mtx is held.
Acquire in_multi_mtx around iteration over the IPv4 multicast address lists, such as in ip_input() and ip_output().
Acquire in_multi_mtx when manipulating the IPv4 layer multicast addresses, as well as over the manipulation of ifnet multicast address lists in order to keep the two layers in sync.
Lock down accesses to IPv4 multicast addresses in IGMP, or assert the lock when performing IGMP join/leave events.
Eliminate spl's associated with IPv4 multicast addresses, portions of IGMP that weren't previously expunged by IGMP locking.
Add in_multi_mtx, igmp_mtx, and if_addr_mtx lock order to hard-coded lock order in WITNESS, in that order.
Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 10 days
|
148653 |
02-Aug-2005 |
rwatson |
Modify network protocol consumers of the ifnet multicast address lists to lock if_addr_mtx.
Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca> MFC after: 1 week
|
148616 |
01-Aug-2005 |
ume |
recover the line which was wrongly disappeared during scope cleanup. tcpdrop(8) should work for IPv6, again.
|
148613 |
01-Aug-2005 |
bz |
Add support for IPv6 over GRE [1]. PR kern/80340 includes the FreeBSD specific ip_newid() changes NetBSD does not have. Correct handling of non AF_INET packets passed to bpf [2].
PR: kern/80340[1], NetBSD PRs 29150[1], 30844[2] Obtained from: NetBSD ip_gre.c rev. 1.34,1.35, if_gre.c rev. 1.56 Submitted by: Gert Doering <gert at greenie.muc.de>[2] MFC after: 4 days
|
148414 |
26-Jul-2005 |
ume |
include scope6_var.h for in6_clearscope().
|
148387 |
25-Jul-2005 |
ume |
include netinet6/scope6_var.h.
|
148385 |
25-Jul-2005 |
ume |
scope cleanup. with this change - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current *BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the *BSD kernel cannot reject this attempt.
Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp> Obtained from: KAME
|
148324 |
23-Jul-2005 |
keramida |
Misc spelling and/or English fixes in comments.
Reviewed by: glebius, andre
|
148176 |
20-Jul-2005 |
ume |
move RFC3542 related definitions into ip6.h.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Reviewed by: mlaier Obtained from: KAME
|
148171 |
20-Jul-2005 |
ume |
add missing RFC3542 definition.
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
|
148169 |
20-Jul-2005 |
ume |
update comments: - RFC2292bis -> RFC3542 - typo fixes
Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME
|
148157 |
19-Jul-2005 |
rwatson |
Remove no-op spl references in in_pcb.c, since in_pcb locking has been basically complete for several years now. Update one spl comment to reference the locking strategy.
MFC after: 3 days
|
148156 |
19-Jul-2005 |
rwatson |
Remove no-op spl's and most comment references to spls, as TCP locking is believed to be basically done (modulo any remaining bugs).
MFC after: 3 days
|
148155 |
19-Jul-2005 |
rwatson |
Remove spl() calls from ip_slowtimo(), as IP fragment queue locking was merged several years ago.
Submitted by: gnn MFC after: 1 day
|
148015 |
14-Jul-2005 |
mlaier |
Export pfsyncstats via sysctl "net.inet.pfsync" in order to print them with netstat (seperate commit).
Requested by: glebius MFC after: 1 week
|
147785 |
05-Jul-2005 |
rwatson |
Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which is redundant with respect to existing mbuf copy label routines. Expose a new mac_copy_mbuf() routine at the top end of the Framework and use that; use the existing mpo_copy_mbuf_label() routine on the bottom end.
Obtained from: TrustedBSD Project Sponsored by: SPARTA, SPAWAR Approved by: re (scottl)
|
147781 |
05-Jul-2005 |
ps |
Fix for a bug in newreno partial ack handling where if a large amount of data is partial acked, snd_cwnd underflows, causing a burst.
Found, Submitted by: Noritoshi Demizu Approved by: re
|
147758 |
03-Jul-2005 |
mlaier |
Remove ambiguity from hlen. IPv4 is now indicated by is_ipv4 and we need a proper hlen value for IPv6 to implement O_REJECT and O_LOG.
Reviewed by: glebius, brooks, gnn Approved by: re (scottl)
|
147744 |
02-Jul-2005 |
thompsa |
Check the alignment of the IP header before passing the packet up to the packet filter. This would cause a panic on architectures that require strict alignment such as sparc64 (tier1) and ia64/ppc (tier2).
This adds two new macros that check the alignment, these are compile time dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where alignment isn't need so the cost is avoided.
IP_HDR_ALIGNED_P() IP6_HDR_ALIGNED_P()
Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment is checked for ipfw and dummynet too.
PR: ia64/81284 Obtained from: NetBSD Approved by: re (dwhite), mlaier (mentor)
|
147735 |
01-Jul-2005 |
ps |
Fix for a bug in the change that defers sack option processing until after PAWS checks. The symptom of this is an inconsistency in the cached sack state, caused by the fact that the sack scoreboard was not being updated for an ACK handled in the header prediction path.
Found by: Andrey Chernov. Submitted by: Noritoshi Demizu, Raja Mukerji. Approved by: re
|
147734 |
01-Jul-2005 |
ps |
Fix for a SACK crash caused by a bug in tcp_reass(). tcp_reass() does not clear tlen and frees the mbuf (leaving th pointing at freed memory), if the data segment is a complete duplicate. This change works around that bug. A fix for the tcp_reass() bug will appear later (that bug is benign for now, as neither th nor tlen is referenced in tcp_input() after the call to tcp_reass()).
Found by: Pawel Jakub Dawidek. Submitted by: Raja Mukerji, Noritoshi Demizu. Approved by: re
|
147718 |
01-Jul-2005 |
glebius |
When doing ARP load balancing source IP is taken in network byte order, so residue of division for all hosts on net is the same, and thus only one VHID answers. Change source IP in host byte order.
Reviewed by: mlaier Approved by: re (scottl)
|
147666 |
29-Jun-2005 |
simon |
Fix ipfw packet matching errors with address tables.
The ipfw tables lookup code caches the result of the last query. The kernel may process multiple packets concurrently, performing several concurrent table lookups. Due to an insufficient locking, a cached result can become corrupted that could cause some addresses to be incorrectly matched against a lookup table.
Submitted by: ru Reviewed by: csjp, mlaier Security: CAN-2005-2019 Security: FreeBSD-SA-05:13.ipfw
Correct bzip2 permission race condition vulnerability.
Obtained from: Steve Grubb via RedHat Security: CAN-2005-0953 Security: FreeBSD-SA-05:14.bzip2 Approved by: obrien
Correct TCP connection stall denial of service vulnerability.
A TCP packets with the SYN flag set is accepted for established connections, allowing an attacker to overwrite certain TCP options.
Submitted by: Noritoshi Demizu Reviewed by: andre, Mohan Srinivasan Security: CAN-2005-2068 Security: FreeBSD-SA-05:15.tcp
Approved by: re (security blanket), cperciva
|
147637 |
27-Jun-2005 |
ps |
- Postpone SACK option processing until after PAWS checks. SACK option processing is now done in the ACK processing case. - Merge tcp_sack_option() and tcp_del_sackholes() into a new function called tcp_sack_doack(). - Test (SEG.ACK < SND.MAX) before processing the ACK.
Submitted by: Noritoshi Demizu Reveiewed by: Mohan Srinivasan, Raja Mukerji Approved by: re
|
147636 |
27-Jun-2005 |
phk |
Libalias incorrectly applies proxy rules to the global divert socket: it should only look for existing translation entries, not create new ones (no matter how it got the idea).
Approved by: re(scottl)
|
147623 |
27-Jun-2005 |
glebius |
Disable checksum processing in LibAlias, when it works as a kernel module. LibAlias is not aware about checksum offloading, so the caller should provide checksum calculation. (The only current consumer is ng_nat(4)). When TCP packet internals has been changed and it requires checksum recalculation, a cookie is set in th_x2 field of TCP packet, to inform caller that it needs to recalculate checksum. This ugly hack would be removed when LibAlias is made more kernel friendly.
Incremental checksum updates are left as is, since they don't conflict with offloading.
Approved by: re (scottl)
|
147611 |
26-Jun-2005 |
dwmalone |
Fix some long standing bugs in writing to the BPF device attached to a DLT_NULL interface. In particular:
1) Consistently use type u_int32_t for the header of a DLT_NULL device - it continues to represent the address family as always. 2) In the DLT_NULL case get bpf_movein to store the u_int32_t in a sockaddr rather than in the mbuf, to be consistent with all the DLT types. 3) Consequently fix a bug in bpf_movein/bpfwrite which only permitted packets up to 4 bytes less than the MTU to be written. 4) Fix all DLT_NULL devices to have the code required to allow writing to their bpf devices. 5) Move the code to allow writing to if_lo from if_simloop to looutput, because it only applies to DLT_NULL devices but was being applied to other devices that use if_simloop possibly incorrectly.
PR: 82157 Submitted by: Matthew Luckie <mjl@luckie.org.nz> Approved by: re (scottl)
|
147605 |
25-Jun-2005 |
ups |
Fix a timer ticks wrap around bug for minmssoverload processing.
Approved by: re (scottl,dwhite) MFC after: 4 weeks
|
147549 |
23-Jun-2005 |
imp |
Add back missing copyright and license statement. This is identical to the statement in ip_mroute.h, as well as being the same as what OpenBSD has done with this file. It matches the copyright in NetBSD's 1.1 through 1.14 versions of the file as well, which they subsequently added back.
It appears to have been lost in the 4.4-lite1 import for FreeBSD 2.0, but where and why I've not investigated further. OpenBSD had the same problem. NetBSD had a copyright notice until Multicast 3.5 was integrated verbatim back in 1995. This appears to be the version that made it into 4.4-lite1.
Approved by: re (scottl) MFC after: 3 days
|
147535 |
23-Jun-2005 |
ps |
Fix for a bug in tcp_sack_option() causing crashes.
Submitted by: Noritoshi Demizu, Mohan Srinivasan. Approved by: re (scottl blanket SACK)
|
147503 |
20-Jun-2005 |
bz |
Fix IP(v6) over IP tunneling most likely broken with ifnet changes.
Reviewed by: gnn Approved by: re (dwhite), rwatson (mentor)
|
147501 |
20-Jun-2005 |
glebius |
- Don't use legacy function in a non-legacy one. This gives us possibility to compile libalias without legacy support. - Use correct way to mark variable as unused.
Approved by: re (dwhite)
|
147418 |
16-Jun-2005 |
mlaier |
In verify_rev_path6(): - do not use static memory as we are under a shared lock only - properly rtfree routes allocated with rtalloc - rename to verify_path6() - implement the full functionality of the IPv4 version
Also make O_ANTISPOOF work with IPv6.
Reviewed by: gnn Approved by: re (blanket)
|
147415 |
16-Jun-2005 |
mlaier |
Fix indentation in INET6 section in preperation of more serious work.
Approved by: re (blanket ip6fw removal)
|
147319 |
12-Jun-2005 |
mlaier |
When doing matching based on dst_ip/src_ip make sure we are really looking on an IPv4 packet as these variables are uninitialized if not. This used to allow arbitrary IPv6 packets depending on the value in the uninitialized variables.
Some opcodes (most noteably O_REJECT) do not support IPv6 at all right now.
Reviewed by: brooks, glebius Security: IPFW might pass IPv6 packets depending on stack contents. Approved by: re (blanket)
|
147256 |
10-Jun-2005 |
brooks |
Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go.
Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
|
147247 |
10-Jun-2005 |
green |
Modify send_pkt() to return the generated packet and have the caller do the subsequent ip_output() in IPFW. In ipfw_tick(), the keep-alive packets must be generated from the data that resides under the stateful lock, but they must not be sent at that time, as this would cause a lock order reversal with the normal ordering (interface's lock, then locks belonging to the pfil hooks).
In practice, this caused deadlocks when using IPFW and if_bridge(4) together to do stateful transparent filtering.
MFC after: 1 week
|
147205 |
10-Jun-2005 |
thompsa |
Add dummynet(4) support to if_bridge, this code is largely based on bridge.c.
This is the final piece to match bridge.c in functionality, we can now be a drop-in replacement.
Approved by: mlaier (mentor)
|
147180 |
09-Jun-2005 |
ps |
Fix a mis-merge. Remove a redundant call to tcp_sackhole_insert
Submitted by: Mohan Srinivasan
|
147169 |
09-Jun-2005 |
ps |
Fix for a crash in tcp_sack_option() caused by hitting the limit on the number of sack holes.
Reported by: Andrey Chernov Submitted by: Noritoshi Demizu Reviewed by: Raja Mukerji
|
147061 |
06-Jun-2005 |
ps |
Fix for a bug in the change that walks the scoreboard backwards from the tail (in tcp_sack_option()). The bug was caused by incorrect accounting of the retransmitted bytes in the sackhint.
Reported by: Kris Kennaway. Submitted by: Noritoshi Demizu.
|
146986 |
05-Jun-2005 |
thompsa |
Add hooks into the networking layer to support if_bridge. This changes struct ifnet so a buildworld is necessary.
Approved by: mlaier (mentor) Obtained from: NetBSD
|
146962 |
04-Jun-2005 |
green |
Better explain, then actually implement the IPFW ALTQ-rule first-match policy. It may be used to provide more detailed classification of traffic without actually having to decide its fate at the time of classification.
MFC after: 1 week
|
146953 |
04-Jun-2005 |
ps |
Changes to tcp_sack_option() that - Walks the scoreboard backwards from the tail to reduce the number of comparisons for each sack option received. - Introduce functions to add/remove sack scoreboard elements, making the code more readable.
Submitted by: Noritoshi Demizu Reviewed by: Raja Mukerji, Mohan Srinivasan
|
146894 |
03-Jun-2005 |
mlaier |
Add support for IPv4 only rules to IPFW2 now that it supports IPv6 as well. This is the last requirement before we can retire ip6fw.
Reviewed by: dwhite, brooks(earlier version) Submitted by: dwhite (manpage) Silence from: -ipfw
|
146883 |
02-Jun-2005 |
iedowse |
Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface if_ioctl routine. This should fix a number of code paths through soo_ioctl() that could call into Giant-locked network drivers without first acquiring Giant.
|
146866 |
01-Jun-2005 |
rwatson |
When aborting tcp_attach() due to a problem allocating or attaching the tcpcb, lock the inpcb before calling in_pcbdetach() or in6_pcbdetach(), as they expect the inpcb to be passed locked.
MFC after: 7 days
|
146865 |
01-Jun-2005 |
rwatson |
Assert tcbinfo lock, inpcb lock in tcp_disconnect(). Assert tcbinfo lock, inpcb lock in in tcp_usrclosed().
MFC after: 7 days
|
146864 |
01-Jun-2005 |
rwatson |
Assert tcbinfo lock in tcp_drop() due to its call of tcp_close() Assert tcbinfo lock in tcp_close() due to its call to in{,6}_detach() Assert tcbinfo lock in tcp_drop_syn_sent() due to its call to tcp_drop()
MFC after: 7 days
|
146863 |
01-Jun-2005 |
rwatson |
Assert that tcbinfo is locked in tcp_input() before calling into tcp_drop().
MFC after: 7 days
|
146862 |
01-Jun-2005 |
rwatson |
Assert the tcbinfo lock whenever tcp_close() is to be called by tcp_input().
MFC after: 7 days
|
146861 |
01-Jun-2005 |
rwatson |
Assert tcbinfo lock in tcp_attach(), as it is required; the caller (tcp_usr_attach()) currently grabs it.
MFC after: 7 days
|
146860 |
01-Jun-2005 |
rwatson |
Commit correct version of previous commit (in_pcb.c:1.164). Use the local variables as currently named.
MFC after: 7 days
|
146859 |
01-Jun-2005 |
rwatson |
Assert pcbinfo lock in in_pcbdisconnect() and in_pcbdetach(), as the global pcb lists are modified.
MFC after: 7 days
|
146858 |
01-Jun-2005 |
rwatson |
Slight white space tweak.
MFC after: 7 days
|
146854 |
01-Jun-2005 |
rwatson |
De-spl UDP.
MFC after: 3 days
|
146704 |
28-May-2005 |
tanimura |
Let OSPFv3 go through ipfw. Some more additional checks would be desirable, though.
|
146630 |
25-May-2005 |
ps |
This is conform with the terminology in
M.Mathis and J.Mahdavi, "Forward Acknowledgement: Refining TCP Congestion Control" SIGCOMM'96, August 1996.
Submitted by: Noritoshi Demizu, Raja Mukerji
|
146552 |
23-May-2005 |
ps |
Rewrite of tcp_sack_option(). Kentaro Kurahone (NetBSD) pointed out that if we sort the incoming SACK blocks, we can update the scoreboard in one pass of the scoreboard. The added overhead of sorting upto 4 sack blocks is much lower than traversing (potentially) large scoreboards multiple times. The code was updating the scoreboard with multiple passes over it (once for each sack option). The rewrite fixes that, reducing the complexity of the main loop from O(n^2) to O(n).
Submitted by: Mohan Srinivasan, Noritoshi Demizu. Reviewed by: Raja Mukerji.
|
146463 |
21-May-2005 |
ps |
Replace t_force with a t_flag (TF_FORCEDATA).
Submitted by: Raja Mukerji. Reviewed by: Mohan, Silby, Andre Opperman.
|
146304 |
16-May-2005 |
ps |
Introduce routines to alloc/free sack holes. This cleans up the code considerably.
Submitted by: Noritoshi Demizu. Reviewed by: Raja Mukerji, Mohan Srinivasan.
|
146226 |
15-May-2005 |
glebius |
- When carp interface is destroyed, and it affects global preemption suppresion counter, decrease the latter. [1] - Add sysctl to monitor preemption suppression.
PR: kern/80972 [1] Submitted by: Frank Volf [1] MFC after: 1 week
|
146193 |
13-May-2005 |
ps |
Fix for a bug where the "nexthole" sack hint is out of sync with the real next hole to retransmit from the scoreboard, caused by a bug which did not update the "nexthole" hint in one case in tcp_sack_option().
Reported by: Daniel Eriksson Submitted by: Mohan Srinivasan
|
146182 |
13-May-2005 |
glebius |
In div_output() explicitly set m->m_nextpkt to NULL. If divert socket is not userland, but ng_ksocket, then m->m_nextpkt may be non-NULL. In this case we would panic in sbappend.
|
146123 |
11-May-2005 |
ps |
When looking for the next hole to retransmit from the scoreboard, or to compute the total retransmitted bytes in this sack recovery episode, the scoreboard is traversed. While in sack recovery, this traversal occurs on every call to tcp_output(), every dupack and every partial ack. The scoreboard could potentially get quite large, making this traversal expensive.
This change optimizes this by storing hints (for the next hole to retransmit and the total retransmitted bytes in this sack recovery episode) reducing the complexity to find these values from O(n) to constant time.
The debug code that sanity checks the hints against the computed value will be removed eventually.
Submitted by: Mohan Srinivasan, Noritoshi Demizu, Raja Mukerji.
|
145978 |
07-May-2005 |
cperciva |
Fix two issues which were missed in FreeBSD-SA-05:08.kmem.
Reported by: Uwe Doering
|
145963 |
06-May-2005 |
glebius |
Add a workaround for 64-bit archs: store unsigned long return value in temporary variable, check it and then cast to in_addr_t.
|
145961 |
06-May-2005 |
glebius |
s/DEBUG/LIBALIAS_DEBUG/, since DEBUG is defined in LINT and not supported for kernel build.
|
145953 |
06-May-2005 |
cperciva |
If we are going to 1. Copy a NULL-terminated string into a fixed-length buffer, and 2. copyout that buffer to userland, we really ought to 0. Zero the entire buffer first.
Security: FreeBSD-SA-05:08.kmem
|
145933 |
05-May-2005 |
glebius |
More bits for kernel version: - copy inet_aton() from libc - disable getservbyname() lookup and accept only numeric port
|
145932 |
05-May-2005 |
glebius |
Always include alias.h before alias_local.h
|
145931 |
05-May-2005 |
glebius |
When used in kernel define NO_FW_PUNCH, NO_LOGGING, NO_USE_SOCKETS.
|
145930 |
05-May-2005 |
glebius |
Fix argument order for bcopy() in last commit.
Noticed by: njl Pointy hat to: glebius
|
145929 |
05-May-2005 |
glebius |
Use bcopy() instead of memmove().
|
145928 |
05-May-2005 |
glebius |
Hide fflush(3) under ifdef DEBUG.
|
145927 |
05-May-2005 |
glebius |
Things required to build libalias as kernel module: - kernel module declarations and handler. - macros to map malloc(3) calls to malloc(9) ones. - malloc(9) declarations. - call finishoff() from module handler MOD_UNLOAD case instead of atexit(3). - use panic(9) instead of abort(3) - take time from time_second instead of gettimeofday(2) - define INADDR_NONE
|
145926 |
05-May-2005 |
glebius |
Add NO_USE_SOCKETS knob, which cuts off functionality socket binding.
|
145925 |
05-May-2005 |
glebius |
Add NO_LOGGING knob, which cuts off functionality of debug logging to a file.
|
145921 |
05-May-2005 |
glebius |
Play with includes so that libalias can be compiled both as userland library and kernel module.
|
145869 |
04-May-2005 |
andre |
If we don't get a suggested MTU during path MTU discovery look up the packet size of the packet that generated the response, step down the MTU by one step through ip_next_mtu() and try again.
Suggested by: dwmalone
|
145868 |
04-May-2005 |
glebius |
Cleanup IPFW2 ifdefs.
|
145867 |
04-May-2005 |
glebius |
Makefile is not needed here.
|
145866 |
04-May-2005 |
andre |
Add another step of 1280 (gif(4) tunnels) to ip_next_mtu().
|
145864 |
04-May-2005 |
glebius |
IPFW version 2 is the only option in HEAD and RELENG_5. Thus, cleanup unnecessary now ifdefs.
|
145863 |
04-May-2005 |
andre |
Pass icmp_error() the MTU argument directly instead of an interface pointer. This simplifies a couple of uses and removes some XXX workarounds.
|
145773 |
01-May-2005 |
rwatson |
Remove now unused inirw variable from previous use of COMMON_END().
Reported by: csjp
|
145771 |
01-May-2005 |
grehan |
Fix typo in last commit.
Approved by: rwatson
|
145766 |
01-May-2005 |
rwatson |
Slide unlocking of the tcbinfo lock earlier in tcp_usr_send(), as it's needed only for implicit connect cases. Under load, especially on SMP, this can greatly reduce contention on the tcbinfo lock.
NB: Ambiguities about the state of so_pcb need to be resolved so that all use of the tcbinfo lock in non-implicit connection cases can be eliminated.
Submited by: Kazuaki Oda <kaakun at highway dot ne dot jp>
|
145565 |
26-Apr-2005 |
brooks |
Introduce a struct icmphdr which contains the type, code, and cksum fields of an ICMP packet.
Use this to allow ipfw to pullup only these values since it does not use the rest of the packet and it was failed on ICMP packets because they were not long enough.
struct icmp should probably be modified to use these at some point, but that will break a fair bit of code so it can wait for another day.
On the off chance that adding this struct breaks something in ports, bump __FreeBSD_version.
Reported by: Randy Bush <randy at psg dot com> Tested by: Randy Bush <randy at psg dot com>
|
145373 |
21-Apr-2005 |
ps |
Remove some code that snuck in by accident.
Submitted by: Mohan Srinivasan
|
145372 |
21-Apr-2005 |
ps |
Fix for interaction problems between TCP SACK and TCP Signature. If TCP Signatures are enabled, the maximum allowed sack blocks aren't going to fit. The fix is to compute how many sack blocks fit and tack these on last. Also on SYNs, defer padding until after the SACK PERMITTED option has been added.
Found by: Mohan Srinivasan. Submitted by: Mohan Srinivasan, Noritoshi Demizu. Reviewed by: Raja Mukerji.
|
145371 |
21-Apr-2005 |
ps |
Undo rev 1.71 as it is the wrong change.
|
145370 |
21-Apr-2005 |
ps |
- Make the sack scoreboard logic use the TAILQ macros. This improves code readability and facilitates some anticipated optimizations in tcp_sack_option(). - Remove tcp_print_holes() and TCP_SACK_DEBUG.
Submitted by: Raja Mukerji. Reviewed by: Mohan Srinivasan, Noritoshi Demizu.
|
145369 |
21-Apr-2005 |
ps |
Fix for 2 bugs related to TCP Signatures : - If the peer sends the Signature option in the SYN, use of Timestamps and Window Scaling were disabled (even if the peer supports them). - The sender must not disable signatures if the option is absent in the received SYN. (See comment in syncache_add()).
Found, Submitted by: Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp>. Reviewed by: Mohan Srinivasan <mohans at yahoo-inc dot com>.
|
145360 |
21-Apr-2005 |
andre |
Move Path MTU discovery ICMP processing from icmp_input() to tcp_ctlinput() and subject it to active tcpcb and sequence number checking. Previously any ICMP unreachable/needfrag message would cause an update to the TCP hostcache. Now only ICMP PMTU messages belonging to an active TCP session with the correct src/dst/port and sequence number will update the hostcache and complete the path MTU discovery process.
Note that we don't entirely implement the recommended counter measures of Section 7.2 of the paper. However we close down the possible degradation vector from trivially easy to really complex and resource intensive. In addition we have limited the smallest acceptable MTU with net.inet.tcp.minmss sysctl for some time already, further reducing the effect of any degradation due to an attack.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.2 MFC after: 3 days
|
145355 |
21-Apr-2005 |
andre |
Ignore ICMP Source Quench messages for TCP sessions. Source Quench is ineffective, depreciated and can be abused to degrade the performance of active TCP sessions if spoofed.
Replace a bogus call to tcp_quench() in tcp_output() with the direct equivalent tcpcb variable assignment.
Security: draft-gont-tcpm-icmp-attacks-03.txt Section 7.1 MFC after: 3 days
|
145321 |
20-Apr-2005 |
glebius |
Remove anti-LOR bandaid, it is not needed now.
Sponsored by: Rambler
|
145268 |
19-Apr-2005 |
phk |
Make DUMMYNET compile without INET6
|
145267 |
19-Apr-2005 |
phk |
typo
|
145266 |
19-Apr-2005 |
phk |
Make IPFIREWALL compile without INET6
|
145246 |
18-Apr-2005 |
brooks |
Add IPv6 support to IPFW and Dummynet.
Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
|
145244 |
18-Apr-2005 |
ps |
Rewrite of tcp_update_sack_list() to make it simpler and more readable than our original OpenBSD derived version.
Submitted by: Noritoshi Demizu Reviewed by: Mohan Srinivasan, Raja Mukerji
|
145093 |
15-Apr-2005 |
brooks |
Centralized finding the protocol header in IP packets in preperation for IPv6 support. The header in IPv6 is more complex then in IPv4 so we want to handle skipping over it in one location.
Submitted by: Mariano Tortoriello and Raffaele De Lorenzo (via luigi)
|
145087 |
14-Apr-2005 |
ps |
Fix for a TCP SACK bug where more than (win/2) bytes could have been in flight in SACK recovery.
Found by: Noritoshi Demizu Submitted by: Mohan Srinivasan <mohans at yahoo-inc dot com> Noritoshi Demizu <demizu at dd dot ij4u dot or dot jp> Raja Mukerji <raja at moselle dot com>
|
144858 |
10-Apr-2005 |
ps |
- Tighten up the Timestamp checks to prevent a spoofed segment from setting ts_recent to an arbitrary value, stopping further communication between the two hosts. - If the Echoed Timestamp is greater than the current time, fall back to the non RFC 1323 RTT calculation.
Submitted by: Raja Mukerji (raja at moselle dot com) Reviewed by: Noritoshi Demizu, Mohan Srinivasan
|
144857 |
10-Apr-2005 |
ps |
- If the reassembly queue limit was reached or if we couldn't allocate a reassembly queue state structure, don't update (receiver) sack report. - Similarly, if tcp_drain() is called, freeing up all items on the reassembly queue, clean the sack report.
Found, Submitted by: Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp> Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com), Raja Mukerji (raja at moselle dot com).
|
144856 |
10-Apr-2005 |
ps |
When the rightmost SACK block expands, rcv_lastsack should be updated. (Fix for kern/78226).
Submitted by : Noritoshi Demizu <demizu at dd dot iij4u dot or dot jp> Reviewed by : Mohan Srinivasan (mohans at yahoo-inc dot com), Raja Mukerji (raja at moselle dot com).
|
144855 |
10-Apr-2005 |
ps |
Remove some unused sack fields.
Submitted by : Noritoshi Demizu, Mohan Srinivasan.
|
144792 |
08-Apr-2005 |
maxim |
o Nano optimize ip_reass() code path for the first fragment: do not try to reasseble the packet from the fragments queue with the only fragment, finish with the first fragment as soon as we create a queue.
Spotted by: Vijay Singh
o Drop the fragment if maxfragsperpacket == 0, no chances we will be able to reassemble the packet in future.
Reviewed by: silby
|
144786 |
08-Apr-2005 |
maxim |
o Tweak the comment a bit.
|
144785 |
08-Apr-2005 |
maxim |
o Disable random port allocation when ip.portrange.first == ip.portrange.last and there is the only port for that because: a) it is not wise; b) it leads to a panic in the random ip port allocation code. In general we need to disable ip port allocation randomization if the last - first delta is ridiculous small.
PR: kern/79342 Spotted by: Anjali Kulkarni Glanced at by: silby MFC after: 2 weeks
|
144712 |
06-Apr-2005 |
glebius |
When a packet has been reinjected into ipfw(4) after dummynet(4) processing we have a non-NULL args.rule. If the same packet later is subject to "tee" rule, its original is sent again into ipfw_chk() and it reenters at the same rule. This leads to infinite loop and frozen router.
Assign args.rule to NULL, any time we are going to send packet back to ipfw_chk() after a tee rule. This is a temporary workaround, which we will leave for RELENG_5. In HEAD we are going to make divert(4) save next rule the same way as dummynet(4) does.
PR: kern/79546 Submitted by: Oleg Bulyzhin Reviewed by: maxim, andre MFC after: 3 days
|
144693 |
06-Apr-2005 |
brooks |
Use ACTION_PTR(r) instead of (r->cmd + r->act_ofs).
Reviewed by: md5
|
144691 |
05-Apr-2005 |
brooks |
Make dummynet_flush() match its prototype.
|
144666 |
05-Apr-2005 |
phk |
natd core dumps when -reverse switch is used because of a bug in libalias.
In /usr/src/lib/libalias/alias.c, the functions LibAliasIn and LibAliasOutTry call the legacy PacketAliasIn/PacketAliasOut instead of LibAliasIn/LibAliasOut when the PKT_ALIAS_REVERSE option is set. In this case, the context variable "la" gets lost because the legacy compatibility routines expect "la" to be global. This was obviously an oversight when rewriting the PacketAlias* functions to the LibAlias* functions.
The fix (as shown in the patch below) is to remove the legacy subroutine calls and replace with the new ones using the "la" struct as the first arg.
Submitted by: Gil Kloepfer <fgil@kloepfer.org> Confirmed by: <nicolai@catpipe.net> PR: 76839 MFC after: 3 days
|
144329 |
30-Mar-2005 |
glebius |
When several carp interfaces are attached to Ethernet interface, carp_carpdev_state_locked() is called every time carp interface is attached. The first call backs up flags of the first interface, and the second call backs up them again, erasing correct values. To solve this, a carp_sc_state_locked() function is introduced. It is called when interface is attached to parent, instead of calling carp_carpdev_state_locked. carp_carpdev_state_locked() calls carp_sc_state_locked() for each sc in chain.
Reported by: Yuriy N. Shkandybin, sem
|
144301 |
29-Mar-2005 |
glebius |
- Don't free mbuf, passed to interface output method if the latter returns error. In this case mbuf has already been freed. [1] - Remove redundant declaration.
PR: kern/78893 [1] Submitted by: Liang Yi [1] Reviewed by: sam MFC after: 1 day
|
144260 |
29-Mar-2005 |
sam |
eliminate extraneous null ptr checks
Noticed by: Coverity Prevent analysis tool
|
144163 |
26-Mar-2005 |
sam |
deal with malloc failures
Noticed by: Coverity Prevent analysis tool Together with: mdodd
|
144016 |
23-Mar-2005 |
maxim |
o Document net.inet.ip.portrange.random* sysctls. o Correct a comment about random port allocation threshold implementation.
Reviewed by: silby, ru MFC after: 3 days
|
143881 |
20-Mar-2005 |
glebius |
ifma_protospec is a pointer. Use NULL when assigning or compating it.
|
143868 |
20-Mar-2005 |
glebius |
Remove a workaround from previos revision. It proved to be incorrect. Add two another workarounds for carp(4) interfaces: - do not add connected route when address is assigned to carp(4) interface - do not add connected route when other interface goes down
Embrace workarounds with #ifdef DEV_CARP
|
143806 |
18-Mar-2005 |
glebius |
If vhid exists return more informative EEXIST instead of EINVAL. While here remove redundant brackets.
|
143804 |
18-Mar-2005 |
glebius |
Fix a potential crash that could occur when CARP_LOG is being used.
Obtained from: OpenBSD (pat)
|
143676 |
16-Mar-2005 |
sam |
plug resource leak
Noticed by: Coverity Prevent analysis tool
|
143610 |
14-Mar-2005 |
rwatson |
In tcp_usr_send(), broaden coverage of the socket buffer lock in the non-OOB case so that the sbspace() check is performed under the same lock instance as the append to the send socket buffer.
MFC after: 1 week
|
143491 |
13-Mar-2005 |
glebius |
Embrace with #ifdef DEV_CARP carp-related code.
|
143374 |
10-Mar-2005 |
glebius |
Add antifootshooting workaround, which will make all routes "connected" to carp(4) interfaces host routes. This prevents a problem, when connected network is routed to carp(4) interface.
|
143339 |
09-Mar-2005 |
ps |
Add limits on the number of elements in the sack scoreboard both per-connection and globally. This eliminates potential DoS attacks where SACK scoreboard elements tie up too much memory.
Submitted by: Raja Mukerji (raja at moselle dot com). Reviewed by: Mohan Srinivasan (mohans at yahoo-inc dot com).
|
143314 |
09-Mar-2005 |
glebius |
Make ARP do not complain about wrong interface if correct interface is a carp one and address matched it.
Reviewed by: brooks
|
143083 |
03-Mar-2005 |
marcus |
Fix a problem in the Skinny ALG where a specially crafted packet could cause a libalias application (e.g. natd, ppp, etc.) to crash. Note: Skinny support is not enabled in natd or ppp by default.
Approved by: secteam (nectar) MFC after: 1 day Secuiryt: This fixes a remote DoS exploit
|
142996 |
02-Mar-2005 |
glebius |
Fix typo. Unbreak build. Take pointy hat.
|
142914 |
01-Mar-2005 |
glebius |
Add more locking when reading/writing to carp softc. When carp softc is attached to a parent interface we use its mutex to lock the softc. This means that in several places like carp_ioctl() we lock softc conditionaly. This should be redesigned.
To avoid LORs when MII announces us a link state change, we schedule a quick callout and call carp_carpdev_state_locked() from it.
Initialize callouts using NET_CALLOUT_MPSAFE.
Sponsored by: Rambler Reviewed by: mlaier
|
142911 |
01-Mar-2005 |
glebius |
- Add carp_mtx. Use it to protect list of all carp interfaces. - In carp_send_ad_all() walk through list of all carp interfaces instead of walking through list of all interfaces.
Sponsored by: Rambler Reviewed by: mlaier
|
142906 |
01-Mar-2005 |
glebius |
Use NET_CALLOUT_MPSAFE macro.
|
142901 |
01-Mar-2005 |
glebius |
Revert change to struct ifnet. Use ifnet pointer in softc. Embedding ifnet into smth will soon be removed.
Requested by: brooks
|
142897 |
01-Mar-2005 |
glebius |
Remove debugging printf.
Reviewed by: mlaier
|
142798 |
28-Feb-2005 |
yar |
Support running carp(4) over a vlan(4) parent interface.
Encouraged by: glebius
|
142785 |
28-Feb-2005 |
glebius |
Remove unused field from carp softc.
OK'ed by: mcbride@OpenBSD
|
142784 |
28-Feb-2005 |
glebius |
Fix tcpdump(8) on carp(4) interface: - Use our loop DLT type, not OpenBSD. [1] - The fields that are converted to network byte order are not 32-bit fields but 16-bit fields, so htons should be used in htonl. [1] - Secondly, ip_input changes ip->ip_len into its value without the ip-header length. So, restore the length to make bpf happy. [1] - Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't understand uint8_t af identifier.
Submitted by: Frank Volf [1]
|
142688 |
27-Feb-2005 |
ps |
If the receiver sends an ack that is out of [snd_una, snd_max], ignore the sack options in that segment. Else we'd end up corrupting the scoreboard.
Found by: Raja Mukerji (raja at moselle dot com) Submitted by: Mohan Srinivasan
|
142641 |
27-Feb-2005 |
mlaier |
Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be static as they are used elsewhere.
|
142564 |
26-Feb-2005 |
glebius |
Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.
Obtained from: OpenBSD
|
142559 |
26-Feb-2005 |
glebius |
Staticize local functions.
|
142452 |
25-Feb-2005 |
glebius |
New lines when logging.
|
142451 |
25-Feb-2005 |
glebius |
Embrace macros with do {} while (0)
Submitted by: maxim
|
142447 |
25-Feb-2005 |
glebius |
Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4.
Sponsored by: Rambler
|
142446 |
25-Feb-2005 |
glebius |
Improve logging: - Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD). - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when net.inet.carp.log > 1 - Use CARP_DEBUG to log state changes of carp interfaces.
After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc argument. Remove it.
Sponsored by: Rambler
|
142371 |
24-Feb-2005 |
glebius |
Fix problem when master comes up with one interface down, and preempts mastering on all other interfaces:
- call carp_carpdev_state() on initialize instead of just setting to INIT - in carp_carpdev_state() check that interface is UP, instead of checking that it is not DOWN, because a rebooted machine may have interface in UNKNOWN state.
Sponsored by: Rambler Obtained from: OpenBSD (partially)
|
142268 |
23-Feb-2005 |
sam |
fix potential invalid index into ip_protox array
Noticed by: Coverity Prevent analysis tool
|
142266 |
23-Feb-2005 |
mux |
Unbreak CARP build on 64-bit architectures.
Tested on: sparc64
|
142248 |
22-Feb-2005 |
andre |
Bring back the full packet destination manipulation for 'ipfw fwd' with the kernel compile time option:
options IPFIREWALL_FORWARD_EXTENDED
This option has to be specified in addition to IPFIRWALL_FORWARD.
With this option even packets targeted for an IP address local to the host can be redirected. All restrictions to ensure proper behaviour for locally generated packets are turned off. Firewall rules have to be carefully crafted to make sure that things like PMTU discovery do not break.
Document the two kernel options.
PR: kern/71910 PR: kern/73129 MFC after: 1 week
|
142243 |
22-Feb-2005 |
glebius |
Remove promisc counter from parent interface in carp_clone_destroy(), so that parent interface is not left in promiscous mode after carp interface is destroyed.
This is not perfect, since promisc counter is added when carp interface is assigned an IP address. However, when address is removed parent interface is still in promiscuous mode. Only removal of carp interface removes promisc from parent. Same way in OpenBSD.
Sponsored by: Rambler
|
142215 |
22-Feb-2005 |
glebius |
Add CARP (Common Address Redundancy Protocol), which allows multiple hosts to share an IP address, providing high availability and load balancing.
Original work on CARP done by Michael Shalayeff, with many additions by Marco Pfatschbacher and Ryan McBride.
FreeBSD port done solely by Max Laier.
Patch by: mlaier Obtained from: OpenBSD (mickey, mcbride)
|
142212 |
22-Feb-2005 |
glebius |
We can make code simplier after last change.
Noticed by: Andrew Thompson
|
142207 |
22-Feb-2005 |
glebius |
In in_pcbconnect_setup() jailed sockets are treated specially: if local address is not supplied, then jail IP is choosed and in_pcbbind() is called. Since udp_output() does not save local addr after call to in_pcbconnect_setup(), in_pcbbind() is called for each packet, and this is incorrect.
So, we shall treat jailed sockets specially in udp_output(), we will save their local address.
This fixes a long standing bug with broken sendto() system call in jails.
PR: kern/26506 Reviewed by: rwatson MFC after: 2 weeks
|
142206 |
22-Feb-2005 |
glebius |
In in_pcbconnect_setup() remove a check that route points at loopback interface. Nobody have explained me sense of this check. It breaks connect() system call to a destination address which is loopback routed (e.g. blackholed).
Reviewed by: silence on net@ MFC after: 2 weeks
|
142190 |
21-Feb-2005 |
rwatson |
In the current world order, solisten() implements the state transition of a socket from a regular socket to a listening socket able to accept new connections. As part of this state transition, solisten() calls into the protocol to update protocol-layer state. There were several bugs in this implementation that could result in a race wherein a TCP SYN received in the interval between the protocol state transition and the shortly following socket layer transition would result in a panic in the TCP code, as the socket would be in the TCPS_LISTEN state, but the socket would not have the SO_ACCEPTCONN flag set.
This change does the following:
- Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. The check for a socket layer state transition is performed by solisten_proto_check(), and the actual transition is performed by solisten_proto().
- Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer.
This permits the dual transition of socket layer and protocol layer state to occur while holding locks for both layers, making the two changes atomic with respect to one another. Similar changes are likely require elsewhere in the socket/protocol code.
Reported by: Peter Holm <peter@holm.cc> Review and fixes from: emax, Antoine Brodin <antoine.brodin@laposte.net> Philosophical head nod: gnn
|
142031 |
17-Feb-2005 |
ps |
Remove 2 (SACK) fields from the tcpcb. These are only used by a function that is called from tcp_input(), so they oughta be passed on the stack instead of stuck in the tcpcb.
Submitted by: Mohan Srinivasan
|
141961 |
16-Feb-2005 |
ps |
Fix for a SACK (receiver) bug where incorrect SACK blocks are reported to the sender - in the case where the sender sends data outside the window (as WinXP does :().
Reported by: Sam Jensen <sam at wand dot net dot nz> Submitted by: Mohan Srinivasan
|
141928 |
14-Feb-2005 |
ps |
- Retransmit just one segment on initiation of SACK recovery. Remove the SACK "initburst" sysctl. - Fix bugs in SACK dupack and partialack handling that can cause large bursts while in SACK recovery.
Submitted by: Mohan Srinivasan
|
141886 |
14-Feb-2005 |
maxim |
o Add handling of an IPv4-mapped IPv6 address. o Use SYSCTL_IN() macro instead of direct call of copyin(9).
Submitted by: ume
o Move sysctl_drop() implementation to sys/netinet/tcp_subr.c where most of tcp sysctls live. o There are net.inet[6].tcp[6].getcred sysctls already, no needs in a separate struct tcp_ident_mapping.
Suggested by: ume
|
141383 |
06-Feb-2005 |
glebius |
Jump to common action checks after doing specific once. This fixes adding of divert rules, which I break in previous commit.
Pointy hat to: glebius
|
141381 |
06-Feb-2005 |
maxim |
o Implement net.inet.tcp.drop sysctl and userland part, tcpdrop(8) utility:
The tcpdrop command drops the TCP connection specified by the local address laddr, port lport and the foreign address faddr, port fport.
Obtained from: OpenBSD Reviewed by: rwatson (locking), ru (man page), -current MFC after: 1 month
|
141351 |
05-Feb-2005 |
glebius |
Add a ng_ipfw node, implementing a quick and simple interface between ipfw(4) and netgraph(4) facilities.
Reviewed by: andre, brooks, julian
|
141282 |
04-Feb-2005 |
ume |
teach scope of IPv6 address to net.inet6.tcp6.getcred.
MFC after: 1 week
|
141078 |
31-Jan-2005 |
rwatson |
Update an additional reference to the rate of ISN tick callouts that was missed in tcp_subr.c:1.216: projected_offset must also reflect how often the tcp_isn_tick() callout will fire.
MFC after: 2 weeks Submitted by: silby
|
141076 |
31-Jan-2005 |
csjp |
Change the state allocator from using regular malloc to using a UMA zone instead. This should eliminate a bit of the locking overhead associated with with malloc and reduce the memory consumption associated with each new state.
Reviewed by: rwatson, andre Silence on: ipfw@ MFC after: 1 week
|
141072 |
30-Jan-2005 |
rwatson |
Have tcp_isn_tick() fire 100 times a second, rather than HZ times a second; since the default hz has changed to 1000 times a second, this resulted in unecessary work being performed.
MFC after: 2 weeks Discussed with: phk, cperciva General head nod: silby
|
141064 |
30-Jan-2005 |
rwatson |
Prefer (NULL) spelling of (0) for pointers.
MFC after: 3 days
|
141063 |
30-Jan-2005 |
rwatson |
Remove clause three from tcp_syncache.c license per permission of McAfee. Update copyright to McAfee from NETA.
|
140675 |
23-Jan-2005 |
alc |
Correctly move the packet header in ip_insertoptions().
Reported by: Anupam Chanda Reviewed by: sam@ MFC after: 2 weeks
|
140505 |
20-Jan-2005 |
ru |
Sort sections.
|
140345 |
16-Jan-2005 |
glebius |
- Reduce number of arguments passed to dummynet_io(), we already have cookie in struct ip_fw_args itself. - Remove redundant &= 0xffff from dummynet_io().
|
140224 |
14-Jan-2005 |
glebius |
o Clean up interface between ip_fw_chk() and its callers:
- ip_fw_chk() returns action as function return value. Field retval is removed from args structure. Action is not flag any more. It is one of integer constants. - Any action-specific cookies are returned either in new "cookie" field in args structure (dummynet, future netgraph glue), or in mbuf tag attached to packet (divert, tee, some future action).
o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}() to a switch structure, so that the functions are more readable, and a future actions can be added with less modifications.
Approved by: andre MFC after: 2 months
|
140138 |
12-Jan-2005 |
ps |
Fix a TCP SACK related crash resulting from incorrect computation of len in tcp_output(), in the case where the FIN has already been transmitted. The mis-computation of len is because of a gcc optimization issue, which this change works around.
Submitted by: Mohan Srinivasan
|
139976 |
10-Jan-2005 |
brian |
include "alias.h", not <alias.h>
MFC after: 3 days
|
139823 |
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
139606 |
03-Jan-2005 |
silby |
Add a sysctl (net.inet.tcp.insecure_rst) which allows one to specify that the RFC 793 specification for accepting RST packets should be following. When followed, this makes one vulnerable to the attacks described in "slipping in the window", but it may be necessary in some odd circumstances.
|
139558 |
02-Jan-2005 |
silby |
Port randomization leads to extremely fast port reuse at high connection rates, which is causing problems for some users.
To retain the security advantage of random ports and ensure correct operation for high connection rate users, disable port randomization during periods of high connection rates.
Whenever the connection rate exceeds randomcps (10 by default), randomization will be disabled for randomtime (45 by default) seconds. These thresholds may be tuned via sysctl.
Many thanks to Igor Sysoev, who proved the necessity of this change and tested many preliminary versions of the patch.
MFC After: 20 seconds
|
139310 |
25-Dec-2004 |
rwatson |
Remove an errant blank line apparently introduced in ip_output.c:1.194.
|
139298 |
25-Dec-2004 |
rwatson |
In the dropafterack case of tcp_input(), it's OK to release the TCP pcbinfo lock before calling tcp_output(), as holding just the inpcb lock is sufficient to prevent garbage collection.
|
139297 |
25-Dec-2004 |
rwatson |
Revert parts of tcp_input.c:1.255 associated with the header predicted cases for tcp_input():
While it is true that the pcbinfo lock provides a pseudo-reference to inpcbs, both the inpcb and pcbinfo locks are required to free an un-referenced inpcb. As such, we can release the pcbinfo lock as long as the inpcb remains locked with the confidence that it will not be garbage-collected. This leads to a less conservative locking strategy that should reduce contention on the TCP pcbinfo lock.
Discussed with: sam
|
139222 |
23-Dec-2004 |
rwatson |
Attempt to consistently use () around return values in calls to return() in newer code (sysctl, ISN, timewait).
|
139221 |
23-Dec-2004 |
rwatson |
Remove an XXXRW comment relating to whether or not the TCP timers are MPSAFE: they are now believed to be.
Correct a typo in a second comment.
MFC after: 2 weeks
|
139220 |
23-Dec-2004 |
rwatson |
Remove the now unused tcp_canceltimers() function. tcpcb timers are now stopped as part of tcp_discardcb().
MFC after: 2 weeks
|
139219 |
23-Dec-2004 |
rwatson |
Remove an annotation of a minor race relating to the update of multiple MIB entries using sysctl in short order, which might result in unexpected values for tcp_maxidle being generated by tcp_slowtimo. In practice, this will not happen, or at least, doesn't require an explicit comment.
MFC after: 2 weeks
|
138653 |
10-Dec-2004 |
glebius |
In certain cases ip_output() can free our route, so check for its presence before RTFREE().
Noticed by: ru
|
138652 |
10-Dec-2004 |
glebius |
Revert last change.
Andre: First lets get major new features into the kernel in a clean and nice way, and then start optimizing. In this case we don't have any obfusication that makes later profiling and/or optimizing difficult in any way.
Requested by: csjp, sam
|
138642 |
10-Dec-2004 |
csjp |
This commit adds a shared locking mechanism very similar to the mechanism used by pfil. This shared locking mechanism will remove a nasty lock order reversal which occurs when ucred based rules are used which results in hard locks while mpsafenet=1.
So this removes the debug.mpsafenet=0 requirement when using ucred based rules with IPFW.
It should be noted that this locking mechanism does not guarantee fairness between read and write locks, and that it will favor firewall chain readers over writers. This seemed acceptable since write operations to firewall chains protected by this lock tend to be less frequent than reads.
Reviewed by: andre, rwatson Tested by: myself, seanc Silence on: ipfw@ MFC after: 1 month
|
138631 |
09-Dec-2004 |
glebius |
Check that DUMMYNET_LOADED before seeking dummynet m_tag.
Reviewed by: andre MFC after: 1 week
|
138615 |
09-Dec-2004 |
mlaier |
More fixing of multiple addresses in the same prefix. This time do not try to arp resolve "secondary" local addresses.
Found and submitted by: ru With additions from: OpenBSD (rev. 1.47) Reviewed by: ru
|
138499 |
06-Dec-2004 |
ru |
Time out routes created by redirect.
|
138470 |
06-Dec-2004 |
glebius |
- Make route cacheing optional, configurable via IFF_LINK0 flag. - Turn it off by default.
Requested by: many Reviewed by: andre Approved by: julian (mentor) MFC after: 3 days
|
138416 |
05-Dec-2004 |
rwatson |
Assert the tcptw inpcb lock in tcp_timer_2msl_reset(), as fields in the tcptw undergo non-atomic read-modify-writes.
MFC after: 2 weeks
|
138410 |
05-Dec-2004 |
rwatson |
Assert inpcb lock in:
tcpip_fillheaders() tcp_discardcb() tcp_close() tcp_notify() tcp_new_isn() tcp_xmit_bandwidth_limit()
Fix a locking comment in tcp_twstart(): the pcbinfo will be locked (and is asserted).
MFC after: 2 weeks
|
138409 |
05-Dec-2004 |
rwatson |
Minor grammer fix in comment.
|
138408 |
05-Dec-2004 |
rwatson |
Pass the inpcb reference into ip_getmoptions() rather than just the inp->inp_moptions pointer, so that ip_getmoptions() can perform necessary locking when doing non-atomic reads.
Lock the inpcb by default to copy any data to local variables, then unlock before performing sooptcopyout().
MFC after: 2 weeks
|
138407 |
05-Dec-2004 |
rwatson |
Define INP_UNLOCK_ASSERT() to assert that an inpcb is unlocked.
MFC after: 2 weeks
|
138404 |
05-Dec-2004 |
rwatson |
Push the inpcb argument into ip_setmoptions() when setting IP multicast socket options, so that it is available for locking.
|
138397 |
05-Dec-2004 |
rwatson |
Start working through inpcb locking for ip_ctloutput() by cleaning up modifications to the inpcb IP options mbuf:
- Lock the inpcb before passing it into ip_pcbopts() in order to prevent simulatenous reads and read-modify-writes that could result in races. - Pass the inpcb reference into ip_pcbopts() instead of the option chain pointer in the inpcb. - Assert the inpcb lock in ip_pcbots. - Convert one or two uses of a pointer as a boolean or an integer comparison to a comparison with NULL for readability.
|
138199 |
29-Nov-2004 |
ps |
Fixes a bug in SACK causing us to send data beyond the receive window.
Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138148 |
28-Nov-2004 |
rwatson |
Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify- write of various time/rtt-related fields in the tcpcb.
|
138147 |
28-Nov-2004 |
rwatson |
Expand coverage of the receive socket buffer lock when handling urgent pointer updates: test available space while holding the socket buffer mutex, and continue to hold until until the pointer update has been performed.
MFC after: 2 weeks
|
138136 |
27-Nov-2004 |
rwatson |
Do export the advertised receive window via the tcpi_rcv_space field of struct tcp_info.
|
138118 |
26-Nov-2004 |
rwatson |
Implement parts of the TCP_INFO socket option as found in Linux 2.6. This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.
|
138098 |
25-Nov-2004 |
silby |
Fix a problem where our TCP stack would ignore RST packets if the receive window was 0 bytes in size. This may have been the cause of unsolved "connection not closing" reports over the years.
Thanks to Michiel Boland for providing the fix and providing a concise test program for the problem.
Submitted by: Michiel Boland MFC after: 2 weeks
|
138040 |
23-Nov-2004 |
rwatson |
In tcp_reass(), assert the inpcb lock on the passed tcpcb, since the contents of the tcpcb are read and modified in volume.
In tcp_input(), replace th comparison with 0 with a comparison with NULL.
At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in tcp_input(), assert 'headlocked'. Try to improve consistency between various assertions regarding headlocked to be more informative.
MFC after: 2 weeks
|
138025 |
23-Nov-2004 |
rwatson |
tcp_timewait() performs multiple non-atomic reads on the tcptw structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it.
In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock.
In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock.
Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock).
In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified.
In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified.
In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified.
MFC after: 2 weeks
|
138024 |
23-Nov-2004 |
rwatson |
De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely.
Annotate that tcp_canceltimers() is currently unused.
De-spl tcp_timer_delack().
De-spl tcp_timer_2msl().
MFC after: 2 weeks
|
138020 |
23-Nov-2004 |
rwatson |
Assert the inpcb lock in tcp_twstart(), which does both read-modify-write on the tcpcb, but also calls into tcp_close() and tcp_twrespond().
Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug.
Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock.
Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock.
MFC after: 2 weeks
|
138019 |
23-Nov-2004 |
rwatson |
Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(), and tcp_drop(), due to read-modify-write of TCP state variables.
MFC after: 2 weeks
|
138018 |
23-Nov-2004 |
rwatson |
Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lock protects access to the ISN state variables.
Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping.
Staticize internal ISN variables since they're not used outside of tcp_subr.c.
MFC after: 2 weeks
|
137988 |
22-Nov-2004 |
rwatson |
Remove "Unlocked read" annotations associated with previously unlocked use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock.
MFC after: 1 week
|
137971 |
21-Nov-2004 |
rwatson |
s/send/sent/ in comment describing TCPS_SYN_RECEIVED.
|
137860 |
18-Nov-2004 |
glebius |
- Since divert protocol is not connection oriented, remove SS_ISCONNECTED flag from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now.
Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month
|
137833 |
17-Nov-2004 |
mlaier |
Fix host route addition for more than one address to a loopback interface after allowing more than one address with the same prefix.
Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru> Submitted by: ru (also NetBSD rev. 1.83) Pointyhat to: mlaier
|
137668 |
13-Nov-2004 |
mlaier |
Merge copyright notices.
Requested by: njl
|
137630 |
12-Nov-2004 |
glebius |
Fix ng_ksocket(4) operation as a divert socket, which is pretty useful and has been broken twice:
- in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2]
A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly.
Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week
|
137628 |
12-Nov-2004 |
mlaier |
Change the way we automatically add prefix routes when adding a new address. This makes it possible to have more than one address with the same prefix. The first address added is used for the route. On deletion of an address with IFA_ROUTE set, we try to find a "fallback" address and hand over the route if possible. I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to in_ifscrub as it must be considered KAPI as it is not static in in.c. I will clean this after the MFC.
Discussed on: arch, net Tested by: many testers of the CARP patches Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it> Obtained from: WIDE via OpenBSD MFC after: 1 month
|
137584 |
11-Nov-2004 |
phk |
Add missing '='
Spotted by: obrien
|
137450 |
09-Nov-2004 |
andre |
Fix a double-free in the 'hlen > m->m_len' sanity check.
Bug report by: <james@towardex.com> MFC after: 2 weeks
|
137396 |
08-Nov-2004 |
suz |
support TCP-MD5(IPv4) in KAME-IPSEC, too.
MFC after: 3 week
|
137386 |
08-Nov-2004 |
phk |
Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain().
This makes it possible to grep these structures and see any bogosities.
|
137349 |
07-Nov-2004 |
rwatson |
Do some re-sorting of TCP pcbinfo locking and assertions: make sure to retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held.
MFC after: 3 weeks
|
137302 |
06-Nov-2004 |
andre |
Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check.
Bug report by: <james@towardex.com> MFC after: 2 weeks
|
137183 |
04-Nov-2004 |
phk |
Hide udp_in6 behind #ifdef INET6
|
137179 |
04-Nov-2004 |
bms |
When performing IP fast forwarding, immediately drop traffic which is destined for a blackhole route.
This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case.
Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks
|
137176 |
04-Nov-2004 |
rwatson |
Until this change, the UDP input code used global variables udp_in, udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races:
- Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded.
- Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects.
- Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append().
- Re-annotate comments to reflect updates.
With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth.
Discovered by: kris (Bug Magnet) MFC after: 3 weeks
|
137139 |
02-Nov-2004 |
andre |
Remove RFC1644 T/TCP support from the TCP side of the network stack.
A complete rationale and discussion is given in this message and the resulting discussion:
http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706
Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality.
Discussed on: -arch
|
137066 |
30-Oct-2004 |
rwatson |
Correct a bug in TCP SACK that could result in wedging of the TCP stack under high load: only set function state to loop and continuing sending if there is no data left to send.
RELENG_5_3 candidate.
Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>
|
136967 |
26-Oct-2004 |
rwatson |
Add a matching tunable for net.inet.tcp.sack.enable sysctl.
|
136960 |
26-Oct-2004 |
bms |
Check that rt_mask(rt) is non-NULL before dereferencing it, in the RTM_ADD case, thus avoiding a panic.
Submitted by: Iasen Kostov
|
136953 |
25-Oct-2004 |
andre |
IPDIVERT is a module now and tell the other parts of the kernel about it. IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.
|
136910 |
24-Oct-2004 |
ru |
For variables that are only checked with defined(), don't provide any fake value.
|
136792 |
22-Oct-2004 |
andre |
Shave 40 unused bytes from struct tcpcb.
|
136790 |
22-Oct-2004 |
andre |
When printing the initialization string and IPDIVERT is not compiled into the kernel refer to it as "loadable" instead of "disabled".
|
136788 |
22-Oct-2004 |
andre |
Refuse to unload the ipdivert module unless the 'force' flag is given to kldunload.
Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.
|
136717 |
19-Oct-2004 |
andre |
Destroy the UMA zone on unload.
|
136716 |
19-Oct-2004 |
andre |
Slightly extend the locking during unload to fully cover the protocol deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.
|
136715 |
19-Oct-2004 |
rwatson |
Annotate a newly introduced race present due to the unloading of protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.
|
136714 |
19-Oct-2004 |
andre |
Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.
|
136713 |
19-Oct-2004 |
andre |
Properly declare the "net.inet" sysctl subtree.
|
136712 |
19-Oct-2004 |
andre |
Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACER to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.
|
136695 |
19-Oct-2004 |
andre |
Make use of the PROTO_SPACER functionality for dynamically loadable protocols in inetsw[] and define initially eight spacer slots.
Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.
|
136694 |
19-Oct-2004 |
andre |
Support for dynamically loadable and unloadable IP protocols in the ipmux.
With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets.
The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.
|
136691 |
19-Oct-2004 |
andre |
Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.
|
136690 |
19-Oct-2004 |
andre |
Make comments more clear. Change the order of one if() statement to check the more likely variable first.
|
136682 |
18-Oct-2004 |
rwatson |
Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
|
136449 |
12-Oct-2004 |
rwatson |
Don't release the udbinfo lock until after the last use of UDP inpcb in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path.
RELENG_5 candidate.
|
136441 |
12-Oct-2004 |
rwatson |
Modify the thrilling "%D is using my IP address %s!" message so that it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.
|
136440 |
12-Oct-2004 |
rwatson |
When the access control on creating raw sockets was modified so that processes in jail could create raw sockets, additional access control checks were added to raw IP sockets to limit the ways in which those sockets could be used. Specifically, only the socket option IP_HDRINCL was permitted in rip_ctloutput(). Other socket options were protected by a call to suser(). This change was required to prevent processes in a Jail from modifying system properties such as multicast routing and firewall rule sets.
However, it also introduced a regression: processes that create a raw socket with root privilege, but then downgraded credential (i.e., a daemon giving up root, or a setuid process switching back to the real uid) could no longer issue other unprivileged generic IP socket option operations, such as IP_TOS, IP_TTL, and the multicast group membership options, which prevented multicast routing daemons (and some other tools) from operating correctly.
This change pushes the access control decision down to the granularity of individual socket options, rather than all socket options, on raw IP sockets. When rip_ctloutput() doesn't implement an option, it will now pass the request directly to in_control() without an access control check. This should restore the functionality of the generic IP socket options for raw sockets in the above-described scenarios, which may be confirmed with the ipsockopt regression test.
RELENG_5 candidate.
Reviewed by: csjp
|
136327 |
09-Oct-2004 |
rwatson |
Acquire the send socket buffer lock around tcp_output() activities reaching into the socket buffer. This prevents a number of potential races, including dereferencing of sb_mb while unlocked leading to a NULL pointer deref (how I found it). Potentially this might also explain other "odd" TCP behavior on SMP boxes (although haven't seen it reported).
RELENG_5 candidate.
|
136226 |
07-Oct-2004 |
rwatson |
When running with debug.mpsafenet=0, initialize IP multicast routing callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an assertion regarding Giant if they enter other parts of the stack from the callout.
MFC after: 3 days Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
|
136151 |
05-Oct-2004 |
ps |
- Estimate the amount of data in flight in sack recovery and use it to control the packets injected while in sack recovery (for both retransmissions and new data). - Cleanups to the sack codepaths in tcp_output.c and tcp_sack.c. - Add a new sysctl (net.inet.tcp.sack.initburst) that controls the number of sack retransmissions done upon initiation of sack recovery.
Submitted by: Mohan Srinivasan <mohans@yahoo-inc.com>
|
136075 |
03-Oct-2004 |
green |
Add support to IPFW for matching by TCP data length.
|
136073 |
03-Oct-2004 |
green |
Add support to IPFW for classification based on "diverted" status (that is, input via a divert socket).
|
136071 |
03-Oct-2004 |
green |
Add to IPFW the ability to do ALTQ classification/tagging.
|
135977 |
30-Sep-2004 |
green |
Validate the action pointer to be within the rule size, so that trying to add corrupt ipfw rules would not potentially panic the system or worse.
|
135920 |
29-Sep-2004 |
mlaier |
Add an additional struct inpcb * argument to pfil(9) in order to enable passing along socket information. This is required to work around a LOR with the socket code which results in an easy reproducible hard lockup with debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do so later. The missing piece is to turn the filter locking into a leaf lock and will follow in a seperate (later) commit.
This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in forseeable future.
Suggested by: rwatson A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;) Reviewed by: rwatson, csjp Tested by: -pf, -ipfw, LINT, csjp and myself MFC after: 3 days
LOR IDs: 14 - 17 (not fixed yet)
|
135919 |
29-Sep-2004 |
rwatson |
Assign so_pcb to NULL rather than 0 as it's a pointer.
Spotted by: dwhite
|
135731 |
24-Sep-2004 |
maxim |
o Turn net.inet.ip.check_interface sysctl off by default.
When net.inet.ip.check_interface was MFCed to RELENG_4 3+ years ago in rev. 1.130.2.17 ip_input.c it was 1 by default but shortly changed to 0 (accidently?) in rev. 1.130.2.20 in RELENG_4 only. Among with the fact this knob is not documented it breaks POLA especially in bridge environment.
OK'ed by: andre Reviewed by: -current
|
135318 |
16-Sep-2004 |
andre |
Fix an out of bounds write during the initialization of the PF_INET protocol family to the ip_protox[] array. The protocol number of IPPROTO_DIVERT is larger than IPPROTO_MAX and was initializing memory beyond the array. Catch all these kinds of errors by ignoring protocols that are higher than IPPROTO_MAX or 0 (zero).
Add more comments ip_init().
|
135275 |
15-Sep-2004 |
andre |
Clarify some comments for the M_FASTFWD_OURS case in ip_input().
|
135274 |
15-Sep-2004 |
andre |
Remove the last two global variables that are used to store packet state while it travels through the IP stack. This wasn't much of a problem because IP source routing is disabled by default but when enabled together with SMP and preemption it would have very likely cross-corrupted the IP options in transit.
The IP source route options of a packet are now stored in a mtag instead of the global variable.
|
135168 |
13-Sep-2004 |
andre |
Do not allow 'ipfw fwd' command when IPFIREWALL_FORWARD is not compiled into the kernel. Return EINVAL instead.
|
135167 |
13-Sep-2004 |
andre |
If we have to 'ipfw fwd'-tag a packet the second time in ipfw_pfil_out() don't prepend an already existing tag again. Instead unlink it and prepend it again to have it as the first tag in the chain.
PR: kern/71380
|
135160 |
13-Sep-2004 |
andre |
Make comments more clear for the packet changed cases after pfil hooks.
|
135158 |
13-Sep-2004 |
andre |
Fix ip_input() fallback for the destination modified cases (from the packet filters). After the ipfw to pfil move ip_input() expects M_FASTFWD_OURS tagged packets to have ip_len and ip_off in host byte order instead of network byte order.
PR: kern/71652 Submitted by: mlaier (patch)
|
135154 |
13-Sep-2004 |
andre |
Make 'ipfw tee' behave as inteded and designed. A tee'd packet is copied and sent to the DIVERT socket while the original packet continues with the next rule. Unlike a normally diverted packet no IP reassembly attemts are made on tee'd packets and they are passed upwards totally unmodified.
Note: This will not be MFC'd to 4.x because of major infrastucture changes.
PR: kern/64240 (and many others collapsed into that one)
|
134991 |
09-Sep-2004 |
glebius |
Check flag do_bridge always, even if kernel was compiled without BRIDGE support. This makes dynamic bridge.ko working.
Reviewed by: sam Approved by: julian (mentor) MFC after: 1 week
|
134852 |
06-Sep-2004 |
jmg |
revert comment from rev1.158 now that rev1.225 backed it out..
MFC after: 3 days
|
134823 |
05-Sep-2004 |
glebius |
Recover normal behavior: return EINVAL to attempt to add a divert rule when module is built without IPDIVERT.
Silence from: andre Approved by: julian (mentor)
|
134793 |
05-Sep-2004 |
jmg |
fix up socket/ip layer violation... don't assume/know that SO_DONTROUTE == IP_ROUTETOIF and SO_BROADCAST == IP_ALLOWBROADCAST...
|
134391 |
27-Aug-2004 |
andre |
Apply error and success logic consistently to the function netisr_queue() and its users.
netisr_queue() now returns (0) on success and ERRNO on failure. At the moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full) are supported.
Previously it would return (1) on success but the return value of IF_HANDOFF() was interpreted wrongly and (0) was actually returned on success. Due to this schednetisr() was never called to kick the scheduling of the isr. However this was masked by other normal packets coming through netisr_dispatch() causing the dequeueing of waiting packets.
PR: kern/70988 Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp> MFC after: 3 days
|
134385 |
27-Aug-2004 |
andre |
In the case the destination of a packet was changed by the packet filter to point to a local IP address; and the packet was sourced from this host we fill in the m_pkthdr.rcvif with a pointer to the loopback interface.
Before the function ifunit("lo0") was used to obtain the ifp. However this is sub-optimal from a performance point of view and might be dangerous if the loopback interface has been renamed. Use the global variable 'loif' instead which always points to the loopback interface.
Submitted by: brooks
|
134384 |
27-Aug-2004 |
andre |
Remove a junk line left over from the recent IPFW to PFIL_HOOKS conversion.
|
134383 |
27-Aug-2004 |
andre |
Always compile PFIL_HOOKS into the kernel and remove the associated kernel compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and thus it becomes a standard part of the network stack.
If no hooks are connected the entire packet filter hooks section and related activities are jumped over. This removes any performance impact if no hooks are active.
Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.
|
134346 |
26-Aug-2004 |
ru |
Revert the last change to sys/modules/ipfw/Makefile and fix a standalone module build in a better way.
Silence from: andre MFC after: 3 days
|
134290 |
25-Aug-2004 |
pjd |
Allocate memory when dumping pipes with M_WAITOK flag. On a system with huge number of pipes, M_NOWAIT failes almost always, because of memory fragmentation. My fix is different than the patch proposed by Pawel Malachowski, because in FreeBSD 5.x we cannot sleep while holding dummynet mutex (in 4.x there is no such lock). My fix is also ugly, but there is no easy way to prepare nice and clean fix.
PR: kern/46557 Submitted by: Eugene Grosbein <eugen@grosbein.pp.ru> Reviewed by: mlaier
|
134172 |
22-Aug-2004 |
mlaier |
Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel. Previously the early drop was disabled unconditionally for ALTQ-enabled kernels.
This should give some benefit for the normal gateway + LAN-server case with a busy LAN leg and an ALTQ managed uplink.
Reviewed and style help from: cperciva, pjd
|
134142 |
22-Aug-2004 |
rwatson |
When sliding the m_data pointer forward, update m_pktrhdr.len as well as m_len, or the pkthdr length will be inconsistent with the actual length of data in the mbuf chain. The symptom of this occuring was "out of data" warnings from in_cksum_skip() on large UDP packets sent via the loopback interface.
Foot shot: green
|
134122 |
21-Aug-2004 |
csjp |
When a prison is given the ability to create raw sockets (when the security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged access to jails is given out, it is possible for prison root to manipulate various network parameters which effect the host environment. This commit plugs a number of security holes associated with the use of raw sockets and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add any ioctl commands, they should use super-user checks where necessary, as it is possible for PRISON root to make it this far in execution. - Add super-user checks for the execution of the SIOCGETVIFCNT and SIOCGETSGCNT IP multicast ioctl commands. - Add a super-user check to rip_ctloutput(). If the calling cred is PRISON root, make sure the socket option name is IP_HDRINCL, otherwise deny the request.
Although this patch corrects a number of security problems associated with raw sockets and prisons, the warning in jail(8) should still apply, and by default we should keep the default value of security.jail.allow_raw_sockets MIB to 0 (or disabled) until we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the references to curthread.
This may be a MFC candidate for RELENG_5.
Reviewed by: rwatson Approved by: bmilekic (mentor)
|
134119 |
21-Aug-2004 |
rwatson |
When prepending space onto outgoing UDP datagram payloads to hold the UDP/IP header, make sure that space is also allocated for the link layer header. If an mbuf must be allocated to hold the UDP/IP header (very likely), then this will avoid an additional mbuf allocation at the link layer. This trick is also used by TCP and other protocols to avoid extra calls to the mbuf allocator in the ethernet (and related) output routines.
|
134055 |
20-Aug-2004 |
andre |
Fix a stupid typo which prevented an ipfw KLD unload from successfully cleaning up its remains. Do not terminate 'if' lines with ';'.
Spotted by: claudio@OpenBSD.ORG (sitting 3m from my desk) Pointy hat to: andre
|
134049 |
19-Aug-2004 |
andre |
When unloading ipfw module use callout_drain() to make absolutely sure that all callouts are stopped and finished. Move it before IPFW_LOCK() to avoid deadlocking when draining callouts.
|
134041 |
19-Aug-2004 |
andre |
For IPv6 access pointer to tcpcb only after we have checked it is valid.
Found by: Coverity's automated analysis (via Ted Unangst)
|
134026 |
19-Aug-2004 |
andre |
Give a useful error message if someone tries to compile IPFIREWALL into the kernel without specifying PFIL_HOOKS as well.
|
134023 |
19-Aug-2004 |
andre |
Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building the ipfw KLD.
For IPFIREWALL_FORWARD this does not have any side effects. If the module has it but not the kernel it just doesn't do anything.
For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT compiled in too. However this is the least disturbing behaviour. The user can just recompile either module or the kernel to match the other one. The access to the machine is not denied if ipfw refuses to load.
|
134022 |
19-Aug-2004 |
andre |
Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts and to be able to disable ipfw if it was compiled directly into the kernel.
|
133994 |
19-Aug-2004 |
rwatson |
Push down pcbinfo and inpcb locking from udp_send() into udp_output(). This provides greater context for the locking and allows us to avoid locking the pcbinfo structure if not binding operations will take place (i.e., already bound, connected, and no expliti sendto() address).
|
133993 |
19-Aug-2004 |
rwatson |
In in_pcbrehash(), do assert the inpcb lock as well as the pcbinfo lock.
|
133923 |
18-Aug-2004 |
rwatson |
Fix build of ip_input.c with "options IPSEC" -- the "pass:" label is used with both FAST_IPSEC and IPSEC, but was defined for only FAST_IPSEC.
|
133922 |
18-Aug-2004 |
peter |
Make the kernel compile again if you are not using PFIL_HOOKS
|
133920 |
17-Aug-2004 |
andre |
Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland and preserves the ipfw ABI. The ipfw core packet inspection and filtering functions have not been changed, only how ipfw is invoked is different.
However there are many changes how ipfw is and its add-on's are handled:
In general ipfw is now called through the PFIL_HOOKS and most associated magic, that was in ip_input() or ip_output() previously, is now done in ipfw_check_[in|out]() in the ipfw PFIL handler.
IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to be diverted is checked if it is fragmented, if yes, ip_reass() gets in for reassembly. If not, or all fragments arrived and the packet is complete, divert_packet is called directly. For 'tee' no reassembly attempt is made and a copy of the packet is sent to the divert socket unmodified. The original packet continues its way through ip_input/output().
ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet with the new destination sockaddr_in. A check if the new destination is a local IP address is made and the m_flags are set appropriately. ip_input() and ip_output() have some more work to do here. For ip_input() the m_flags are checked and a packet for us is directly sent to the 'ours' section for further processing. Destination changes on the input path are only tagged and the 'srcrt' flag to ip_forward() is set to disable destination checks and ICMP replies at this stage. The tag is going to be handled on output. ip_output() again checks for m_flags and the 'ours' tag. If found, the packet will be dropped back to the IP netisr where it is going to be picked up by ip_input() again and the directly sent to the 'ours' section. When only the destination changes, the route's 'dst' is overwritten with the new destination from the forward m_tag. Then it jumps back at the route lookup again and skips the firewall check because it has been marked with M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with 'option IPFIREWALL_FORWARD' to enable it.
DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will then inject it back into ip_input/ip_output() after it has served its time. Dummynet packets are tagged and will continue from the next rule when they hit the ipfw PFIL handlers again after re-injection.
BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.
More detailed changes to the code:
conf/files Add netinet/ip_fw_pfil.c.
conf/options Add IPFIREWALL_FORWARD option.
modules/ipfw/Makefile Add ip_fw_pfil.c.
net/bridge.c Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw is still directly invoked to handle layer2 headers and packets would get a double ipfw when run through PFIL_HOOKS as well.
netinet/ip_divert.c Removed divert_clone() function. It is no longer used.
netinet/ip_dummynet.[ch] Neither the route 'ro' nor the destination 'dst' need to be stored while in dummynet transit. Structure members and associated macros are removed.
netinet/ip_fastfwd.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code.
netinet/ip_fw.h Removed 'ro' and 'dst' from struct ip_fw_args.
netinet/ip_fw2.c (Re)moved some global variables and the module handling.
netinet/ip_fw_pfil.c New file containing the ipfw PFIL handlers and module initialization.
netinet/ip_input.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code. ip_forward() does not longer require the 'next_hop' struct sockaddr_in argument. Disable early checks if 'srcrt' is set.
netinet/ip_output.c Removed all direct ipfw handling code and replace it with the new 'ipfw forward' handling code.
netinet/ip_var.h Add ip_reass() as general function. (Used from ipfw PFIL handlers for IPDIVERT.)
netinet/raw_ip.c Directly check if ipfw and dummynet control pointers are active.
netinet/tcp_input.c Rework the 'ipfw forward' to local code to work with the new way of forward tags.
netinet/tcp_sack.c Remove include 'opt_ipfw.h' which is not needed here.
sys/mbuf.h Remove m_claim_next() macro which was exclusively for ipfw 'forward' and is no longer needed.
Approved by: re (scottl)
|
133874 |
16-Aug-2004 |
rwatson |
White space cleanup for netinet before branch:
- Trailing tab/space cleanup - Remove spurious spaces between or before tabs
This change avoids touching files that Andre likely has in his working set for PFIL hooks changes for IPFW/DUMMYNET.
Approved by: re (scottl) Submitted by: Xin LI <delphij@frontfree.net>
|
133849 |
16-Aug-2004 |
obrien |
Put the 'antispoof' opcode in the proper place in the opcode list such that it doesn't break the ipfw2 ABI.
|
133720 |
14-Aug-2004 |
dwmalone |
Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD have already done this, so I have styled the patch on their work:
1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID.
2) named the sysctl net.inet.ip.random_id
3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns.
The sysctl defaults to 0 (sequential IP IDs).
Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
|
133719 |
14-Aug-2004 |
phk |
Fix outgoing ICMP on global instance.
|
133600 |
12-Aug-2004 |
csjp |
Add the ability to associate ipfw rules with a specific prison ID. Since the only thing truly unique about a prison is it's ID, I figured this would be the most granular way of handling this.
This commit makes the following changes:
- Adds tokenizing and parsing for the ``jail'' command line option to the ipfw(8) userspace utility. - Append the ipfw opcode list with O_JAIL. - While Iam here, add a comment informing others that if they want to add additional opcodes, they should append them to the end of the list to avoid ABI breakage. - Add ``fw_prid'' to the ipfw ucred cache structure. - When initializing ucred cache, if the process is jailed, set fw_prid to the prison ID, otherwise set it to -1. - Update man page to reflect these changes.
This change was a strong motivator behind the ucred caching mechanism in ipfw.
A sample usage of this new functionality could be:
ipfw add count ip from any to any jail 2
It should be noted that because ucred based constraints are only implemented for TCP and UDP packets, the same applies for jail associations.
Conceptual head nod by: pjd Reviewed by: rwatson Approved by: bmilekic (mentor)
|
133591 |
12-Aug-2004 |
dwmalone |
In tcp6_ctlinput, lock tcbinfo around the call to syncache_unreach so that the locks held are the same as the IPv4 case.
Reviewed by: rwatson
|
133557 |
12-Aug-2004 |
andre |
Fix two cases of incorrect IPQ_UNLOCK'ing in the merged ip_reass() function. The first one was going to 'dropfrag', which unlocks the IPQ, before the lock was aquired; The second one doing a unlock and then a 'goto dropfrag' which led to a double-unlock.
Tripped over by: des
|
133532 |
12-Aug-2004 |
rwatson |
When udp_send() fails, make sure to free the control mbufs as well as the data mbuf. This was done in most error cases, but not the case where the inpcb pointer is surprisingly NULL.
|
133517 |
11-Aug-2004 |
andre |
Backout removal of UMA_ZONE_NOFREE flag for all zones which are established for structures with timers in them. It might be that a timer might fire even when the associated structure has already been free'd. Having type- stable storage in this case is beneficial for graceful failure handling and debugging.
Discussed with: bosko, tegge, rwatson
|
133509 |
11-Aug-2004 |
andre |
Remove the UMA_ZONE_NOFREE flag to all uma_zcreate() calls in the IP and TCP code. This flag would have prevented giving back excessive free slabs to the global pool after a transient peak usage.
|
133497 |
11-Aug-2004 |
andre |
Make use of in_localip() function and replace previous direct LIST_FOREACH loops over INADDR_HASH.
|
133486 |
11-Aug-2004 |
andre |
Add the function in_localip() which returns 1 if an internet address is for the local host and configured on one of its interfaces.
|
133485 |
11-Aug-2004 |
andre |
Only invoke verify_path() for verrevpath and versrcreach when we have an IP packet.
|
133482 |
11-Aug-2004 |
andre |
Only check for local broadcast addresses if the mbuf is flagged with M_BCAST.
|
133481 |
11-Aug-2004 |
andre |
Consistently use NULL for pointer comparisons.
|
133480 |
11-Aug-2004 |
andre |
Make IP fastforwarding ALTQ-aware by adding the input traffic conditioner check and disabling the early output interface queue length check.
|
133477 |
11-Aug-2004 |
andre |
Correct the displayed bandwidth calculation for a readout via sysctl. The saved value does not have to be scaled with HZ; it is already in bytes per second. Only the multiply by eight remains to show bits per second (bps).
|
133469 |
11-Aug-2004 |
rwatson |
Assert the locks of inpcbinfo's and inpcb's passed into in_pcbconnect() and in_pcbconnect_setup(), since these functions frob the port and address state of inpcbs.
|
133390 |
09-Aug-2004 |
andre |
Make a comment that IP source routing is not SMP and PREEMPTION safe.
|
133389 |
09-Aug-2004 |
andre |
Make a comment that "ipfw forward" is not SMP and PREEMPTION safe.
|
133387 |
09-Aug-2004 |
andre |
New ipfw option "antispoof":
For incoming packets, the packet's source address is checked if it belongs to a directly connected network. If the network is directly connected, then the interface the packet came on in is compared to the interface the network is connected to. When incoming interface and directly connected interface are not the same, the packet does not match.
Usage example:
ipfw add deny ip from any to any not antispoof in
Manpage education by: ru
|
133192 |
06-Aug-2004 |
rwatson |
Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead structures, allowing in6_pcbnotify() to lock the pcbinfo and each inpcb that it notifies of ICMPv6 events. This prevents inpcb assertions from firing when IPv6 generates and delievers event notifications for inpcbs.
Reported by: kuriyama Tested by: kuriyama
|
133189 |
06-Aug-2004 |
rwatson |
When iterating the UDP inpcb list processing an inbound broadcast or multicast packet, we don't need to acquire the inpcb mutex unless we are actually using inpcb fields other than the bound port and address. Since we hold the pcbinfo lock already, these can't change. Defer acquiring the inpcb mutex until we have a high chance of a match. This avoids about 120 mutex operations per UDP broadcast packet received on one of my work systems.
Reviewed by: sam
|
133128 |
04-Aug-2004 |
rwatson |
Now that IPv6 performs basic in6pcb and inpcb locking, enable inpcb lock assertions even if IPv6 is compiled into the kernel. Previously, inclusion of IPv6 and locking assertions would result in a rapid assertion failure as IPv6 was not properly locking inpcbs.
|
133121 |
04-Aug-2004 |
marcus |
Fix Skinny and PPTP NAT'ing after the introduction of the {ip,tcp,udp}_next functions. Basically, the ip_next() function was used to get the PPTP and Skinny headers when tcp_next() should have been used instead. Symptoms of this included a segfault in natd when trying to process a PPTP or Skinny packet.
Approved by: des
|
133074 |
03-Aug-2004 |
andre |
o Delayed checksums are now calculated in divert_packet() for diverted packets Remove the XXX-escaped code that did it in ip_output()'s IPHACK section.
|
133072 |
03-Aug-2004 |
andre |
o Move the inflight sysctls to their own sub-tree under net.inet.tcp to be more consistent with the other sysctls around it.
|
133069 |
03-Aug-2004 |
andre |
o Move all parts of the IP reassembly process into the function ip_reass() to make it fully self-contained. o ip_reass() now returns a new mbuf with the reassembled packet and ip->ip_len including the IP header. o Computation of the delayed checksum is moved into divert_packet().
Reviewed by: silby
|
133046 |
03-Aug-2004 |
hsu |
Fix bug with tracking the previous element in a list.
Found by: edrt@citiz.net Submitted by: pavlin@icir.org
|
132794 |
28-Jul-2004 |
yar |
Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent manner. Therefore she knocks up the server using PF_INET6 only and allows the IPv6 socket to accept mapped IPv4 as well. An evil hacker known on IRC as cheshire_cat has an account in the same system. He starts a process listening on the same port as used by Alice's server, but in PF_INET. As a consequence, cheshire_cat will distract all IPv4 traffic supposed to go to Alice's server.
Such sort of port theft was initially enabled by copying the code that implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for the implied case of the same owner for both connections. After this change, the above scenario will be impossible. In the same setting, the user who attempts to start his server last will get EADDRINUSE.
Of course, using IPv4 mapped to IPv6 leads to security complications in the first place, but there is no reason to make it even more unsafe.
This change doesn't apply to KAME since it affects a FreeBSD-specific part of the code. It doesn't modify the out-of-box behaviour of the TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.
MFC after: 1 month
|
132717 |
28-Jul-2004 |
jayanth |
Fix a bug in the sack code that was causing data to be retransmitted with the FIN bit set for all segments, if a FIN has already been sent before. The fix will allow the FIN bit to be set for only the last segment, in case it has to be retransmitted.
Fix another bug that would have caused snd_nxt to be pulled by len if there was an error from ip_output. snd_nxt should not be touched during sack retransmissions.
|
132676 |
26-Jul-2004 |
jayanth |
Fix for a SACK bug where the very last segment retransmitted from the SACK scoreboard could result in the next (untransmitted) segment to be skipped.
|
132675 |
26-Jul-2004 |
jmg |
compare pointer against NULL, not 0
when inpcb is NULL, this is no longer invalid since jlemon added the tcp_twstart function... this prevents close "failing" w/ EINVAL when it really was successful...
Reviewed by: jeremy (NetBSD)
|
132653 |
26-Jul-2004 |
cperciva |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags.
The old name is still defined, but will be removed in a few days (unless I hear any complaints...)
Discussed with: rwatson, scottl Requested by: jhb
|
132510 |
21-Jul-2004 |
andre |
Extend versrcreach by checking against the rt_flags for RTF_REJECT and RTF_BLACKHOLE as well.
To quote the submitter:
The uRPF loose-check implementation by the industry vendors, at least on Cisco and possibly Juniper, will fail the check if the route of the source address is pointed to Null0 (on Juniper, discard or reject route). What this means is, even if uRPF Loose-check finds the route, if the route is pointed to blackhole, uRPF loose-check must fail. This allows people to utilize uRPF loose-check mode as a pseudo-packet-firewall without using any manual filtering configuration -- one can simply inject a IGP or BGP prefix with next-hop set to a static route that directs to null/discard facility. This results in uRPF Loose-check failing on all packets with source addresses that are within the range of the nullroute.
Submitted by: James Jun <james@towardex.com>
|
132469 |
20-Jul-2004 |
rwatson |
M_PREPEND() the IP header on to the front of an outgoing raw IP packet using M_DONTWAIT rather than M_WAITOK to avoid sleeping on memory while holding a mutex.
|
132418 |
19-Jul-2004 |
jayanth |
Let IN_FASTREOCOVERY macro decide if we are in recovery mode.
Nuke sackhole_limit for now. We need to add it back to limit the total number of sack blocks in the system.
|
132417 |
19-Jul-2004 |
jayanth |
Fix a potential panic in the SACK code that was causing 1) data to be sent to the right of snd_recover. 2) send more data then whats in the send buffer.
The fix is to postpone sack retransmit to a subsequent recovery episode if the current retransmit pointer is beyond snd_recover.
Thanks to Mohan Srinivasan for helping fix the bug.
Submitted by:Daniel Lang
|
132315 |
17-Jul-2004 |
dwmalone |
Fix the !INET6 build.
Reported by: alc
|
132307 |
17-Jul-2004 |
dwmalone |
The tcp syncache code was leaving the IPv6 flowlabel uninitialised for the SYN|ACK packet and then letting in6_pcbconnect set the flowlabel later. Arange for the syncache/syncookie code to set and recall the flow label so that the flowlabel used for the SYN|ACK is consistent. This is done by using some of the cookie (when tcp cookies are enabeled) and by stashing the flowlabel in syncache.
Tested and Discovered by: Orla McGann <orly@cnri.dit.ie> Approved by: ume, silby MFC after: 1 month
|
132280 |
17-Jul-2004 |
mlaier |
Define semantic of M_SKIP_FIREWALL more precisely, i.e. also pass associated icmp_error() packets. While here retire PACKET_TAG_PF_GENERATED (which served the same purpose) and use M_SKIP_FIREWALL in pf as well. This should speed up things a bit as we get rid of the tag allocations.
Discussed with: juli
|
132274 |
17-Jul-2004 |
jmallett |
Make M_SKIP_FIREWALL a global (and semantic) flag, preventing anything from using M_PROTO6 and possibly shooting someone's foot, as well as allowing the firewall to be used in multiple passes, or with a packet classifier frontend, that may need to explicitly allow a certain packet. Presently this is handled in the ipfw_chk code as before, though I have run with it moved to upper layers, and possibly it should apply to ipfilter and pf as well, though this has not been investigated.
Discussed with: luigi, rwatson
|
132259 |
16-Jul-2004 |
ume |
when IN6P_AUTOFLOWLABEL is set, the flowlabel is not set on outgoing tcp connections.
Reported by: Orla McGann <orly@cnri.dit.ie> Reviewed by: Orla McGann <orly@cnri.dit.ie> Obtained from: KAME
|
132199 |
15-Jul-2004 |
phk |
Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events.
A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".
|
132107 |
13-Jul-2004 |
stefanf |
Remove erroneous semicolons.
|
132044 |
12-Jul-2004 |
rwatson |
After each label in tcp_input(), assert the inpcbinfo and inpcb lock state that we expect.
|
131840 |
08-Jul-2004 |
brian |
Change the following environment variables to kernel options:
bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO
- i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel.
Pointed out by: dwmalone
|
131814 |
08-Jul-2004 |
brian |
Change the following kernel options to environment variables:
BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to
This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf:
bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1"
or even setting the variables manually from the OK prompt.
|
131700 |
06-Jul-2004 |
des |
Push WARNS back up to 6, but define NO_WERROR; I want the warts out in the open where people can see them and hopefully fix them.
|
131699 |
06-Jul-2004 |
des |
Introduce inline {ip,udp,tcp}_next() functions which take a pointer to an {ip,udp,tcp} header and return a void * pointing to the payload (i.e. the first byte past the end of the header and any required padding). Use them consistently throughout libalias to a) reduce code duplication, b) improve code legibility, c) get rid of a bunch of alignment warnings.
|
131693 |
06-Jul-2004 |
des |
Rewrite twowords() to access its argument through a char pointer and not a short pointer. The previous implementation seems to be in a gray zone of the C standard, and GCC generates incorrect code for it at -O2 or higher on some platforms.
|
131690 |
06-Jul-2004 |
des |
Temporarily lower WARNS to 3 while I figure out the alignment issues on alpha.
|
131614 |
05-Jul-2004 |
des |
Make libalias WARNS?=6-clean. This mostly involves renaming variables named link, foo_link or link_foo to lnk, foo_lnk or lnk_foo, fixing signed / unsigned comparisons, and shoving unused function arguments under the carpet.
I was hoping WARNS?=6 might reveal more serious problems, and perhaps the source of the -O2 breakage, but found no smoking gun.
|
131613 |
05-Jul-2004 |
des |
Parenthesize return values.
|
131612 |
05-Jul-2004 |
des |
Mechanical whitespace cleanup.
|
131566 |
04-Jul-2004 |
phk |
Add LibAliasOutTry() which checks a packet for a hit in the tables, but does not create a new entry if none is found.
|
131504 |
02-Jul-2004 |
ru |
Mechanically kill hard sentence breaks.
|
131427 |
01-Jul-2004 |
jayanth |
On receiving 3 duplicate acknowledgements, SACK recovery was not being entered correctly. Fix this problem by separating out the SACK and the newreno cases. Also, check if we are in FASTRECOVERY for the sack case and if so, turn off dupacks.
Fix an issue where the congestion window was not being incremented by ssthresh.
Thanks to Mohan Srinivasan for finding this problem.
|
131420 |
01-Jul-2004 |
ru |
Bumped document date. Fixed markup. Fixed examples to match the new API.
|
131208 |
27-Jun-2004 |
phk |
Rwatson, write 100 times for tomorrow:
First unlock, then assign NULL to pointer.
|
131178 |
27-Jun-2004 |
pjd |
Those are unneeded too.
|
131177 |
27-Jun-2004 |
pjd |
Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.
|
131151 |
26-Jun-2004 |
rwatson |
Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer:
- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().
- When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets.
For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.
|
131147 |
26-Jun-2004 |
rwatson |
Remove spl's from TCP protocol entry points. While not all locking is merged here yet, this will ease the merge process by bringing the locked and unlocked versions into sync.
|
131079 |
25-Jun-2004 |
ps |
White space & spelling fixes
Submitted by: Xin LI <delphij@frontfree.net>
|
131078 |
25-Jun-2004 |
bms |
Whitespace.
|
131018 |
24-Jun-2004 |
rwatson |
Broaden scope of the socket buffer lock when processing an ACK so that the read and write of sb_cc are atomic. Call sbdrop_locked() instead of sbdrop() since we already hold the socket buffer lock.
|
131017 |
24-Jun-2004 |
rwatson |
Protect so_oobmark with with SOCKBUF_LOCK(&so->so_rcv), and broaden locking in tcp_input() for TCP packets with urgent data pointers to hold the socket buffer lock across testing and updating oobmark from just protecting sb_state.
Update socket locking annotations
|
131012 |
24-Jun-2004 |
rwatson |
In ip_ctloutput(), acquire the inpcb lock around some of the basic inpcb flag and status updates.
|
131011 |
24-Jun-2004 |
rwatson |
When asserting non-Giant locks in the network stack, also assert Giant if debug.mpsafenet=0, as any points that require synchronization in the SMPng world also required it in the Giant-world:
- inpcb locks (including IPv6) - inpcbinfo locks (including IPv6) - dummynet subsystem lock - ipfw2 subsystem lock
|
131006 |
24-Jun-2004 |
rwatson |
Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.
|
130993 |
23-Jun-2004 |
ps |
Move the sack sysctl's under net.inet.tcp.sack
net.inet.tcp.do_sack -> net.inet.tcp.sack.enable net.inet.tcp.sackhole_limit -> net.inet.tcp.sack.sackhole_limit
Requested by: wollman
|
130989 |
23-Jun-2004 |
ps |
Add support for TCP Selective Acknowledgements. The work for this originated on RELENG_4 and was ported to -CURRENT.
The scoreboarding code was obtained from OpenBSD, and many of the remaining changes were inspired by OpenBSD, but not taken directly from there.
You can enable/disable sack using net.inet.tcp.do_sack. You can also limit the number of sack holes that all senders can have in the scoreboard with net.inet.tcp.sackhole_limit.
Reviewed by: gnn Obtained from: Yahoo! (Mohan Srinivasan, Jayanth Vijayaraghavan)
|
130901 |
22-Jun-2004 |
rwatson |
Acquire socket lock around frobbing of socket state in divert sockets.
|
130900 |
22-Jun-2004 |
rwatson |
Prefer use of the inpcb as a MAC label source for outgoing packets sent via divert sockets, when available.
|
130821 |
20-Jun-2004 |
rwatson |
If debug.mpsafenet is set, initialize TCP callouts as CALLOUT_MPSAFE.
|
130811 |
20-Jun-2004 |
rwatson |
Assert the inpcb lock before letting MAC check whether we can deliver to the inpcb in tcp_input().
|
130810 |
20-Jun-2004 |
rwatson |
IP multicast code no longer needs to acquire Giant before appending an mbuf onto a socket buffer. This is left over from debug.mpsafenet affecting the forwarding/bridging plane only.
|
130701 |
18-Jun-2004 |
rwatson |
In tcp_ctloutput(), don't hold the inpcb lock over a call to ip_ctloutput(), as it may need to perform blocking memory allocations. This also improves consistency with locking relative to other points that call into ip_ctloutput().
Bumped into by: Grover Lines <grover@ceribus.net>
|
130685 |
18-Jun-2004 |
bms |
Check that m->m_pkthdr.rcvif is not NULL before checking if a packet was received on a broadcast address on the input path. Under certain circumstances this could result in a panic, notably for locally-generated packets which do not have m_pkthdr.rcvif set.
This is a similar situation to that which is solved by src/sys/netinet/ip_icmp.c rev 1.66.
PR: kern/52935
|
130683 |
18-Jun-2004 |
bms |
Appease GCC.
|
130666 |
18-Jun-2004 |
bms |
If SO_DEBUG is enabled for a TCP socket, and a received segment is encapsulated within an IPv6 datagram, do not abuse the 'ipov' pointer when registering trace records. 'ipov' is specific to IPv4, and will therefore be uninitialized.
[This fandango is only necessary in the first place because of our host-byte-order IP field pessimization.]
PR: kern/60856 Submitted by: Galois Zheng
|
130664 |
18-Jun-2004 |
bms |
Don't set FIN on a retransmitted segment after a FIN has been sent, unless the segment really contains the last of the data for the stream.
PR: kern/34619 Obtained from: OpenBSD (tcp_output.c rev 1.47) Noticed by: Joseph Ishac Reviewed by: George Neville-Neil
|
130662 |
18-Jun-2004 |
bms |
Ensure that dst is bzeroed before calling rtalloc_ign(), to avoid possible routing table corruption.
PR: kern/40563, freebsd4/432 (KAME) Obtained from: NetBSD (in_gif.c rev 1.26.10.1) Requested by: Jean-Luc Richier
|
130613 |
16-Jun-2004 |
mlaier |
Commit pf version 3.5 and link additional files to the kernel build.
Version 3.5 brings: - Atomic commits of ruleset changes (reduce the chance of ending up in an inconsistent state). - A 30% reduction in the size of state table entries. - Source-tracking (limit number of clients and states per client). - Sticky-address (the flexibility of round-robin with the benefits of source-hash). - Significant improvements to interface handling. - and many more ...
|
130609 |
16-Jun-2004 |
mlaier |
Prepare for pf 3.5 import: - Remove pflog and pfsync modules. Things will change in such a fashion that there will be one module with pf+pflog that can be loaded into GENERIC without problems (which is what most people want). pfsync is no longer possible as a module. - Add multicast address for in-kernel multicast pfsync protocol. Protocol glue will follow once the import is done. - Add one more mbuf tag
|
130590 |
16-Jun-2004 |
maxim |
o connect(2): if there is no a route to the destination do not pick up the first local ip address for the source ip address, return ENETUNREACH instead.
Submitted by: Gleb Smirnoff Reviewed by: -current (silence)
|
130584 |
16-Jun-2004 |
bms |
Fix build for IPSEC && !INET6
PR: kern/66125 Submitted by: Cyrille Lefevre
|
130583 |
16-Jun-2004 |
bms |
Reverse a patch which has no effect on -CURRENT and should probably be applied directly to -STABLE.
Noticed by: iedowse Pointy hat to: bms
|
130581 |
16-Jun-2004 |
bms |
In ip_forward(), when calculating the MTU in effect for an IPSEC transport mode tunnel, take the per-route MTU into account, *if* and *only if* it is non-zero (as found in struct rt_metrics/rt_metrics_lite).
PR: kern/42727 Obtained from: NetBSD (ip_input.c rev 1.151)
|
130580 |
16-Jun-2004 |
bms |
In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chain is sane, and ipsec4_getpolicybyaddr() will therefore complete.
PR: kern/42727 Obtained from: KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)
|
130559 |
16-Jun-2004 |
bms |
Disconnect a temporarily-connected UDP socket in out-of-mbufs case. This fixes the problem of UDP sockets getting wedged in a connected state (and bound to their destination) under heavy load. Temporary bind/connect should probably be deleted in future as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993].
Notes: - INP_LOCK() is already held in udp_output(). The connection is in effect happening at a layer lower than the socket layer, therefore in theory socket locking should not be needed. - Inlining the in_pcbdisconnect() operation buys us nothing (in the case of the current state of the code), as laddr is not part of the inpcb hash or the udbinfo hash. Therefore there should be no need to rehash after restoring laddr in the error case (this was a concern of the original author of the patch).
PR: kern/41765 Requested by: gnn Submitted by: Jinmei Tatuya (with cleanups) Tested by: spray(8)
|
130555 |
16-Jun-2004 |
rwatson |
Convert GIANT_REQUIRED to NET_ASSERT_GIANT for socket access.
|
130513 |
15-Jun-2004 |
rwatson |
Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.
|
130480 |
14-Jun-2004 |
rwatson |
The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state:
SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
|
130416 |
13-Jun-2004 |
mlaier |
Link ALTQ to the build and break with ABI for struct ifnet. Please recompile your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation.
__FreeBSD_version bump will follow.
Tested-by: (i386)LINT
|
130407 |
13-Jun-2004 |
dfr |
Add a new driver to support IP over firewire. This driver is intended to conform to the rfc2734 and rfc3146 standard for IP over firewire and should eventually supercede the fwe driver. Right now the broadcast channel number is hardwired and we don't support MCAP for multicast channel allocation - more infrastructure is required in the firewire code itself to fix these problems.
|
130398 |
13-Jun-2004 |
rwatson |
Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.
|
130387 |
12-Jun-2004 |
rwatson |
Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers.
- In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket.
Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
|
130363 |
11-Jun-2004 |
csjp |
Modify ip fw so that whenever UID or GID constraints exist in a ruleset, the pcb is looked up once per ipfw_chk() activation.
This is done by extracting the required information out of the PCB and caching it to the ipfw_chk() stack. This should greatly reduce PCB looking contention and speed up the processing of UID/GID based firewall rules (especially with large UID/GID rulesets).
Some very basic benchmarks were taken which compares the number of in_pcblookup_hash(9) activations to the number of firewall rules containing UID/GID based contraints before and after this patch.
The results can be viewed here: o http://people.freebsd.org/~csjp/ip_fw_pcb.png
Reviewed by: andre, luigi, rwatson Approved by: bmilekic (mentor)
|
130337 |
11-Jun-2004 |
rwatson |
Remove unneeded Giant acquisition in divert_packet(), which is left over from debug.mpsafenet affecting only the forwarding plane. Giant is now acquired in the ithread/netisr or in the system call code.
|
130333 |
11-Jun-2004 |
rwatson |
Lock down parallel router_info list for tracking multicast IGMP versions of various routers seen:
- Introduce igmp_mtx. - Protect global variable 'router_info_head' and list fields in struct router_info with this mutex, as well as igmp_timers_are_running. - find_rti() asserts that the caller acquires igmp_mtx. - Annotate a failure to check the return value of MALLOC(..., M_NOWAIT).
|
130311 |
10-Jun-2004 |
ru |
init_tables() must be run after sys/net/route.c:route_init().
|
130281 |
09-Jun-2004 |
ru |
Introduce a new feature to IPFW2: lookup tables. These are useful for handling large sparse address sets. Initial implementation by Vsevolod Lobko <seva@ip.net.ua>, refined by me.
MFC after: 1 week
|
130183 |
07-Jun-2004 |
ume |
do not send icmp response if the original packet is encrypted.
Obtained from: KAME MFC after: 1 week
|
130024 |
03-Jun-2004 |
bmilekic |
Move the locking of the pcb into raw_output(). Organize code so that m_prepend() is not called with possibility to wait while the pcb lock is held. What still needs revisiting is whether the ripcbinfo lock is really required here.
Discussed with: rwatson
|
129880 |
30-May-2004 |
phk |
add missing #include <sys/module.h>
|
129876 |
30-May-2004 |
phk |
Add some missing <sys/module.h> includes which are masked by the one on death-row in <sys/kernel.h>
|
129720 |
25-May-2004 |
csjp |
Add a super-user check to ipfw_ctl() to make sure that the calling process is a non-prison root. The security.jail.allow_raw_sockets sysctl variable is disabled by default, however if the user enables raw sockets in prisons, prison-root should not be able to interact with firewall rule sets.
Approved by: rwatson, bmilekic (mentor)
|
129465 |
20-May-2004 |
yar |
When checking for possible port theft, skip over a TCP inpcb unless it's in the closed or listening state (remote address == INADDR_ANY).
If a TCP inpcb is in any other state, it's impossible to steal its local port or use it for port theft. And if there are both closed/listening and connected TCP inpcbs on the same localIP:port couple, the call to in_pcblookup_local() will find the former due to the design of that function.
No objections raised in: -net, -arch MFC after: 1 month
|
129126 |
11-May-2004 |
maxim |
o Calculate a number of bytes to copy (cnt) correctly:
+----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ | | |C| | | | | | | | IP |N|O|L|P| | IP | | IP | | #1 |O|D|E|T| | #2 | | #n | | |P|E|N|R| | | | | +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->| | | src | +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) | dst +-- cp[IPOPT_OFF + 1]
PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks
|
129019 |
07-May-2004 |
maxim |
o IFNAMSIZ does include the trailing \0.
Approved by: andre
o Document net.inet.icmp.reply_src.
|
129017 |
06-May-2004 |
andre |
Provide the sysctl net.inet.ip.process_options to control the processing of IP options.
net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified. net.inet.ip.process_options=1 Process all IP options (default). net.inet.ip.process_options=2 Reject all packets with IP options with ICMP filter prohibited message.
This sysctl affects packets destined for the local host as well as those only transiting through the host (routing).
IP options do not have any legitimate purpose anymore and are only used to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP stacks.
Reviewed by: sam (mentor)
|
128905 |
04-May-2004 |
rwatson |
Switch to using the inpcb MAC label instead of socket MAC label when labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular:
- Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond().
While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned.
Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
|
128904 |
04-May-2004 |
rwatson |
Assert inpcb lock in udp_append().
Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
|
128903 |
04-May-2004 |
rwatson |
Assert the inpcb lock on 'last' in udp_append(), since it's always called with it, and also requires it.
Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
|
128880 |
03-May-2004 |
maxim |
o Fix misindentation in the previous commit.
|
128877 |
03-May-2004 |
andre |
Back out a change that slipped into the previous commit for which other supporting parts have not yet been committed.
Remove pre-mature IP options ignoring option.
|
128872 |
03-May-2004 |
andre |
Optimize IP fastforwarding some more:
o New function ip_findroute() to reduce code duplication for the route lookup cases. (luigi)
o Store ip_len in host byte order on the stack instead of using it via indirection from the mbuf. This allows to defer the host byte conversion to a later point and makes a quicker fallback to normal ip_input() processing. (luigi)
o Check if route is dampned with RTF_REJECT flag and drop packet already here when ARP is unable to resolve destination address. An ICMP unreachable is sent to inform the sender.
o Check if interface output queue is full and drop packet already here. No ICMP notification is sent because signalling source quench is depreciated.
o Check if media_state is down (used for ethernet type interfaces) and drop the packet already here. An ICMP unreachable is sent to inform the sender.
o Do not account sent packets to the interface address counters. They are only for packets with that 'ia' as source address.
o Update and clarify some comments.
Submitted by: luigi (most of it)
|
128829 |
02-May-2004 |
darrenr |
Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.
|
128828 |
02-May-2004 |
darrenr |
oops, I forgot this file in a prior commit (change was still sitting here, uncommitted):
Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
|
128816 |
02-May-2004 |
darrenr |
Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
|
128664 |
26-Apr-2004 |
bmilekic |
Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1.
Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail.
The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list.
Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800
|
128653 |
26-Apr-2004 |
silby |
Tighten up reset handling in order to make reset attacks as difficult as possible while maintaining compatibility with the widest range of TCP stacks.
The algorithm is as follows:
--- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped.
For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. ---
The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check.
Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)
|
128645 |
25-Apr-2004 |
luigi |
Another small set of changes to reduce diffs with the new arp code.
|
128642 |
25-Apr-2004 |
luigi |
remove a stale comment on the behaviour of arpresolve
|
128641 |
25-Apr-2004 |
luigi |
Start the arp timer at init time. It runs so rarely that it makes no sense to wait until the first request.
|
128636 |
25-Apr-2004 |
luigi |
This commit does two things:
1. rt_check() cleanup: rt_check() is only necessary for some address families to gain access to the corresponding arp entry, so call it only in/near the *resolve() routines where it is actually used -- at the moment this is arpresolve(), nd6_storelladdr() (the call is embedded here), and atmresolve() (the call is just before atmresolve to reduce the number of changes). This change will make it a lot easier to decouple the arp table from the routing table.
There is an extra call to rt_check() in if_iso88025subr.c to determine the routing info length. I have left it alone for the time being.
The interface of arpresolve() and nd6_storelladdr() now changes slightly: + the 'rtentry' parameter (really a hint from the upper level layer) is now passed unchanged from *_output(), so it becomes the route to the final destination and not to the gateway. + the routines will return 0 if resolution is possible, non-zero otherwise. + arpresolve() returns EWOULDBLOCK in case the mbuf is being held waiting for an arp reply -- in this case the error code is masked in the caller so the upper layer protocol will not see a failure.
2. arpcom untangling Where possible, use 'struct ifnet' instead of 'struct arpcom' variables, and use the IFP2AC macro to access arpcom fields. This mostly affects the netatalk code.
=== Detailed changes: === net/if_arcsubr.c rt_check() cleanup, remove a useless variable
net/if_atmsubr.c rt_check() cleanup
net/if_ethersubr.c rt_check() cleanup, arpcom untangling
net/if_fddisubr.c rt_check() cleanup, arpcom untangling
net/if_iso88025subr.c rt_check() cleanup
netatalk/aarp.c arpcom untangling, remove a block of duplicated code
netatalk/at_extern.h arpcom untangling
netinet/if_ether.c rt_check() cleanup (change arpresolve)
netinet6/nd6.c rt_check() cleanup (change nd6_storelladdr)
|
128593 |
23-Apr-2004 |
silby |
Wrap two long lines in the previous commit.
|
128592 |
23-Apr-2004 |
andre |
Correct an edge case in tcp_mss() where the cached path MTU from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary.
Make a comment about already pre-initialized default values more clear.
Reviewed by: sam
|
128575 |
23-Apr-2004 |
andre |
Add the option versrcreach to verify that a valid route to the source address of a packet exists in the routing table. The default route is ignored because it would match everything and render the check pointless.
This option is very useful for routers with a complete view of the Internet (BGP) in the routing table to reject packets with spoofed or unrouteable source addresses.
Example:
ipfw add 1000 deny ip from any to any not versrcreach
also known in Cisco-speak as:
ip verify unicast source reachable-via any
Reviewed by: luigi
|
128574 |
23-Apr-2004 |
andre |
Fix a potential race when purging expired hostcache entries.
Spotted by: luigi
|
128548 |
22-Apr-2004 |
silby |
Take out an unneeded variable I forgot to remove in the last commit, and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
|
128547 |
22-Apr-2004 |
silby |
Simplify random port allocation, and add net.inet.ip.portrange.randomized, which can be used to turn off randomized port allocation if so desired.
Requested by: alfred
|
128493 |
20-Apr-2004 |
bms |
Fix a typo in a comment.
|
128453 |
20-Apr-2004 |
silby |
Switch from using sequential to random ephemeral port allocation, implementation taken directly from OpenBSD.
I've resisted committing this for quite some time because of concern over TIME_WAIT recycling breakage (sequential allocation ensures that there is a long time before ports are recycled), but recent testing has shown me that my fears were unwarranted.
|
128452 |
20-Apr-2004 |
silby |
Enhance our RFC1948 implementation to perform better in some pathlogical TIME_WAIT recycling cases I was able to generate with http testing tools.
In short, as the old algorithm relied on ticks to create the time offset component of an ISN, two connections with the exact same host, port pair that were generated between timer ticks would have the exact same sequence number. As a result, the second connection would fail to pass the TIME_WAIT check on the server side, and the SYN would never be acknowledged.
I've "fixed" this by adding random positive increments to the time component between clock ticks so that ISNs will *always* be increasing, no matter how quickly the port is recycled.
Except in such contrived benchmarking situations, this problem should never come up in normal usage... until networks get faster.
No MFC planned, 4.x is missing other optimizations that are needed to even create the situation in which such quick port recycling will occur.
|
128398 |
18-Apr-2004 |
luigi |
Replace Bcopy with 'the real thing' as in the rest of the file.
|
128210 |
14-Apr-2004 |
luigi |
In an effort to simplify the routing code, try to deprecate rtalloc() in favour of rtalloc_ign(), which is what would end up being called anyways.
There are 25 more instances of rtalloc() in net*/ and about 10 instances of rtalloc_ign()
|
128019 |
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson.
Approved by: core, peter, alc, rwatson
|
128003 |
07-Apr-2004 |
ru |
Fixed a bug in previous revision: compute the payload checksum before we convert ip_len into a network byte order; in_delayed_cksum() still expects it in host byte order.
The symtom was the ``in_cksum_skip: out of data by %d'' complaints from the kernel.
To add to the previous commit log. These fixes make tcpdump(1) happy by not complaining about UDP/TCP checksum being bad for looped back IP multicast when multicast router is deactivated.
Reported by: Vsevolod Lobko
|
127936 |
06-Apr-2004 |
bde |
Fixed misspelling of IPPORT_MAX as USHRT_MAX. Don't include <sys/limits.h> to implement this mistake.
Fixed some nearby style bugs (initialization in declaration, misformatting of this initialization, missing blank line after the declaration, and comparision of the non-boolean result of the initialization with 0 using "!". In KNF, "!" is not even used to compare booleans with 0).
|
127871 |
05-Apr-2004 |
rwatson |
Two missed in previous commit -- compare pointer with NULL rather than using it as a boolean.
|
127870 |
05-Apr-2004 |
rwatson |
Prefer NULL to 0 when checking pointer values as integers or booleans.
|
127862 |
04-Apr-2004 |
pjd |
Fix a panic possibility caused by returning without releasing locks. It was fixed by moving problemetic checks, as well as checks that doesn't need locking before locks are acquired.
Submitted by: Ryan Sommers <ryans@gamersimpact.com> In co-operation with: cperciva, maxim, mlaier, sam Tested by: submitter (previous patch), me (current patch) Reviewed by: cperciva, mlaier (previous patch), sam (current patch) Approved by: sam Dedicated to: enough!
|
127828 |
04-Apr-2004 |
luigi |
+ arpresolve(): remove an unused argument + struct ifnet: remove unused fields, move ipv6-related field close to each other, add a pointer to l3<->l2 translation tables (arp,nd6, etc.) for future use.
+ struct route: remove an unused field, move close to each other some fields that might likely go away in the future
|
127757 |
02-Apr-2004 |
deischen |
Unbreak natd.
Reported and submitted by: Sean McNeil (sean at mcneil.com)
|
127690 |
31-Mar-2004 |
des |
Raise WARNS level to 2.
|
127689 |
31-Mar-2004 |
des |
Deal with aliasing warnings.
Reviewed by: ru Approved by: silence on the lists
|
127535 |
28-Mar-2004 |
rwatson |
Invert the logic of NET_LOCK_GIANT(), and remove the one reference to it. Previously, Giant would be grabbed at entry to the IP local delivery code when debug.mpsafenet was set to true, as that implied Giant wouldn't be grabbed in the driver path. Now, we will use this primitive to conditionally grab Giant in the event the entire network stack isn't running MPSAFE (debug.mpsafenet == 0).
|
127526 |
28-Mar-2004 |
pjd |
Remove unused argument.
|
127505 |
27-Mar-2004 |
pjd |
Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson
Requested by: rwatson Reviewed by: rwatson, ume
|
127504 |
27-Mar-2004 |
pjd |
Remove unused argument.
Reviewed by: ume
|
127463 |
26-Mar-2004 |
ume |
Validate IPv6 socket options more carefully to avoid a panic.
PR: kern/61513 Reviewed by: cperciva, nectar
|
127408 |
25-Mar-2004 |
pjd |
Remove unused function. It was used in FreeBSD 4.x, but now we're using cr_canseesocket().
|
127396 |
25-Mar-2004 |
ru |
Untangle IP multicast routing interaction with delayed payload checksums.
Compute the payload checksum for a locally originated IP multicast where God intended, in ip_mloopback(), rather than doing it in ip_output() and only when multicast router is active. This is more correct as we do not fool ip_input() that the packet has the correct payload checksum when in fact it does not (when multicast router is inactive). This is also more efficient if we don't join the multicast group we send to, thus allowing the hardware to checksum the payload.
|
127307 |
22-Mar-2004 |
rwatson |
Lock down global variables in if_gre:
- Add gre_mtx to protect global softc list. - Hold gre_mtx over various list operations (insert, delete). - Centralize if_gre interface teardown in gre_destroy(), and call this from modevent unload and gre_clone_destroy(). - Export gre_mtx to ip_gre.c, which walks the gre list to look up gre interfaces during encapsulation. Add a wonking comment on how we need some sort of drain/reference count mechanism to keep gre references alive while in use and simultaneous destroy.
This commit does not lockdown softc data, which follows in a future commit.
|
127277 |
21-Mar-2004 |
mdodd |
- Fix indentation lost by 'diff -b'. - Un-wrap short line.
|
127261 |
21-Mar-2004 |
mdodd |
Remove interface type specific code from arprequest(), and in_arpinput().
The AF_ARP case in the (*if_output)() routine will handle the interface type specific bits.
Obtained from: NetBSD
|
127094 |
16-Mar-2004 |
des |
Run through indent(1) so I can read the code without getting a headache. The result isn't quite knf, but it's knfer than the original, and far more consistent.
|
126936 |
14-Mar-2004 |
mdodd |
De-register.
|
126792 |
10-Mar-2004 |
rwatson |
Lock down IP-layer encapsulation library:
- Add encapmtx to protect ip_encap.c global variables (encapsulation list). - Unifdef #ifdef 0 pieces of encap_init() which was (and now really is) basically a no-op. - Lock encapmtx when walking encaptab, modifying it, comparing entries, etc. - Remove spl's.
Note that currently there's no facilite to make sure outstanding use of encapsulation methods on a table entry have drained bfore we allow a table entry to be removed. As such, it's currently the caller's responsibility to make sure that draining takes place.
Reviewed by: mlaier
|
126791 |
10-Mar-2004 |
rwatson |
Scrub unused variable zeroin_addr.
|
126741 |
08-Mar-2004 |
hsu |
To comply with the spec, do not copy the TOS from the outer IP header to the inner IP header of the PIM Register if this is a PIM Null-Register message.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
126740 |
08-Mar-2004 |
hsu |
Include <sys/types.h> for autoconf/automake detection.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
126513 |
03-Mar-2004 |
mlaier |
Add some missing DUMMYNET_UNLOCK() in config_pipe().
Noticed by: Simon Coggins Approved by: bms(mentor)
|
126486 |
02-Mar-2004 |
mlaier |
Two minor follow-ups on the MT_TAG removal: ifp is now passed explicitly to ether_demux; no need to look it up again. Make mtag a global var in ip_input.
Noticed by: rwatson Approved by: bms(mentor)
|
126467 |
01-Mar-2004 |
rwatson |
Rename NET_PICKUP_GIANT() to NET_LOCK_GIANT(), and NET_DROP_GIANT() to NET_UNLOCK_GIANT(). While they are used in similar ways, the semantics are quite different -- NET_LOCK_GIANT() and NET_UNLOCK_GIANT() directly wrap mutex lock and unlock operations, whereas drop/pickup special case the handling of Giant recursion. Add a comment saying as much.
Add NET_ASSERT_GIANT(), which conditionally asserts Giant based on the value of debug_mpsafenet.
|
126456 |
01-Mar-2004 |
ume |
fix -O0 compilation without INET6.
Pointed out by: ru
|
126368 |
28-Feb-2004 |
rwatson |
Remove unneeded {} originally used to hold local variables for dummynet in a code block, as the variable is now gone.
Submitted by: sam
|
126351 |
28-Feb-2004 |
rwatson |
Remove now unneeded arguments to tcp_twrespond() -- so and msrc. These were needed by the MAC Framework until inpcbs gained labels.
Submitted by: sam
|
126264 |
26-Feb-2004 |
mlaier |
Bring eventhandler callbacks for pf. This enables pf to track dynamic address changes on interfaces (dailup) with the "on (<ifname>)"-syntax. This also brings hooks in anticipation of tracking cloned interfaces, which will be in future versions of pf.
Approved by: bms(mentor)
|
126263 |
26-Feb-2004 |
mlaier |
Tweak existing header and other build infrastructure to be able to build pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile (i.e. do not connect it to any (automatic) builds - yet).
Approved by: bms(mentor)
|
126253 |
26-Feb-2004 |
truckman |
Split the mlock() kernel code into two parts, mlock(), which unpacks the syscall arguments and does the suser() permission check, and kern_mlock(), which does the resource limit checking and calls vm_map_wire(). Split munlock() in a similar way.
Enable the RLIMIT_MEMLOCK checking code in kern_mlock().
Replace calls to vslock() and vsunlock() in the sysctl code with calls to kern_mlock() and kern_munlock() so that the sysctl code will obey the wired memory limits.
Nuke the vslock() and vsunlock() implementations, which are no longer used.
Add a member to struct sysctl_req to track the amount of memory that is wired to handle the request.
Modify sysctl_wire_old_buffer() to return an error if its call to kern_mlock() fails. Only wire the minimum of the length specified in the sysctl request and the length specified in its argument list. It is recommended that sysctl handlers that use sysctl_wire_old_buffer() should specify reasonable estimates for the amount of data they want to return so that only the minimum amount of memory is wired no matter what length has been specified by the request.
Modify the callers of sysctl_wire_old_buffer() to look for the error return.
Modify sysctl_old_user to obey the wired buffer length and clean up its implementation.
Reviewed by: bms
|
126239 |
25-Feb-2004 |
mlaier |
Re-remove MT_TAGs. The problems with dummynet have been fixed now.
Tested by: -current, bms(mentor), me Approved by: bms(mentor), sam
|
126226 |
25-Feb-2004 |
bde |
Fixed namespace pollution in rev.1.74. Implementation of the syncache increased <netinet/tcp_var>'s already large set of prerequisites, and this was handled badly. Just don't declare the complete syncache struct unless <netinet/pcb.h> is included before <netinet/tcp_var.h>.
Approved by: jlemon (years ago, for a more invasive fix)
|
126225 |
25-Feb-2004 |
bde |
Don't use the negatively-opaque type uma_zone_t or be chummy with <vm/uma.h>'s idempotency indentifier or its misspelling.
|
126220 |
25-Feb-2004 |
hsu |
Relax a KASSERT condition to allow for a valid corner case where the FIN on the last segment consumes an extra sequence number.
Spurious panic reported by Mike Silbersack <silby@silby.com>.
|
126193 |
24-Feb-2004 |
andre |
Convert the tcp segment reassembly queue to UMA and limit the maximum amount of segments it will hold.
The following tuneables and sysctls control the behaviour of the tcp segment reassembly queue:
net.inet.tcp.reass.maxsegments (loader tuneable) specifies the maximum number of segments all tcp reassemly queues can hold (defaults to 1/16 of nmbclusters).
net.inet.tcp.reass.maxqlen specifies the maximum number of segments any individual tcp session queue can hold (defaults to 48).
net.inet.tcp.reass.cursegments (readonly) counts the number of segments currently in all reassembly queues.
net.inet.tcp.reass.overflows (readonly) counts how often either the global or local queue limit has been reached.
Tested by: bms, silby Reviewed by: bms, silby
|
126002 |
19-Feb-2004 |
pjd |
Fixed ucred structure leak.
Approved by: scottl (mentor) PR: 54163 MFC after: 3 days
|
125952 |
18-Feb-2004 |
mlaier |
Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is not working properly with the patch in place.
Approved by: bms(mentor)
|
125941 |
17-Feb-2004 |
ume |
IPSEC and FAST_IPSEC have the same internal API now; so merge these (IPSEC has an extra ipsecstat)
Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net>
|
125890 |
16-Feb-2004 |
bms |
Shorten the name of the socket option used to enable TCP-MD5 packet treatment.
Submitted by: Vincent Jardin
|
125875 |
16-Feb-2004 |
ume |
don't update outgoing ifp, if ipsec tunnel mode encapsulation was not made.
Obtained from: KAME
|
125870 |
16-Feb-2004 |
bms |
Spell types consistently throughout this file. Do not use the __packed attribute, as we are often #include'd from userland without <sys/cdefs.h> in front of us, and it is not strictly necessary.
Noticed by: Sascha Blank
|
125819 |
14-Feb-2004 |
bms |
Final brucification pass. Spell types consistently (u_int). Remove bogus casts. Remove unnecessary parenthesis.
Submitted by: bde
|
125791 |
13-Feb-2004 |
mlaier |
Do not expose ip_dn_find_rule inline function to userland and unbreak world. ----------------------------------------------------------------------
|
125785 |
13-Feb-2004 |
mlaier |
Do not check receive interface when pfil(9) hook changed address.
Approved by: bms(mentor)
|
125784 |
13-Feb-2004 |
mlaier |
This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing them mostly with packet tags (one case is handled by using an mbuf flag since the linkage between "caller" and "callee" is direct and there's no need to incur the overhead of a packet tag).
This is (mostly) work from: sam
Silence from: -arch Approved by: bms(mentor), sam, rwatson
|
125783 |
13-Feb-2004 |
bms |
Brucification.
Submitted by: bde
|
125776 |
13-Feb-2004 |
ume |
supported IPV6_RECVPATHMTU socket option.
Obtained from: KAME
|
125742 |
12-Feb-2004 |
bms |
Update the prototype for tcpsignature_apply() to reflect the spelling of the types used by m_apply()'s callback function, f, as documented in mbuf(9).
Noticed by: njl
|
125741 |
12-Feb-2004 |
bms |
style(9) pass; whitespace and comments.
Submitted by: njl
|
125740 |
12-Feb-2004 |
bms |
Remove an unnecessary initialization that crept in from the code which verifies TCP-MD5 digests.
Noticed by: njl
|
125698 |
11-Feb-2004 |
bms |
Fix a typo; left out preprocessor conditional for sigoff variable, which is only used by TCP_SIGNATURE code.
Noticed by: Roop Nanuwa
|
125680 |
11-Feb-2004 |
bms |
Initial import of RFC 2385 (TCP-MD5) digest support.
This is the first of two commits; bringing in the kernel support first. This can be enabled by compiling a kernel with options TCP_SIGNATURE and FAST_IPSEC.
For the uninitiated, this is a TCP option which provides for a means of authenticating TCP sessions which came into being before IPSEC. It is still relevant today, however, as it is used by many commercial router vendors, particularly with BGP, and as such has become a requirement for interconnect at many major Internet points of presence.
Several parts of the TCP and IP headers, including the segment payload, are digested with MD5, including a shared secret. The PF_KEY interface is used to manage the secrets using security associations in the SADB.
There is a limitation here in that as there is no way to map a TCP flow per-port back to an SPI without polluting tcpcb or using the SPD; the code to do the latter is unstable at this time. Therefore this code only supports per-host keying granularity.
Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6), TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective users of this feature, this will not pose any problem.
This implementation is output-only; that is, the option is honoured when responding to a host initiating a TCP session, but no effort is made [yet] to authenticate inbound traffic. This is, however, sufficient to interwork with Cisco equipment.
Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with local patches. Patches for tcpdump to validate TCP-MD5 sessions are also available from me upon request.
Sponsored by: sentex.net
|
125396 |
03-Feb-2004 |
ume |
pass pcb rather than so. it is expected that per socket policy works again.
|
125360 |
02-Feb-2004 |
andre |
Add sysctl net.inet.icmp.reply_src to specify the interface name used for the ICMP reply source in reponse to packets which are not directly addressed to us. By default continue with with normal source selection.
Reviewed by: bms
|
125349 |
02-Feb-2004 |
andre |
More verbose description of the source ip address selection for ICMP replies.
Reviewed by: bms
|
125264 |
31-Jan-2004 |
phk |
Introduce the SO_BINTIME option which takes a high-resolution timestamp at packet arrival.
For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL since it has higher resolution and lower overhead. Simultaneous use of the two options is possible and they will return consistent timestamps.
This introduces an extra test and a function call for SO_TIMEVAL, but I have not been able to measure that.
|
125226 |
30-Jan-2004 |
sobomax |
Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again.
|
125118 |
27-Jan-2004 |
ru |
Correct the descriptions of the net.inet.{udp,raw}.recvspace sysctls.
|
125024 |
26-Jan-2004 |
sobomax |
Add support for WCCPv2. It should be enablem manually using link2 ifconfig(8) flag since header for version 2 is the same but IP payload is prepended with additional 4-bytes field.
Inspired by: Roman Synyuk <roman@univ.kiev.ua> MFC after: 2 weeks
|
125020 |
26-Jan-2004 |
sobomax |
(whilespace-only)
Kill trailing spaces.
|
124851 |
23-Jan-2004 |
andre |
Remove leftover FREE() from changes in rev 1.50.
Noticed by: Jun Kuriyama <kuriyama@imgsrc.co.jp>
|
124849 |
22-Jan-2004 |
andre |
Split the overloaded variable 'win' into two for their specific purposes: recwin and sendwin. This removes a big source of confusion and makes following the code much easier.
Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)
|
124848 |
22-Jan-2004 |
andre |
Move the reduction by one of the syncache limit after the zone has been allocated.
Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)
|
124847 |
22-Jan-2004 |
andre |
Remove an unused variable and put the sockaddr_in6 onto the stack instead of malloc'ing it.
Reviewed by: sam (mentor) Obtained from: DragonFlyBSD rev 1.6 (hsu)
|
124761 |
20-Jan-2004 |
hsu |
Merge from DragonFlyBSD rev 1.10:
date: 2003/09/02 10:04:47; author: hsu; state: Exp; lines: +5 -6 Account for when Limited Transmit is not congestion window limited.
Obtained from: DragonFlyBSD
|
124621 |
17-Jan-2004 |
phk |
Mostly mechanical rework of libalias:
Makes it possible to have multiple packet aliasing instances in a single process by moving all static and global variables into an instance structure called "struct libalias".
Redefine a new API based on s/PacketAlias/LibAlias/g
Add new "instance" argument to all functions in the new API.
Implement old API in terms of the new API.
|
124464 |
13-Jan-2004 |
ume |
do not deref freed pointer
Submitted by: "Bjoern A. Zeeb" <bzeeb+freebsd@zabbadoz.net> Reviewed by: itojun
|
124437 |
12-Jan-2004 |
andre |
Disable the minmssoverload connection drop by default until the detection logic is refined.
|
124336 |
10-Jan-2004 |
truckman |
Check that sa_len is the appropriate value in tcp_usr_bind(), tcp6_usr_bind(), tcp_usr_connect(), and tcp6_usr_connect() before checking to see whether the address is multicast so that the proper errno value will be returned if sa_len is incorrect. The checks are identical to the ones in in_pcbbind_setup(), in6_pcbbind(), and in6_pcbladdr(), which are called after the multicast address check passes.
MFC after: 30 days
|
124290 |
09-Jan-2004 |
andre |
Reduce TCP_MINMSS default to 216. The AX.25 protocol (packet radio) is frequently used with an MTU of 256 because of slow speeds and a high packet loss rate.
|
124258 |
08-Jan-2004 |
andre |
Limiters and sanity checks for TCP MSS (maximum segement size) resource exhaustion attacks.
For network link optimization TCP can adjust its MSS and thus packet size according to the observed path MTU. This is done dynamically based on feedback from the remote host and network components along the packet path. This information can be abused to pretend an extremely low path MTU.
The resource exhaustion works in two ways:
o during tcp connection setup the advertized local MSS is exchanged between the endpoints. The remote endpoint can set this arbitrarily low (except for a minimum MTU of 64 octets enforced in the BSD code). When the local host is sending data it is forced to send many small IP packets instead of a large one.
For example instead of the normal TCP payload size of 1448 it forces TCP payload size of 12 (MTU 64) and thus we have a 120 times increase in workload and packets. On fast links this quickly saturates the local CPU and may also hit pps processing limites of network components along the path.
This type of attack is particularly effective for servers where the attacker can download large files (WWW and FTP).
We mitigate it by enforcing a minimum MTU settable by sysctl net.inet.tcp.minmss defaulting to 256 octets.
o the local host is reveiving data on a TCP connection from the remote host. The local host has no control over the packet size the remote host is sending. The remote host may chose to do what is described in the first attack and send the data in packets with an TCP payload of at least one byte. For each packet the tcp_input() function will be entered, the packet is processed and a sowakeup() is signalled to the connected process.
For example an attack with 2 Mbit/s gives 4716 packets per second and the same amount of sowakeup()s to the process (and context switches).
This type of attack is particularly effective for servers where the attacker can upload large amounts of data. Normally this is the case with WWW server where large POSTs can be made.
We mitigate this by calculating the average MSS payload per second. If it goes below 'net.inet.tcp.minmss' and the pps rate is above 'net.inet.tcp.minmssoverload' defaulting to 1000 this particular TCP connection is resetted and dropped.
MITRE CVE: CAN-2004-0002 Reviewed by: sam (mentor) MFC after: 1 day
|
124248 |
08-Jan-2004 |
andre |
If path mtu discovery is enabled set the DF bit in all cases we send packets on a tcp connection.
PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)
|
124247 |
08-Jan-2004 |
andre |
Do not set the ip_id to zero when DF is set on packet and restore the general pre-randomid behaviour.
Setting the ip_id to zero causes several problems with packet reassembly when a device along the path removes the DF bit for some reason.
Other BSD and Linux have found and fixed the same issues.
PR: kern/60889 Tested by: Richard Wendland <richard@wendland.org.uk> Approved by: re (scottl)
|
124199 |
06-Jan-2004 |
andre |
Enable the following TCP options by default to give it more exposure:
rfc3042 Limited retransmit rfc3390 Increasing TCP's initial congestion Window inflight TCP inflight bandwidth limiting
All my production server have it enabled and there have been no issues. I am confident about having them on by default and it gives us better overall TCP performance.
Reviewed by: sam (mentor)
|
124198 |
06-Jan-2004 |
andre |
According to RFC1812 we have to ignore ICMP redirects when we are acting as router (ipforwarding enabled).
This doesn't fix the problem that host routes from ICMP redirects are never removed from the kernel routing table but removes the problem for machines doing packet forwarding.
Reviewed by: sam (mentor)
|
123998 |
30-Dec-2003 |
ru |
Document the net.inet.ip.subnets_are_local sysctl.
|
123992 |
30-Dec-2003 |
sobomax |
Sync with NetBSD:
if_gre.c rev.1.41-1.49
o Spell output with two ts. o Remove assigned-to but not used variable. o fix grammatical error in a diagnostic message. o u_short -> u_int16_t. o gi_len is ip_len, so it has to be network byteorder.
if_gre.h rev.1.11-1.13
o prototype must not have variable name. o u_short -> u_int16_t. o Spell address with two d's.
ip_gre.c rev.1.22-1.29
o KNF - return is not a function. o The "osrc" variable in gre_mobile_input() is only ever set but not referenced; remove it. o correct (false) assumptions on mbuf chain. not sure if it really helps, but anyways, it is necessary to perform m_pullup. o correct arg to m_pullup (need to count IP header size as well). o remove redundant adjustment of m->m_pkthdr.len. o clear m_flags just for safety. o tabify. o u_short -> u_int16_t.
MFC after: 2 weeks
|
123922 |
28-Dec-2003 |
sam |
o eliminate widespread on-stack mbuf use for bpf by introducing a new bpf_mtap2 routine that does the right thing for an mbuf and a variable-length chunk of data that should be prepended. o while we're sweeping the drivers, use u_int32_t uniformly when when prepending the address family (several places were assuming sizeof(int) was 4) o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated mbufs have been eliminated; this may better be moved to the bpf routines
Reviewed by: arch@ and several others
|
123893 |
27-Dec-2003 |
maxim |
o Fix a comment: softticks lives in sys/kern/kern_timeout.c.
PR: kern/60613 Submitted by: Gleb Smirnoff MFC after: 3 days
|
123809 |
24-Dec-2003 |
ume |
NULL is not 0.
Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
|
123768 |
23-Dec-2003 |
ru |
I didn't notice it right away, but check the right length too.
|
123765 |
23-Dec-2003 |
ru |
Fix a problem introduced in revision 1.84: m_pullup() does not necessarily return the same mbuf chain so we need to recompute mtod() consumers after pulling up.
|
123740 |
23-Dec-2003 |
peter |
Catch a few places where NULL (pointer) was used where 0 (integer) was expected.
|
123690 |
20-Dec-2003 |
sam |
o move mutex init/destroy logic to the module load/unload hooks; otherwise they are initialized twice when the code is statically configured in the kernel because the module load method gets invoked before the user application calls ip_mrouter_init o add a mutex to synchronize the module init/done operations; this sort of was done using the value of ip_mroute but X_ip_mrouter_done sets it to NULL very early on which can lead to a race against ip_mrouter_init--using the additional mutex means this is safe now o don't call ip_mrouter_reset from ip_mrouter_init; this now happens once at module load and X_ip_mrouter_done does the appropriate cleanup work to insure the data structures are in a consistent state so that a subsequent init operation inherits good state
Reviewed by: juli
|
123608 |
17-Dec-2003 |
jhb |
Fix some becuase -> because typos.
Reported by: Marco Wertejuk <wertejuk@mwcis.com>
|
123607 |
17-Dec-2003 |
rwatson |
Switch TCP over to using the inpcb label when responding in timed wait, rather than the socket label. This avoids reaching up to the socket layer during connection close, which requires locking changes. To do this, introduce MAC Framework entry point mac_create_mbuf_from_inpcb(), which is called from tcp_twrespond() instead of calling mac_create_mbuf_from_socket() or mac_create_mbuf_netlayer(). Introduce MAC Policy entry point mpo_create_mbuf_from_inpcb(), and implementations for various policies, which generally just copy label data from the inpcb to the mbuf. Assert the inpcb lock in the entry point since we require consistency for the inpcb label reference.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
123572 |
16-Dec-2003 |
maxim |
o IN_MULTICAST wants an address in host byte order.
PR: kern/60304 Submitted by: demon MFC after: 1 week
|
123169 |
06-Dec-2003 |
emax |
Do not panic when flushing dummynet firewall rules
Reviewed by: andre Approved by: re (scottl)
|
123113 |
02-Dec-2003 |
andre |
Swap destination and source arguments of two bcopy() calls.
Before committing the initial tcp_hostcache I changed them from memcpy() to conform with FreeBSD style without realizing the difference in argument definition.
This fixes hostcache operation for IPv6 (in general and explicitly IPv6 path mtu discovery) and T/TCP (RFC1644).
Submitted by: Taku YAMAMOTO <taku@cent.saitama-u.ac.jp> Approved by: re (rwatson)
|
123096 |
02-Dec-2003 |
sam |
Include opt_ipsec.h so IPSEC/FAST_IPSEC is defined and the appropriate code is compiled in to support the O_IPSEC operator. Previously no support was included and ipsec rules were always matching. Note that we do not return an error when an ipsec rule is added and the kernel does not have IPsec support compiled in; this is done intentionally but we may want to revisit this (document this in the man page).
PR: 58899 Submitted by: Bjoern A. Zeeb Approved by: re (rwatson)
|
123028 |
28-Nov-2003 |
andre |
Fix an optimization where I made an ifdef'd out section to broad.
When the hostcache bucket limit is reached the last bucket wasn't removed from the bucket row but inserted a few lines later at the bucket row head again. This leads to infinite loop when the same bucket row is accessed the next time for a lookup/insert or purge action.
Tested by: imp, Matt Smith Approved by: re (rwatson)
|
123000 |
27-Nov-2003 |
andre |
Fix verify_rev_path() function. The author of this function tried to cut corners which completely broke down when the routing table locking was introduced.
Reviewed by: sam (mentor) Approved by: re (rwatson)
|
122996 |
26-Nov-2003 |
andre |
Make sure all uses of stack allocated struct route's are properly zeroed. Doing a bzero on the entire struct route is not more expensive than assigning NULL to ro.ro_rt and bzero of ro.ro_dst.
Reviewed by: sam (mentor) Approved by: re (scottl)
|
122991 |
26-Nov-2003 |
sam |
Split the "inp" mutex class into separate classes for each of divert, raw, tcp, udp, raw6, and udp6 sockets to avoid spurious witness complaints.
Reviewed by: rwatson Approved by: re (rwatson)
|
122987 |
25-Nov-2003 |
andre |
Restructure a too broad ifdef which was disabling the setting of the tcp flightsize sysctl value for local networks in the !INET6 case.
Approved by: re (scottl)
|
122971 |
24-Nov-2003 |
sam |
Correct a problem where ipfw-generated packets were being returned for ipfw processing w/o an indication the packets were generated by ipfw--and so should not be processed (this manifested itself as a LOR.) The flag bit in the mbuf that was used to mark the packets was not listed in M_COPYFLAGS so if a packet had a header prepended (as done by IPsec) the flag was lost. Correct this by defining a new M_PROTO6 flag and use it to mark packets that need this processing.
Reviewed by: bms Approved by: re (rwatson) MFC after: 2 weeks
|
122966 |
23-Nov-2003 |
sam |
Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines potentially transmit packets that may enter KAME IPsec w/o Giant if the callouts are marked MPSAFE.
Reviewed by: ume Approved by: re (rwatson)
|
122960 |
23-Nov-2003 |
tmm |
bzero() the the sockaddr used for the destination address for rtalloc_ign() in in_pcbconnect_setup() before it is filled out. Otherwise, stack junk would be left in sin_zero, which could cause host routes to be ignored because they failed the comparison in rn_match(). This should fix the wrong source address selection for connect() to 127.0.0.1, among other things.
Reviewed by: sam Approved by: re (rwatson)
|
122922 |
20-Nov-2003 |
andre |
Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address.
It removes significant locking requirements from the tcp stack with regard to the routing table.
Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
|
122921 |
20-Nov-2003 |
andre |
Remove RTF_PRCLONING from routing table and adjust users of it accordingly. The define is left intact for ABI compatibility with userland.
This is a pre-step for the introduction of tcp_hostcache. The network stack remains fully useable with this change.
Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
|
122915 |
20-Nov-2003 |
maxim |
Fix an arguments order in check_uidgid() call.
PR: kern/59314 Submitted by: Andrey V. Shytov Approved by: re (rwatson, jhb)
|
122875 |
18-Nov-2003 |
rwatson |
Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check.
For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy.
Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
122867 |
17-Nov-2003 |
cognet |
In rip_abort(), unlock the inpcb if we didn't detach it, or we may recurse on the lock before destroying the mutex.
Submitted by: sam
|
122828 |
17-Nov-2003 |
green |
Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but do not have mh_nextpkt initialized. Somtimes what's there is "1", and the ip_input() code pukes trying to m_free() it, rendering divert sockets and such broken. This really underscores the need to get rid of MT_TAG.
Reviewed by: rwatson
|
122797 |
16-Nov-2003 |
andre |
Make two casts correct for all types of 64bit platforms.
Explained by: bde
|
122759 |
15-Nov-2003 |
andre |
Correct a cast to make it compile on 64bit platforms (noticed by tinderbox) and remove two unneccessary variable initializations. Make the introduction comment more clear with regard which parts of the packet are touched.
Requested by: luigi
|
122723 |
15-Nov-2003 |
andre |
Make ipstealth global as we need it in ip_fastforward too.
|
122708 |
14-Nov-2003 |
andre |
Remove the global one-level rtcache variable and associated complex locking and rework ip_rtaddr() to do its own rtlookup. Adopt all its callers to this and make ip_output() callable with NULL rt pointer.
Reviewed by: sam (mentor)
|
122702 |
14-Nov-2003 |
andre |
Introduce ip_fastforward and remove ip_flow.
Short description of ip_fastforward:
o adds full direct process-to-completion IPv4 forwarding code o handles ip fragmentation incl. hw support (ip_flow did not) o sends icmp needfrag to source if DF is set (ip_flow did not) o supports ipfw and ipfilter (ip_flow did not) o supports divert, ipfw fwd and ipfilter nat (ip_flow did not) o returns anything it can't handle back to normal ip_input
Enable with sysctl -w net.inet.ip.fastforwarding=1
Reviewed by: sam (mentor)
|
122599 |
13-Nov-2003 |
sam |
add missing inpcb lock before call to tcp_twclose (which reclaims the inpcb)
Supported by: FreeBSD Foundation
|
122598 |
13-Nov-2003 |
sam |
o reorder some locking asserts to reflect the order of the locks o correct a read-lock assert in in_pcblookup_local that should be a write-lock assert (since time wait close cleanups may alter state)
Supported by: FreeBSD Foundation
|
122593 |
13-Nov-2003 |
andre |
Move global variables for icmp_input() to its stack. With SMP or preemption two CPUs can be in the same function at the same time and clobber each others variables. Remove register declaration from local variables.
Reviewed by: sam (mentor)
|
122588 |
12-Nov-2003 |
andre |
Do not fragment a packet with hardware assistance if it has the DF bit set.
Reviewed by: sam (mentor)
|
122579 |
12-Nov-2003 |
bms |
Add a new sysctl knob, net.inet.udp.strict_mcast_mship, to the udp_input path.
This switch toggles between strict multicast delivery, and traditional multicast delivery.
The traditional (default) behaviour is to deliver multicast datagrams to all sockets which are members of that group, regardless of the network interface where the datagrams were received.
The strict behaviour is to deliver multicast datagrams received on a particular interface only to sockets whose membership is bound to that interface.
Note that as a matter of course, multicast consumers specifying INADDR_ANY for their interface get joined on the interface where the default route happens to be bound. This switch has no effect if the interface which the consumer specifies for IP_ADD_MEMBERSHIP is not UP and RUNNING.
The original patch has been cleaned up somewhat from that submitted. It has been tested on a multihomed machine with multiple QuickTime RTP streams running over the local switch, which doesn't do IGMP snooping.
PR: kern/58359 Submitted by: William A. Carrel Reviewed by: rwatson MFC after: 1 week
|
122576 |
12-Nov-2003 |
andre |
dropwithreset is not needed in this case as tcp_drop() is already notifying the other side. Before we were sending two RST packets.
|
122524 |
12-Nov-2003 |
rwatson |
Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol.
Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
122501 |
11-Nov-2003 |
sam |
correct typos
Pointed out by: Mike Silbersack
|
122496 |
11-Nov-2003 |
sam |
o add missing inpcb locking in tcp_respond o replace spl's with lock assertions
Supported by: FreeBSD Foundation
|
122449 |
10-Nov-2003 |
sam |
use Giant-less callouts when debug_mpsafenet is non-zero
Supported by: FreeBSD Foundation
|
122446 |
10-Nov-2003 |
iedowse |
In in_pcbconnect_setup(), don't use the cached inp->inp_route unless it is marked as RTF_UP. This appears to fix a crash that was sometimes triggered when dhclient(8) tried to send a packet after an interface had been detatched.
Reviewed by: sam
|
122437 |
10-Nov-2003 |
hsu |
Mark TCP syncache timer as not Giant-free ready yet.
|
122334 |
08-Nov-2003 |
sam |
replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF macros that expand to include assertions when the system is built with INVARIANTS
Supported by: FreeBSD Foundation
|
122331 |
08-Nov-2003 |
sam |
divert socket fixups:
o pickup Giant in divert_packet to protect sbappendaddr since it can be entered through MPSAFE callouts or through ip_input when mpsafenet is 1 o add missing locking on output o add locking to abort and shutdown o add a ctlinput handler to invalidate held routing table references on an ICMP redirect (may not be needed)
Supported by: FreeBSD Foundation
|
122330 |
08-Nov-2003 |
sam |
assert optional inpcb is passed in locked
Supported by: FreeBSD Foundation
|
122329 |
08-Nov-2003 |
sam |
add locking assertions
Supported by: FreeBSD Foundation
|
122328 |
08-Nov-2003 |
sam |
assert inpcb is locked in udp_output
Supported by: FreeBSD Foundation
|
122327 |
08-Nov-2003 |
sam |
o correct locking problem: the inpcb must be held across tcp_respond o add assertions in tcp_respond to validate inpcb locking assumptions o use local variable instead of chasing pointers in tcp_respond
Supported by: FreeBSD Foundation
|
122326 |
08-Nov-2003 |
sam |
use local values instead of chasing pointers
Supported by: FreeBSD Foundation
|
122325 |
08-Nov-2003 |
sam |
replace mtx_assert by INP_LOCK_ASSERT
Supported by: FreeBSD Foundation
|
122324 |
08-Nov-2003 |
sam |
add some missing locking
Supported by: FreeBSD Foundation
|
122323 |
08-Nov-2003 |
sam |
the sbappendaddr call in socket_send must be protected by Giant because it can happen from an MPSAFE callout
Supported by: FreeBSD Foundation
|
122322 |
08-Nov-2003 |
sam |
add locking assertions that turn into noops if INET6 is configured; this is necessary because the ipv6 code shares the in_pcb code with ipv4 but (presently) lacks proper locking
Supported by: FreeBSD Foundation
|
122320 |
08-Nov-2003 |
sam |
o add a flags parameter to netisr_register that is used to specify whether or not the isr needs to hold Giant when running; Giant-less operation is also controlled by the setting of debug_mpsafenet o mark all netisr's except NETISR_IP as needing Giant o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant o pickup Giant (when debug_mpsafenet is 1) inside ip_input before calling up with a packet o change netisr handling so swi_net runs w/o Giant; instead we grab Giant before invoking handlers based on whether the handler needs Giant o change netisr handling so that netisr's that are marked MPSAFE may have multiple instances active at a time o add netisr statistics for packets dropped because the isr is inactive
Supported by: FreeBSD Foundation
|
122271 |
08-Nov-2003 |
sam |
unbreak compilation of FAST_IPSEC
Supported by: FreeBSD Foundation
|
122267 |
07-Nov-2003 |
sam |
MFp4: reminder that random id code is not reentrant
Supported by: FreeBSD Foundation
|
122265 |
07-Nov-2003 |
sam |
Move uid/gid checking logic out of line and lock inpcb usage. This has a LOR between IPFW inpcb locks but I'm committing it now as the lesser of two evils (the other being unlocked use of in_pcblookup).
Supported by: FreeBSD Foundation
|
122242 |
07-Nov-2003 |
ume |
use ipsec_getnhist() instead of obsoleted ipsec_gethist().
Submitted by: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Reviewed by: Ari Suutari <ari@suutari.iki.fi> (ipfw@)
|
122179 |
07-Nov-2003 |
sam |
Fix locking of the ip forwarding cache. We were holding a reference to a routing table entry w/o bumping the reference count or locking against the entry being free'd. This caused major havoc (for some reason it appeared most frequently for folks running natd). Fix is to bump the reference count whenever we copy the route cache contents into a private copy so the entry cannot be reclaimed out from under us. This is a short term fix as the forthcoming routing table changes will eliminate this cache entirely.
Supported by: FreeBSD Foundation
|
122062 |
04-Nov-2003 |
ume |
- cleanup SP refcnt issue. - share policy-on-socket for listening socket. - don't copy policy-on-socket at all. secpolicy no longer contain spidx, which saves a lot of memory. - deep-copy pcb policy if it is an ipsec policy. assign ID field to all SPD entries. make it possible for racoon to grab SPD entry on pcb. - fixed the order of searching SA table for packets. - fixed to get a security association header. a mode is always needed to compare them. - fixed that the incorrect time was set to sadb_comb_{hard|soft}_usetime. - disallow port spec for tunnel mode policy (as we don't reassemble). - an user can define a policy-id. - clear enc/auth key before freeing. - fixed that the kernel crashed when key_spdacquire() was called because key_spdacquire() had been implemented imcopletely. - preparation for 64bit sequence number. - maintain ordered list of SA, based on SA id. - cleanup secasvar management; refcnt is key.c responsibility; alloc/free is keydb.c responsibility. - cleanup, avoid double-loop. - use hash for spi-based lookup. - mark persistent SP "persistent". XXX in theory refcnt should do the right thing, however, we have "spdflush" which would touch all SPs. another solution would be to de-register persistent SPs from sptree. - u_short -> u_int16_t - reduce kernel stack usage by auto variable secasindex. - clarify function name confusion. ipsec_*_policy -> ipsec_*_pcbpolicy. - avoid variable name confusion. (struct inpcbpolicy *)pcb_sp, spp (struct secpolicy **), sp (struct secpolicy *) - count number of ipsec encapsulations on ipsec4_output, so that we can tell ip_output() how to handle the packet further. - When the value of the ul_proto is ICMP or ICMPV6, the port field in "src" of the spidx specifies ICMP type, and the port field in "dst" of the spidx specifies ICMP code. - avoid from applying IPsec transport mode to the packets when the kernel forwards the packets.
Tested by: nork Obtained from: KAME
|
121972 |
03-Nov-2003 |
rwatson |
Note that when ip_output() is called from ip_forward(), it will already have its options inserted, so the opt argument to ip_output() must be NULL.
|
121971 |
03-Nov-2003 |
rwatson |
Remove comment about desire for eventual explicit labeling of ICMP header copy made on input path: this is now handled differently.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
121929 |
03-Nov-2003 |
sam |
Remove bogus RTFREE that was added in rev 1.47. The rmx code operates directly on the radix tree and does not hold any routing table refernces. This fixes the reference counting problems that manifested itself as a panic during unmount of filesystems that were mounted by NFS over an interface that had been removed.
Supported by: FreeBSD Foundation
|
121922 |
03-Nov-2003 |
sam |
Correct rev 1.56 which (incorrectly) reversed the test used to decide if in_pcbpurgeif0 should be invoked.
Supported by: FreeBSD Foundation
|
121884 |
02-Nov-2003 |
silby |
Add an additional check to the tcp_twrecycleable function; I had previously only considered the send sequence space. Unfortunately, some OSes (windows) still use a random positive increments scheme for their syn-ack ISNs, so I must consider receive sequence space as well.
The value of 250000 bytes / second for Microsoft's ISN rate of increase was determined by testing with an XP machine.
|
121850 |
01-Nov-2003 |
silby |
- Add a new function tcp_twrecycleable, which tells us if the ISN which we will generate for a given ip/port tuple has advanced far enough for the time_wait socket in question to be safely recycled.
- Have in_pcblookup_local use tcp_twrecycleable to determine if time_Wait sockets which are hogging local ports can be safely freed.
This change preserves proper TIME_WAIT behavior under normal circumstances while allowing for safe and fast recycling whenever ephemeral port space is scarce.
|
121816 |
31-Oct-2003 |
brooks |
Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics.
Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)
|
121770 |
30-Oct-2003 |
sam |
Overhaul routing table entry cleanup by introducing a new rtexpunge routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference).
Supported by: FreeBSD Foundation
|
121700 |
29-Oct-2003 |
sam |
Potential fix for races shutting down callouts when unloading the module. Previously we grabbed the mutex used by the callouts, then stopped the callout with callout_stop, but if the callout was already active and blocked by the mutex then it would continue later and reference the mutex after it was destroyed. Instead stop the callout first then lock.
Supported by: FreeBSD Foundation
|
121699 |
29-Oct-2003 |
sam |
o add locking to protect routing table refcnt manipulations o add some more debugging help for figuring out why folks are getting complaints about releasing routing table entries with a zero refcnt o fix comment that talked about spl's o remove duplicate define of DUMMYNET_DEBUG
Supported by: FreeBSD Foundation
|
121684 |
29-Oct-2003 |
ume |
add ECN support in layer-3. - implement the tunnel egress rule in ip_ecn_egress() in ip_ecn.c. make ip{,6}_ecn_egress() return integer to tell the caller that this packet should be dropped. - handle ECN at fragment reassembly in ip_input.c and frag6.c.
Obtained from: KAME
|
121674 |
29-Oct-2003 |
ume |
ip6_savecontrol() argument is redundant
|
121645 |
29-Oct-2003 |
sam |
Introduce the notion of "persistent mbuf tags"; these are tags that stay with an mbuf until it is reclaimed. This is in contrast to tags that vanish when an mbuf chain passes through an interface. Persistent tags are used, for example, by MAC labels.
Add an m_tag_delete_nonpersistent function to strip non-persistent tags from mbufs and use it to strip such tags from packets as they pass through the loopback interface and when turned around by icmp. This fixes problems with "tag leakage".
Pointed out by: Jonathan Stone Reviewed by: Robert Watson
|
121628 |
28-Oct-2003 |
sam |
speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append
Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)
|
121499 |
25-Oct-2003 |
ume |
revert following unwanted changes: - __packed to __attribute__((__packed__) - uintN_t back to u_intN_t
Reported by: bde
|
121498 |
25-Oct-2003 |
ume |
correct namespace pollution.
Submitted by: bde
|
121478 |
24-Oct-2003 |
ume |
remove the ip6r0_addr and ip6r0_slmap members from ip6_rthdr0{} according to rfc2292bis.
Obtained from: KAME
|
121477 |
24-Oct-2003 |
ume |
correct tab and order.
|
121472 |
24-Oct-2003 |
ume |
Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542 (aka RFC2292bis). Though I believe this commit doesn't break backward compatibility againt existing binaries, it breaks backward compatibility of API. Now, the applications which use Advanced Sockets API such as telnet, ping6, mld6query and traceroute6 use RFC3542 API.
Obtained from: KAME
|
121453 |
24-Oct-2003 |
silby |
Reduce the number of tcp time_wait structs to maxsockets / 5; this ensures that at most 20% of sockets can be in time_wait at one time, ensuring that time_wait sockets do not starve real connections from inpcb structures.
No implementation change is needed, jlemon already implemented a nice LRU-ish algorithm for tcp_tw structure recycling.
This should reduce the need for sysadmins to lower the default msl on busy servers.
|
121446 |
24-Oct-2003 |
sam |
o restructure initialization code so data structures are setup when loaded as a module o cleanup data structures on module unload when no application has been started (i.e. kldload, kldunload w/o mrtd) o remove extraneous unlocks immediately prior to destroying them
Supported by: FreeBSD Foundation
|
121307 |
21-Oct-2003 |
silby |
Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
|
121285 |
20-Oct-2003 |
ume |
enclose IPv6 part with ifdef INET6.
Obtained from: KAME
|
121283 |
20-Oct-2003 |
ume |
correct linkmtu handling.
Obtained from: KAME
|
121161 |
17-Oct-2003 |
ume |
- add dom_if{attach,detach} framework. - transition to use ifp->if_afdata.
Obtained from: KAME
|
121141 |
16-Oct-2003 |
sam |
pfil hooks can modify packet contents so check if the destination address has been changed when PFIL_HOOKS is enabled and, if it has, arrange for the proper action by ip*_forward.
Supported by: FreeBSD Foundation Submitted by: Pyun YongHyeon
|
121140 |
16-Oct-2003 |
sam |
Drop dummynet lock when calling back into the network stack to deliver packets. This eliminates a LOR with Giant that caused outbound pipes to fail.
Supported by: FreeBSD Foundation
|
121123 |
16-Oct-2003 |
mckusick |
Malloc buckets of size 128 have been having their 64-byte offset trashed after being freed. This has caused several panics including kern/42277 related to soft updates. Jim Kuhn tracked the problem down to ipfw limit rule processing. In the expiry of dynamic rules, it is possible for an O_LIMIT_PARENT rule to be removed when it still has live children. When the children eventually do expire, a pointer to the (long gone) parent is dereferenced and a count decremented. Since this memory can, and is, allocated for other purposes (in the case of kern/42277 an inodedep structure), chaos ensues. The offset in question in inodedep is the offset of the 16 bit count field in the ipfw2 ipfw_dyn_rule.
Submitted by: Jim Kuhn <jkuhn@sandvine.com> Reviewed by: "Evgueni V. Gavrilov" <aquatique@rusunix.org> Reviewed by: Ben Pfountz <netprince@vt.edu> MFC after: 1 week
|
121119 |
15-Oct-2003 |
sam |
purge extraneous ';'s
Supported by: FreeBSD Foundation Noticed by: bde
|
121093 |
14-Oct-2003 |
sam |
Lock ip forwarding route cache. While we're at it, remove the global variable ipforward_rt by introducing an ip_forward_cacheinval() call to use to invalidate the cache.
Supported by: FreeBSD Foundation
|
121091 |
14-Oct-2003 |
sam |
remove dangling ';'s` that were harmless
Supported by: FreeBSD Foundation
|
120891 |
07-Oct-2003 |
ume |
- fix typo in comment. - style.
Obtained from: KAME
|
120887 |
07-Oct-2003 |
ume |
nuke unused ICMPV6CTL_NAMES and KEYCTL_NAMES macros.
|
120885 |
07-Oct-2003 |
ume |
return(code) -> return (code)
Obtained from: KAME
|
120727 |
04-Oct-2003 |
sam |
Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested.
Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
|
120721 |
03-Oct-2003 |
sam |
hookup ctlinput for fast ipsec versions of esp+ah protocols
Supported by: FreeBSD Foundation
|
120714 |
03-Oct-2003 |
sam |
place some kernel-specific data structures under #ifdef _KERNEL
Sponsored by: FreeBSD Foundation
|
120699 |
03-Oct-2003 |
bms |
Shorten 'bad gateway' AF_LINK message.
Submitted by: green
|
120698 |
03-Oct-2003 |
bms |
Make arp_rtrequest()'s 'bad gateway' messages slightly more informative, to aid me in tracking down LLINFO inconsistencies in the routing table.
Discussed with: fenner
|
120685 |
03-Oct-2003 |
bms |
Only delete the route if arplookup() tried to create it. Do not delete RTF_STATIC routes. Do not check for RTF_HOST so as to avoid being DoSed when an RTF_GENMASK route exists in the table.
Add a more verbose comment about exactly what this code does.
Submitted by: ru
|
120626 |
01-Oct-2003 |
ru |
By popular demand, added the "static ARP" per-interface option.
|
120435 |
25-Sep-2003 |
ume |
add /*CONSTCOND*/ to reduce diffs against latest KAME.
Obtained from: KAME
|
120418 |
24-Sep-2003 |
bms |
Fix a logic error in the check to see if arplookup() should free the route.
Noticed by: Mike Hogsett Reviewed by: ru
|
120386 |
23-Sep-2003 |
sam |
o update PFIL_HOOKS support to current API used by netbsd o revamp IPv4+IPv6+bridge usage to match API changes o remove pfil_head instances from protosw entries (no longer used) o add locking o bump FreeBSD version for 3rd party modules
Heavy lifting by: "Max Laier" <max@love2party.net> Supported by: FreeBSD Foundation Obtained from: NetBSD (bits of pfil.h and pfil.c)
|
120383 |
23-Sep-2003 |
bms |
Fix a bug in arplookup(), whereby a hostile party on a locally attached network could exhaust kernel memory, and cause a system panic, by sending a flood of spoofed ARP requests.
Approved by: jake (mentor) Reported by: Apple Product Security <product-security@apple.com>
|
120373 |
23-Sep-2003 |
marcus |
Grrr...add the Skinny alias code forgotten in the last commit.
|
120372 |
23-Sep-2003 |
marcus |
Add Cisco Skinny Station protocol support to libalias, natd, and ppp. Skinny is the protocol used by Cisco IP phones to talk to Cisco Call Managers. With this code, one can use a Cisco IP phone behind a FreeBSD NAT gateway.
Currently, having the Call Manager behind the NAT gateway is not supported. More information on enabling Skinny support in libalias, natd, and ppp can be found in those applications' manpages.
PR: 55843 Reviewed by: ru Approved by: ru MFC after: 30 days
|
120182 |
17-Sep-2003 |
sam |
Bandaid locking change: mark static rule mutex recursive so re-entry when sending an ICMP packet doesn't cause a panic. A better solution is needed; possibly defering the transmit to a dedicated thread.
Observed by: "Aaron Wohl" <freebsd@soith.com>
|
120181 |
17-Sep-2003 |
sam |
shuffle code so we don't "continue" and miss a needed unlock operation
Observed by: Wiktor Niesiobedzki <w@evip.pl>
|
120141 |
17-Sep-2003 |
sam |
Add locking.
o change timeout to MPSAFE callout o restructure rule deletion to deal with locking requirements o replace static buffer used for ipfw control operations with malloc'd storage
Sponsored by: FreeBSD Foundation
|
120140 |
17-Sep-2003 |
sam |
Minor fixups + add locking.
o change time to MPSAFE callout o make debug printfs conditional on DUMMYNET_DEBUG and runtime controllable by net.inet.ip.dummynet.debug o make boot-time printf dependent on bootverbose
Sponsored by: FreeBSD Foundation
|
119995 |
11-Sep-2003 |
ru |
Fix a bunch of off-by-one errors in the range checking code.
|
119932 |
09-Sep-2003 |
ru |
Fixed -Wpointer-arith warning.
Submitted by: Stefan Farfeleder PR: bin/56653
|
119893 |
08-Sep-2003 |
ru |
mdoc(7): Use the new feature of the .In macro.
|
119792 |
06-Sep-2003 |
sam |
Add locking.
Special thanks to Pavlin Radoslavov <pavlin@icir.org> for testing and fixing numerous problems.
Sponsored by: FreeBSD Foundation Reviewed by: Pavlin Radoslavov <pavlin@icir.org>
|
119753 |
05-Sep-2003 |
sam |
lock ip fragment queues
Submitted by: Robert Watson <rwatson@freebsd.org> Obtained from: BSD/OS
|
119752 |
05-Sep-2003 |
sam |
o add locking o move the global divsrc socket address to a local variable instead of locking it
Sponsored by: FreeBSD Foundation
|
119705 |
03-Sep-2003 |
bms |
PR: kern/56343 Reviewed by: tjr Approved by: jake (mentor)
|
119644 |
01-Sep-2003 |
silby |
Implement MBUF_STRESS_TEST mark II.
Changes from the original implementation:
- Fragmentation is handled by the function m_fragment, which can be called from whereever fragmentation is needed. Note that this function is wrapped in #ifdef MBUF_STRESS_TEST to discourage non-testing use.
- m_fragment works slightly differently from the old fragmentation code in that it allocates a seperate mbuf cluster for each fragment. This defeats dma_map_load_mbuf/buffer's feature of coalescing adjacent fragments. While that is a nice feature in practice, it nerfed the usefulness of mbuf_stress_test.
- Add two modes of random fragmentation. Chains with fragments all of the same random length and chains with fragments that are each uniquely random in length may now be requested.
|
119640 |
01-Sep-2003 |
sam |
add locking
NB: There is a known LOR on the forwarding path; this needs to be resolved together with a similar issue in the bridge. For the moment it is believed to be benign.
Sponsored by: FreeBSD Fondation
|
119635 |
01-Sep-2003 |
sam |
remove warning about use of old divert sockets; this was marked for removal before 5.2
Reviewed by: silence on -net and -arch
|
119634 |
01-Sep-2003 |
sam |
add locking
Sponsored by: FreeBSD Foundation
|
119541 |
28-Aug-2003 |
rwatson |
Remove redundant initialization of rti; SLIST_FOREACH does that for us.
|
119489 |
26-Aug-2003 |
rwatson |
M_PREPEND() with an argument of M_TRYWAIT can fail, meaning the returned mbuf can be NULL. Check for NULL in rip_output() when prepending an IP header. This prevents mbuf exhaustion from causing a local kernel panic when sending raw IP packets.
PR: kern/55886 Reported by: Pawel Malachowski <pawmal-posting@freebsd.lublin.pl> MFC after: 3 days
|
119401 |
24-Aug-2003 |
hsu |
Remove redundant bzero.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
119245 |
21-Aug-2003 |
rwatson |
Introduce two new MAC Framework and MAC policy entry points:
mac_reflect_mbuf_icmp() mac_reflect_mbuf_tcp()
These entry points permit MAC policies to do "update in place" changes to the labels on ICMP and TCP mbuf headers when an ICMP or TCP response is generated to a packet outside of the context of an existing socket. For example, in respond to a ping or a RST packet to a SYN on a closed port.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
119181 |
20-Aug-2003 |
rwatson |
Before digging into IGMP locking, do a whitespace and prototype cleanup: prefer tabs to 8 spaces, focus on consistent indentation, prefer modern C function prototypes. Not all the way to style(9), but substantially closer.
|
119180 |
20-Aug-2003 |
rwatson |
Move from a custom-crafted singly-linked list to the SLIST_* macros from queue(3).
Improve vertical compactness by using a IGMP_PRINTF() macro rather than #ifdefing IGMP_DEBUG a large number of debugging printfs.
Reviewed by: mdodd (SLIST changes)
|
119178 |
20-Aug-2003 |
bms |
Add the IP_ONESBCAST option, to enable undirected IP broadcasts to be sent on specific interfaces. This is required by aodvd, and may in future help us in getting rid of the requirement for BPF from our import of isc-dhcp.
Suggested by: fenestro Obtained from: BSD/OS Reviewed by: mini, sam Approved by: jake (mentor)
|
119137 |
19-Aug-2003 |
sam |
Change instances of callout_init that specify MPSAFE behaviour to use CALLOUT_MPSAFE instead of "1" for the second parameter. This does not change the behaviour; it just makes the intent more clear.
|
119134 |
19-Aug-2003 |
hsu |
* Bug fix in bw_meter_process(): the periodically processed bins of bw_meter entries were processed up to one second ahead. After an unappropriate rescheduling of some of the bw_meter entries, the upcalls weren't delivered.
* pim_register_prepare() uses the appropriate sw_csum flag to call ip_fragment() so the IP checksum is computed properly.
* Modify pim_register_prepare() to take care of IP packets that don't need fragmentation.
* Add-back in_delayed_cksum() to encap_send(), because it seems it should be there.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
119132 |
19-Aug-2003 |
sam |
add missing unlock when in_pcballoc returns an error
|
119071 |
18-Aug-2003 |
obrien |
style.Makefile(5)
|
119017 |
17-Aug-2003 |
gordon |
Stage 3 of dynamic root support. Make all the libraries needed to run binaries in /bin and /sbin installed in /lib. Only the versioned files reside in /lib, the .so symlink continues to live /usr/lib so the toolchain doesn't need to be modified.
|
118864 |
13-Aug-2003 |
harti |
The syncache has made use of TCPDEBUG problematic, because the SYN segments are lost for the application. This broke, for example, ports/benchmarks/dbs which needs the SYN segment to filter the contents of the trace buffer for the connection it is interested in.
This patch makes the SYN segments available again. Unfortunately they are now associated with the listening socket instead of the new one, so a change to applications is required, but without this patch it wouldn't work altogether.
PR: kern/45966
|
118862 |
13-Aug-2003 |
harti |
The tcp_trace call needs the length of the header. Unfortunately the code has rotten a bit so that the header length is not correct at the point when tcp_trace is called. Temporarily compute the correct value before the call and restore the old value after. This makes ports/benchmarks/dbs to almost work.
This is a NOP unless you compile with TCPDEBUG.
|
118861 |
13-Aug-2003 |
harti |
A number of patches in the last years have created new return paths in tcp_input that leave the function before hitting the tcp_trace function call for the TCPDEBUG option. This has made TCPDEBUG mostly useless (and tools like ports/benchmarks/dbs not working). Add tcp_trace calls to the return paths that could be identified in this maze.
This is a NOP unless you compile with TCPDEBUG.
|
118823 |
12-Aug-2003 |
harti |
Change the code that enables/disables the ATM channel to use the new ATMIOCOPENVCC/CLOSEVCC. This allows us to not only use UBR channels for IP over ATM, but also CBR, VBR and ABR. Change the format of the link layer address to specify the channel characteristics. The old format is still supported and opens UBR channels.
|
118623 |
07-Aug-2003 |
hsu |
New PIM header files.
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
118622 |
07-Aug-2003 |
hsu |
1. Basic PIM kernel support Disabled by default. To enable it, the new "options PIM" must be added to the kernel configuration file (in addition to MROUTING):
options MROUTING # Multicast routing options PIM # Protocol Independent Multicast
2. Add support for advanced multicast API setup/configuration and extensibility.
3. Add support for kernel-level PIM Register encapsulation. Disabled by default. Can be enabled by the advanced multicast API.
4. Implement a mechanism for "multicast bandwidth monitoring and upcalls".
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
118607 |
07-Aug-2003 |
jhb |
Consistently use the BSD u_int and u_short instead of the SYSV uint and ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent.
Requested by: bde (kern_ktrace.c)
|
118552 |
06-Aug-2003 |
harti |
Ups. I forgot this one in the SIOCATMENA/SIOCATMDIS removal commit.
This change allows one to specify almost the complete traffic parameters for IPoverATM channels through the routing table. Up to now we used 4 byte DL addresses (flag, vpi, vciH, vciL). This format is still allowed. If the address is longer, however, the 5th byte is interpreted as the traffic class (UBR, CBR, VBR or ABR) and the remaining bytes are the parameters for this traffic class:
UBR: 0 byte or 3 byte PCR CBR: 3 byte PCR VBR: 3 byte PCR, 3 byte SCR, 3 byte MBS ABR: 3 byte PCR, 3 byte MCR, 3 byte ICR, 3 byte TBE, 1 byte NRM, 1 byte TRM, 2 bytes ADTF, 1 byte RIF, 1 byte RDF and 1 byte CDF
A script to generate the corresponding 'route add' arguments will follow soon.
|
118501 |
05-Aug-2003 |
hsu |
* makes mfc[MFCTBLSIZ] and vif[MAXVIFS] tables accessible via sysctl: - sysctlbyname("net.inet.ip.mfctable", ...) - sysctlbyname("net.inet.ip.viftable", ...)
This change is needed so netstat can use sysctlbyname() to read the data from those tables. Otherwise, in some cases "netstat -g" may fail to report the multicast forwarding information (e.g., if we run a multicast router on PicoBSD).
* Bug fix: when sending IGMPMSG_WRONGVIF upcall to the multicast routing daemon, set properly "im->im_vif" to the receiving incoming interface of the packet that triggered that upcall rather than to the expected incoming interface of that packet.
* Bug fix: add missing increment of counter "mrtstat.mrts_upcalls"
* Few formatting nits (e.g., replace extra spaces with TABs)
Submitted by: Pavlin Radoslavov <pavlin@icir.org>
|
118499 |
05-Aug-2003 |
harti |
When adding a channel for INET failed at the device level (ioctl) the code used to call rtrequest(RTM_DELETE, ...). This is a problem, because the function that just has called us (route_output) is not really happy with the route it just is creating beeing ripped out from under it. Unfortunately we also cannot return an error from ifa_rtrequest. Therefore mark the route just as RTF_REJECT.
|
118497 |
05-Aug-2003 |
harti |
Make this file to conform more to style(9) before really touching it.
|
118259 |
31-Jul-2003 |
maxim |
o Fix a typo in previous commit.
|
118008 |
25-Jul-2003 |
maxim |
o Do not overwrite saved interrupt priority level by alloc_hash(), use a separate variable. o Restore interrupt priority level before return (no-op in HEAD).
Spotted by: Don Bowman <don@sandvine.com> MFC after: 5 days
|
117897 |
22-Jul-2003 |
sam |
add IPSEC_FILTERGIF suport for FAST_IPSEC
PR: kern/51922 Submitted by: Eric Masson <e-masson@kisoft-services.com> MFC after: 1 week
|
117765 |
19-Jul-2003 |
silby |
Minor fix to the MBUF_STRESS_TEST code so that it keeps pkthdr.len consistant at all times. (Some debugging code I'm working on is tripped otherwise.)
MFC after: 3 days
|
117737 |
18-Jul-2003 |
rwatson |
Add a comment above rip_ctloutput() documenting that the privilege check for raw IP system management operations is often (although not always) implicit due to the namespacing of raw IP sockets. I.e., you have to have privilege to get a raw IP socket, so much of the management code sitting on raw IP sockets assumes that any requests on the socket should be granted privilege.
Obtained from: TrustedBSD Project Product of: France
|
117686 |
17-Jul-2003 |
hsu |
Drop Giant around syncache timer processing.
|
117654 |
15-Jul-2003 |
luigi |
Allow set 31 to be used for rules other than 65535. Set 31 is still special because rules belonging to it are not deleted by the "ipfw flush" command, but must be deleted explicitly with "ipfw delete set 31" or by individual rule numbers.
This implement a flexible form of "persistent rules" which you might want to have available even after an "ipfw flush". Note that this change does not violate POLA, because you could not use set 31 in a ruleset before this change.
sbin/ipfw changes to allow manipulation of set 31 will follow shortly.
Suggested by: Paul Richards
|
117650 |
15-Jul-2003 |
hsu |
Unify the "send high" and "recover" variables as specified in the lastest rev of the spec. Use an explicit flag for Fast Recovery. [1]
Fix bug with exiting Fast Recovery on a retransmit timeout diagnosed by Lu Guohan. [2]
Reviewed by: Thomas Henderson <thomas.r.henderson@boeing.com> Reported and tested by: Lu Guohan <lguohan00@mails.tsinghua.edu.cn> [2] Approved by: Thomas Henderson <thomas.r.henderson@boeing.com>, Sally Floyd <floyd@acm.org> [1]
|
117468 |
12-Jul-2003 |
luigi |
Implement comments embedded into ipfw2 instructions.
Since we already had 'O_NOP' instructions which always match, all I needed to do is allow the NOP command to have arbitrary length (i.e. move its label in a different part of the switch() which validates instructions).
The kernel must know nothing about comments, everything else is done in userland (which will be described in the upcoming ipfw2.c commit).
|
117327 |
08-Jul-2003 |
luigi |
Merge the handlers of O_IP_SRC_MASK and O_IP_DST_MASK opcodes, and support matching a list of addr/mask pairs so one can write more efficient rulesets which were not possible before e.g.
add 100 skipto 1000 not src-ip 10.0.0.0/8,127.0.0.1/8,192.168.0.0/16
The change is fully backward compatible. ipfw2 and manpage commit to follow.
MFC after: 3 days
|
117241 |
04-Jul-2003 |
luigi |
Implement the 'ipsec' option to match packets coming out of an ipsec tunnel. Should work with both regular and fast ipsec (mutually exclusive). See manpage for more details.
Submitted by: Ari Suutari (ari.suutari@syncrontech.com) Revised by: sam MFC after: 1 week
|
117240 |
04-Jul-2003 |
luigi |
Correct some comments, add opcode O_IPSEC to match packets coming out of an ipsec tunnel.
|
116982 |
28-Jun-2003 |
luigi |
Remove a stale comment, fix indentation.
|
116981 |
28-Jun-2003 |
luigi |
whitespace fix
|
116778 |
24-Jun-2003 |
luigi |
remove unused file (ipfw2 is the default in RELENG_5 and above; the old ipfw1 has been unused and unmaintained for a long time).
|
116764 |
23-Jun-2003 |
luigi |
Fix typo in a (commented out) debugging string.
Spotted by: diff
|
116763 |
23-Jun-2003 |
luigi |
Remove whitespace at end of line.
|
116690 |
22-Jun-2003 |
luigi |
Add support for multiple values and ranges for the "iplen", "ipttl", "ipid" options. This feature has been requested by several users. On passing, fix some minor bugs in the parser. This change is fully backward compatible so if you have an old /sbin/ipfw and a new kernel you are not in trouble (but you need to update /sbin/ipfw if you want to use the new features).
Document the changes in the manpage.
Now you can write things like
ipfw add skipto 1000 iplen 0-500
which some people were asking to give preferential treatment to short packets.
The 'MFC after' is just set as a reminder, because I still need to merge the Alpha/Sparc64 fixes for ipfw2 (which unfortunately change the size of certain kernel structures; not that it matters a lot since ipfw2 is entirely optional and not the default...)
PR: bin/48015
MFC after: 1 week
|
116462 |
17-Jun-2003 |
silby |
Map icmp time exceeded responses to EHOSTUNREACH rather than 0 (no error); this makes connect act more sensibly in these cases.
PR: 50839 Submitted by: Barney Wolff <barney@pit.databus.com> Patch delayed by laziness of: silby MFC after: 1 week
|
116315 |
13-Jun-2003 |
ru |
In the PKT_ALIAS_PROXY_ONLY mode, make sure to preserve the original source IP address, as promised in the manual page.
Spotted by: Vaclav Petricek
|
116314 |
13-Jun-2003 |
ru |
Removed a couple of .Xo/.Xc that are leftovers of the "ninth-argument limit" mdoc(7) atavism.
|
116313 |
13-Jun-2003 |
ru |
Clarify that original address and port when doing transparent proxying are _destination_ address and port.
|
116312 |
13-Jun-2003 |
ru |
Added myself to the AUTHORS section.
|
116020 |
08-Jun-2003 |
charnier |
The .Fn function
|
115909 |
06-Jun-2003 |
rwatson |
When setting fragment queue pointers to NULL, or comparing them with NULL, use NULL rather than 0 to improve readability.
|
115824 |
04-Jun-2003 |
hsu |
Compensate for decreasing the minimum retransmit timeout.
Reviewed by: jlemon
|
115793 |
04-Jun-2003 |
ticso |
Change handling to support strong alignment architectures such as alpha and sparc64.
PR: alpha/50658 Submitted by: rizzo Tested on: alpha
|
115750 |
02-Jun-2003 |
kbyanc |
Account for packets processed at layer-2 (i.e. net.link.ether.ipfw=1).
MFC after: 2 weeks
|
115650 |
01-Jun-2003 |
ru |
A new API function PacketAliasRedirectDynamic() can be used to mark a fully specified static link as dynamic; i.e. make it a one-time link.
|
115648 |
01-Jun-2003 |
ru |
Make the PacketAliasSetAddress() function call optional. If it is not called, and no static rules match an outgoing packet, the latter retains its source IP address. This is in support of the "static NAT only" mode.
|
115612 |
01-Jun-2003 |
phk |
Remove unused variables.
Found by: FlexeLint
|
115503 |
31-May-2003 |
phk |
Add /* FALLTHROUGH */
Found by: FlexeLint
|
115471 |
31-May-2003 |
wollman |
Don't generate an ip_id for packets with the DF bit set; ip_id is only meaningful for fragments. Also don't bother to byte-swap the ip_id when we do generate it; it is only used at the receiver as a nonce. I tried several different permutations of this code with no measurable difference to each other or to the unmodified version, so I've settled on the one for which gcc seems to generate the best code. (If anyone cares to microoptimize this differently for an architecture where it actually matters, feel free.)
Suggested by: Steve Bellovin's paper in IMW'02
|
114794 |
07-May-2003 |
rwatson |
Correct a bug introduced with reduced TCP state handling; make sure that the MAC label on TCP responses during TIMEWAIT is properly set from either the socket (if available), or the mbuf that it's responding to.
Unfortunately, this is made somewhat difficult by the TCP code, as tcp_twstart() calls tcp_twrespond() after discarding the socket but without a reference to the mbuf that causes the "response". Passing both the socket and the mbuf works arounds this--eventually it might be good to make sure the mbuf always gets passed in in "response" scenarios but working through this provided to complicate things too much.
Approved by: re (scottl) Reviewed by: hsu Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
114788 |
06-May-2003 |
rwatson |
Trim a call to mac_create_mbuf_from_mbuf() since m_tag meta-data copying for mbuf headers now works properly in m_dup_pkthdr(), so we don't need to do an explicit copy.
Approved by: re (jhb) Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
114259 |
29-Apr-2003 |
mdodd |
Add definitions for IN6ADDR_LINKLOCAL_ALLMDNS_INIT and INADDR_ALLMDNS_GROUP.
|
114258 |
29-Apr-2003 |
mdodd |
IP_RECVTTL socket option.
Reviewed by: Stuart Cheshire <cheshire@apple.com>
|
114216 |
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
113799 |
21-Apr-2003 |
obrien |
Explicitly declare 'int' parameters.
|
113755 |
20-Apr-2003 |
obrien |
style.Makefile(5)
|
113384 |
12-Apr-2003 |
silby |
Rename MBUF_FRAG_TEST to MBUF_STRESS_TEST as it will be extended to include more than just frag tests.
|
113345 |
10-Apr-2003 |
rwatson |
Remove a potential panic condition introduced by reduced TCP wait state. Those changed attempted to work around the changed invariant that inp->in_socket was sometimes now NULL, but the logic wasn't quite right, meaning that inp->in_socket would be dereferenced by cr_canseesocket() if security.bsd.see_other_uids, jail, or MAC were in use. Attempt to clarify and correct the logic.
Note: the work-around originally introduced with the reduced TCP wait state handling to use cr_cansee() instead of cr_canseesocket() in this case isn't really right, although it "Does the right thing" for most of the cases in the base system. We'll need to address this at some point in the future.
Pointed out by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
113255 |
08-Apr-2003 |
des |
Introduce an M_ASSERTPKTHDR() macro which performs the very common task of asserting that an mbuf has a packet header. Use it instead of hand- rolled versions wherever applicable.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
|
113074 |
04-Apr-2003 |
des |
Replace memcpy() and ovbcopy() with bcopy(); ditch some caddr_t usage.
|
112985 |
02-Apr-2003 |
mdodd |
Back out support for RFC3514.
RFC3514 poses an unacceptale risk to compliant systems.
|
112983 |
02-Apr-2003 |
mdodd |
- Use the correct constant define. - Add a missing break.
|
112973 |
02-Apr-2003 |
mdodd |
Sync constant define with NetBSD.
Requested by: Tom Spindler <dogcow@babymeat.com>
|
112957 |
01-Apr-2003 |
hsu |
Observe conservation of packets when entering Fast Recovery while doing Limited Transmit. Only artificially inflate the congestion window by 1 segment instead of the usual 3 to take into account the 2 already sent by Limited Transmit.
Approved in principle by: Mark Allman <mallman@grc.nasa.gov>, Hari Balakrishnan <hari@nms.lcs.mit.edu>, Sally Floyd <floyd@icir.org>
|
112929 |
01-Apr-2003 |
mdodd |
Implement support for RFC 3514 (The Security Flag in the IPv4 Header). (See: ftp://ftp.rfc-editor.org/in-notes/rfc3514.txt)
This fulfills the host requirements for userland support by way of the setsockopt() IP_EVIL_INTENT message.
There are three sysctl tunables provided to govern system behavior.
net.inet.ip.rfc3514:
Enables support for rfc3514. As this is an Informational RFC and support is not yet widespread this option is disabled by default.
net.inet.ip.hear_no_evil
If set the host will discard all received evil packets.
net.inet.ip.speak_no_evil
If set the host will discard all transmitted evil packets.
The IP statistics counter 'ips_evil' (available via 'netstat') provides information on the number of 'evil' packets recieved.
For reference, the '-E' option to 'ping' has been provided to demonstrate and test the implementation.
|
112711 |
27-Mar-2003 |
maxim |
Fix indentation.
|
112710 |
27-Mar-2003 |
maxim |
o Protect set_fs_param() by splimp(9).
Quote from kern/37573:
There is an obvious race in netinet/ip_dummynet.c:config_pipe(). Interrupts are not blocked when changing the params of an existing pipe. The specific crash observed:
... -> config_pipe -> set_fs_parms -> config_red
malloc a new w_q_lookup table but take an interrupt before intializing it, interrupt handler does:
... -> dummynet_io -> red_drops
red_drops dereferences the uninitialized (zeroed) w_q_lookup table.
o Flush accumulated credits for idle pipes. o Flush accumulated credits when change pipe characteristics. o Change dn_flow_queue.numbytes type to unsigned long.
Overlapping dn_flow_queue->numbytes in ready_event() leads to numbytes becomes negative and SET_TICKS() macro returns a very big value. heap_insert() overlaps dn_key again and inserts a queue to a ready heap with a sched_time points to the past. That leads to an "infinity" loop.
PR: kern/33234, kern/37573, misc/42459, kern/43133, kern/44045, kern/48099 Submitted by: Mike Hibler <mike@cs.utah.edu> (kern/37573) MFC after: 6 weeks
|
112675 |
26-Mar-2003 |
rwatson |
Modify the mac_init_ipq() MAC Framework entry point to accept an additional flags argument to indicate blocking disposition, and pass in M_NOWAIT from the IP reassembly code to indicate that blocking is not OK when labeling a new IP fragment reassembly queue. This should eliminate some of the WITNESS warnings that have started popping up since fine-grained IP stack locking started going in; if memory allocation fails, the creation of the fragment queue will be aborted.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
112650 |
25-Mar-2003 |
mux |
Try to make the MBUF_FRAG_TEST code work better.
- Don't try to fragment the packet if it's smaller than mbuf_frag_size. - Preserve the size of the mbuf chain which is modified by m_split(). - Check that m_split() didn't return NULL. - Make it so we don't end up with two M_PKTHDR mbuf in the chain. - Use m->m_pkthdr.len instead of m->m_len so that we fragment the whole chain and not just the first mbuf. - Fix a nearby style bug and rework the logic of the loops so that it's more clear.
This is still not quite right, because we're clearly abusing m_split() to do something it was not designed for, but at least it works now. We should probably move this code into a m_fragment() function when it's correct.
|
112591 |
25-Mar-2003 |
silby |
Add the MBUF_FRAG_TEST option. When compiled in, this option allows you to tell ip_output to fragment all outgoing packets into mbuf fragments of size net.inet.ip.mbuf_frag_size bytes. This is an excellent way to test if network drivers can properly handle long mbuf chains being passed to them.
net.inet.ip.mbuf_frag_size defaults to 0 (no fragmentation) so that you can at least boot before your network driver dies. :)
|
112482 |
22-Mar-2003 |
mux |
Use __packed instead of __attribute__((__packed__)).
|
112465 |
21-Mar-2003 |
mdodd |
Add a sysctl node allowing the specification of an address mask to use when replying to ICMP Address Mask Request packets.
|
112464 |
21-Mar-2003 |
mdodd |
Add comments regarding the ICMP timestamp fields.
|
112250 |
15-Mar-2003 |
cjc |
Add a 'verrevpath' option that verifies the interface that a packet comes in on is the same interface that we would route out of to get to the packet's source address. Essentially automates an anti-spoofing check using the information in the routing table.
Experimental. The usage and rule format for the feature may still be subject to change.
|
112191 |
13-Mar-2003 |
hsu |
Greatly simplify the unlocking logic by holding the TCP protocol lock until after FIN_WAIT_2 processing.
Helped with debugging: Doug Barton
|
112171 |
13-Mar-2003 |
hsu |
Add support for RFC 3390, which allows for a variable-sized initial congestion window.
|
112162 |
12-Mar-2003 |
hsu |
Implement the Limited Transmit algorithm (RFC 3042).
|
112148 |
12-Mar-2003 |
sam |
correct two more flag misuses; m_tag* use malloc flags
|
112010 |
08-Mar-2003 |
jlemon |
Remove check for t_state == TCPS_TIME_WAIT and introduce the tw structure.
Sponsored by: DARPA, NAI Labs
|
112009 |
08-Mar-2003 |
jlemon |
Remove a panic(); if the zone allocator can't provide more timewait structures, reuse the oldest one. Also move the expiry timer from a per-structure callout to the tcp slow timer.
Sponsored by: DARPA, NAI Labs
|
111926 |
05-Mar-2003 |
peter |
Finish driving a stake through the heart of netns and the associated ifdefs scattered around the place - its dead Jim!
The SMB stuff had stolen AF_NS, make it official.
|
111888 |
04-Mar-2003 |
jlemon |
Update netisr handling; Each SWI now registers its queue, and all queue drain routines are done by swi_net, which allows for better queue control at some future point. Packets may also be directly dispatched to a netisr instead of queued, this may be of interest at some installations, but currently defaults to off.
Reviewed by: hsu, silby, jayanth, sam Sponsored by: DARPA, NAI Labs
|
111748 |
02-Mar-2003 |
des |
More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
|
111560 |
26-Feb-2003 |
jlemon |
In timewait state, if the incoming segment is a pure in-sequence ack that matches snd_max, then do not respond with an ack, just drop the segment. This fixes a problem where a simultaneous close results in an ack loop between two time-wait states.
Test case supplied by: Tim Robbins <tjr@FreeBSD.ORG> Sponsored by: DARPA, NAI Labs
|
111549 |
26-Feb-2003 |
jlemon |
The TCP protocol lock may still be held if the reassembly queue dropped FIN. Detect this case and drop the lock accordingly.
Sponsored by: DARPA, NAI Labs
|
111541 |
26-Feb-2003 |
silby |
Fix a condition so that ip reassembly queues are emptied immediately when maxfragpackets is dropped to 0.
Noticed by: bmah
|
111483 |
25-Feb-2003 |
rwatson |
When generating a TCP response to a connection, not only test if the tcpcb is NULL, but also its connected inpcb, since we now allow elements of a TCP connection to hang around after other state, such as the socket, has been recycled.
Tested by: dcs Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
111479 |
25-Feb-2003 |
maxim |
style(9): join lines.
|
111478 |
25-Feb-2003 |
maxim |
Ip reassembly queue structure has ipq_nfrags now. Count a number of dropped ip fragments precisely.
Reviewed by: silby
|
111459 |
25-Feb-2003 |
hsu |
Hold the TCP protocol lock while modifying the connection hash table.
|
111405 |
24-Feb-2003 |
silby |
Fix a comment which didn't match the new cookie behavior.
Submitted by: Scott Renfro <scott@renfro.org> MFC after: 1 day
|
111389 |
24-Feb-2003 |
hsu |
tcp_twstart() need to be called with the TCP protocol lock held to avoid a race condition with the TCP timer routines.
|
111386 |
24-Feb-2003 |
hsu |
Pass the right function to callout_reset() for a compressed TIME-WAIT control block.
|
111338 |
23-Feb-2003 |
silby |
Improve the security and performance of syncookies:
Security improvements: - Increase the size of each syncookie secret from 32 to 128 bits in order to make brute force attacks on the secrets much more difficult. - Always return the lowest order dword from the MD5 hash; this allows us to expose 2 more bits of the cookie and makes ACK floods which seek to guess the cookie value more difficult.
Performance improvements: - Increase the lifetime of each syncookie from 4 seconds to 16 seconds. This increases the usefulness of syncookies during an attack. - From Yahoo!: Reduce the number of calls to MD5Update; this results in a ~17% increase in cookie generation time here.
Reviewed by: hsu, jayanth, jlemon, nectar MFC After: 15 seconds
|
111319 |
23-Feb-2003 |
jlemon |
Yesterday just wasn't my day. Remove testing delta that crept into the diff.
Pointy hat provided by: sam
|
111275 |
23-Feb-2003 |
sam |
Add a new config option IPSEC_FILTERGIF to control whether or not packets coming out of a GIF tunnel are re-processed by ipfw, et. al. By default they are not reprocessed. With the option they are.
This reverts 1.214. Prior to that change packets were not re-processed. After they were which caused problems because packets do not have distinguishing characteristics (like a special network if) that allows them to be filtered specially.
This is really a stopgap measure designed for immediate MFC so that 4.8 has consistent handling to what was in 4.7.
PR: 48159 Reviewed by: Guido van Rooij <guido@gvr.org> MFC after: 1 day
|
111266 |
22-Feb-2003 |
jlemon |
Check to see if the TF_DELACK flag is set before returning from tcp_input(). This unbreaks delack handling, while still preserving correct T/TCP behavior
Tested by: maxim Sponsored by: DARPA, NAI Labs
|
111244 |
22-Feb-2003 |
silby |
Add the ability to limit the number of IP fragments allowed per packet, and enable it by default, with a limit of 16.
At the same time, tweak maxfragpackets downward so that in the worst possible case, IP reassembly can use only 1/2 of all mbuf clusters.
MFC after: 3 days Reviewed by: hsu Liked by: bmah
|
111231 |
21-Feb-2003 |
phk |
- m = m_gethdr(M_NOWAIT, MT_HEADER); + m = m_gethdr(M_DONTWAIT, MT_HEADER);
'nuff said.
|
111205 |
21-Feb-2003 |
cjc |
The ancient and outdated concept of "privileged ports" in UNIX-type OSes has probably caused more problems than it ever solved. Allow the user to retire the old behavior by specifying their own privileged range with,
net.inet.ip.portrange.reservedhigh default = IPPORT_RESERVED - 1 net.inet.ip.portrange.reservedlo default = 0
Now you can run that webserver without ever needing root at all. Or just imagine, an ftpd that can really drop privileges, rather than just set the euid, and still do PORT data transfers from 20/tcp.
Two edge cases to note,
# sysctl net.inet.ip.portrange.reservedhigh=0
Opens all ports to everyone, and,
# sysctl net.inet.ip.portrange.reservedhigh=65535
Locks all network activity to root only (which could actually have been achieved before with ipfw(8), but is somewhat more complicated).
For those who stick to the old religion that 0-1023 belong to root and root alone, don't touch the knobs (or even lock them by raising securelevel(8)), and nothing changes.
|
111186 |
20-Feb-2003 |
jlemon |
Remove unused variables in the IPSEC case.
Submitted by: Lars Eggert <larse@ISI.EDU>
|
111153 |
19-Feb-2003 |
jlemon |
Unbreak non-IPV6 compilation.
Caught by: phk Sponsored by: DARPA, NAI Labs
|
111145 |
19-Feb-2003 |
jlemon |
Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket.
Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
|
111144 |
19-Feb-2003 |
jlemon |
Convert tcp_fillheaders(tp, ...) -> tcpip_fillheaders(inp, ...) so the routine does not require a tcpcb to operate. Since we no longer keep template mbufs around, move pseudo checksum out of this routine, and merge it with the length update.
Sponsored by: DARPA, NAI Labs
|
111140 |
19-Feb-2003 |
jlemon |
Correct comments.
|
111139 |
19-Feb-2003 |
jlemon |
Clean up delayed acks and T/TCP interactions: - delay acks for T/TCP regardless of delack setting - fix bug where a single pass through tcp_input might not delay acks - use callout_active() instead of callout_pending()
Sponsored by: DARPA, NAI Labs
|
111119 |
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
111037 |
17-Feb-2003 |
maxim |
o Fix ipfw uid rules: socheckuid() returns 0 when uid matches a socket cr_uid.
Note: we do not have socheckuid() in RELENG_4, ip_fw2.c uses its own macro for a similar purpose that is why ipfw2 in RELENG_4 processes uid rules correctly. I will MFC the diff for code consistency.
Reported by: Oleg Baranov <ol@csa.ru> Reviewed by: luigi MFC after: 1 month
|
110896 |
15-Feb-2003 |
hsu |
Take advantage of pre-existing lock-free synchronization and type stable memory to avoid acquiring SMP locks during expensive copyout process.
|
110830 |
13-Feb-2003 |
hsu |
The protocol lock is always held in the dropafterack case, so we don't need to check for it at runtime.
|
110775 |
12-Feb-2003 |
hsu |
in_pcbnotifyall() requires an exclusive protocol lock for notify functions which modify the connection list, namely, tcp_notify().
|
110737 |
12-Feb-2003 |
hsu |
Properly document that syncache timer processing requires an exclusive TCP protocol lock.
|
110683 |
11-Feb-2003 |
tanimura |
s/IPSSEC/IPSEC/
|
110656 |
10-Feb-2003 |
hsu |
Get cosmetic changes out of the way before I add routing table SMP locks.
|
110544 |
08-Feb-2003 |
orion |
Avoid multiply for preemptive arp calculation since it hits every ethernet packet sent.
Prompted by: Jeffrey Hsu <hsu@FreeBSD.org>
|
110308 |
04-Feb-2003 |
orion |
MFS 1.64.2.22: Re-enable non pre-emptive ARP requests.
Submitted by: "Diomidis Spinellis" <dds@aueb.gr> PR: kern/46116
|
110251 |
02-Feb-2003 |
cjc |
Add the TCP flags to the log message whenever log_in_vain is 1, not just when set to 2.
PR: kern/43348 MFC after: 5 days
|
110178 |
01-Feb-2003 |
silby |
Move a comment and optimize the frag timeout code a slight bit.
Submitted by: maxim MFC with: The previous two revisions
|
110074 |
30-Jan-2003 |
sam |
FAST_IPSEC bandaid: act like KAME and ignore ENOENT error codes from ipsec4_process_packet; they happen when a packet is dropped because an SA acquire is initiated
Submitted by: Doug Ambrisko <ambrisko@verniernetworks.com>
|
110073 |
30-Jan-2003 |
sam |
remove the restriction on build a kernel with FAST_IPSEC and INET6; you still don't want to use the two together, but it's ok to have them in the same kernel (the problem that initiated this bandaid has long since been fixed)
|
110023 |
29-Jan-2003 |
silby |
Fix a bug with syncookies; previously, the syncache's MSS size was not initialized until after a syncookie was generated. As a result, all connections resulting from a returned cookie would end up using a MSS of ~512 bytes. Now larger packets will be used where possible.
MFC after: 5 days
|
110008 |
28-Jan-2003 |
phk |
Check bounds for index before dereferencing memory past end of array.
Found by: FlexeLint
|
109996 |
28-Jan-2003 |
hsu |
Avoid lock order reversal by expanding the scope of the AF_INET radix tree lock to cover the ARP data structures.
|
109965 |
28-Jan-2003 |
silby |
A few fixes to rev 1.221
- Honor the previous behavior of maxfragpackets = 0 or -1 - Take a better stab at fragment statistics - Move / correct a comment
Suggested by: maxim@ MFC after: 7 days
|
109843 |
26-Jan-2003 |
silby |
Merge the best parts of maxfragpackets and maxnipq together. (Both functions implemented approximately the same limits on fragment memory usage, but in different fashions.)
End user visible changes: - Fragment reassembly queues are freed in a FIFO manner when maxfragpackets has been reached, rather than all reassembly stopping.
MFC after: 5 days
|
109623 |
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
109569 |
20-Jan-2003 |
maxim |
De-anonymity a couple of messages I missed in a previous sweep. Move one of them under DEB macro.
Noticed by: Wiktor Niesiobedzki <w@evip.pl>
|
109566 |
20-Jan-2003 |
maxim |
If the first action is O_LOG adjust a pointer to the real one, unbreaks skipto + log rules.
Reported by: Wiktor Niesiobedzki <w@evip.pl> MFC after: 1 week
|
109492 |
18-Jan-2003 |
hsu |
Optimize away call to bzero() in the common case by directly checking if a connection has any cached TAO information.
|
109451 |
18-Jan-2003 |
hsu |
Fix long-standing bug predating FreeBSD where calling connect() twice on a raw ip socket will crash the system with a null-dereference.
|
109409 |
17-Jan-2003 |
hsu |
SMP locking for ARP.
|
109246 |
14-Jan-2003 |
dillon |
Introduce the ability to flag a sysctl for operation at secure level 2 or 3 in addition to secure level 1. The mask supports up to a secure level of 8 but only add defines through CTLFLAG_SECURE3 for now.
As per the missif in the log entry for 1.11 of ip_fw2.c which added the secure flag to the IPFW sysctl's in the first place, change the secure level requirement from 1 to 3 now that we have support for it.
Reviewed by: imp With Design Suggestions by: imp
|
109175 |
13-Jan-2003 |
hsu |
Fix NewReno.
Reviewed by: Tom Henderson <thomas.r.henderson@boeing.com>
|
109035 |
10-Jan-2003 |
tmm |
Clear the target hardware address field when generating an ARP request.
Reviewed by: nectar MFC after: 1 week
|
108703 |
05-Jan-2003 |
hsu |
Validate inp before de-referencing it.
Submitted by: pb
|
108533 |
01-Jan-2003 |
schweikh |
Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.
|
108466 |
30-Dec-2002 |
sam |
Correct mbuf packet header propagation. Previously, packet headers were sometimes propagated using M_COPY_PKTHDR which actually did something between a "move" and a "copy" operation. This is replaced by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it from the source mbuf) and m_dup_pkthdr which copies the packet header contents including any m_tag chain. This corrects numerous problems whereby mbuf tags could be lost during packet manipulations.
These changes also introduce arguments to m_tag_copy and m_tag_copy_chain to specify if the tag copy work should potentially block. This introduces an incompatibility with openbsd which we may want to revisit.
Note that move/dup of packet headers does not handle target mbufs that have a cluster bound to them. We may want to support this; for now we watch for it with an assert.
Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.
Supported by: Vernier Networks Reviewed by: Robert Watson <rwatson@FreeBSD.org>
|
108464 |
30-Dec-2002 |
dillon |
Remove the PAWS ack-on-ack debugging printf().
Note that the original RFC 1323 (PAWS) says in 4.2.1 that the out of order / reverse-time-indexed packet should be acknowledged as specified in RFC-793 page 69 then dropped. The original PAWS code in FreeBSD (1994) simply acknowledged the segment unconditionally, which is incorrect, and was fixed in 1.183 (2002). At the moment we do not do checks for SYN or FIN in addition to (tlen != 0), which may or may not be correct, but the worst that ought to happen should be a retry by the sender.
|
108461 |
30-Dec-2002 |
sam |
correct style bogons
|
108327 |
27-Dec-2002 |
iedowse |
Bridged packets are supplied to the firewall with their IP header in network byte order, but icmp_error() expects the IP header to be in host order and the code here did not perform the necessary swapping for the bridged case. This bug causes an "icmp_error: bad length" panic when certain length IP packets (e.g. ip_len == 0x100) are rejected by the firewall with an ICMP response.
MFC after: 3 days
|
108265 |
24-Dec-2002 |
hsu |
Validate inp to prevent an use after free.
|
108258 |
24-Dec-2002 |
maxim |
o De-anonymity dummynet(4) and ipfw(4) messages, prepend them by 'dummynet: ' and 'ipfw: ' prefixes.
PR: kern/41609
|
108250 |
24-Dec-2002 |
hsu |
SMP locking for radix nodes.
|
108180 |
22-Dec-2002 |
pb |
Remove forgotten INP_UNLOCK(inp) in my previous commit. Reported by: hsu
|
108160 |
21-Dec-2002 |
pb |
In syncache_timer(), don't attempt to lock the inpcb structure associated with the syncache entry: in case tcp_close() has been called on the corresponding listening socket, the lock has been destroyed as a side effect of in_pcbdetach(), causing a panic when we attempt to lock on it.
Reviewed by: hsu
|
108144 |
21-Dec-2002 |
sam |
replace the special-purpose rate-limiting code with the general facility just added; this tries to maintain the same behaviour vis a vis printing the rate-limiting messages but need tweaking
|
108125 |
20-Dec-2002 |
hsu |
Eliminate a goto. Fix some line breaks.
|
108123 |
20-Dec-2002 |
hsu |
Unravel a nested conditional. Remove an unneeded local variable.
|
108112 |
20-Dec-2002 |
hsu |
Expand scope of TCP protocol lock to cover syncache data structures.
|
108107 |
19-Dec-2002 |
bmilekic |
o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}. o Fix a bpf_compat issue where malloc() was defined to just call bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT flag (and only one of those two).
Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)
|
108033 |
18-Dec-2002 |
hsu |
Lock up ifaddr reference counts.
|
107983 |
17-Dec-2002 |
phk |
Remove unused and incorrectly maintained variable "in_interfaces"
|
107961 |
17-Dec-2002 |
dillon |
Fix syntax in last commit.
|
107900 |
15-Dec-2002 |
maxim |
o Trim EOL whitespaces.
MFC after: 1 week
|
107899 |
15-Dec-2002 |
maxim |
o s/if_name[16]/if_name[IFNAMSIZ]/
Reviewed by: luigi MFC after: 1 week
|
107898 |
15-Dec-2002 |
maxim |
o M_DONTWAIT is mbuf(9) flag: malloc(M_DONTWAIT) -> malloc(M_NOWAIT). The bug does not affect anything because M_NOWAIT == M_DONTWAIT.
Reviewed by: luigi MFC after: 1 week
|
107897 |
15-Dec-2002 |
maxim |
o Fix byte order logging issue: sa.sin_port is already in host byte order.
PR: kern/45964 Submitted by: Sascha Blank <sblank@tiscali.de> Reviewed by: luigi MFC after: 1 week
|
107881 |
14-Dec-2002 |
dillon |
Change tcp.inflight_min from 1024 to a production default of 6144. Create a sysctl for the stabilization value for the bandwidth delay product (inflight) algorithm and document it.
MFC after: 3 days
|
107854 |
14-Dec-2002 |
dillon |
Bruce forwarded this tidbit from an analysis Van Jacobson did on an apparent ack-on-ack problem with FreeBSD. Prof. Jacobson noticed a case in our TCP stack which would acknowledge a received ack-only packet, which is not legal in TCP.
Submitted by: Van Jacobson <van@packetdesign.com>, bmah@packetdesign.com (Bruce A. Mah) MFC after: 7 days
|
107670 |
07-Dec-2002 |
sobomax |
MFS: recognize gre packets used in the WCCP protocol.
Approved by: re
|
107114 |
20-Nov-2002 |
luigi |
Move fw_one_pass from ip_fw2.c to ip_input.c so that neither bridge.c nor if_ethersubr.c depend on IPFIREWALL. Restore the use of fw_one_pass in if_ethersubr.c
ipfw.8 will be updated with a separate commit.
Approved by: re
|
107113 |
20-Nov-2002 |
luigi |
Back out some style changes. They are not urgent, I will put them back in after 5.0 is out.
Requested by: sam Approved by: re
|
107112 |
20-Nov-2002 |
luigi |
Back out the ip_fragment() code -- it is not urgent to have it in now, I will put it back in in a better form after 5.0 is out.
Requested by: sam, rwatson, luigi (on second thought) Approved by: re
|
107081 |
19-Nov-2002 |
silby |
Add a sysctl to control the generation of source quench packets, and set it to 0 by default.
Partially obtained from: NetBSD Suggested by: David Gilbert MFC after: 5 days
|
107022 |
17-Nov-2002 |
luigi |
Fix function headers and remove 'register' variable declarations.
|
107020 |
17-Nov-2002 |
luigi |
Move the ip_fragment code from ip_output() to a separate function, so that it can be reused elsewhere (there is a number of places where it can be useful). This also trims some 200 lines from the body of ip_output(), which helps readability a bit.
(This change was discussed a few weeks ago on the mailing lists, Julian agreed, silence from others. It is not a functional change, so i expect it to be ok to commit it now but i am happy to back it out if there are objections).
While at it, fix some function headers and replace m_copy() with m_copypacket() where applicable.
MFC after: 1 week
|
107018 |
17-Nov-2002 |
luigi |
Minor documentation changes and indentation fix.
Replace m_copy() with m_copypacket() where applicable.
While at it, fix some function headers and remove 'register' from variable declarations.
|
107017 |
17-Nov-2002 |
luigi |
Cleanup some of the comments, and reformat long lines.
Replace m_copy() with m_copypacket() where applicable.
Replace "if (a.s_addr ...)" with "if (a.s_addr != INADDR_ANY ...)" to make it clear what the code means.
While at it, fix some function headers and remove 'register' from variable declarations.
MFC after: 3 days
|
106968 |
15-Nov-2002 |
luigi |
Massive cleanup of the ip_mroute code.
No functional changes, but:
+ the mrouting module now should behave the same as the compiled-in version (it did not before, some of the rsvp code was not loaded properly); + netinet/ip_mroute.c is now truly optional; + removed some redundant/unused code; + changed many instances of '0' to NULL and INADDR_ANY as appropriate; + removed several static variables to make the code more SMP-friendly; + fixed some minor bugs in the mrouting code (mostly, incorrect return values from functions).
This commit is also a prerequisite to the addition of support for PIM, which i would like to put in before DP2 (it does not change any of the existing APIs, anyways).
Note, in the process we found out that some device drivers fail to properly handle changes in IFF_ALLMULTI, leading to interesting behaviour when a multicast router is started. This bug is not corrected by this commit, and will be fixed with a separate commit.
Detailed changes: -------------------- netinet/ip_mroute.c all the above. conf/files make ip_mroute.c optional net/route.c fix mrt_ioctl hook netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here together with other rsvp code, and a couple of indentation fixes. netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks netinet/ip_var.h rsvp function hooks netinet/raw_ip.c hooks for mrouting and rsvp functions, plus interface cleanup. netinet/ip_mroute.h remove an unused and optional field from a struct
Most of the code is from Pavlin Radoslavov and the XORP project
Reviewed by: sam MFC after: 1 week
|
106935 |
14-Nov-2002 |
sam |
track changes to not strip the Ethernet header from input packets
Reviewed by: many Approved by: re
|
106934 |
14-Nov-2002 |
sam |
track bpf changes
Reviewed by: many Approved by: re
|
106846 |
13-Nov-2002 |
maxim |
Due to a memory alignment sizeof(struct ipfw_flow_id) is bigger than ipfw_flow_id structure actual size and bcmp(3) may fail to compare them properly. Compare members of these structures instead.
PR: kern/44078 Submitted by: Oleg Bulyzhin <oleg@rinet.ru> Reviewed by: luigi MFC after: 2 weeks
|
106824 |
12-Nov-2002 |
hsu |
Turn off duplicate lock checking for inp locks because udp_input() intentionally locks two inp records simultaneously.
|
106736 |
10-Nov-2002 |
sam |
a better solution to building FAST_IPSEC w/o INET6
Submitted by: Jeffrey Hsu <hsu@FreeBSD.org>
|
106696 |
09-Nov-2002 |
alfred |
Fix instances of macros with improperly parenthasized arguments.
Verified by: md5
|
106681 |
08-Nov-2002 |
sam |
temporarily disallow FAST_IPSEC and INET6 to avoid potential panics; will correct this before 5.0 release
|
106680 |
08-Nov-2002 |
sam |
FAST_IPSEC fixups:
o fix #ifdef typo o must use "bounce functions" when dispatched from the protosw table
don't know how this stuff was missed in my testing; must've committed the wrong bits
Pointy hat: sam Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>
|
106679 |
08-Nov-2002 |
sam |
fixup FAST_IPSEC build w/o INET6
|
106678 |
08-Nov-2002 |
sam |
correct fast ipsec logic: compare destination ip address against the contents of the SA, not the SP
Submitted by: "Doug Ambrisko" <ambrisko@verniernetworks.com>
|
106625 |
08-Nov-2002 |
jhb |
Cast a ptrdiff_t to an int to printf.
|
106271 |
31-Oct-2002 |
jeff |
- Consistently update snd_wl1, snd_wl2, and rcv_up in the header prediction code. Previously, 2GB worth of header predicted data could leave these variables too far out of sequence which would cause problems after receiving a packet that did not match the header prediction.
Submitted by: Bill Baumann <bbaumann@isilon.com> Sponsored by: Isilon Systems, Inc. Reviewed by: hsu, pete@isilon.com, neal@isilon.com, aaronp@isilon.com
|
106198 |
30-Oct-2002 |
hsu |
Don't need to check if SO_OOBINLINE is defined. Don't need to protect isipv6 conditional with INET6. Fix leading indentation in 2 lines.
|
106152 |
29-Oct-2002 |
fenner |
Renumber IPPROTO_DIVERT out of the range of valid IP protocol numbers. This allows socket() to return an error when the kernel is not built with IPDIVERT, and doesn't prevent future applications from using the "borrowed" IP protocol number. The sysctl net.inet.raw.olddiverterror controls whether opening a socket with the "borrowed" IP protocol fails with an accompanying kernel printf; this code should last only a couple of releases.
Approved by: re
|
106118 |
29-Oct-2002 |
maxim |
Lower a priority of "session drop" messages.
Requested by: Eugene Grosbein <eugen@kuzbass.ru> MFC after: 3 days
|
105899 |
24-Oct-2002 |
mux |
Oops, forgot to commit this file. This is part of the fix for ipfw2 panics on sparc64.
|
105887 |
24-Oct-2002 |
mux |
Fix ipfw2 panics on 64-bit platforms.
Quoting luigi:
In order to make the userland code fully 64-bit clean it may be necessary to commit other changes that may or may not cause a minor change in the ABI.
Reviewed by: luigi
|
105886 |
24-Oct-2002 |
luigi |
src and dst address were erroneously swapped in SRC_SET and DST_SET commands. Use the correct one. Also affects ipfw2 in -stable.
|
105856 |
24-Oct-2002 |
mux |
Fix kernel build on sparc64 in the IPDIVERT case.
|
105840 |
24-Oct-2002 |
iedowse |
Unbreak the automatic remapping of an INADDR_ANY destination address to the primary local IP address when doing a TCP connect(). The tcp_connect() code was relying on in_pcbconnect (actually in_pcbladdr) modifying the passed-in sockaddr, and I failed to notice this in the recent change that added in_pcbconnect_setup(). As a result, tcp_connect() was ending up using the unmodified sockaddr address instead of the munged version.
There are two cases to handle: if in_pcbconnect_setup() succeeds, then the PCB has already been updated with the correct destination address as we pass it pointers to inp_faddr and inp_fport directly. If in_pcbconnect_setup() fails due to an existing but dead connection, then copy the destination address from the old connection.
|
105775 |
23-Oct-2002 |
maxim |
Kill EOL spaces.
Approved by: luigi MFC after: 1 week
|
105774 |
23-Oct-2002 |
maxim |
Use syslog for messages about dropped sessions, do not flood a console.
Suggested by: Eugene Grosbein <eugen@kuzbass.ru> Approved by: luigi MFC after: 1 week
|
105748 |
22-Oct-2002 |
suz |
fixed a kernel crash by "ifconfig stf0 inet 1.2.3.4" MFC after: 1 week
|
105651 |
21-Oct-2002 |
iedowse |
Implement a new IP_SENDSRCADDR ancillary message type that permits a server process bound to a wildcard UDP socket to select the IP address from which outgoing packets are sent on a per-datagram basis. When combined with IP_RECVDSTADDR, such a server process can guarantee to reply to an incoming request using the same source IP address as the destination IP address of the request, without having to open one socket per server IP address.
Discussed on: -net Approved by: re
|
105649 |
21-Oct-2002 |
iedowse |
Remove the "temporary connection" hack in udp_output(). In order to send datagrams from an unconnected socket, we used to first block input, then connect the socket to the sendmsg/sendto destination, send the datagram, and finally disconnect the socket and unblock input.
We now use in_pcbconnect_setup() to check if a connect() would have succeeded, but we never record the connection in the PCB (local anonymous port allocation is still recorded, though). The result from in_pcbconnect_setup() authorises the sending of the datagram and selects the local address and port to use, so we just construct the header and call ip_output().
Discussed on: -net Approved by: re
|
105629 |
21-Oct-2002 |
iedowse |
Replace in_pcbladdr() with a more generic inner subroutine for in_pcbconnect() called in_pcbconnect_setup(). This version performs all of the functions of in_pcbconnect() except for the final committing of changes to the PCB. In the case of an EADDRINUSE error it can also provide to the caller the PCB of the duplicate connection, avoiding an extra in_pcblookup_hash() lookup in tcp_connect().
This change will allow the "temporary connect" hack in udp_output() to be removed and is part of the preparation for adding the IP_SENDSRCADDR control message.
Discussed on: -net Approved by: re
|
105586 |
20-Oct-2002 |
phk |
Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on anywhere and it would make us incompatible with other BSD netstacks to retain this version.
Add a CTASSERT protecting sizeof(struct ip) == 20.
Don't let the size of struct ipq depend on the IPDIVERT option.
This is a functional no-op commit.
Approved by: re
|
105570 |
20-Oct-2002 |
rwatson |
When a packet is multicast encapsulated, give labeled policies the opportunity to preserve the label.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
105565 |
20-Oct-2002 |
iedowse |
Split out most of the logic from in_pcbbind() into a new function called in_pcbbind_setup() that does everything except commit the changes to the PCB. There should be no functional change here, but in_pcbbind_setup() will be used by the soon-to-appear IP_SENDSRCADDR control message implementation to check or allocate the source address and port.
Discussed on: -net Approved by: re
|
105440 |
19-Oct-2002 |
mux |
Several malloc() calls were passing the M_DONTWAIT flag which is an mbuf allocation flag. Use the correct M_NOWAIT malloc() flag. Fortunately, both were defined to 1, so this commit is a no-op.
|
105340 |
17-Oct-2002 |
ume |
last arg of in6?_gif_output() is not used any more.
Obtained from: KAME MFC after: 3 weeks
|
105301 |
16-Oct-2002 |
alfred |
de-__P().
|
105295 |
16-Oct-2002 |
ume |
use encapcheck.
Obtained from: KAME MFC after: 3 weeks
|
105293 |
16-Oct-2002 |
ume |
- after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly. - set IFF_UP on SIOCSIFADDR. be consistent with others. - set if_addrlen explicitly (just in case) - multi destination mode is long gone. - missing break statement - add gif_set_tunnel(), so that we can set tunnel address from within the kernel at ease. - encap_attach/detach dynamically on ioctls - move encap_attach() to dedicated function in in*_gif.c
Obtained from: KAME MFC after: 3 weeks
|
105291 |
16-Oct-2002 |
dillon |
Fix oops in my last commit, I was calculating a new length but then not using it. (The code is already correct in -stable).
Found by: silby
|
105218 |
16-Oct-2002 |
guido |
Get rid of checking for ip sec history. It is true that packets are not supposed to be checked by the firewall rules twice. However, because the various ipsec handlers never call ip_input(), this never happens anyway.
This fixes the situation where a gif tunnel is encrypted with IPsec. In such a case, after IPsec processing, the unencrypted contents from the GIF tunnel are fed back to the ipintrq and subsequently handeld by ip_input(). Yet, since there still is IPSec history attached, the packets coming out from the gif device are never fed into the filtering code. This fix was sent to Itojun, and he pointed towartds http://www.netbsd.org/Documentation/network/ipsec/#ipf-interaction. This patch actually implements what is stated there (specifically: Packet came from tunnel devices (gif(4) and ipip(4)) will still go through ipf(4). You may need to identify these packets by using interface name directive in ipf.conf(5).
Reviewed by: rwatson MFC after: 3 weeks
|
105201 |
16-Oct-2002 |
sam |
correct PCB locking in broadcast/multicast case that was exposed by change to use udp_append
Reviewed by: hsu
|
105199 |
16-Oct-2002 |
sam |
Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation).
As noted previously, don't use FAST_IPSEC with INET6 at the moment.
Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
|
105194 |
16-Oct-2002 |
sam |
Replace aux mbufs with packet tags:
o instead of a list of mbufs use a list of m_tag structures a la openbsd o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit ABI/module number cookie o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and use this in defining openbsd-compatible m_tag_find and m_tag_get routines o rewrite KAME use of aux mbufs in terms of packet tags o eliminate the most heavily used aux mbufs by adding an additional struct inpcb parameter to ip_output and ip6_output to allow the IPsec code to locate the security policy to apply to outbound packets o bump __FreeBSD_version so code can be conditionalized o fixup ipfilter's call to ip_output based on __FreeBSD_version
Reviewed by: julian, luigi (silent), -arch, -net, darren Approved by: julian, silence from everyone else Obtained from: openbsd (mostly) MFC after: 1 month
|
104975 |
12-Oct-2002 |
seanc |
Increase the max dummynet hash size from 1024 to 65536. Default is still 1024.
Silence on: -net, -ipfw 4weeks+ Reviewed by: dd Approved by: knu (mentor) MFC after: 3 weeks
|
104825 |
10-Oct-2002 |
dillon |
turn off debugging by default if bandwidth delay product limiting is turned on (it is already off in -stable).
|
104815 |
10-Oct-2002 |
dillon |
Update various comments mainly related to retransmit/FIN that I documented while working on a previous bug.
Fix a PERSIST bug. Properly account for a FIN sent during a PERSIST.
MFC after: 7 days
|
104774 |
10-Oct-2002 |
maxim |
Fix IPOPT_TS processing: do not overwrite IP address by timestamp.
PR: misc/42121 Submitted by: Praveen Khurjekar <praveen@codito.com> Reviewed by: silence on -net MFC after: 1 month
|
104366 |
02-Oct-2002 |
sobomax |
Since bpf is no longer an optional component, remove associated ifdef's.
Submitted by: don't quite remember - the name of the sender disappeared with the rest of my inbox. :(
|
104343 |
02-Oct-2002 |
mike |
Include <sys/cdefs.h> so the visibility conditionals are available. (This should have been included with the previous revision.)
|
104342 |
02-Oct-2002 |
mike |
Use visibility conditionals. Only TCP_NODELAY ends up being defined in the standards case.
|
104226 |
30-Sep-2002 |
dillon |
Guido found another bug. There is a situation with timestamped TCP packets where FreeBSD will send DATA+FIN and A W2K box will ack just the DATA portion. If this occurs after FreeBSD has done a (NewReno) fast-retransmit and is recovering it (dupacks > threshold) it triggers a case in tcp_newreno_partial_ack() (tcp_newreno() in stable) where tcp_output() is called with the expectation that the retransmit timer will be reloaded. But tcp_output() falls through and returns without doing anything, causing the persist timer to be loaded instead. This causes the connection to hang until W2K gives up. This occurs because in the case where only the FIN must be acked, the 'len' calculation in tcp_output() will be 0, a lot of checks will be skipped, and the FIN check will also be skipped because it is designed to handle FIN retransmits, not forced transmits from tcp_newreno().
The solution is to simply set TF_ACKNOW before calling tcp_output() to absolute guarentee that it will run the send code and reset the retransmit timer. TF_ACKNOW is already used for this purpose in other cases.
For some unknown reason this patch also seems to greatly reduce the number of duplicate acks received when Guido runs his tests over a lossy network. It is quite possible that there are other tcp_newreno{_partial_ack()} cases which were not generating the expected output which this patch also fixes.
X-MFC after: Will be MFC'd after the freeze is over
|
104094 |
28-Sep-2002 |
phk |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too.
Inspired by: FlexeLint warning #512
|
104073 |
28-Sep-2002 |
peter |
Zap now-unused SHLIB_MINOR
|
103852 |
23-Sep-2002 |
maxim |
Slightly rearrange a code in rev. 1.164:
o Move len initialization closer to place of its first usage. o Compare len with 0 to improve readability. o Explicitly zero out phlen in ip_insertoptions() in failure case.
Suggested by: jhb Reviewed by: jhb MFC after: 2 weeks
|
103842 |
23-Sep-2002 |
alfred |
s/__attribute__((__packed__))/__packed/g
|
103776 |
22-Sep-2002 |
silby |
Fix issue where shutdown(socket, SHUT_RD) was effectively ignored for TCP sockets.
NetBSD PR: 18185 Submitted by: Sean Boudreau <seanb@qnx.com> MFC after: 3 days
|
103553 |
18-Sep-2002 |
phk |
Use m_fixhdr() rather than roll our own.
|
103505 |
17-Sep-2002 |
dillon |
Guido reported an interesting bug where an FTP connection between a Windows 2000 box and a FreeBSD box could stall. The problem turned out to be a timestamp reply bug in the W2K TCP stack. FreeBSD sends a timestamp with the SYN, W2K returns a timestamp of 0 in the SYN+ACK causing FreeBSD to calculate an insane SRTT and RTT, resulting in a maximal retransmit timeout (60 seconds). If there is any packet loss on the connection for the first six or so packets the retransmit case may be hit (the window will still be too small for fast-retransmit), causing a 60+ second pause. The W2K box gives up and closes the connection.
This commit works around the W2K bug.
15:04:59.374588 FREEBSD.20 > W2K.1036: S 1420807004:1420807004(0) win 65535 <mss 1460,nop,wscale 2,nop,nop,timestamp 188297344 0> (DF) [tos 0x8] 15:04:59.377558 W2K.1036 > FREEBSD.20: S 4134611565:4134611565(0) ack 1420807005 win 17520 <mss 1460,nop,wscale 0,nop,nop,timestamp 0 0> (DF)
Bug reported by: Guido van Rooij <guido@gvr.org>
|
103481 |
17-Sep-2002 |
sobomax |
Remove __RCSID().
Submitted by: bde
|
103479 |
17-Sep-2002 |
maxim |
Explicitly clear M_FRAG flag on a mbuf with the last fragment to unbreak ip fragments reassembling for loopback interface.
Discussed with: bde, jlemon Reviewed by: silence on -net MFC after: 2 weeks
|
103478 |
17-Sep-2002 |
maxim |
In rare cases when there is no room for ip options ip_insertoptions() can fail and corrupt a header length. Initialize len and check what ip_insertoptions() returns.
Reviewed by: archie, silence on -net MFC after: 5 days
|
103444 |
17-Sep-2002 |
jennifer |
Tempary fix for inet6. The final fix is to change in6_pcbnotify to take pcbinfo instead of pcbhead. It is on the way.
|
103176 |
10-Sep-2002 |
sobomax |
Remove superfluous break.
|
103124 |
09-Sep-2002 |
sobomax |
Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach().
MFC after: 28 days (along with other if_gre changes)
|
103032 |
06-Sep-2002 |
sobomax |
Reduce namespace pollution by staticizing everything, which doesn't need to be visible from outside of the module.
|
103026 |
06-Sep-2002 |
sobomax |
Add a new gre(4) driver, which could be used to create GRE (RFC1701) and MOBILE (RFC2004) IP tunnels.
Obrained from: NetBSD
|
102981 |
05-Sep-2002 |
bde |
Fixed namespace pollution in uma changes: - use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite. - don't include <sys/uma.h>. Namespace pollution makes "opaque" types like uma_zone_t perfectly non-opaque. Such types should never be used (see style(9)).
Fixed subsequently grwon dependencies of this header on its own pollution: - include <sys/_mutex.h> and its prerequisite <sys/_lock.h> instead of depending on namespace pollution 2 layers deep in <sys/uma.h>.
|
102967 |
05-Sep-2002 |
bde |
Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 4 layers deep in <netinet/in_pcb.h>.
Removed unused includes. Sorted includes.
|
102925 |
04-Sep-2002 |
sobomax |
Add in_hosteq() and in_nullhost() macros to make life of developers porting NetBSD code a little bit easier.
Obtained from: NetBSD
|
102575 |
29-Aug-2002 |
darrenr |
some ipfilter files that accidently got imported here
|
102515 |
28-Aug-2002 |
darrenr |
This commit was generated by cvs2svn to compensate for changes in r102514, which included commits to RCS files with non-trunk default branches.
|
102412 |
25-Aug-2002 |
charnier |
Replace various spelling with FALLTHROUGH which is lint()able
|
102397 |
25-Aug-2002 |
cjc |
Lock the sysctl(8) knobs that turn ip{,6}fw(8) firewalling and firewall logging on and off when at elevated securelevel(8). It would be nice to be able to only lock these at securelevel >= 3, like rules are, but there is no such functionality at present. I don't see reason to be adding features to securelevel(8) with MAC being merged into 5.0.
PR: kern/39396 Reviewed by: luigi MFC after: 1 week
|
102368 |
24-Aug-2002 |
dillon |
Correct bug in t_bw_rtttime rollover, #undef USERTT
|
102291 |
22-Aug-2002 |
archie |
Replace (ab)uses of "NULL" where "0" is really meant.
|
102227 |
21-Aug-2002 |
mike |
o Merge <machine/ansi.h> and <machine/types.h> into a new header called <machine/_types.h>. o <machine/ansi.h> will continue to live so it can define MD clock macros, which are only MD because of gratuitous differences between architectures. o Change all headers to make use of this. This mainly involves changing: #ifdef _BSD_FOO_T_ typedef _BSD_FOO_T_ foo_t; #undef _BSD_FOO_T_ #endif to: #ifndef _FOO_T_DECLARED typedef __foo_t foo_t; #define _FOO_T_DECLARED #endif
Concept by: bde Reviewed by: jake, obrien
|
102218 |
21-Aug-2002 |
truckman |
Create new functions in_sockaddr(), in6_sockaddr(), and in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked.
Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept().
Reviewed by: suz
|
102131 |
19-Aug-2002 |
jmallett |
Enclose IPv6 addresses in brackets when they are displayed printable with a TCP/UDP port seperated by a colon. This is for the log_in_vain facility.
Pointed out by: Edward J. M. Brocklesby Reviewed by: ume MFC after: 2 weeks
|
102086 |
19-Aug-2002 |
luigi |
Raise limit for port lists to 30 entries/ranges.
Remove a duplicate "logging" message, and identify the firewall as ipfw2 in the boot message.
|
102017 |
17-Aug-2002 |
dillon |
Implement TCP bandwidth delay product window limiting, similar to (but not meant to duplicate) TCP/Vegas. Add four sysctls and default the implementation to 'off'.
net.inet.tcp.inflight_enable enable algorithm (defaults to 0=off) net.inet.tcp.inflight_debug debugging (defaults to 1=on) net.inet.tcp.inflight_min minimum window limit net.inet.tcp.inflight_max maximum window limit
MFC after: 1 week
|
102002 |
17-Aug-2002 |
hsu |
Cosmetic-only changes for readability.
Reviewed by: (early form passed by) bde Approved by: itojun (from core@kame.net)
|
101978 |
16-Aug-2002 |
luigi |
sys/netinet/ip_fw2.c:
Implement the M_SKIP_FIREWALL bit in m_flags to avoid loops for firewall-generated packets (the constant has to go in sys/mbuf.h).
Better comments on keepalive generation, and enforce dyn_rst_lifetime and dyn_fin_lifetime to be less than dyn_keepalive_period.
Enforce limits (up to 64k) on the number of dynamic buckets, and retry allocation with smaller sizes.
Raise default number of dynamic rules to 4096.
Improved handling of set of rules -- now you can atomically enable/disable multiple sets, move rules from one set to another, and swap sets.
sbin/ipfw/ipfw2.c:
userland support for "noerror" pipe attribute.
userland support for sets of rules.
minor improvements on rule parsing and printing.
sbin/ipfw/ipfw.8:
more documentation on ipfw2 extensions, differences from ipfw1 (so we can use the same manpage for both), stateful rules, and some additional examples. Feedback and more examples needed here.
|
101975 |
16-Aug-2002 |
alfred |
make the strings for tcptimers, tanames and prurequests const to silence warnings.
|
101948 |
15-Aug-2002 |
rwatson |
Code formatting sync to trustedbsd_mac: don't perform an assignment in an if clause.
PR: Submitted by: Reviewed by: Approved by: Obtained from: MFC after:
|
101934 |
15-Aug-2002 |
rwatson |
Rename mac_check_socket_receive() to mac_check_socket_deliver() so that we can use the names _receive() and _send() for the receive() and send() checks. Rename related constants, policy implementations, etc.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101928 |
15-Aug-2002 |
hsu |
Reset dupack count in header prediction. Follow-on to rev 1.39.
Reviewed by: jayanth, Thomas R Henderson <thomas.r.henderson@boeing.com>, silby, dillon
|
101927 |
15-Aug-2002 |
luigi |
Kernel support for a dummynet option: When a pipe or queue has the "noerror" attribute, do not report drops to the caller (ip_output() and friends). (2 lines to implement it, 2 lines to document it.)
This will let you simulate losses on the sender side as if they happened in the middle of the network, i.e. with no explicit feedback to the sender.
manpage and ipfw2.c changes to follow shortly, together with other ipfw2 changes.
Requested by: silby MFC after: 3 days
|
101921 |
15-Aug-2002 |
rwatson |
It's now sufficient to rely on a nested include of _label.h to make sure all structures in ip_var.h are defined, so remove include of mac.h.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101920 |
15-Aug-2002 |
rwatson |
Perform a nested include of _label.h if #ifdef _KERNEL. This will satisfy consumers of ip_var.h that need a complete definition of struct ipq and don't include mac.h.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101919 |
15-Aug-2002 |
rwatson |
Add mac.h -- raw_ip.c was depending on nested inclusion of mac.h which is no longer present.
Pointed out by: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101843 |
13-Aug-2002 |
phk |
remove spurious printf
|
101713 |
12-Aug-2002 |
jennifer |
Assert that the inpcb lock is held when calling tcp_output().
Approved by: hsu
|
101628 |
10-Aug-2002 |
luigi |
One bugfix and one new feature.
The bugfix (ipfw2.c) makes the handling of port numbers with a dash in the name, e.g. ftp-data, consistent with old ipfw: use \\ before the - to consider it as part of the name and not a range separator.
The new feature (all this description will go in the manpage):
each rule now belongs to one of 32 different sets, which can be optionally specified in the following form:
ipfw add 100 set 23 allow ip from any to any
If "set N" is not specified, the rule belongs to set 0.
Individual sets can be disabled, enabled, and deleted with the commands:
ipfw disable set N ipfw enable set N ipfw delete set N
Enabling/disabling of a set is atomic. Rules belonging to a disabled set are skipped during packet matching, and they are not listed unless you use the '-S' flag in the show/list commands. Note that dynamic rules, once created, are always active until they expire or their parent rule is deleted. Set 31 is reserved for the default rule and cannot be disabled.
All sets are enabled by default. The enable/disable status of the sets can be shown with the command
ipfw show sets
Hopefully, this feature will make life easier to those who want to have atomic ruleset addition/deletion/tests. Examples:
To add a set of rules atomically:
ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18
To delete a set of rules atomically
ipfw disable set 18 ipfw delete set 18 ipfw enable set 18
To test a ruleset and disable it and regain control if something goes wrong:
ipfw disable set 18 ipfw add ... set 18 ... # repeat as needed ipfw enable set 18 ; echo "done "; sleep 30 && ipfw disable set 18
here if everything goes well, you press control-C before the "sleep" terminates, and your ruleset will be left active. Otherwise, e.g. if you cannot access your box, the ruleset will be disabled after the sleep terminates.
I think there is only one more thing that one might want, namely a command to assign all rules in set X to set Y, so one can test a ruleset using the above mechanisms, and once it is considered acceptable, make it part of an existing ruleset.
|
101405 |
05-Aug-2002 |
silby |
Handle PMTU discovery in syn-ack packets slightly differently; rely on syncache flags instead of directly accessing the route entry.
MFC after: 3 days
|
101335 |
04-Aug-2002 |
luigi |
bugfix: move check for udp_blackhole before the one for icmp_bandlim.
MFC after: 3 days
|
101268 |
03-Aug-2002 |
luigi |
Fix handling of packets which matched an "ipfw fwd" rule on the input side.
|
101239 |
02-Aug-2002 |
rwatson |
When preserving the IP header in extra mbuf in the IP forwarding case, also preserve the MAC label. Note that this mbuf allocation is fairly non-optimal, but not my fault.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101233 |
02-Aug-2002 |
rwatson |
Work to fix LINT build.
Reported by: phk
|
101185 |
01-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Add MAC support for the UDP protocol. Invoke appropriate MAC entry points to label packets that are generated by local UDP sockets, and to authorize delivery of mbufs to local sockets both in the multicast/broadcast case and the unicast case.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101137 |
01-Aug-2002 |
rwatson |
Document the undocumented assumption that at least one of the PCB pointer and incoming mbuf pointer will be non-NULL in tcp_respond(). This is relied on by the MAC code for correctness, as well as existing code.
Obtained from: TrustedBSD PRoject Sponsored by: DARPA, NAI Labs
|
101136 |
01-Aug-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Add support for labeling most out-going ICMP messages using an appropriate MAC entry point. Currently, we do not explicitly label packet reflect (timestamp, echo request) ICMP events, implicitly using the originating packet label since the mbuf is reused. This will be made explicit at some point.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101106 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Instrument the TCP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check socket and mbuf labels before permitting delivery to a socket. Assign labels to newly accepted connections when the syncache/cookie code has done its business. Also set peer labels as convenient. Currently, MAC policies cannot influence the PCB matching algorithm, so cannot implement polyinstantiation. Note that there is at least one case where a PCB is not available due to the TCP packet not being associated with any socket, so we don't label in that case, but need to handle it in a special manner.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101103 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Instrument the raw IP socket code for packet generation and delivery: label outgoing mbufs with the label of the socket, and check the socket and mbuf labels before permitting delivery to a socket, permitting MAC policies to selectively allow delivery of raw IP mbufs to various raw IP sockets that may be open. Restructure the policy checking code to compose IPsec and MAC results in a more readable manner.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101096 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
When fragmenting an IP datagram, invoke an appropriate MAC entry point so that MAC labels may be copied (...) to the individual IP fragment mbufs by MAC policies.
When IP options are inserted into an IP datagram when leaving a host, preserve the label if we need to reallocate the mbuf for alignment or size reasons.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101095 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Instrument the code managing IP fragment reassembly queues (struct ipq) to invoke appropriate MAC entry points to maintain a MAC label on each queue. Permit MAC policies to associate information with a queue based on the mbuf that caused it to be created, update that information based on further mbufs accepted by the queue, influence the decision making process by which mbufs are accepted to the queue, and set the label of the mbuf holding the reassembled datagram following reassembly completetion.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101091 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
When generating an IGMP message, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the target interface.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101090 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
When generating an ARP query, invoke a MAC entry point to permit the MAC framework to label its mbuf appropriately for the interface.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101088 |
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Invoke the MAC framework to label mbuf created using divert sockets. These labels may later be used for access control on delivery to another socket, or to an interface.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI LAbs
|
100993 |
30-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Label IP fragment reassembly queues, permitting security features to be maintained on those objects. ipq_label will be used to manage the reassembly of fragments into IP datagrams using security properties. This permits policies to deny the reassembly of fragments, as well as influence the resulting label of a datagram following reassembly.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
100871 |
29-Jul-2002 |
maxim |
Use a common way to release locks before exit.
Reviewed by: hsu
|
100831 |
28-Jul-2002 |
truckman |
Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.
|
100685 |
25-Jul-2002 |
ume |
make setsockopt(IPV6_V6ONLY, 0) actuall work for tcp6.
MFC after: 1 week
|
100683 |
25-Jul-2002 |
ume |
cleanup usage of ip6_mapped_addr_on and ip6_v6only. now, ip6_mapped_addr_on is unified into ip6_v6only.
MFC after: 1 week
|
100589 |
24-Jul-2002 |
luigi |
Only log things net.inet.ip.fw.verbose is set
|
100537 |
23-Jul-2002 |
ru |
Don't forget to recalculate the IP checksum of the original IP datagram embedded into ICMP error message.
Spotted by: tcpdump 3.7.1 (-vvv) MFC after: 3 days
|
100534 |
22-Jul-2002 |
ru |
Don't shrink socket buffers in tcp_mss(), application might have already configured them with setsockopt(SO_*BUF), for RFC1323's scaled windows.
PR: kern/11966 MFC after: 1 week
|
100508 |
22-Jul-2002 |
ume |
do not refer to IN6P_BINDV6ONLY anymore.
Obtained from: KAME MFC after: 1 week
|
100420 |
20-Jul-2002 |
jdp |
Fix overflows in intermediate calculations in sysctl_msec_to_ticks(). At hz values of 1000 and above the overflows caused net.inet.tcp.keepidle to be reported as negative.
MFC after: 3 days
|
100419 |
20-Jul-2002 |
rwatson |
Don't export 'struct ipq' from kernel, instead #ifdef _KERNEL. As kernel data structures pick up security and synchronization primitives, it becomes increasingly desirable not to arbitrarily export them via include files to userland, as the userland applications pick up new #include dependencies.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
100373 |
19-Jul-2002 |
dillon |
Add the tcps_sndrexmitbad statistic, keep track of late acks that caused unnecessary retransmissions.
|
100335 |
18-Jul-2002 |
dillon |
Introduce two new sysctl's:
net.inet.tcp.rexmit_min (default 3 ticks equiv)
This sysctl is the retransmit timer RTO minimum, specified in milliseconds. This value is designed for algorithmic stability only.
net.inet.tcp.rexmit_slop (default 200ms)
This sysctl is the retransmit timer RTO slop which is added to every retransmit timeout and is designed to handle protocol stack overheads and delayed ack issues.
Note that the *original* code applied a 1-second RTO minimum but never applied real slop to the RTO calculation, so any RTO calculation over one second would have no slop and thus not account for protocol stack overheads (TCP timestamps are not a measure of protocol turnaround!). Essentially, the original code made the RTO calculation almost completely irrelevant.
Please note that the 200ms slop is debateable. This commit is not meant to be a line in the sand, and if the community winds up deciding that increasing it is the correct solution then it's easy to do. Note that larger values will destroy performance on lossy networks while smaller values may result in a greater number of unnecessary retransmits.
|
100288 |
18-Jul-2002 |
luigi |
Move IPFW2 definition before including ip_fw.h
Make indentation of new parts consistent with the style used for this file.
|
100270 |
17-Jul-2002 |
dillon |
I don't know how the minimum retransmit timeout managed to get set to one second but it badly breaks throughput on networks with minor packet loss.
Complaints by: at least two people tracked down to this. MFC after: 3 days
|
100228 |
17-Jul-2002 |
luigi |
Fix a panic when doing "ipfw add pipe 1 log ..."
Also synchronize ip_dummynet.c with the version in RELENG_4 to ease MFC's.
|
100004 |
14-Jul-2002 |
luigi |
Implement keepalives for dynamic rules, so they will not expire just because you leave your session idle.
Also, put in a fix for 64-bit architectures (to be revised).
In detail:
ip_fw.h
* Reorder fields in struct ip_fw to avoid alignment problems on 64-bit machines. This only masks the problem, I am still not sure whether I am doing something wrong in the code or there is a problem elsewhere (e.g. different aligmnent of structures between userland and kernel because of pragmas etc.)
* added fields in dyn_rule to store ack numbers, so we can generate keepalives when the dynamic rule is about to expire
ip_fw2.c
* use a local function, send_pkt(), to generate TCP RST for Reset rules;
* save about 250 bytes by cleaning up the various snprintf() in ipfw_log() ...
* ... and use twice as many bytes to implement keepalives (this seems to be working, but i have not tested it extensively).
Keepalives are generated once every 5 seconds for the last 20 seconds of the lifetime of a dynamic rule for an established TCP flow. The packets are sent to both sides, so if at least one of the endpoints is responding, the timeout is refreshed and the rule will not expire.
You can disable this feature with
sysctl net.inet.ip.fw.dyn_keepalive=0
(the default is 1, to have them enabled).
MFC after: 1 day
(just kidding... I will supply an updated version of ipfw2 for RELENG_4 tomorrow).
|
99891 |
12-Jul-2002 |
luigi |
Avoid dereferencing a null pointer in ro_rt.
This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30.
So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.
|
99869 |
12-Jul-2002 |
truckman |
Back out the previous change, since it looks like locking udbinfo provides sufficient protection.
|
99863 |
12-Jul-2002 |
truckman |
Lock inp while we're accessing it.
|
99838 |
11-Jul-2002 |
truckman |
Defer calling SYSCTL_OUT() until after the locks have been released.
|
99837 |
11-Jul-2002 |
truckman |
Reduce the nesting level of a code block that doesn't need to be in an else clause.
|
99642 |
09-Jul-2002 |
luigi |
Change one variable to make it easier to switch between ipfw and ipfw2
|
99623 |
08-Jul-2002 |
luigi |
Fix a bug caused by dereferencing an invalid pointer when no punch_fw was used. Fix another couple of bugs which prevented rules from being installed properly.
On passing, use IPFW2 instead of NEW_IPFW to compile the new code, and slightly simplify the instruction generation code.
|
99622 |
08-Jul-2002 |
luigi |
No functional changes, but:
Following Darren's suggestion, make Dijkstra happy and rewrite the ipfw_chk() main loop removing a lot of goto's and using instead a variable to store match status.
Add a lot of comments to explain what instructions are supposed to do and how -- this should ease auditing of the code and make people more confident with it.
In terms of code size: the entire file takes about 12700 bytes of text, about 3K of which are for the main function, ipfw_chk(), and 2K (ouch!) for ipfw_log().
|
99621 |
08-Jul-2002 |
luigi |
Remove one unused command name.
|
99620 |
08-Jul-2002 |
luigi |
Forgot to update one field name in one of the latest commits.
|
99475 |
05-Jul-2002 |
luigi |
Implement the last 2-3 missing instructions for ipfw, now it should support all the instructions of the old ipfw.
Fix some bugs in the user interface, /sbin/ipfw.
Please check this code against your rulesets, so i can fix the remaining bugs (if any, i think they will be mostly in /sbin/ipfw).
Once we have done a bit of testing, this code is ready to be MFC'ed, together with a bunch of other changes (glue to ipfw, and also the removal of some global variables) which have been in -current for a couple of weeks now.
MFC after: 7 days
|
99207 |
01-Jul-2002 |
brian |
Remove trailing whitespace
|
99156 |
30-Jun-2002 |
jesper |
Extend the effect of the sysctl net.inet.tcp.icmp_may_rst so that, if we recieve a ICMP "time to live exceeded in transit", (type 11, code 0) for a TCP connection on SYN-SENT state, close the connection.
MFC after: 2 weeks
|
98982 |
28-Jun-2002 |
jlemon |
One possible code path for syncache_respond() is:
syncache_respond(A), ip_output(), ip_input(), tcp_input(), syncache_badack(B)
Which winds up deleting a different entry from the syncache. Handle this by not utilizing the next entry in the timer chain until after syncache_respond() completes. The case of A == B should not be possible.
Problem found by: Don Bowman <don@sandvine.com>
|
98965 |
28-Jun-2002 |
dfr |
Fix warning.
Reviewed by: luigi
|
98943 |
27-Jun-2002 |
luigi |
The new ipfw code.
This code makes use of variable-size kernel representation of rules (exactly the same concept of BPF instructions, as used in the BSDI's firewall), which makes firewall operation a lot faster, and the code more readable and easier to extend and debug.
The interface with the rest of the system is unchanged, as witnessed by this commit. The only extra kernel files that I am touching are if_fw.h and ip_dummynet.c, which is quite tied to ipfw. In userland I only had to touch those programs which manipulate the internal representation of firewall rules).
The code is almost entirely new (and I believe I have written the vast majority of those sections which were taken from the former ip_fw.c), so rather than modifying the old ip_fw.c I decided to create a new file, sys/netinet/ip_fw2.c . Same for the user interface, which is in sbin/ipfw/ipfw2.c (it still compiles to /sbin/ipfw). The old files are still there, and will be removed in due time.
I have not renamed the header file because it would have required touching a one-line change to a number of kernel files.
In terms of user interface, the new "ipfw" is supposed to accepts the old syntax for ipfw rules (and produce the same output with "ipfw show". Only a couple of the old options (out of some 30 of them) has not been implemented, but they will be soon.
On the other hand, the new code has some very powerful extensions. First, you can put "or" connectives between match fields (and soon also between options), and write things like
ipfw add allow ip from { 1.2.3.4/27 or 5.6.7.8/30 } 10-23,25,1024-3000 to any
This should make rulesets slightly more compact (and lines longer!), by condensing 2 or more of the old rules into single ones.
Also, as an example of how easy the rules can be extended, I have implemented an 'address set' match pattern, where you can specify an IP address in a format like this:
10.20.30.0/26{18,44,33,22,9}
which will match the set of hosts listed in braces belonging to the subnet 10.20.30.0/26 . The match is done using a bitmap, so it is essentially a constant time operation requiring a handful of CPU instructions (and a very small amount of memmory -- for a full /24 subnet, the instruction only consumes 40 bytes).
Again, in this commit I have focused on functionality and tried to minimize changes to the other parts of the system. Some performance improvement can be achieved with minor changes to the interface of ip_fw_chk_t. This will be done later when this code is settled.
The code is meant to compile unmodified on RELENG_4 (once the PACKET_TAG_* changes have been merged), for this reason you will see #ifdef __FreeBSD_version in a couple of places. This should minimize errors when (hopefully soon) it will be time to do the MFC.
|
98904 |
27-Jun-2002 |
mux |
Warning fixes for 64 bits platforms. With this last fix, I can build a GENERIC sparc64 kernel with -Werror.
Reviewed by: luigi
|
98894 |
26-Jun-2002 |
luigi |
Just a comment on some additional consistency checks that could be added here.
|
98849 |
26-Jun-2002 |
ken |
At long last, commit the zero copy sockets code.
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.
ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls.
man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links.
jumbo.9: New man page describing the jumbo buffer allocator interface and operation.
zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality.
NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.
conf/files: Add uipc_jumbo.c and uipc_cow.c.
conf/options: Add the 5 options mentioned above.
kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1.
uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack.
uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive.
uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on.
Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions.
uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c)
if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails.
The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)).
ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives.
if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers.
Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface.
Add header splitting support to the ti(4) driver.
Tweak some of the default interrupt coalescing parameters to more useful defaults.
Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off.
if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13.
Add defines needed for debugging.
Remove the ti_stats structure, it is now defined in sys/tiio.h.
ti_fw.h: 12.4.11 firmware.
ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.)
sys/jumbo.h: Jumbo buffer allocator interface.
sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process.
socketvar.h: Add prototype for socow_setup.
tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions.
uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable.
ufs_readwrite.c:Update for new prototype of uiomoveco().
vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault.
vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre.
This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.)
vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK.
vm_object.h: Add prototype for vm_object_allocate_wait().
vm_page.c: Add page-based copy on write setup, clear and fault routines.
vm_page.h: Add page based COW function prototypes and variable in the vm_page structure.
Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
|
98781 |
24-Jun-2002 |
hsu |
Avoid unlocking the inp twice if badport_bandlim() returns -1.
Reported by: jlemon
|
98769 |
24-Jun-2002 |
hsu |
Style bug: fix 4 space indentations that should have been tabs.
Submitted by: jlemon
|
98704 |
23-Jun-2002 |
luigi |
Slightly restructure the #ifdef INET6 sections to make the code more readable.
Remove the six "register" attributes from variables tcp_output(), the compiler surely knows well how to allocate them.
|
98703 |
23-Jun-2002 |
luigi |
Move two global variables to automatic variables within the only function where they are used (they are used with TCPDEBUG only).
|
98701 |
23-Jun-2002 |
luigi |
Move some global variables in more appropriate places.
Add XXX comments to mark places which need to be taken care of if we want to remove this part of the kernel from Giant.
Add a comment on a potential performance problem with ip_forward()
|
98666 |
23-Jun-2002 |
luigi |
fix bad indentation and whitespace resulting from cut&paste
|
98665 |
23-Jun-2002 |
luigi |
fix indentation of a comment
|
98664 |
23-Jun-2002 |
luigi |
fix a typo in a comment
|
98663 |
23-Jun-2002 |
luigi |
Remove ip_fw_fwd_addr (forgotten in previous commit) remove some extra whitespace.
|
98613 |
22-Jun-2002 |
luigi |
Remove (almost all) global variables that were used to hold packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now.
The variables removed by this change are:
ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet
Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output().
On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide.
Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations.
option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code.
NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed.
* I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary
* this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack.
* despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code).
MFC after: 10 days
|
98598 |
21-Jun-2002 |
hsu |
Fix logic which resulted in missing a call to INP_UNLOCK().
Submitted by: jlemon, mux
|
98596 |
21-Jun-2002 |
hsu |
TCP notify functions can change the pcb list.
|
98459 |
20-Jun-2002 |
peter |
Solve the 'unregistered netisr 18' information notice with a sledgehammer. Register the ISR early, but do not actually kick off the timer until we see some activity. This still saves us from running the arp timers on a system with no network cards.
|
98385 |
18-Jun-2002 |
tanimura |
Remove so*_locked(), which were backed out by mistake.
|
98211 |
14-Jun-2002 |
hsu |
Notify functions can destroy the pcb, so they have to return an indication of whether this happenned so the calling function knows whether or not to unlock the pcb.
Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
|
98204 |
14-Jun-2002 |
silby |
Re-commit w/fix:
Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets.
PR: 39141 MFC after: 2 weeks
This time, make sure that ipv4 specific code (aka all of the above) is only run in the ipv4 case.
|
98203 |
14-Jun-2002 |
silby |
Back out ip_tos/ip_ttl/DF "fix", it just panic'd my box. :)
Pointy-hat to: silby
|
98202 |
14-Jun-2002 |
silby |
Ensure that the syn cache's syn-ack packets contain the same ip_tos, ip_ttl, and DF bits as all other tcp packets.
PR: 39141 MFC after: 2 weeks
|
98191 |
13-Jun-2002 |
hsu |
Because we're holding an exclusive write lock on the head, references to the new inp cannot leak out even though it has been placed on the head list.
|
98147 |
12-Jun-2002 |
hsu |
The UDP head was unlocked too early in one unicast case.
Submitted by: bug reported by arr
|
98135 |
12-Jun-2002 |
hsu |
Fix logic which resulted in missing a call to INP_UNLOCK().
|
98134 |
12-Jun-2002 |
hsu |
Fix typo where INP_INFO_RLOCK should be INP_INFO_RUNLOCK. Submitted by: tegge, jlemon
Prefer LIST_FOREACH macro. Submitted by: jlemon
|
98115 |
11-Jun-2002 |
hsu |
Remember to initialize the control block head mutex.
|
98114 |
11-Jun-2002 |
hsu |
Fix typo.
Submitted by: Kyunghwan Kim <redjade@atropos.snu.ac.kr>
|
98108 |
10-Jun-2002 |
hsu |
Every array elt is initialized in the following loop, so remove unnecessary M_ZERO.
|
98102 |
10-Jun-2002 |
hsu |
Lock up inpcb.
Submitted by: Jennifer Yang <yangjihui@yahoo.com>
|
97658 |
31-May-2002 |
tanimura |
Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by: hsu
|
97627 |
30-May-2002 |
wollman |
Avoid unintentional trigraph.
|
97074 |
21-May-2002 |
arr |
- Change the newly turned INVARIANTS #ifdef blocks (they were changed from DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code readability.
|
97020 |
20-May-2002 |
arr |
- Turn a few DIAGNOSTIC into INVARIANTS since they are really sanity checks.
|
97019 |
20-May-2002 |
arr |
- Turn a DIAGNOSTIC into an INVARIANTS since it's a sanity check. Use proper ``if'' statement style.
|
97018 |
20-May-2002 |
arr |
- Turn a #ifdef DIAGNOSTIC to #ifdef INVARIANTS as the code from this line through the #endif is really a sanity check.
Reviewed by: jake
|
96972 |
20-May-2002 |
tanimura |
Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count - so_options - so_linger - so_state
o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket:
- sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup()
Reviewed by: alfred
|
96624 |
15-May-2002 |
kbyanc |
Reset token-ring source routing control field on receipt of ethernet frame without source routing information. This restores the behaviour in this scenario to that of prior to my last commit.
|
96602 |
14-May-2002 |
rwatson |
Modify the arguments to syncache_socket() to include the mbuf (m) that results in the syncache entry being turned into a socket. While it's not used in the main tree, this is required in the MAC tree so that labels can be propagated from the mbuf to the socket. This is also useful if you're doing things like transparent IP connection hijacking and you want to use the syncache/cookie mechanism, but we won't go there.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
96511 |
13-May-2002 |
luigi |
Add ipfw hooks to ether_demux() and ether_output_frame(). Ipfw processing of frames at layer 2 can be enabled by the sysctl variable
net.link.ether.ipfw=1
Consider this feature experimental, because right now, the firewall is invoked in the places indicated below, and controlled by the sysctl variables listed on the right. As a consequence, a packet can be filtered from 1 to 4 times depending on the path it follows, which might make a ruleset a bit hard to follow.
I will add an ipfw option to tell if we want a given rule to apply to ether_demux() and ether_output_frame(), but we have run out of flags in the struct ip_fw so i need to think a bit on how to implement this.
to upper layers | | +----------->-----------+ ^ V [ip_input] [ip_output] net.inet.ip.fw.enable=1 | | ^ V [ether_demux] [ether_output_frame] net.link.ether.ipfw=1 | | +->- [bdg_forward]-->---+ net.link.ether.bridge_ipfw=1 ^ V | | to devices
|
96509 |
13-May-2002 |
luigi |
Remove custom definitions (IP_FW_TCPF_SYN etc.) of TCP header flags which are the same as the original ones (TH_SYN etc.)
|
96474 |
12-May-2002 |
luigi |
Add code to match MAC header fields (at the moment supported on bridged packets only, soon to come also for packets on ordinary ether_input() and ether_output() paths. The syntax is
ipfw add <action> MAC dst src type
where dst and src can be "any" or a MAC address optionallyfollowed by a mask, e.g.
10:20:30:40:50 10:20:30:40:50/32 10:20:30:40:50&ff:ff:ff:f0:ff:0f
and type can be a single ethernet type, a range, or a type followed by a mask (values are always in hexadecimal) e.g.
0800 0800-0806 0800/8 0800&03ff
Note, I am still uncertain on what is the best format for inputting these values, having the values in hexadecimal is convenient in most cases but can be confusing sometimes. Suggestions welcome.
Implement suggestion from PR 37778 to allow "not me" on destination and source IP. The code in the PR was slightly wrong and interfered with the normal handling of IP addresses. This version hopefully is correct.
Minor cleanup of the code, in some places moving the indentation to 4 spaces because the code was becoming too deep. Eventually, in a separate commit, I will move the whole file to 4 space indent.
|
96432 |
12-May-2002 |
dd |
s/demon/daemon/
|
96431 |
11-May-2002 |
mike |
Remove some duplicate types that should have been removed as part of the rearranging in the previous revision.
Pointy hat to: cvs update (merging), mike (for not noticing)
|
96245 |
09-May-2002 |
luigi |
Cleanup the interface to ip_fw_chk, two of the input arguments were totally useless and have been removed.
ip_input.c, ip_output.c: Properly initialize the "ip" pointer in case the firewall does an m_pullup() on the packet.
Remove some debugging code forgotten long ago.
ip_fw.[ch], bridge.c: Prepare the grounds for matching MAC header fields in bridged packets, so we can have 'etherfw' functionality without a lot of kernel and userland bloat.
|
96184 |
07-May-2002 |
kbyanc |
Move ISO88025 source routing information into sockaddr_dl's sdl_data field. This returns the sdl_data field to a variable-length field. More importantly, this prevents a easily-reproduceable data-corruption bug when the interface name plus the hardware address exceed the sdl_data field's original 12 byte limit. However, token-ring interfaces may still overflow the new sdl_data field's 46 byte limit if the interface name exceeds 6 characters (since 6 characters for interface name plus 6 for hardware address plus 34 for source routing = the size of sdl_data). Further refinements could overcome this limitation but would break binary compatibility; this commit only addresses fixing the bug for commonly-occuring cases without breaking binary compatibility with the intention that the functionality can be MFC'ed to -stable.
See message ID's (both send to -arch): 20020421013332.F87395-100000@gateway.posi.net 20020430181359.G11009-300000@gateway.posi.net for a more thorough description of the bug addressed and how to reproduce it.
Approved by: silence on -arch and -net Sponsored by: NTT Multimedia Communications Labs MFC after: 1 week
|
96116 |
06-May-2002 |
ume |
Revised MLD-related definitions - Used mld_xxx and MLD_xxx instead of mld6_xxx and MLD6_xxx according to the official defintions in rfc2292bis (macro definitions for backward compatibility were provided) - Changed the first member of mld_hdr{} from mld_hdr to mld_icmp6_hdr to avoid name space conflict in C++
This change makes ports/net/pchar compilable again under -CURRENT.
Obtained from: KAME
|
96077 |
05-May-2002 |
luigi |
Indentation and comments cleanup, no functional change.
MFC after: 3 days
|
95883 |
01-May-2002 |
alfred |
Redo the sigio locking.
Turn the sigio sx into a mutex.
Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed.
In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.
|
95867 |
01-May-2002 |
alfred |
Fix some edge cases where bad string handling could occur.
Submitted by: ps
|
95865 |
01-May-2002 |
alfred |
cleanup: fix line wraps, add some comments, fix macro definitions, fix for(;;) loops.
|
95858 |
01-May-2002 |
cjc |
Enlighten those who read the FINE POINTS of the documentation a bit more on how ipfw(8) deals with tiny fragments. While we're at it, add a quick log message to even let people know we dropped a packet. (Note that the second FINE POINT is somewhat redundant given the first, but since the code is there, leave the docs for it.)
MFC after: 1 day
|
95759 |
30-Apr-2002 |
tanimura |
Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by: bde
Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h.
While I am here, sort include files alphabetically, where possible.
|
95552 |
27-Apr-2002 |
tanimura |
Add a global sx sigio_lock to protect the pointer to the sigio object of a socket. This avoids lock order reversal caused by locking a process in pgsigio().
sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now require sigio_lock to be locked. Provide sowwakeup_locked(), soisconnected_locked(), and so on in case where we have to modify a socket and wake up a process atomically.
|
95336 |
24-Apr-2002 |
mike |
Rearrange <netinet/in.h> so that it is easier to conditionalize sections for various standards. Conditionalize sections for various standards. Use standards conforming spelling for types in the sockaddr_in structure.
|
95099 |
20-Apr-2002 |
mike |
Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h> and <sys/socket.h>. Previously, sa_family_t was only typedef'd in <sys/socket.h>.
|
95023 |
19-Apr-2002 |
suz |
just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128)
Reviewed by: ume MFC after: 1 week
|
94394 |
11-Apr-2002 |
suz |
initialize local variable explicitly Reviewed by: ume Obtained from: Fujitsu guys MFC after: 1 week
|
94390 |
10-Apr-2002 |
silby |
Remove some ISN generation code which has been unused since the syncache went in.
MFC after: 3 days
|
94379 |
10-Apr-2002 |
silby |
Totally nuke IPPORT_USERRESERVED, it is no longer used anywhere, update remaining comments to reflect new ephemeral port range.
Reminded by: Maxim Konovalov <maxim@macomnet.ru> MFC after: 3 days
|
94357 |
10-Apr-2002 |
mike |
Unconditionalize the definition of INET_ADDRSTRLEN and INET6_ADDRSTRLEN. Doing this helps expose bogus redefinitions in 3rd party software.
|
94327 |
10-Apr-2002 |
brian |
Remove the code that masks an EEXIST returned from rtinit() when calling ioctl(SIOC[AS]IFADDR).
This allows the following:
ifconfig xx0 inet 1.2.3.1 netmask 0xffffff00 ifconfig xx0 inet 1.2.3.17 netmask 0xfffffff0 alias ifconfig xx0 inet 1.2.3.25 netmask 0xfffffff8 alias ifconfig xx0 inet 1.2.3.26 netmask 0xffffffff alias
but would (given the above) reject this:
ifconfig xx0 inet 1.2.3.27 netmask 0xfffffff8 alias
due to the conflicting netmasks. I would assert that it's wrong to mask the EEXIST returned from rtinit() as in the above scenario, the deletion of the 1.2.3.25 address will leave the 1.2.3.27 address as unroutable as it was in the first place.
Offered for review on: -arch, -net Discussed with: stephen macmanus <stephenm@bayarea.net> MFC after: 3 weeks
|
94326 |
10-Apr-2002 |
brian |
Don't add host routes for interface addresses of 0.0.0.0/8 -> 0.255.255.255.
This change allows bootp to work with more than one interface, at the expense of some rather ``wrong'' looking code. I plan to MFC this in place of luigi's recent #ifdef BOOTP stuff that was committed to this file in -stable, as that's slightly more wrong that this is.
Offered for review on: -arch, -net MFC after: 2 weeks
|
94304 |
09-Apr-2002 |
jhb |
Change the first argument of prison_xinpcb() to be a thread pointer instead of a proc pointer so that prison_xinpcb() can use td_ucred.
|
94291 |
09-Apr-2002 |
silby |
Update comments to reflect the recent ephemeral port range change.
Noticed by: ru MFC After: 1 day
|
93904 |
05-Apr-2002 |
mdodd |
Retire this copy; it now lives in sys/net/fddi.h.
|
93818 |
04-Apr-2002 |
jhb |
Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
|
93593 |
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
93514 |
01-Apr-2002 |
mike |
o Implement <sys/_types.h>, a new header for storing types that are MI, not required to be a fixed size, and used in multiple headers. This will grow in time, as more things move here from <sys/types.h> and <machine/ansi.h>. o Add missing type definitions (uint16_t and uint32_t) to <arpa/inet.h> and <netinet/in.h>. o Reduce pollution in <sys/types.h> by using `#if _FOO_T_DECLARED' widgets to avoid including <sys/stdint.h>. o Add some missing type definitions to <unistd.h> and note the ones that still need to be added. o Make use of <sys/_types.h> primitives in <grp.h> and <sys/types.h>.
Reviewed by: bde
|
93085 |
24-Mar-2002 |
bde |
Fixed some style bugs in the removal of __P(()). Continuation lines were not outdented to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting.
|
92976 |
22-Mar-2002 |
rwatson |
Merge from TrustedBSD MAC branch:
Move the network code from using cr_cansee() to check whether a socket is visible to a requesting credential to using a new function, cr_canseesocket(), which accepts a subject credential and object socket. Implement cr_canseesocket() so that it does a prison check, a uid check, and add a comment where shortly a MAC hook will go. This will allow MAC policies to seperately instrument the visibility of sockets from the visibility of processes.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
92960 |
22-Mar-2002 |
ru |
Prevent icmp_reflect() from calling ip_output() with a NULL route pointer which will then result in the allocated route's reference count never being decremented. Just flood ping the localhost and watch refcnt of the 127.0.0.1 route with netstat(1).
Submitted by: jayanth
Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed ip_output() to be called with a NULL route pointer. The previous paragraph shows why this was a bad idea in the first place.
MFC after: 0 days
|
92926 |
22-Mar-2002 |
silby |
Change the ephemeral port range from 1024-5000 to 49152-65535. This increases the number of concurrent outgoing connections from ~4000 to ~16000. Other OSes (Solaris, OS X, NetBSD) and many other NAT products have already made this change without ill effects, so we should not run into any problems.
MFC after: 1 week
|
92802 |
20-Mar-2002 |
orion |
Send periodic ARP requests when ARP entries for hosts we are sending to are about to expire. This prevents high packet rate flows from experiencing packet drops at the sender following ARP cache entry timeout.
PR: kern/25517 Reviewed by: luigi MFC after: 7 days
|
92760 |
20-Mar-2002 |
jeff |
Switch vm_zone.h with uma.h. Change over to uma interfaces.
|
92723 |
19-Mar-2002 |
alfred |
Remove __P.
|
92654 |
19-Mar-2002 |
jeff |
This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator.
Reviewed by: arch@
|
92275 |
14-Mar-2002 |
rwatson |
NAI DBA update
|
91984 |
10-Mar-2002 |
mike |
o Add INET_ADDRSTRLEN and INET6_ADDRSTRLEN defines to <arpa/inet.h> for POSIX.1-2001 conformance. o Add magic to <netinet/in.h> and <netinet6/in6.h> to prevent redefining INET_ADDRSTRLEN and INET6_ADDRSTRLEN. o Add a note about missing typedefs in <arpa/inet.h>.
|
91959 |
09-Mar-2002 |
mike |
o Don't require long long support in bswap64() functions. o In i386's <machine/endian.h>, macros have some advantages over inlines, so change some inlines to macros. o In i386's <machine/endian.h>, ungarbage collect word_swap_int() (previously __uint16_swap_uint32), it has some uses on i386's with PDP endianness.
Submitted by: bde
o Move a comment up in <machine/endian.h> that was accidentially moved down a few revisions ago. o Reenable userland's use of optimized inline-asm versions of byteorder(3) functions. o Fix ordering of prototypes vs. redefinition of byteorder(3) functions, so that the non-GCC (libc asm) case has proper prototypes. o Add proper prototypes for byteorder(3) functions in <sys/param.h>. o Prevent redundant duplicate prototypes by making use of the _BYTEORDER_PROTOTYPED define. o Move the bswap16(), bswap32(), bswap64() C functions into MD space for platforms in which asm versions don't exist. This significantly reduces the complexity of some things at the cost of duplicate code.
Reviewed by: bde
|
91492 |
28-Feb-2002 |
ume |
- Set inc_isipv6 in tcp6_usr_connect(). - When making a pcb from a sync cache, do not forget to copy inc_isipv6.
Obtained from: KAME MFC After: 1 week
|
91406 |
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
91374 |
27-Feb-2002 |
cjc |
Change the wording of the inline comments from the previous commit.
Objection from: ru
|
91357 |
27-Feb-2002 |
alfred |
More IPV6 const fixes.
|
91354 |
27-Feb-2002 |
dd |
Introduce a version field to `struct xucred' in place of one of the spares (the size of the field was changed from u_short to u_int to reflect what it really ends up being). Accordingly, change users of xucred to set and check this field as appropriate. In the kernel, this is being done inside the new cru2x() routine which takes a `struct ucred' and fills out a `struct xucred' according to the former. This also has the pleasant sideaffect of removing some duplicate code.
Reviewed by: rwatson
|
91324 |
26-Feb-2002 |
brooks |
Staticize an extern that no one else used.
|
91271 |
26-Feb-2002 |
jedgar |
Enforce inbound IPsec SPD
Reviewed by: fenner
|
91236 |
25-Feb-2002 |
alfred |
Document what inpcb->inp_vflag is for.
Submitted by: Marco Molteni <molter@tin.it>
|
91234 |
25-Feb-2002 |
cjc |
The TCP code did not do sufficient checks on whether incoming packets were destined for a broadcast IP address. All TCP packets with a broadcast destination must be ignored. The system only ignored packets that were _link-layer_ broadcasts or multicast. We need to check the IP address too since it is quite possible for a broadcast IP address to come in with a unicast link-layer address.
Note that the check existed prior to CSRG revision 7.35, but was removed. This commit effectively backs out that nine-year-old change.
PR: misc/35022
|
90988 |
20-Feb-2002 |
luigi |
BUGFIX: make use of the pointer to the target of skipto rules, so that after the first time we can follow the pointer instead of having to scan the list. This was the intended behaviour from day one.
PR: 34639 MFC-after: 3 days
|
90982 |
20-Feb-2002 |
jlemon |
When expanding a syncache entry into a socket, inherit the socket options from the current listen socket instead of the cached (and possibly stale) TCB pointer.
|
90868 |
18-Feb-2002 |
mike |
o Move NTOHL() and associated macros into <sys/param.h>. These are deprecated in favor of the POSIX-defined lowercase variants. o Change all occurrences of NTOHL() and associated marcros in the source tree to use the lowercase function variants. o Add missing license bits to sparc64's <machine/endian.h>. Approved by: jake o Clean up <machine/endian.h> files. o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>. o Remove prototypes for non-existent bswapXX() functions. o Include <machine/endian.h> in <arpa/inet.h> to define the POSIX-required ntohl() family of functions. o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>, and <sys/param.h>. o Prepend underscores to the ntohl() family to help deal with complexities associated with having MD (asm and inline) versions, and having to prevent exposure of these functions in other headers that happen to make use of endian-specific defines. o Create weak aliases to the canonical function name to help deal with third-party software forgetting to include an appropriate header. o Remove some now unneeded pollution from <sys/types.h>. o Add missing <arpa/inet.h> includes in userland.
Tested on: alpha, i386 Reviewed by: bde, jake, tmm
|
90698 |
15-Feb-2002 |
ru |
Moved the 127/8 check below so that IPF redirects have a chance of working.
MFC after: 1 day
|
90556 |
12-Feb-2002 |
jlemon |
When a duplicate SYN arrives which matches an entry in the syncache, update our lazy reference to the inpcb structure, as it may have changed.
Found by: dima
|
90493 |
10-Feb-2002 |
dd |
Silence unused variable warning in the !KLD_MODULE case.
Submitted by: archie
|
90361 |
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
90198 |
04-Feb-2002 |
ume |
In tcp_respond(), correctly reset returned IPv6 header. This is essential when the original packet contains an IPv6 extension header.
Obtained from: KAME MFC after: 1 week
|
90137 |
03-Feb-2002 |
markm |
WARNS=n and lint(1) silencer. Declare an array of (const) strings as const char.
|
89809 |
26-Jan-2002 |
cjc |
The ipfw(8) 'tee' action simply hasn't worked on incoming packets for some time. _All_ packets, regardless of destination, were accepted by the machine as if addressed to it.
Jump back to 'pass' processing for a teed packet instead of falling through as if it was ours.
PR: kern/31130 Reviewed by: -net, luigi MFC after: 2 weeks
|
89667 |
22-Jan-2002 |
jlemon |
The ENDPTS_EQ macro was comparing the one of the fports to itself. Fix.
Submitted by: emy@boostworks.com
|
89624 |
21-Jan-2002 |
ume |
- Check the address family of the destination cached in a PCB. - Clear the cached destination before getting another cached route. Otherwise, garbage in the padding space (which might be filled in if it was used for IPv4) could annoy rtalloc.
Obtained from: KAME
|
89614 |
21-Jan-2002 |
ru |
RFC1122 requires that addresses of the form { 127, <any> } MUST NOT appear outside a host.
PR: 30792, 33996 Obtained from: ip_input.c MFC after: 1 week
|
89253 |
11-Jan-2002 |
ru |
Fix a panic condition in icmp_reflect() introduced in rev. 1.61. (We should be able to handle locally originated IP packets, and these do not have m_pkthdr.rcvif set.)
PR: kern/32806, kern/33766 Reviewed by: luigi Fix tested by: Maxim Konovalov <maxim@macomnet.ru>, Erwin Lansing <erwin@lansing.dk>
|
89069 |
08-Jan-2002 |
msmith |
Initialise the intrq_present fields at runtime, not link time. This allows us to load protocols at runtime, and avoids the use of common variables.
Also fix the ip6_intrq assignment so that it works at all.
|
88991 |
07-Jan-2002 |
cjc |
Fix a missing "ipfw:" in a syslog message.
MFC after: 1 day
|
88931 |
05-Jan-2002 |
fenner |
Pre-calculate the checksum for multicast packets sourced on a multicast router. This is overkill; it should be possible to delay to hardware interfaces and only pre-calculate when forwarding to a tunnel.
|
88884 |
04-Jan-2002 |
rwatson |
o Spelling fix in comment: tcp_ouput -> tcp_output
|
88665 |
29-Dec-2001 |
yar |
Don't reveal a router in the IPSTEALTH mode through IP options. The following steps are involved: a) the IP options related to routing (LSRR and SSRR) are processed as though the router were a host, b) the other IP options are processed as usual only if the packet is destined for the router; otherwise they are ignored.
PR: kern/23123 Discussed in: freebsd-hackers
|
88593 |
28-Dec-2001 |
julian |
Fix ipfw fwd so that it acts as the docs say when forwarding an incoming packet to another machine.
Obtained from: Vicor Production tree MFC after: 3 weeks
|
88359 |
21-Dec-2001 |
yar |
Implement matching IP precedence in ipfw(4).
Submitted by: Igor Timkin <ivt@gamma.ru>
|
88331 |
21-Dec-2001 |
jlemon |
Remove a change that snuck in from my private tree.
|
88330 |
21-Dec-2001 |
jlemon |
If syncookies are disabled (net.inet.tcp.syncookies) then use the faster arc4random() routine to generate ISNs instead of creating them with MD5().
Suggested by: silby
|
88195 |
19-Dec-2001 |
jlemon |
When storing an int value in a void *, use intptr_t as the cast type (instead of int) to keep the 64 bit platforms happy.
|
88190 |
19-Dec-2001 |
yar |
Don't try to free a NULL route when doing IPFIREWALL_FORWARD. An old route will be NULL at that point if a packet were initially routed to an interface (using the IP_ROUTETOIF flag.)
Submitted by: Igor Timkin <ivt@gamma.ru>
|
88180 |
19-Dec-2001 |
jlemon |
Extend the SYN DoS defense by adding syncookies to the syncache. All TCP ISNs that are sent out are valid cookies, which allows entries in the syncache to be dropped and still have the ACK accepted later. As all entries pass through the syncache, there is no sudden switchover from cache -> cookies when the cache is full; instead, syncache entries simply have a reduced lifetime. More details may be found in the "Resisting DoS attacks with a SYN cache" paper in the Usenix BSDCon 2002 conference proceedings.
Sponsored by: DARPA, NAI Labs
|
88132 |
18-Dec-2001 |
ru |
Fixed the bug in transparent TCP proxying with the "encode_ip_hdr" option -- TcpAliasOut() did not catch the IP header length change.
Submitted by: Stepachev Andrey <aka50@mail.ru>
|
87919 |
14-Dec-2001 |
rwatson |
o Add IPOPT_ESO for the 'Extended Security' IP option (RFC1108)
Obtained from: TrustedBSD Project
|
87917 |
14-Dec-2001 |
rwatson |
o Add definition for IPOPT_CIPSO, the commercial security IP option number.
Submitted by: Ilmar S. Habibulin <ilmar@watson.org> Obtained from: TrustedBSD Project
|
87916 |
14-Dec-2001 |
jlemon |
whitespace and style fixes recovered from -stable.
|
87915 |
14-Dec-2001 |
jlemon |
minor style and whitespace fixes.
|
87914 |
14-Dec-2001 |
jlemon |
whitespace fixes.
|
87913 |
14-Dec-2001 |
jlemon |
minor whitespace fixes.
|
87903 |
14-Dec-2001 |
silby |
Reduce the local network slowstart flightsize from infinity to 4 packets.
Now that we've increased the size of our send / receive buffers, bursting an entire window onto the network may cause congestion. As a result, we will slow start beginning with a flightsize of 4 packets.
Problem reported by: Thomas Zenker <thz@Lennartz-electronic.de>
MFC after: 3 days
|
87780 |
13-Dec-2001 |
jlemon |
Undo one of my last minute changes; move sc_iss up earlier so it is initialized in case we take the T/TCP path.
|
87779 |
13-Dec-2001 |
jlemon |
Fix up tabs from cut&n&paste.
|
87778 |
13-Dec-2001 |
jlemon |
Fix up tabs in comments.
|
87777 |
13-Dec-2001 |
jlemon |
Minor style fixes.
|
87776 |
13-Dec-2001 |
jlemon |
Minor style fix.
|
87599 |
10-Dec-2001 |
obrien |
Update to C99, s/__FUNCTION__/__func__/, also don't use ANSI string concatenation.
|
87499 |
07-Dec-2001 |
rwatson |
o Our currenty userland boot code (due to rc.conf and rc.network) always enables TCP keepalives using the net.inet.tcp.always_keepalive by default. Synchronize the kernel default with the userland default.
|
87410 |
05-Dec-2001 |
ru |
Fixed remotely exploitable DoS in arpresolve().
Easily exploitable by flood pinging the target host over an interface with the IFF_NOARP flag set (all you need to know is the target host's MAC address).
MFC after: 0 days
|
87275 |
03-Dec-2001 |
rwatson |
o Introduce pr_mtx into struct prison, providing protection for the mutable contents of struct prison (hostname, securelevel, refcount, pr_linux, ...) o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/ so as to enforce these protections, in particular, in kern_mib.c protection sysctl access to the hostname and securelevel, as well as kern_prot.c access to the securelevel for access control purposes. o Rewrite linux emulator abstractions for accessing per-jail linux mib entries (osname, osrelease, osversion) so that they don't return a pointer to the text in the struct linux_prison, rather, a copy to an array passed into the calls. Likewise, update linprocfs to use these primitives. o Update in_pcb.c to always use prison_getip() rather than directly accessing struct prison.
Reviewed by: jhb
|
87193 |
02-Dec-2001 |
dillon |
Fix a bug with transmitter restart after receiving a 0 window. The receiver was not sending an immediate ack with delayed acks turned on when the input buffer is drained, preventing the transmitter from restarting immediately.
Propogate the TCP_NODELAY option to accept()ed sockets. (Helps tbench and is a good idea anyway).
Some cleanup. Identify additonal issues in comments.
MFC after: 1 day
|
87167 |
01-Dec-2001 |
ru |
Allow for ip_output() to be called with a NULL route pointer. This fixes a panic I introduced yesterday in ip_icmp.c,v 1.64.
|
87158 |
01-Dec-2001 |
mike |
o Stop abusing MD headers with non-MD types. o Hide nonstandard functions and types in <netinet/in.h> when _POSIX_SOURCE is defined. o Add some missing types (required by POSIX.1-200x) to <netinet/in.h>. o Restore vendor ID from Rev 1.1 in <netinet/in.h> and make use of new __FBSDID() macro. o Fix some miscellaneous issues in <arpa/inet.h>. o Correct final argument for the inet_ntop() function (POSIX.1-200x). o Get rid of the namespace pollution from <sys/types.h> in <arpa/inet.h>.
Reviewed by: fenner Partially submitted by: bde
|
87145 |
30-Nov-2001 |
dillon |
The transmit burst limit for newreno completely breaks TCP's performance if the receive side is using delayed acks. Temporarily remove it.
MFC after: 0 days
|
87124 |
30-Nov-2001 |
brian |
During SIOCAIFADDR, if in_ifinit() fails and we've already added an interface address, blow the address away again before returning the error.
In in_ifinit(), if we get an error from rtinit() and we've also got a destination address, return the error rather than masking EEXISTS. Failing to create a host route when configuring an interface should be treated as an error.
|
87120 |
30-Nov-2001 |
ru |
- Make ip_rtaddr() global, and use it to look up the correct source address in icmp_reflect(). - Two new "struct icmpstat" members: icps_badaddr and icps_noroute.
PR: kern/31575 Obtained from: BSD/OS MFC after: 1 week
|
87003 |
27-Nov-2001 |
dd |
ipfw_modevent(): Don't use an unnatural block to define a variable (fcp) that's already defined in the outer block and isn't used anywhere else. This silences -Wunused.
Reviewed by: md5(1)
|
87002 |
27-Nov-2001 |
dd |
Remove debugging printfs that weren't conditional on any debugging options in handling MOD_{UN,}LOAD (they weren't very useful, anyway).
|
86999 |
27-Nov-2001 |
dd |
In icmp_reflect(): If the packet was not addressed to us and was received on an interface without an IP address, try to find a non-loopback AF_INET address to use. If that fails, drop it. Previously, we used the address at the top of the in_ifaddrhead list, which didn't make much sense, and would cause a panic if there were no AF_INET addresses configured on the system.
PR: 29337, 30524 Reviewed by: ru, jlemon Obtained from: NetBSD
|
86991 |
27-Nov-2001 |
rwatson |
Add include of net/route.h, as structures moved around due to the syncache rely on 'struct route' being defined. This fixes the LINT build some.
|
86958 |
27-Nov-2001 |
tanimura |
Clear a new syncache entry first, followed by filling in values. This fixes route breakage due to uncleared gabage on my box.
|
86953 |
27-Nov-2001 |
ru |
When servicing an internal FTP server, punch ipfirewall(4) holes for passive mode data connections (PASV/EPSV -> 227/229). Well, the actual punching happens a bit later, when the aliasing link becomes fully specified.
Prodded by: Danny Carroll <dannycarroll@hotmail.com> MFC after: 1 week
|
86910 |
26-Nov-2001 |
ru |
Restore the ability to use IP_FW_ADD with setsockopt(2) that got broken in revision 1.86. This broke natd(8)'s -punch_fw option.
Reported by: Daniel Rock <D.Rock@t-online.de>, setantae <setantae@submonkey.net>
|
86814 |
23-Nov-2001 |
bde |
Fixed a buffer overrun. In my kernel configuration, tcp_syncache happens to be followed by nfsnodehashtbl, so bzeroing callouts beyond the end of tcp_syncache soon caused a null pointer panic when nfsnodehashtbl was accessed.
|
86764 |
22-Nov-2001 |
jlemon |
Introduce a syncache, which enables FreeBSD to withstand a SYN flood DoS in an improved fashion over the existing code.
Reviewed by: silby (in a previous iteration) Sponsored by: DARPA, NAI Labs
|
86744 |
21-Nov-2001 |
jlemon |
Move initialization of snd_recover into tcp_sendseqinit().
|
86487 |
17-Nov-2001 |
dillon |
Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
|
86183 |
08-Nov-2001 |
rwatson |
o Replace reference to 'struct proc' with 'struct thread' in 'struct sysctl_req', which describes in-progress sysctl requests. This permits sysctl handlers to have access to the current thread, permitting work on implementing td->td_ucred, migration of suser() to using struct thread to derive the appropriate ucred, and allowing struct thread to be passed down to other code, such as network code where td is not currently available (and curproc is used).
o Note: netncp and netsmb are not updated to reflect this change, as they are not currently KSE-adapted.
Reviewed by: julian Obtained from: TrustedBSD Project
|
86117 |
06-Nov-2001 |
arr |
- Fixes non-zero'd out sin_zero field problem so that the padding is used as it is supposed to be.
Inspired by: PR #31704 Approved by: jdp Reviewed by: jhb, -net@
|
86106 |
05-Nov-2001 |
phk |
3.5 years ago Wollman wrote: "[...] and removes the hostcache code from standard kernels---the code that depends on it is not going to happen any time soon, I'm afraid." Time to clean up.
|
86047 |
04-Nov-2001 |
luigi |
MFS: sync the ipfw/dummynet/bridge code with the one recently merged into stable (mostly , but not only, formatting and comments changes).
|
86031 |
04-Nov-2001 |
luigi |
s/FREE/free/
|
85964 |
03-Nov-2001 |
brian |
cmott@scientech.com -> cm@linktel.net
Requested by: Charles Mott <cmott@scientech.com>
|
85741 |
30-Oct-2001 |
wpaul |
Fix a (long standing?) bug in ip_output(): if ip_insertoptions() is called and ip_output() encounters an error and bails (i.e. host unreachable), we will leak an mbuf. This is because the code calls m_freem(m0) after jumping to the bad: label at the end of the function, when it should be calling m_freem(m). (m0 is the original mbuf list _without_ the options mbuf prepended.)
Obtained from: NetBSD
|
85740 |
30-Oct-2001 |
des |
Make sure the netmask always has an address family. This fixes Linux ifconfig, which expects the address returned by the SIOCGIFNETMASK ioctl to have a valid sa_family. Similar changes may be necessary for IPv6.
While we're here, get rid of an unnecessary temp variable.
MFC after: 2 weeks
|
85732 |
30-Oct-2001 |
jlemon |
When dropping a packet because there is no room in the queue (which itself is somewhat bogus), update the statistics to indicate something was dropped.
PR: 13740
|
85689 |
29-Oct-2001 |
joe |
A few more style changes picked up whilst working on an MFC to -stable.
|
85687 |
29-Oct-2001 |
joe |
Fix some whitespace, and a comment that I missed in the last commit.
|
85665 |
29-Oct-2001 |
joe |
Clean up the style of this header file.
|
85658 |
29-Oct-2001 |
dillon |
fix int argument used in printf w/ %ld (cast to long)
|
85467 |
25-Oct-2001 |
jlemon |
Don't use the ip_timestamp structure to access timestamp options, as the compiler may cause an unaligned access to be generated in some cases.
PR: 30982
|
85466 |
25-Oct-2001 |
jlemon |
If we are bridging, fall back to using any inet address in the system, irrespective of receive interface, as a last resort.
Submitted by: ru
|
85465 |
25-Oct-2001 |
jlemon |
Relocate the KASSERT for a null recvif to a location where it will actually do some good.
Pointed out by: ru
|
85315 |
22-Oct-2001 |
ume |
restore the data of the ip header when extended udp header and data checksum is calculated. this caused some trouble in the code which the ip header is not modified. for example, inbound policy lookup failed.
Obtained from: KAME MFC after: 1 week
|
85223 |
20-Oct-2001 |
jlemon |
Only examine inet addresses of the interface. This was broken in r1.83, with the result that the system would reply to an ARP request of 0.0.0.0
|
85074 |
17-Oct-2001 |
ru |
Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *'' as the argument. Pass rt_addrinfo all the way down to rtrequest1 and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is using it anyways).
Benefit: the following command now works. Previously we needed two route(8) invocations, "add" then "change". # route add -inet6 default ::1 -ifp gif0
Remove unsafe typecast in rtrequest(), from ``rtentry *'' to ``sockaddr *''. It was introduced by 4.3BSD-Reno and never corrected.
Obtained from: BSD/OS, NetBSD MFC after: 1 month PR: kern/28360
|
84931 |
14-Oct-2001 |
fjoe |
bring in ARP support for variable length link level addresses
Reviewed by: jdp Approved by: jdp Obtained from: NetBSD MFC after: 6 weeks
|
84736 |
09-Oct-2001 |
rwatson |
- Combine kern.ps_showallprocs and kern.ipc.showallsockets into a single kern.security.seeotheruids_permitted, describes as: "Unprivileged processes may see subjects/objects with different real uid" NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is an API change. kern.ipc.showallsockets does not. - Check kern.security.seeotheruids_permitted in cr_cansee(). - Replace visibility calls to socheckuid() with cr_cansee() (retain the change to socheckuid() in ipfw, where it is used for rule-matching). - Remove prison_unpcb() and make use of cr_cansee() against the UNIX domain socket credential instead of comparing root vnodes for the UDS and the process. This allows multiple jails to share the same chroot() and not see each others UNIX domain sockets. - Remove unused socheckproc().
Now that cr_cansee() is used universally for socket visibility, a variety of policies are more consistently enforced, including uid-based restrictions and jail-based restrictions. This also better-supports the introduction of additional MAC models.
Reviewed by: ps, billf Obtained from: TrustedBSD Project
|
84564 |
05-Oct-2001 |
jayanth |
Add a flag TF_LASTIDLE, that forces a previously idle connection to send all its data, especially when the data is less than one MSS. This fixes an issue where the stack was delaying the sending of data, eventhough there was enough window to send all the data and the sending of data was emptying the socket buffer.
Problem found by Yoshihiro Tsuchiya (tsuchiya@flab.fujitsu.co.jp)
Submitted by: Jayanth Vijayaraghavan
|
84527 |
05-Oct-2001 |
ps |
Only allow users to see their own socket connections if kern.ipc.showallsockets is set to 0.
Submitted by: billf (with modifications by me) Inspired by: Dave McKay (aka pm aka Packet Magnet) Reviewed by: peter MFC after: 2 weeks
|
84516 |
05-Oct-2001 |
ps |
Make it so dummynet and bridge can be loaded as modules.
Submitted by: billf
|
84317 |
01-Oct-2001 |
jlemon |
in_ifinit apparently can be used to rewrite an ip address; recalculate the correct hash bucket for the entry.
Submitted by: iedowse (with some munging by me)
|
84315 |
01-Oct-2001 |
luigi |
Fix a problem with unnumbered rules introduced in latest commit. Reported by: des
|
84306 |
01-Oct-2001 |
ru |
mdoc(7) police: Use the new .In macro for #include statements.
|
84195 |
30-Sep-2001 |
dillon |
Add __FBSDID's to libalias
|
84137 |
29-Sep-2001 |
jlemon |
Nuke unused (and incorrect) #define of INADDR_HMASK.
Spotted by: ru
|
84109 |
29-Sep-2001 |
jlemon |
Make the INADDR_TO_IFP macro use the IP address hash lookup instead of walking the entire list of IP addresses.
Pointed out by: bfumerola
|
84102 |
29-Sep-2001 |
jlemon |
Add a hash table that contains the list of internet addresses, and use this in place of the in_ifaddr list when appropriate. This improves performance on hosts which have a large number of IP aliases.
|
84101 |
29-Sep-2001 |
jlemon |
Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h> Remove local definitions.
|
84058 |
27-Sep-2001 |
luigi |
Two main changes here: + implement "limit" rules, which permit to limit the number of sessions between certain host pairs (according to masks). These are a special type of stateful rules, which might be of interest in some cases. See the ipfw manpage for details.
+ merge the list pointers and ipfw rule descriptors in the kernel, so the code is smaller, faster and more readable. This patch basically consists in replacing "foo->rule->bar" with "rule->bar" all over the place. I have been willing to do this for ages!
MFC after: 1 week
|
84023 |
27-Sep-2001 |
luigi |
Remove unused (and duplicate) struct ip_opts which is never used, not referenced in Stevens, and does not compile with g++. There is an equivalent structure, struct ipoption in ip_var.h which is actually used in various parts of the kernel, and also referenced in Stevens.
Bill Fenner also says: ... if you want the trivia, struct ip_opts was introduced in in.h SCCS revision 7.9, on 6/28/1990, by Mike Karels. struct ipoption was introduced in ip_var.h SCCS revision 6.5, on 9/16/1985, by... Mike Karels.
MFC-after: 3 days
|
83994 |
26-Sep-2001 |
brooks |
Include sys/proc.h for the definition of securelevel_ge().
Submitted by: LINT
|
83970 |
26-Sep-2001 |
rwatson |
o Modify IPFW and DUMMYNET administrative setsockopt() calls to use securelevel_gt() to check the securelevel, rather than direct access to the securelevel variable.
Obtained from: TrustedBSD Project
|
83934 |
25-Sep-2001 |
brooks |
Make faith loadable, unloadable, and clonable.
|
83873 |
24-Sep-2001 |
luigi |
Fix a null pointer dereference introduced in the last commit, plus remove a useless assignment and move a comment.
Submitted by: Thomas Moestl
|
83771 |
21-Sep-2001 |
ru |
Fixed the bug that prevented communication with FTP servers behind NAT in extended passive mode if the server's public IP address was different from the main NAT address. This caused a wrong aliasing link to be created that did not route the incoming packets back to the original IP address of the server.
natd -v -n pub0 -redirect_address localFTP publicFTP
Note that even if localFTP == publicFTP, one still needs to supply the -redirect_address directive. It is needed as a helper because extended passive mode's 229 reply does not contain the IP address.
MFC after: 1 week
|
83742 |
20-Sep-2001 |
rwatson |
o Rename u_cansee() to cr_cansee(), making the name more comprehensible in the face of a rename of ucred to cred, and possibly generally.
Obtained from: TrustedBSD Project
|
83725 |
20-Sep-2001 |
luigi |
A bunch of minor changes to the code (see below) for readability, code size and speed. No new functionality added (yet) apart from a bugfix. MFC will occur in due time and probably in stages.
BUGFIX: fix a problem in old code which prevented reallocation of the hash table for dynamic rules (there is a PR on this).
OTHER CHANGES: minor changes to the internal struct for static and dynamic rules. Requires rebuild of ipfw binary.
Add comments to show how data structures are linked together. (It probably makes no sense to keep the chain pointers separate from actual rule descriptors. They will be hopefully merged soon.
keep a (sysctl-readable) counter for the number of static rules, to speed up IP_FW_GET operations
initial support for a "grace time" for expired connections, so we can set timeouts for closing connections to much shorter times.
merge zero_entry() and resetlog_entry(), they use basically the same code.
clean up and reduce replication of code for removing rules, both for readability and code size.
introduce a separate lifetime for dynamic UDP rules.
fix a problem in old code which prevented reallocation of the hash table for dynamic rules (PR ...)
restructure dynamic rule descriptors
introduce some local variables to avoid multiple dereferencing of pointer chains (reduces code size and hopefully increases speed).
|
83708 |
20-Sep-2001 |
sumikawa |
Fixed comment: ipip_input -> mroute_encapcheck.
Reported by: bde
|
83615 |
18-Sep-2001 |
sumikawa |
Removed ipip_input(). No codes calls it anymore due to ip_encap.c's encapsulation support.
|
83366 |
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
83188 |
07-Sep-2001 |
julian |
Remove some un-needed code that was accidentally included in the 2nd previous KAME patch.
Submitted by: SUMIKAWA Munechika <sumikawa@ebina.hitachi.co.jp>
|
83187 |
07-Sep-2001 |
julian |
Patches from KAME to remove usage of Varargs in existing IPV4 code. For now they will still have some in the developing stuff (IPv6)
Submitted by: Keiichi SHIMA / <keiichi@iij.ad.jp> Obtained from: KAME
|
83130 |
06-Sep-2001 |
jlemon |
Wrap array accesses in macros, which also happen to be lvalues:
ifnet_addrs[i - 1] -> ifaddr_byindex(i) ifindex2ifnet[i] -> ifnet_byindex(i)
This is intended to ease the conversion to SMPng.
|
82966 |
04-Sep-2001 |
alfred |
Fix sysctl comment field, s/the the/then the
Pointed out by: ru
|
82893 |
03-Sep-2001 |
alfred |
Allow disabling of "arp moved" messages.
Submitted by: Stephen Hurd <deuce@lordlegacy.org>
|
82892 |
03-Sep-2001 |
julian |
I really hope this is the right answer. call ip_input directly but take the offset off the packet first if it's an IPV4 packet encapsulated.
|
82891 |
03-Sep-2001 |
julian |
Call ip_input() instead of ipip_input() when decoding encapsulated ipv4 packets. (allows line to compile again)
|
82890 |
03-Sep-2001 |
julian |
One caller of rip_input failed to be converted in the last commit.
|
82884 |
03-Sep-2001 |
julian |
Patches from Keiichi SHIMA <keiichi@iij.ad.jp> to make ip use the standard protosw structure again.
Obtained from: Well, KAME I guess.
|
82529 |
29-Aug-2001 |
jayanth |
when newreno is turned on, if dupacks = 1 or dupacks = 2 and new data is acknowledged, reset the dupacks to 0. The problem was spotted when a connection had its send buffer full because the congestion window was only 1 MSS and was not being incremented because dupacks was not reset to 0.
Obtained from: Yahoo!
|
82445 |
27-Aug-2001 |
jesper |
When net.inet.tcp.icmp_may_rst is enabled, report ECONNREFUSED not ENETRESET to the application as a RST would, this way we're compatible with the most applications.
MFC candidate.
Submitted by: Scott Renfro <scott@renfro.org> Reviewed by: Mike Silbersack <silby@silby.com>
|
82345 |
26-Aug-2001 |
billf |
the IP_FW_GET code in ip_fw_ctl() sizes a buffer to hold information about rules and dynamic rules. it later fills this buffer with these rules.
it also takes the opporunity to compare the expiration of the dynamic rules with the current time and either marks them for deletion or simply charges the countdown.
unfortunatly it does this all (the sizing, the buffer copying, and the expiration GC) with no spl protection whatsoever. it was possible for the dynamic rule(s) to be ripped out from under the request before it had completed, resulting in corrupt memory dereferencing.
Reviewed by: ps MFC before: 4.4-RELEASE, hopefully.
|
82238 |
23-Aug-2001 |
dd |
Correct a typo in a comment: FIN_WAIT2 -> FIN_WAIT_2
PR: 29970 Submitted by: Joseph Mallett <jmallett@xMach.org>
|
82122 |
22-Aug-2001 |
silby |
Much delayed but now present: RFC 1948 style sequence numbers
In order to ensure security and functionality, RFC 1948 style initial sequence number generation has been implemented. Barring any major crypographic breakthroughs, this algorithm should be unbreakable. In addition, the problems with TIME_WAIT recycling which affect our currently used algorithm are not present.
Reviewed by: jesper
|
82069 |
21-Aug-2001 |
ru |
Added TFTP support.
Submitted by: Joe Clarke <marcus@marcuscom.com> MFC after: 2 weeks
|
82050 |
21-Aug-2001 |
ru |
Close the "IRC DCC" security breach reported recently on Bugtraq.
Submitted by: Makoto MATSUSHITA <matusita@jp.FreeBSD.org>
|
82001 |
20-Aug-2001 |
brian |
Make the copyright consistent.
Previously approved by: Charles Mott <cmott@scientech.com>
|
81962 |
20-Aug-2001 |
brian |
Handle snprintf() returning -1
MFC after: 2 weeks
|
81501 |
10-Aug-2001 |
julian |
Make the protoswitch definitiosn checkable in the same way that cdevsw entries have been for a long time. Discover that we now have two version sof the same structure. I will shoot one of them shortly when I figure out why someone thinks they need it. (And I can prove they don't) (netinet/ipprotosw.h should GO AWAY)
|
81251 |
07-Aug-2001 |
ru |
mdoc(7) police:
Avoid using parenthesis enclosure macros (.Pq and .Po/.Pc) with plain text. Not only this slows down the mdoc(7) processing significantly, but it also has an undesired (in this case) effect of disabling hyphenation within the entire enclosed block.
|
81127 |
04-Aug-2001 |
ume |
When running aplication joined multicast address, removing network card, and kill aplication. imo_membership[].inm_ifp refer interface pointer after removing interface. When kill aplication, release socket,and imo_membership. imo_membership use already not exist interface pointer. Then, kernel panic.
PR: 29345 Submitted by: Inoue Yuichi <inoue@nd.net.fujitsu.co.jp> Obtained from: KAME MFC after: 3 days
|
81111 |
03-Aug-2001 |
dcs |
MFS: Avoid dropping fragments in the absence of an interface address.
Noticed by: fenner Submitted by: iedowse Not committed to current by: iedowse ;-)
|
80429 |
27-Jul-2001 |
peter |
Fix a warning.
|
80428 |
27-Jul-2001 |
peter |
Patch up some style(9) stuff in tcp_new_isn()
|
80427 |
27-Jul-2001 |
peter |
s/OpemBSD/OpenBSD/
|
80406 |
26-Jul-2001 |
ume |
move ipsec security policy allocation into in_pcballoc, before making pcbs available to the outside world. otherwise, we will see inpcb without ipsec security policy attached (-> panic() in ipsec.c).
Obtained from: KAME MFC after: 3 days
|
80354 |
25-Jul-2001 |
fenner |
Somewhat modernize ip_mroute.c: - Use sysctl to export stats - Use ip_encap.c's encapsulation support - Update lkm to kld (is 6 years a record for a broken module?) - Remove some unused cruft
|
80211 |
23-Jul-2001 |
ru |
Avoid a NULL pointer derefence introduced in rev. 1.129.
Problem noticed by: bde, gcc(1) Panic caught by: mjacob Patch tested by: mjacob
|
79934 |
19-Jul-2001 |
ru |
Backout non-functional changes from revision 1.128.
Not objected to by: dcs
|
79830 |
17-Jul-2001 |
dcs |
Skip the route checking in the case of multicast packets with known interfaces.
Reviewed by: people at that channel Approved by: silence on -net
|
79821 |
17-Jul-2001 |
ru |
Backout damage to the INADDR_TO_IFP() macro in revision 1.7.
This macro was supposed to only match local IP addresses of interfaces, and all consumers of this macro assume this as well. (See IP_MULTICAST_IF and IP_ADD_MEMBERSHIP socket options in the ip(4) manpage.)
This fixes a major security breach in IPFW-based firewalls where the `me' keyword would match the other end of a P2P link.
PR: kern/28567
|
79685 |
13-Jul-2001 |
obrien |
Bump net.inet.tcp.sendspace to 32k and net.inet.tcp.recvspace to 65k. This should help us in nieve benchmark "tests".
It seems a wide number of people think 32k buffers would not cause major issues, and is in fact in use by many other OS's at this time. The receive buffers can be bumped higher as buffers are hardly used and several research papers indicate that receive buffers rarely use much space at all.
Submitted by: Leo Bicknell <bicknell@ufp.org> <20010713101107.B9559@ussenterprise.ufp.org> Agreed to in principle by: dillon (at the 32k level)
|
79531 |
10-Jul-2001 |
ru |
mdoc(7) police: removed HISTORY info from the .Os call.
|
79413 |
08-Jul-2001 |
silby |
Temporary feature: Runtime tuneable tcp initial sequence number generation scheme. Users may now select between the currently used OpenBSD algorithm and the older random positive increment method.
While the OpenBSD algorithm is more secure, it also breaks TIME_WAIT handling; this is causing trouble for an increasing number of folks.
To switch between generation schemes, one sets the sysctl net.inet.tcp.tcp_seq_genscheme. 0 = random positive increments, 1 = the OpenBSD algorithm. 1 is still the default.
Once a secure _and_ compatible algorithm is implemented, this sysctl will be removed.
Reviewed by: jlemon Tested by: numerous subscribers of -net
|
79106 |
02-Jul-2001 |
brooks |
gif(4) and stf(4) modernization:
- Remove gif dependencies from stf. - Make gif and stf into modules - Make gif cloneable.
PR: kern/27983 Reviewed by: ru, ume Obtained from: NetBSD MFC after: 1 week
|
79092 |
02-Jul-2001 |
cjc |
While in there fixing a fragment logging bug, fix it so we log fragments "right." Log fragment information tcpdump(8)-style,
Jul 1 19:38:45 bubbles /boot/kernel/kernel: ipfw: 1000 Accept ICMP:8.0 192.168.64.60 192.168.64.20 in via ep0 (frag 53113:1480@0+)
That is, instead of the old,
... Fragment = <offset/8>
Do,
... (frag <IP ID>:<data len>@<offset>[+])
PR: kern/23446 Approved by: ru MFC after: 1 week
|
78964 |
29-Jun-2001 |
ru |
Backout CSRG revision 7.22 to this file (if in_losing notices an RTF_DYNAMIC route, it got freed twice). I am not sure what was the actual problem in 1992, but the current behavior is memory leak if PCB holds a reference to a dynamically created/modified routing table entry. (rt_refcnt>0 and we don't call rtfree().)
My test bed was:
1. Set net.inet.tcp.msl to a low value (for test purposes), e.g., 5 seconds, to speed up the transition of TCP connection to a "closed" state. 2. Add a network route which causes ICMP redirect from the gateway. 3. ping(8) host H that matches this route; this creates RTF_DYNAMIC RTF_HOST route to H. (I was forced to use ICMP to cause gateway to generate ICMP host redirect, because gateway in question is a 4.2-STABLE system vulnerable to a problem that was fixed later in ip_icmp.c,v 1.39.2.6, and TCP packets with DF bit set were triggering this bug.) 4. telnet(1) to H 5. Block access to H with ipfw(8) 6. Send something in telnet(1) session; this causes EPERM, followed by an in_losing() call in a few seconds. 7. Delete ipfw(8) rule blocking access to H, and wait for TCP connection moving to a CLOSED state; PCB is freed. 8. Delete host route to H. 9. Watch with netstat(1) that `rttrash' increased. 10. Repeat steps 3-9, and watch `rttrash' increases.
PR: kern/25421 MFC after: 2 weeks
|
78886 |
27-Jun-2001 |
ru |
Fixed the brain-o in rev. 1.10: the logic check was reversed.
Reported by: Bernd Fuerwitt <bf@fuerwitt.de>
|
78805 |
26-Jun-2001 |
ru |
Bring in fix from NetBSD's revision 1.16:
Pass the correct destination address for the route-to-gateway case.
PR: kern/10607 MFC after: 2 weeks
|
78697 |
24-Jun-2001 |
dwmalone |
Allow getcred sysctl to work in jailed root processes. Processes can only do getcred calls for sockets which were created in the same jail. This should allow the ident to work in a reasonable way within jails.
PR: 28107 Approved by: des, rwatson
|
78671 |
23-Jun-2001 |
jlemon |
Replace bzero() of struct ip with explicit zeroing of structure members, which is faster.
|
78667 |
23-Jun-2001 |
ru |
Add netstat(1) knob to reset net.inet.{ip|icmp|tcp|udp|igmp}.stats. For example, ``netstat -s -p ip -z'' will show and reset IP stats.
PR: bin/17338
|
78642 |
23-Jun-2001 |
silby |
Eliminate the allocation of a tcp template structure for each connection. The information contained in a tcptemp can be reconstructed from a tcpcb when needed.
Previously, tcp templates required the allocation of one mbuf per connection. On large systems, this change should free up a large number of mbufs.
Reviewed by: bmilekic, jlemon, ru MFC after: 2 weeks
|
78539 |
21-Jun-2001 |
sumikawa |
- Renumber KAME local ICMP types and NDP options numberes beacaues they are duplicated by newly defined types/options in RFC3121 - We have no backward compatibility issue. There is no apps in our distribution which use the above types/options.
Obtained from: KAME MFC after: 2 weeks
|
78492 |
20-Jun-2001 |
ume |
made sure to use the correct sa_len for rtalloc(). sizeof(ro_dst) is not necessarily the correct one. this change would also fix the recent path MTU discovery problem for the destination of an incoming TCP connection.
Submitted by: JINMEI Tatuya <jinmei@kame.net> Obtained from: KAME MFC after: 2 weeks
|
78295 |
15-Jun-2001 |
jlemon |
Do not perform arp send/resolve on an interface marked NOARP.
PR: 25006 MFC after: 2 weeks
|
78243 |
15-Jun-2001 |
peter |
Fix a stack of KAME netinet6/in6.h warnings: 592: warning: `struct mbuf' declared inside parameter list 595: warning: `struct ifnet' declared inside parameter list
|
78064 |
11-Jun-2001 |
ume |
Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge.
TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT.
Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
|
77969 |
10-Jun-2001 |
jesper |
Make the default value of net.inet.ip.maxfragpackets and net.inet6.ip6.maxfragpackets dependent on nmbclusters, defaulting to nmbclusters / 4
Reviewed by: bde MFC after: 1 week
|
77900 |
08-Jun-2001 |
peter |
"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.
|
77859 |
07-Jun-2001 |
jlemon |
Move IPFilter into contrib.
|
77853 |
07-Jun-2001 |
peter |
Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(
|
77843 |
06-Jun-2001 |
peter |
Make the TUNABLE_*() macros look and behave more consistantly like the SYSCTL_*() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.
|
77830 |
06-Jun-2001 |
jesper |
Silby's take one on increasing FreeBSD's resistance to SYN floods:
One way we can reduce the amount of traffic we send in response to a SYN flood is to eliminate the RST we send when removing a connection from the listen queue. Since we are being flooded, we can assume that the majority of connections in the queue are bogus. Our RST is unwanted by these hosts, just as our SYN-ACK was. Genuine connection attempts will result in hosts responding to our SYN-ACK with an ACK packet. We will automatically return a RST response to their ACK when it gets to us if the connection has been dropped, so the early RST doesn't serve the genuine class of connections much. In summary, we can reduce the number of packets we send by a factor of two without any loss in functionality by ensuring that RST packets are not sent when dropping a connection from the listen queue.
Submitted by: Mike Silbersack <silby@silby.com> Reviewed by: jesper MFC after: 2 weeks
|
77701 |
04-Jun-2001 |
brian |
Add BSD-style copyright headers
Approved by: Charles Mott <cmott@scientech.com>
|
77696 |
04-Jun-2001 |
brian |
Change to a standard BSD-style copyright
Approved by: Atsushi Murai <amurai@spec.co.jp>
|
77665 |
03-Jun-2001 |
jesper |
Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target (with different IPv4 identification field - ip_id), may be able to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we prevent this situation.
This upper limit is controlled by the new sysctl net.inet.ip.maxfragpackets which defaults to 200, as the IPv6 case, this should be sufficient for most systmes, but you might want to increase it if you have lots of TCP sessions. I'm working on making the default value dependent on nmbclusters.
If you want old behaviour (no upper limit) set this sysctl to a negative value.
If you don't want to accept any fragments (not recommended) set the sysctl to 0 (zero).
Obtained from: NetBSD MFC after: 1 week
|
77574 |
01-Jun-2001 |
kris |
Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets. This closes a minor information leak which allows a remote observer to determine the rate at which the machine is generating packets, since the default behaviour is to increment a counter for each packet sent.
Reviewed by: -net Obtained from: OpenBSD
|
77572 |
01-Jun-2001 |
obrien |
Back out jesper's 2001/05/31 14:58:11 PDT commit. It does not compile.
|
77545 |
31-May-2001 |
jesper |
Prevent denial of service using bogus fragmented IPv4 packets.
A attacker sending a lot of bogus fragmented packets to the target (with different IPv4 identification field - ip_id), may be able to put the target machine into mbuf starvation state.
By setting a upper limit on the number of reassembly queues we prevent this situation.
This upper limit is controlled by the new sysctl net.inet.ip.maxfragpackets which defaults to NMBCLUSTERS/4
If you want old behaviour (no upper limit) set this sysctl to a negative value.
If you don't want to accept any fragments (not recommended) set the sysctl to 0 (zero)
Obtained from: NetBSD (partially) MFC after: 1 week
|
77539 |
31-May-2001 |
jesper |
Disable rfc1323 and rfc1644 TCP extensions if we havn't got any response to our third SYN to work-around some broken terminal servers (most of which have hopefully been retired) that have bad VJ header compression code which trashes TCP segments containing unknown-to-them TCP options.
PR: kern/1689 Submitted by: jesper Reviewed by: wollman MFC after: 2 weeks
|
77485 |
30-May-2001 |
ru |
Add an integer field to keep protocol-specific flags with links.
For FTP control connection, keep the CRLF end-of-line termination status in there.
Fixed the bug when the first FTP command in a session was ignored.
PR: 24048 MFC after: 1 week
|
77427 |
29-May-2001 |
jesper |
Inline TCP_REASS() in the single location where it's used, just as OpenBSD and NetBSD has done.
No functional difference.
MFC after: 2 weeks
|
77421 |
29-May-2001 |
jesper |
properly delay acks in half-closed TCP connections
PR: 24962 Submitted by: Tony Finch <dot@dotat.at> MFC after: 2 weeks
|
76469 |
11-May-2001 |
ru |
In in_ifadown(), differentiate between whether the interface goes down or interface address is deleted. Only delete static routes in the latter case.
Reported by: Alexander Leidinger <Alexander@leidinger.net>
|
76166 |
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
75733 |
20-Apr-2001 |
jesper |
Say goodbye to TCP_COMPAT_42
Reviewed by: wollman Requested by: wollman
|
75619 |
17-Apr-2001 |
kris |
Randomize the TCP initial sequence numbers more thoroughly.
Obtained from: OpenBSD Reviewed by: jesper, peter, -developers
|
75262 |
06-Apr-2001 |
darrenr |
fix security hole created by fragment cache
|
75255 |
06-Apr-2001 |
billf |
pipe/queue are the only consumers of flow_id, so only set it in those cases
|
74937 |
28-Mar-2001 |
jesper |
MFC candidate.
Change code from PRC_UNREACH_ADMIN_PROHIB to PRC_UNREACH_PORT for ICMP_UNREACH_PROTOCOL and ICMP_UNREACH_PORT
And let TCP treat PRC_UNREACH_PORT like PRC_UNREACH_ADMIN_PROHIB
This should fix the case where port unreachables for udp returned ENETRESET instead of ECONNREFUSED
Problem found by: Bill Fenner <fenner@research.att.com> Reviewed by: jlemon
|
74870 |
27-Mar-2001 |
ru |
MAN[1-9] -> MAN.
|
74851 |
27-Mar-2001 |
yar |
Add a missing m_pullup() before a mtod() in in_arpinput().
PR: kern/22177 Reviewed by: wollman
|
74839 |
27-Mar-2001 |
simokawa |
Replace dyn_fin_lifetime with dyn_ack_lifetime for half-closed state. Half-closed state could last long for some connections and fin_lifetime (default 20sec) is too short for that.
OK'ed by: luigi
|
74810 |
26-Mar-2001 |
phk |
Send the remains (such as I have located) of "block major numbers" to the bit-bucket.
|
74778 |
25-Mar-2001 |
brian |
Make header files conform to style(9).
Reviewed by (*): bde
(*) alias_local.h only got a cursory glance.
|
74768 |
25-Mar-2001 |
brian |
Remove an extraneous declaration.
|
74700 |
23-Mar-2001 |
ume |
IPv4 address is not unsigned int. This change introduces in_addr_t.
PR: 9982 Adviced by: des Reviewed by: -alpha and -net (no objection) Obtained from: OpenBSD
|
74651 |
22-Mar-2001 |
brian |
Remove (non-protected) variable names from function prototypes.
|
74551 |
21-Mar-2001 |
paul |
Only flush rules that have a rule number above that set by a new sysctl, net.inet.ip.fw.permanent_rules.
This allows you to install rules that are persistent across flushes, which is very useful if you want a default set of rules that maintains your access to remote machines while you're reconfiguring the other rules.
Reviewed by: Mark Murray <markm@FreeBSD.org>
|
74494 |
19-Mar-2001 |
des |
Axe TCP_RESTRICT_RST. It was never a particularly good idea except for a few very specific scenarios, and now that we have had net.inet.tcp.blackhole for quite some time there is really no reason to use it any more.
(last of three commits)
|
74454 |
19-Mar-2001 |
ru |
Invalidate cached forwarding route (ipforward_rt) whenever a new route is added to the routing table, otherwise we may end up using the wrong route when forwarding.
PR: kern/10778 Reviewed by: silence on -net
|
74415 |
18-Mar-2001 |
ru |
Make sure the cached forwarding route (ipforward_rt) is still up before using it. Not checking this may have caused the wrong IP address to be used when processing certain IP options (see example below). This also caused the wrong route to be passed to ip_output() when forwarding, but fortunately ip_output() is smart enough to detect this.
This example demonstrates the wrong behavior of the Record Route option observed with this bug. Host ``freebsd'' is acting as the gateway for the ``sysv''.
1. On the gateway, we add the route to the destination. The new route will use the primary address of the loopback interface, 127.0.0.1:
: freebsd# route add 10.0.0.66 -iface lo0 -reject : add host 10.0.0.66: gateway lo0
2. From the client, we ping the destination. We see the correct replies. Please note that this also causes the relevant route on the ``freebsd'' gateway to be cached in ipforward_rt variable:
: sysv# ping -snv 10.0.0.66 : PING 10.0.0.66: 56 data bytes : ICMP Host Unreachable from gateway 192.168.0.115 : ICMP Host Unreachable from gateway 192.168.0.115 : ICMP Host Unreachable from gateway 192.168.0.115 : : ----10.0.0.66 PING Statistics---- : 3 packets transmitted, 0 packets received, 100% packet loss
3. On the gateway, we delete the route to the destination, thus making the destination reachable through the `default' route:
: freebsd# route delete 10.0.0.66 : delete host 10.0.0.66
4. From the client, we ping destination again, now with the RR option turned on. The surprise here is the 127.0.0.1 in the first reply. This is caused by the bug in ip_rtaddr() not checking the cached route is still up befor use. The debug code also shows that the wrong (down) route is further passed to ip_output(). The latter detects that the route is down, and replaces the bogus route with the valid one, so we see the correct replies (192.168.0.115) on further probes:
: sysv# ping -snRv 10.0.0.66 : PING 10.0.0.66: 56 data bytes : 64 bytes from 10.0.0.66: icmp_seq=0. time=10. ms : IP options: <record route> 127.0.0.1, 10.0.0.65, 10.0.0.66, : 192.168.0.65, 192.168.0.115, 192.168.0.120, : 0.0.0.0(Current), 0.0.0.0, 0.0.0.0 : 64 bytes from 10.0.0.66: icmp_seq=1. time=0. ms : IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66, : 192.168.0.65, 192.168.0.115, 192.168.0.120, : 0.0.0.0(Current), 0.0.0.0, 0.0.0.0 : 64 bytes from 10.0.0.66: icmp_seq=2. time=0. ms : IP options: <record route> 192.168.0.115, 10.0.0.65, 10.0.0.66, : 192.168.0.65, 192.168.0.115, 192.168.0.120, : 0.0.0.0(Current), 0.0.0.0, 0.0.0.0 : : ----10.0.0.66 PING Statistics---- : 3 packets transmitted, 3 packets received, 0% packet loss : round-trip (ms) min/avg/max = 0/3/10
|
74362 |
16-Mar-2001 |
phk |
<sys/queue.h> makeover.
|
74361 |
16-Mar-2001 |
phk |
Fix a style(9) nit.
|
74299 |
15-Mar-2001 |
ru |
net/route.c:
A route generated from an RTF_CLONING route had the RTF_WASCLONED flag set but did not have a reference to the parent route, as documented in the rtentry(9) manpage. This prevented such routes from being deleted when their parent route is deleted.
Now, for example, if you delete an IP address from a network interface, all ARP entries that were cloned from this interface route are flushed.
This also has an impact on netstat(1) output. Previously, dynamically created ARP cache entries (RTF_STATIC flag is unset) were displayed as part of the routing table display (-r). Now, they are only printed if the -a option is given.
netinet/in.c, netinet/in_rmx.c:
When address is removed from an interface, also delete all routes that point to this interface and address. Previously, for example, if you changed the address on an interface, outgoing IP datagrams might still use the old address. The only solution was to delete and re-add some routes. (The problem is easily observed with the route(8) command.)
Note, that if the socket was already bound to the local address before this address is removed, new datagrams generated from this socket will still be sent from the old address.
PR: kern/20785, kern/21914 Reviewed by: wollman (the idea)
|
74213 |
13-Mar-2001 |
ru |
RFC768 (UDP) requires that "if the computed checksum is zero, it is transmitted as all ones". This got broken after introduction of delayed checksums as follows. Some guys (including Jonathan) think that it is allowed to transmit all ones in place of a zero checksum for TCP the same way as for UDP. (The discussion still takes place on -net.) Thus, the 0 -> 0xffff checksum fixup was first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65) to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18, INVERT expression). Besides that I disagree that it is valid for TCP, there was no real problem until in_cksum.c,v 1.20, where the in_cksum() was made just a special version of in_cksum_skip(). The side effect was that now every incoming IP datagram failed to pass the checksum test (in_cksum() returned 0xffff when it should actually return zero). It was fixed next day in revision 1.21, by removing the INVERT expression. The latter also broke the 0 -> 0xffff fixup for UDP checksums.
Before this change: : tcpdump: listening on lo0 : 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1) : 4500 001c 0001 0000 4011 7cce 7f00 0001 : 7f00 0001 80ed 80ee 0008 0000
After this change: : tcpdump: listening on lo0 : 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1) : 4500 001c 0001 0000 4011 7cce 7f00 0001 : 7f00 0001 80ed 80ee 0008 ffff
|
74209 |
13-Mar-2001 |
ru |
Count and show incoming UDP datagrams with no checksum.
|
74183 |
12-Mar-2001 |
phk |
Correctly cleanup in case of failure to bind a pcb.
PR: 25751 Submitted by: <unicorn@Forest.Od.UA>
|
74134 |
12-Mar-2001 |
jlemon |
Unbreak LINT.
Pointed out by: phk
|
74111 |
11-Mar-2001 |
iedowse |
In ip_output(), initialise `ia' in the case where the packet has come from a dummynet pipe. Without this, the code which increments the per-ifaddr stats can dereference an uninitialised pointer. This should make dummynet usable again.
Reported by: "Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua> Reviewed by: luigi, joe
|
74024 |
09-Mar-2001 |
ru |
Make it possible to use IP_TTL and IP_TOS setsockopt(2) options on certain types of SOCK_RAW sockets. Also, use the ip.ttl MIB variable instead of MAXTTL constant as the default time-to-live value for outgoing IP packets all over the place, as we already do this for TCP and UDP.
Reviewed by: wollman
|
74018 |
09-Mar-2001 |
jlemon |
Push the test for a disconnected socket when accept()ing down to the protocol layer. Not all protocols behave identically. This fixes the brokenness observed with unix-domain sockets (and postfix)
|
74017 |
09-Mar-2001 |
jlemon |
The TCP sequence number used for sending a RST with the ipfw reset rule is already in host byte order, so do not swap it again.
Reviewed by: bfumerola
|
73996 |
08-Mar-2001 |
iedowse |
It was possible for ip_forward() to supply to icmp_error() an IP header with ip_len in network byte order. For certain values of ip_len, this could cause icmp_error() to write beyond the end of an mbuf, causing mbuf free-list corruption. This problem was observed during generation of ICMP redirects.
We now make quite sure that the copy of the IP header kept for icmp_error() is stored in a non-shared mbuf header so that it will not be modified by ip_output().
Also: - Calculate the correct number of bytes that need to be retained for icmp_error(), instead of assuming that 64 is enough (it's not). - In icmp_error(), use m_copydata instead of bcopy() to copy from the supplied mbuf chain, in case the first 8 bytes of IP payload are not stored directly after the IP header. - Sanity-check ip_len in icmp_error(), and panic if it is less than sizeof(struct ip). Incoming packets with bad ip_len values are discarded in ip_input(), so this should only be triggered by bugs in the code, not by bad packets.
This patch results from code and suggestions from Ruslan, Bosko, Jonathan Lemon and Matt Dillon, with important testing by Mike Tancsa, who could reproduce this problem at will.
Reported by: Mike Tancsa <mike@sentex.net> Reviewed by: ru, bmilekic, jlemon, dillon
|
73791 |
05-Mar-2001 |
truckman |
Modify the comments to more closely resemble the English language.
|
73626 |
05-Mar-2001 |
truckman |
Move the loopback net check closer to the beginning of ip_input() so that it doesn't block packets whose destination address has been translated to the loopback net by ipnat.
Add warning comments about the ip_checkinterface feature.
|
73540 |
04-Mar-2001 |
bmilekic |
During a flood, we don't call rtfree(), but we remove the entry ourselves. However, if the RTF_DELCLONE and RTF_WASCLONED condition passes, but the ref count is > 1, we won't decrement the count at all. This could lead to route entries never being deleted.
Here, we call rtfree() not only if the initial two conditions fail, but also if the ref count is > 1 (and we therefore don't immediately delete the route, but let rtfree() handle it).
This is an urgent MFC candidate. Thanks go to Mike Silbersack for the fix, once again. :-)
Submitted by: Mike Silbersack <silby@silby.com>
|
73402 |
04-Mar-2001 |
truckman |
Disable interface checking for packets subject to "ipfw fwd".
Chris Johnson <cjohnson@palomine.net> tested this fix in -stable.
|
73399 |
04-Mar-2001 |
truckman |
Disable interface checking when IP forwarding is engaged so that packets addressed to the interface on the other side of the box follow their historical path.
Explicitly block packets sent to the loopback network sent from the outside, which is consistent with the behavior of the forwarding path between interfaces as implemented in in_canforward().
Always check the arrival interface when matching the packet destination against the interface broadcast addresses. This bug allowed TCP connections to be made to the broadcast address of an interface on the far side of the system because the M_BCAST flag was not set because the packet was unicast to the interface on the near side. This was broken when the directed broadcast code was removed from revision 1.32. If the directed broadcast code was stil present, the destination would not have been recognized as local until the packet was forwarded to the output interface and ether_output() looped a copy back to ip_input() with M_BCAST set and the receive interface set to the output interface.
Optimize the order of the tests.
Reviewed by: jlemon
|
73357 |
02-Mar-2001 |
jlemon |
Add a new sysctl net.inet.ip.check_interface, which will verify that an incoming packet arrivees on an interface that has an address matching the packet's address. This is turned on by default.
|
73217 |
28-Feb-2001 |
phk |
Fix jails.
|
73172 |
27-Feb-2001 |
jlemon |
When iterating over our list of interface addresses in order to determine if an arriving packet belongs to us, also check that the packet arrived through the correct interface. Skip this check if the packet was locally generated.
|
73142 |
27-Feb-2001 |
billf |
The TCP header-specific section suffered a little bit of bitrot recently:
When we recieve a fragmented TCP packet (other than the first) we can't extract header information (we don't have state to reference). In a rather unelegant fashion we just move on and assume a non-match.
Recent additions to the TCP header-specific section of the code neglected to add the logic to the fragment code so in those cases the match was assumed to be positive and those parts of the rule (which should have resulted in a non-match/continue) were instead skipped (which means the processing of the rule continued even though it had already not matched).
Fault can be spread out over Rich Steenbergen (tcpoptions) and myself (tcp{seq,ack,win}).
rwatson sent me a patch that got me thinking about this whole situation (but what I'm committing / this description is mine so don't blame him).
|
73110 |
26-Feb-2001 |
jlemon |
Use more aggressive retransmit timeouts for the initial SYN packet. As we currently drop the connection after 4 retransmits + 2 ICMP errors, this allows initial connection attempts to be dropped much faster.
|
73109 |
26-Feb-2001 |
jlemon |
Remove in_pcbnotify and use in_pcblookup_hash to find the cb directly.
For TCP, verify that the sequence number in the ICMP packet falls within the tcp receive window before performing any actions indicated by the icmp packet.
Clean up some layering violations (access to tcp internals from in_pcb)
|
73103 |
26-Feb-2001 |
asmodai |
Remove struct full_tcpiphdr{}.
This piece of code has not been referenced since it was put there in 1995. Also done a codebased search on popular networking libraries and third-party applications. This is an orphan.
Reviewed by: jesper
|
73102 |
26-Feb-2001 |
asmodai |
Remove conditionals for vax support. People who care much about this are welcomed to try 2.11BSD. :)
Noticed by: luigi Reviewed by: jesper
|
73036 |
25-Feb-2001 |
jesper |
Remove tcp_drop_all_states, which is unneeded after jlemon removed it from tcp_subr.c in rev 1.92
|
73031 |
25-Feb-2001 |
jlemon |
Do not delay a new ack if there already is a delayed ack pending on the connection, but send it immediately. Prior to this change, it was possible to delay a delayed-ack for multiple times, resulting in degraded TCP behavior in certain corner cases.
|
72960 |
23-Feb-2001 |
jlemon |
When converting soft error into a hard error, drop the connection. The error will be passed up to the user, who will close the connection, so it does not appear to make a sense to leave the connection open.
This also fixes a bug with kqueue, where the filter does not set EOF on the connection, because the connection is still open.
Also remove calls to so{rw}wakeup, as we aren't doing anything with them at the moment anyway.
Reviewed by: alfred, jesper
|
72959 |
23-Feb-2001 |
jlemon |
Allow ICMP unreachables which map into PRC_UNREACH_ADMIN_PROHIB to reset TCP connections which are in the SYN_SENT state, if the sequence number in the echoed ICMP reply is correct. This behavior can be controlled by the sysctl net.inet.tcp.icmp_may_rst.
Currently, only subtypes 2,3,10,11,12 are treated as such (port, protocol and administrative unreachables).
Assocaiate an error code with these resets which is reported to the user application: ENETRESET.
Disallow resetting TCP sessions which are not in a SYN_SENT state.
Reviewed by: jesper, -net
|
72922 |
22-Feb-2001 |
jesper |
Redo the security update done in rev 1.54 of src/sys/netinet/tcp_subr.c and 1.84 of src/sys/netinet/udp_usrreq.c
The changes broken down:
- remove 0 as a wildcard for addresses and port numbers in src/sys/netinet/in_pcb.c:in_pcbnotify() - add src/sys/netinet/in_pcb.c:in_pcbnotifyall() used to notify all sessions with the specific remote address. - change - src/sys/netinet/udp_usrreq.c:udp_ctlinput() - src/sys/netinet/tcp_subr.c:tcp_ctlinput() to use in_pcbnotifyall() to notify multiple sessions, instead of using in_pcbnotify() with 0 as src address and as port numbers. - remove check for src port == 0 in - src/sys/netinet/tcp_subr.c:tcp_ctlinput() - src/sys/netinet/udp_usrreq.c:udp_ctlinput() as they are no longer needed. - move handling of redirects and host dead from in_pcbnotify() to udp_ctlinput() and tcp_ctlinput(), so they will call in_pcbnotifyall() to notify all sessions with the specific remote address.
Approved by: jlemon Inspired by: NetBSD
|
72803 |
21-Feb-2001 |
jesper |
Backout change in 1.153, as it violate rfc1122 section 3.2.1.3.
Requested by: jlemon,ru
|
72786 |
21-Feb-2001 |
rwatson |
o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use.
Notes:
o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure.
Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
|
72778 |
20-Feb-2001 |
jesper |
Only call in_pcbnotify if the src port number != 0, as we treat 0 as a wildcard in src/sys/in_pbc.c:in_pcbnotify()
It's sufficient to check for src|local port, as we'll have no sessions with src|local port == 0
Without this a attacker sending ICMP messages, where the attached IP header (+ 8 bytes) has the address and port numbers == 0, would have the ICMP message applied to all sessions.
PR: kern/25195 Submitted by: originally by jesper, reimplimented by jlemon's advice Reviewed by: jlemon Approved by: jlemon
|
72775 |
20-Feb-2001 |
jesper |
Send a ICMP unreachable instead of dropping the packet silent, if we receive a packet not for us, and forwarding disabled.
PR: kern/24512 Reviewed by: jlemon Approved by: jlemon
|
72774 |
20-Feb-2001 |
jesper |
Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Forgotten by phk, when committing fix in kern/23986
PR: kern/23986 Reviewed by: phk Approved by: phk
|
72650 |
18-Feb-2001 |
green |
Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout.
Reviewed by: bde
|
72638 |
18-Feb-2001 |
phk |
Remove unneeded loop increment in src/sys/netinet/in_pcb.c:in_pcbnotify
Add new PRC_UNREACH_ADMIN_PROHIB in sys/sys/protosw.h
Remove condition on TCP in src/sys/netinet/ip_icmp.c:icmp_input
In src/sys/netinet/ip_icmp.c:icmp_input set code = PRC_UNREACH_ADMIN_PROHIB or PRC_UNREACH_HOST for all unreachables except ICMP_UNREACH_NEEDFRAG
Rename sysctl icmp_admin_prohib_like_rst to icmp_unreach_like_rst to reflect the fact that we also react on ICMP unreachables that are not administrative prohibited. Also update the comments to reflect this.
In sys/netinet/tcp_subr.c:tcp_ctlinput add code to treat PRC_UNREACH_ADMIN_PROHIB and PRC_UNREACH_HOST different.
PR: 23986 Submitted by: Jesper Skriver <jesper@skriver.dk>
|
72631 |
18-Feb-2001 |
luigi |
remove unused data structure definition, and corresponding macro into*()
|
72526 |
15-Feb-2001 |
jlemon |
Clean up warning.
|
72486 |
14-Feb-2001 |
asmodai |
Add definitions for IPPROTO numbers 55-57.
|
72440 |
13-Feb-2001 |
phk |
Introduce a new feature in IPFW: Check of the source or destination address is configured on a interface. This is useful for routers with dynamic interfaces. It is now possible to say:
0100 allow tcp from any to any established 0200 skipto 1000 tcp from any to any 0300 allow ip from any to any 1000 allow tcp from 1.2.3.4 to me 22 1010 deny tcp from any to me 22 1020 allow tcp from any to any
and not have to worry about the behaviour if dynamic interfaces configure new IP numbers later on.
The check is semi expensive (traverses the interface address list) so it should be protected as in the above example if high performance is a requirement.
|
72357 |
11-Feb-2001 |
bmilekic |
Clean up RST ratelimiting. Previously, ratelimiting occured before tests were performed to determine if the received packet should be reset. This created erroneous ratelimiting and false alarms in some cases. The code has now been reorganized so that the checks for validity come before the call to badport_bandlim. Additionally, a few changes in the symbolic names of the bandlim types have been made, as well as a clarification of exactly which type each RST case falls under.
Submitted by: Mike Silbersack <silby@silby.com>
|
72270 |
10-Feb-2001 |
luigi |
Sync with the bridge/dummynet/ipfw code already tested in stable.
In ip_fw.[ch] change a couple of variable and field names to avoid having types, variables and fields with the same name.
|
72091 |
06-Feb-2001 |
asmodai |
Fix typo: seperate -> separate.
Seperate does not exist in the english language.
|
72084 |
06-Feb-2001 |
phk |
Convert if_multiaddrs from LIST to TAILQ so that it can be traversed backwards in the three drivers which want to do that.
Reviewed by: mikeh
|
72056 |
05-Feb-2001 |
julian |
Fix bad patch from a few days ago. It broke some bridging.
|
72012 |
04-Feb-2001 |
phk |
Another round of the <sys/queue.h> FOREACH transmogriffer.
Created with: sed(1) Reviewed by: md5(1)
|
72010 |
04-Feb-2001 |
darrenr |
fix duplicate rcsid
|
72006 |
04-Feb-2001 |
darrenr |
fix conflicts
|
71999 |
04-Feb-2001 |
phk |
Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details.
Created with: sed(1) Reviewed by: md5(1)
|
71998 |
04-Feb-2001 |
phk |
Use <sys/queue.h> macro API.
|
71963 |
03-Feb-2001 |
julian |
Make the code act the same in the case of BRIDGE being defined, but not turned on, and the case of it not being defined at all. i.e. Disabling bridging re-enables some of the checks it disables.
Submitted by: "Rogier R. Mulhuijzen" <drwilco@drwilco.net>
|
71937 |
02-Feb-2001 |
jlemon |
When turning off TCP_NOPUSH, call tcp_output to immediately flush out any data pending in the buffer.
Submitted by: Tony Finch <dot@dotat.at>
|
71909 |
02-Feb-2001 |
luigi |
MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately)
|
71796 |
29-Jan-2001 |
brian |
Add a few ``const''s to silence some -Wwrite-strings warnings
|
71763 |
29-Jan-2001 |
brian |
Ignore leading witespace in the string given to PacketAliasProxyRule().
|
71700 |
27-Jan-2001 |
luigi |
Make sure we do not follow an invalid pointer in ipfw_report when we get an incomplete packet or m_pullup fails.
|
71686 |
26-Jan-2001 |
luigi |
Minor cleanups after yesterday's patch. The code (bridging and dummynet) actually worked fine!
|
71667 |
26-Jan-2001 |
luigi |
Bring dummynet in line with the code that now works in -STABLE. It compiles, but I cannot test functionality yet.
|
71613 |
25-Jan-2001 |
luigi |
Pass up errors returned by dummynet. The same should be done with divert.
|
71594 |
24-Jan-2001 |
wollman |
Correct a comment.
|
71415 |
23-Jan-2001 |
wes |
When attempting to bind to an ephemeral port, if no such port is available, the error return should be EADDRNOTAVAIL rather than EAGAIN.
PR: 14181 Submitted by: Dima Dorfman <dima@unixfreak.org> Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
71395 |
22-Jan-2001 |
luigi |
Change critical section protection for dummynet from splnet() to splimp() -- we need it because dummynet can be invoked by the bridging code at splimp().
This should cure the pipe "stalls" that several people have been reporting on -stable while using bridging+dummynet (the problem would not affect routers using dummynet).
|
71350 |
21-Jan-2001 |
des |
First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant.
Reviewed by: bmilekic, jasone
|
71137 |
17-Jan-2001 |
luigi |
Document data structures and operation on dummynet so next time I or someone else browse through this code I do not have a hard time understanding what is going on.
|
71133 |
16-Jan-2001 |
luigi |
Some dummynet patches that I forgot to commit last summer. One of them fixes a potential panic when bridging is used and you run out of mbufs (though i have no idea if the bug has ever hit anyone).
|
70951 |
12-Jan-2001 |
bmilekic |
Prototype inet_ntoa_r and thereby silence a warning from GCC. The function is prototyped immediately under inet_ntoa, which is also from libkern.
|
70854 |
09-Jan-2001 |
rwatson |
o Minor style(9)ism to make consistent with -STABLE
|
70826 |
09-Jan-2001 |
rwatson |
o IPFW incorrectly handled filtering in the presence of previously reserved and now allocated TCP flags in incoming packets. This patch stops overloading those bits in the IP firewall rules, and moves colliding flags to a seperate field, ipflg. The IPFW userland management tool, ipfw(8), is updated to reflect this change. New TCP flags related to ECN are now included in tcp.h for reference, although we don't currently implement TCP+ECN.
o To use this fix without completely rebuilding, it is sufficient to copy ip_fw.h and tcp.h into your appropriate include directory, then rebuild the ipfw kernel module, and ipfw tool, and install both. Note that a mismatch between module and userland tool will result in incorrect installation of firewall rules that may have unexpected effects. This is an MFC candidate, following shakedown. This bug does not appear to affect ipfilter.
Reviewed by: security-officer, billf Reported by: Aragon Gouveia <aragon@phat.za.net>
|
70699 |
06-Jan-2001 |
alfred |
provide a sysctl 'net.link.ether.inet.log_arp_wrong_iface' to allow one to supress logging when ARP replies arrive on the wrong interface: "/kernel: arp: 1.2.3.4 is on dc0 but got reply from 00:00:c5:79:d0:0c on dc1"
the default is to log just to give notice about possibly incorrectly configured networks.
|
70643 |
03-Jan-2001 |
alfred |
Fix incorrect logic wouldn't disconnect incomming connections that had been disconnected because they were not full.
Submitted by: David Filo
|
70391 |
27-Dec-2000 |
assar |
include tcp header files to get the prototype for tcp_seq_vs_sess
|
70330 |
24-Dec-2000 |
phk |
Update the "icmp_admin_prohib_like_rst" code to check the tcp-window and to be configurable with respect to acting only in SYN or in all TCP states.
PR: 23665 Submitted by: Jesper Skriver <jesper@skriver.dk>
|
70254 |
21-Dec-2000 |
bmilekic |
* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
|
70105 |
16-Dec-2000 |
billf |
Use getmicrotime() instead of microtime() when timestamping ICMP packets, the former is quicker and accurate enough for use here.
Submitted by: Jason Slagle <raistlin@toledolink.com> (on IRC) Reviewed by: phk
|
70103 |
16-Dec-2000 |
phk |
We currently does not react to ICMP administratively prohibited messages send by routers when they deny our traffic, this causes a timeout when trying to connect to TCP ports/services on a remote host, which is blocked by routers or firewalls.
rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually requi re that we treat such a message for a TCP session, that we treat it like if we had recieved a RST.
quote begin.
A Destination Unreachable message that is received MUST be reported to the transport layer. The transport layer SHOULD use the information appropriately; for example, see Sections 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol that has its own mechanism for notifying the sender that a port is unreachable (e.g., TCP, which sends RST segments) MUST nevertheless accept an ICMP Port Unreachable for the same purpose.
quote end.
I've written a small extension that implement this, it also create a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if this new behaviour is activated.
When it's activated (set to 1) we'll treat a ICMP administratively prohibited message (icmp type 3 code 9, 10 and 13) for a TCP sessions, as if we recived a TCP RST, but only if the TCP session is in SYN_SENT state.
The reason for only reacting when in SYN_SENT state, is that this will solve the problem, and at the same time minimize the risk of this being abused.
I suggest that we enable this new behaviour by default, but it would be a change of current behaviour, so if people prefer to leave it disabled by default, at least for now, this would be ok for me, the attached diff actually have the sysctl set to 0 by default.
PR: 23086 Submitted by: Jesper Skriver <jesper@skriver.dk>
|
70070 |
15-Dec-2000 |
bmilekic |
Change the following:
1. ICMP ECHO and TSTAMP replies are now rate limited. 2. RSTs generated due to packets sent to open and unopen ports are now limited by seperate counters. 3. Each rate limiting queue now has its own description, as follows:
Limiting icmp unreach response from 439 to 200 packets per second Limiting closed port RST response from 283 to 200 packets per second Limiting open port RST response from 18724 to 200 packets per second Limiting icmp ping response from 211 to 200 packets per second Limiting icmp tstamp response from 394 to 200 packets per second
Submitted by: Mike Silbersack <silby@silby.com>
|
69781 |
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
69774 |
08-Dec-2000 |
phk |
Staticize some malloc M_ instances.
|
69152 |
25-Nov-2000 |
jlemon |
Lock down the network interface queues. The queue mutex must be obtained before adding/removing packets from the queue. Also, the if_obytes and if_omcasts fields should only be manipulated under protection of the mutex.
IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on the queue. An IF_LOCK macro is provided, as well as the old (mutex-less) versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which needs them, but their use is discouraged.
Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF, which takes care of locking/enqueue, and also statistics updating/start if necessary.
|
69147 |
25-Nov-2000 |
jlemon |
Revert the last commit to the callout interface, and add a flag to callout_init() indicating whether the callout is safe or not. Update the callers of callout_init() to reflect the new interface.
Okayed by: Jake
|
69099 |
23-Nov-2000 |
bmilekic |
Fixup (hopefully) bridging + ipfw + dummynet together...
* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf header structure, called "pkt," and could consequently pollute the mbuf free list if it was ever passed to m_freem(). The fix involved passing not pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf utility routines.
* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header back into the mbuf (prepended) because the dummynet code that follows expects it to be there but it is, unfortunately for dummynet, passed to bdg_forward as a seperate argument.
PRs: kern/19551 ; misc/21534 ; kern/23010 Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: bmilekic Approved by: luigi
|
69025 |
22-Nov-2000 |
ru |
mdoc(7) police: use the new feature of the An macro.
|
68619 |
11-Nov-2000 |
bmilekic |
While I'm here, get rid of (now useless) MCLISREFERENCED and use MEXT_IS_REF instead. Also, fix a small set of "avail." If we're setting `avail,' we shouldn't be re-checking whether m_flags is M_EXT, because we know that it is, as if it wasn't, we would have already returned several lines above.
Reviewed by: jlemon
|
68431 |
07-Nov-2000 |
ru |
Fixed the security breach I introduced in rev 1.145. Disallow getsockopt(IP_FW_ADD) if securelevel >= 3.
PR: 22600
|
68318 |
04-Nov-2000 |
jlemon |
tp->snd_recover is part of the New Reno recovery algorithm, and should only be checked if the system is currently performing New Reno style fast recovery. However, this value was being checked regardless of the NR state, with the end result being that the congestion window was never opened.
Change the logic to check t_dupack instead; the only code path that allows it to be nonzero at this point is NewReno, so if it is nonzero, we are in fast recovery mode and should not touch the congestion window.
Tested by: phk
|
68231 |
02-Nov-2000 |
ru |
Fixed the bug I have introduced in icmp_error() in revision 1.44. The amount of data we copy from the original IP datagram into the ICMP message was computed incorrectly for IP packets with payload less than 8 bytes.
|
68179 |
01-Nov-2000 |
ru |
Wrong checksum may have been computed for certain UDP packets.
Reviewed by: jlemon
|
68169 |
01-Nov-2000 |
ru |
Wrong checksum used for certain reassembled IP packets before diverting.
|
68150 |
01-Nov-2000 |
joe |
It's no longer true that "nobody uses ia beyond here"; it's now used to keep address based if_data statistics in.
Submitted by: ru
|
68056 |
31-Oct-2000 |
ru |
Do not waste a time saving a copy of IP header if we are certainly not going to send an ICMP error message (net.inet.udp.blackhole=1).
|
67980 |
30-Oct-2000 |
ru |
Added boolean argument to link searching functions, indicating whether they should create a link if lookup has failed or not.
|
67966 |
30-Oct-2000 |
ru |
A significant rewrite of PPTP aliasing code.
PPTP links are no longer dropped by simple (and inappropriate in this case) "inactivity timeout" procedure, only when requested through the control connection.
It is now possible to have multiple PPTP servers running behind NAT. Just redirect the incoming TCP traffic to port 1723, everything else is done transparently.
Problems were reported and the fix was tested by: Michael Adler <Michael.Adler@compaq.com>, David Andersen <dga@lcs.mit.edu>
|
67893 |
29-Oct-2000 |
phk |
Move suser() and suser_xxx() prototypes and a related #define from <sys/proc.h> to <sys/systm.h>.
Correctly document the #includes needed in the manpage.
Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.
|
67882 |
29-Oct-2000 |
phk |
Remove unneeded #include <sys/proc.h> lines.
|
67853 |
29-Oct-2000 |
darrenr |
Fix conflicts creted by import.
|
67833 |
29-Oct-2000 |
joe |
Count per-address statistics for IP fragments.
Requested by: ru Obtained from: BSD/OS
|
67711 |
27-Oct-2000 |
obrien |
Include sys/param.h for `__FreeBSD_version' rather than the non-existent osreldate.h.
Submitted by: dougb
|
67708 |
27-Oct-2000 |
phk |
Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc.
Define __offsetof() in <machine/ansi.h>
Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>
Remove myriad of local offsetof() definitions.
Remove includes of <stddef.h> in kernel code.
NB: Kernelcode should *never* include from /usr/include !
Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.
Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001.
Paritials reviews by: various. Significant brucifications by: bde
|
67692 |
27-Oct-2000 |
ru |
Fetch the protocol header (TCP, UDP, ICMP) only from the first fragment of IP datagram. This fixes the problem when firewall denied fragmented packets whose last fragment was less than minimum protocol header size.
Found by: Harti Brandt <brandt@fokus.gmd.de> PR: kern/22309
|
67620 |
26-Oct-2000 |
ru |
RFC 791 says that IP_RF bit should always be zero, but nothing in the code enforces this. So, do not check for and attempt a false reassembly if only IP_RF is set.
Also, removed the dead code, since we no longer use dtom() on return from ip_reass().
|
67614 |
26-Oct-2000 |
darrenr |
fix conflicts from rcsids
|
67609 |
26-Oct-2000 |
ru |
Wrong header length used for certain reassembled IP packets. This was first fixed in rev 1.82 but then broken in rev 1.125.
PR: 6177
|
67596 |
26-Oct-2000 |
luigi |
Close PR22152 and PR19511 -- correct the naming of a variable
|
67564 |
25-Oct-2000 |
ru |
We now keep the ip_id field in network byte order all the time, so there is no need to make the distinction between ip_output() and ip_input() cases.
Reviewed by: silence on freebsd-net
|
67456 |
23-Oct-2000 |
itojun |
be careful on mbuf overrun on ctlinput. short icmp6 packet may be able to panic the kernel. sync with kame.
|
67375 |
20-Oct-2000 |
ru |
Save a few CPU cycles in IP fragmentation code.
|
67334 |
19-Oct-2000 |
joe |
Augment the 'ifaddr' structure with a 'struct if_data' to keep statistics on a per network address basis.
Teach the IPv4 and IPv6 input/output routines to log packets/bytes against the network address connected to the flow.
Teach netstat to display the per-address stats for IP protocols when 'netstat -i' is evoked, instead of displaying the per-interface stats.
|
67316 |
19-Oct-2000 |
ru |
A failure to allocate memory for auxiliary TCP data is now fatal. This fixes a null pointer dereference problem that is unlikely to happen in normal circumstances.
|
67287 |
18-Oct-2000 |
ru |
If we do not byte-swap the ip_id in the first place, don't do it in the second. NetBSD (from where I've taken this originally) needs to fix this too.
|
67026 |
12-Oct-2000 |
ru |
Backout my wrong attempt to fix the compilation warning in ip_input.c and instead reapply the revision 1.49 of mbuf.h, i.e.
Fixed regression of the type of the `header' member of struct pkthdr from `void *' to caddr_t in rev.1.51. This mainly caused an annoying warning for compiling ip_input.c.
Requested by: bde
|
67009 |
12-Oct-2000 |
ru |
Fix the compilation warning.
|
67003 |
12-Oct-2000 |
ru |
Allow for IP_FW_ADD to be used in getsockopt(2) incarnation as well, in which case return the rule number back into userland.
PR: bin/18351 Reviewed by: archie, luigi
|
66798 |
07-Oct-2000 |
alfred |
Remove headers not needed.
Pointed out by: phk
|
66744 |
06-Oct-2000 |
ru |
As we now may check the TCP header window field, make sure we pullup enough into the mbuf data area. Solve this problem once and for all by pulling up the entire (standard) header for TCP and UDP, and four bytes of header for ICMP (enough for type, code and cksum fields).
|
66582 |
03-Oct-2000 |
ru |
Added the missing ntohs() conversion when matching IP packet with the IP_FW_IF_IPID rule. (We have recently decided to keep the ip_id field in network byte order inside the kernel, see revision 1.140 of src/sys/netinet/ip_input.c).
I did not like to have the conversion happen in userland, and I think that the similar conversions for fw_tcp(seq|ack|win) should be moved out of userland (src/sbin/ipfw/ipfw.c) into the kernel.
|
66552 |
02-Oct-2000 |
jlemon |
If TCPDEBUG is defined, we could dereference a tp which was freed.
|
66545 |
02-Oct-2000 |
ru |
A bit of indentation reformatting.
|
66523 |
02-Oct-2000 |
billf |
Add new fields for more granularity: IP: version, tos, ttl, len, id TCP: seq#, ack#, window size
Reviewed by: silence on freebsd-{net,ipfw}
|
66521 |
02-Oct-2000 |
billf |
Add new fields for more granularity: IP: version, tos, ttl, len, id TCP: seq#, ack#, window size
Reviewed by: silence on freebsd-{net,ipfw}
|
66445 |
29-Sep-2000 |
ru |
Document that net.inet.ip.fw.one_pass only affects dummynet(4).
Noticed by: Peter Jeremy<peter.jeremy@alcatel.com.au>
|
66433 |
29-Sep-2000 |
kris |
Use stronger random number generation for TCP_ISSINCR and tcp_iss.
Reviewed by: peter, jlemon
|
66376 |
25-Sep-2000 |
bmilekic |
Finally make do_tcpdrain sysctl live under correct parent, _net_inet_tcp, as opposed to _debug. Like before, default value remains 1.
|
66157 |
21-Sep-2000 |
ru |
Fixed the calculations with UDP header length field. The field is in network byte order and contains the size of the header.
Reviewed by: brian
|
65986 |
17-Sep-2000 |
kjc |
change the evaluation order of the rsvp socket in rsvp_input() in favor of the new-style per-vif socket.
this does not affect the behavior of the ISI rsvpd but allows another rsvp implementation (e.g., KOM rsvp) to take advantage of the new style for particular sockets while using the old style for others.
in the future, rsvp supporn should be replaced by more generic router-alert support.
PR: kern/20984 Submitted by: Martin Karsten <Martin.Karsten@KOM.tu-darmstadt.de> Reviewed by: kjc
|
65985 |
17-Sep-2000 |
phk |
Properly jail UDP sockets. This is quite a bit more tricky than TCP.
This fixes a !root userland panic, and some cases where the wrong interface was chosen for a jailed UDP socket.
PR: 20167, 19839, 20946
|
65984 |
17-Sep-2000 |
phk |
Reverse last commit, a better fix has been found.
|
65976 |
17-Sep-2000 |
phk |
Make sure UDP sockets are explicitly bind(2)'ed [sic] before we connect(2) them.
PR: 20946 Isolated by: Aaron Gifford <agifford@infowest.com>
|
65906 |
16-Sep-2000 |
jlemon |
It is possible for a TCP callout to be removed from the timing wheel, but have a network interrupt arrive and deactivate the timeout before the callout routine runs. Check for this case in the callout routine; it should only run if the callout is active and not on the wheel.
|
65892 |
15-Sep-2000 |
ru |
Add -Wmissing-prototypes.
|
65859 |
14-Sep-2000 |
jlemon |
m_cat() can free its second argument, so collect the checksum information from the fragment before calling m_cat().
|
65837 |
14-Sep-2000 |
ru |
Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.
Requested by: wollman
|
65765 |
12-Sep-2000 |
billf |
Fix screwup in previous commit.
|
65751 |
11-Sep-2000 |
archie |
Don't do snd_nxt rollback optimization (rev. 1.46) for SYN packets. It causes a panic when/if snd_una is incremented elsewhere (this is a conservative change, because originally no rollback occurred for any packets at all).
Submitted by: Vivek Sadananda Pai <vivek@imimic.com>
|
65643 |
09-Sep-2000 |
alfred |
Forget to include sysctl.h
Submitted by: des
|
65534 |
06-Sep-2000 |
alfred |
Accept filter maintainance
Update copyrights.
Introduce a new sysctl node: net.inet.accf
Although acceptfilters need refcounting to be properly (safely) unloaded as a temporary hack allow them to be unloaded if the sysctl net.inet.accf.unloadable is set, this is really for developers who want to work on thier own filters.
A near complete re-write of the accf_http filter: 1) Parse check if the request is HTTP/1.0 or HTTP/1.1 if not dump to the application. Because of the performance implications of this there is a sysctl 'net.inet.accf.http.parsehttpversion' that when set to non-zero parses the HTTP version. The default is to parse the version. 2) Check if a socket has filled and dump to the listener 3) optimize the way that mbuf boundries are handled using some voodoo 4) even though you'd expect accept filters to only be used on TCP connections that don't use m_nextpkt I've fixed the accept filter for socket connections that use this.
This rewrite of accf_http should allow someone to use them and maintain full HTTP compliance as long as net.inet.accf.http.parsehttpversion is set.
|
65504 |
06-Sep-2000 |
billf |
1. IP_FW_F_{UID,GID} are _not_ commands, they are extras. The sanity checking for them does not belong in the IP_FW_F_COMMAND switch, that mask doesn't even apply to them(!).
2. You cannot add a uid/gid rule to something that isn't TCP, UDP, or IP.
XXX - this should be handled in ipfw(8) as well (for more diagnostic output), but this at least protects bogus rules from being added.
Pointy hat: green
|
65332 |
01-Sep-2000 |
ru |
Match IPPROTO_ICMP with IP protocol field of the original IP datagram embedded into ICMP error message, not with protocol field of ICMP message itself (which is always IPPROTO_ICMP).
Pointed by: Erik Salander <erik@whistle.com>
|
65327 |
01-Sep-2000 |
ru |
Fixed broken ICMP error generation, unified conversion of IP header fields between host and network byte order. The details:
o icmp_error() now does not add IP header length. This fixes the problem when icmp_error() is called from ip_forward(). In this case the ip_len of the original IP datagram returned with ICMP error was wrong.
o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host byte order, so DTRT and convert these fields back to network byte order before sending a message. This fixes the problem described in PR 16240 and PR 20877 (ip_id field was returned in host byte order).
o ip_ttl decrement operation in ip_forward() was moved down to make sure that it does not corrupt the copy of original IP datagram passed later to icmp_error().
o A copy of original IP datagram in ip_forward() was made a read-write, independent copy. This fixes the problem I first reported to Garrett Wollman and Bill Fenner and later put in audit trail of PR 16240: ip_output() (not always) converts fields of original datagram to network byte order, but because copy (mcopy) and its original (m) most likely share the same mbuf cluster, ip_output()'s manipulations on original also corrupted the copy.
o ip_output() now expects all three fields, ip_len, ip_off and (what is significant) ip_id in host byte order. It was a headache for years that ip_id was handled differently. The only compatibility issue here is the raw IP socket interface with IP_HDRINCL socket option set and a non-zero ip_id field, but ip.4 manual page was unclear on whether in this case ip_id field should be in host or network byte order.
|
65317 |
01-Sep-2000 |
ru |
Changed the way we handle outgoing ICMP error messages -- do not alias `ip_src' unless it comes from the host an original datagram that triggered this error message was destined for.
PR: 20712 Reviewed by: brian, Charles Mott <cmott@scientech.com>
|
65281 |
31-Aug-2000 |
ru |
Grab ADJUST_CHECKSUM() macro from alias_local.h.
|
65280 |
31-Aug-2000 |
ru |
Create aliasing links for incoming ICMP echo/timestamp requests. This makes outgoing ICMP echo/timestamp replies to be de-aliased with the right source IP, not exactly the primary aliasing IP.
|
65260 |
30-Aug-2000 |
ru |
Fixed the bug that div_bind() always returned zero even if there was an error (broken in rev 1.9).
|
65248 |
30-Aug-2000 |
ru |
Backout the hack in rev 1.71, I am working on a better patch that should cover almost all inconsistencies in ICMP error generation.
|
65221 |
29-Aug-2000 |
ache |
strtok -> strsep (no strtok allowed in libraries) add unsigned char cast to ctype macro
|
65197 |
29-Aug-2000 |
darrenr |
Apply appropriate patch.
PR: 20877 Submitted by: Frank Volf (volf@oasis.IAEhv.nl)
|
64902 |
22-Aug-2000 |
archie |
Remove obsolete comment.
|
64853 |
19-Aug-2000 |
bde |
Fixed a missing splx() in if_addmulti(). Was broken in rev.1.28.
|
64658 |
15-Aug-2000 |
itojun |
repair endianness issue in IN_MULTICAST(). again, *BSD difference...
From: Nick Sayer <nsayer@quack.kfu.com>
|
64644 |
14-Aug-2000 |
ru |
Fixed PunchFW code segmentation violation bug.
Reported by: Christian Schade <chris@cube.sax.de>
|
64643 |
14-Aug-2000 |
ru |
Use queue(3) LIST_* macros for doubly-linked lists.
|
64580 |
13-Aug-2000 |
darrenr |
resolve conflicts
|
64452 |
09-Aug-2000 |
ru |
- Do not modify Peer's Call ID in outgoing Incoming-Call-Connected PPTP control messages.
- Cosmetics: replace `GRE link' with `PPTP link'.
Reviewed by: Erik Salander <erik@whistle.com>
|
64334 |
07-Aug-2000 |
ru |
Adjust TCP checksum rather than compute it afresh.
Submitted by: Erik Salander <erik@whistle.com>
|
64213 |
03-Aug-2000 |
archie |
Improve performance in the case where ip_output() returns an error. When this happens, we know for sure that the packet data was not received by the peer. Therefore, back out any advancing of the transmit sequence number so that we send the same data the next time we transmit a packet, avoiding a guaranteed missed packet and its resulting TCP transmit slowdown.
In most systems ip_output() probably never returns an error, and so this problem is never seen. However, it is more likely to occur with device drivers having short output queues (causing ENOBUFS to be returned when they are full), not to mention low memory situations.
Moreover, because of this problem writers of slow devices were required to make an unfortunate choice between (a) having a relatively short output queue (with low latency but low TCP bandwidth because of this problem) or (b) a long output queue (with high latency and high TCP bandwidth). In my particular application (ISDN) it took an output queue equal to ~5 seconds of transmission to avoid ENOBUFS. A more reasonable output queue of 0.5 seconds resulted in only about 50% TCP throughput. With this patch full throughput was restored in the latter case.
Reviewed by: freebsd-net
|
64192 |
03-Aug-2000 |
ru |
Make netstat(1) to be aware of divert(4) sockets.
|
64105 |
01-Aug-2000 |
roberto |
Change __FreeBSD_Version into the proper __FreeBSD_version.
Submitted by: Alain.Thivillon@hsc.fr (Alain Thivillon) (for ip_fil.c)
|
64078 |
01-Aug-2000 |
ache |
Add missing '0' to FreeBSD_version test: 50011 -> 500011
|
64075 |
31-Jul-2000 |
ache |
Nonexistent <sys/pfil.h> -> <net/pfil.h> Kernel 'make depend' fails otherwise
|
64061 |
31-Jul-2000 |
sheldonh |
Whitespace only:
Fix an overlong line and trailing whitespace that crept in, in the previous commit.
|
64060 |
31-Jul-2000 |
darrenr |
activate pfil_hooks and covert ipfilter to use it
|
63899 |
26-Jul-2000 |
archie |
Add address translation support for RTSP/RTP used by RealPlayer and Quicktime streaming media applications.
Add a BUGS section to the man page.
Submitted by: Erik Salander <erik@whistle.com>
|
63745 |
21-Jul-2000 |
jayanth |
When a connection is being dropped due to a listen queue overflow, delete the cloned route that is associated with the connection. This does not exhaust the routing table memory when the system is under a SYN flood attack. The route entry is not deleted if there is any prior information cached in it.
Reviewed by: Peter Wemm,asmodai
|
63523 |
19-Jul-2000 |
darrenr |
fix conflicts
|
63431 |
18-Jul-2000 |
sheldonh |
Fix a comment which was broken in rev 1.36.
PR: 19947 Submitted by: Tetsuya Isaki <isaki@net.ipc.hiroshima-u.ac.jp>
|
63330 |
17-Jul-2000 |
luigi |
close PR 19544 - ipfw pipe delete causes panic when no pipes defined
PR: 19544
|
63080 |
13-Jul-2000 |
dwmalone |
Extra sanity check when arp proxyall is enabled. Don't send an arp reply if the requesting machine isn't on the interface we believe it should be. Prevents arp wars when you plug cables in the wrong way around.
PR: 9848 Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Not objected to by: wollman
|
63048 |
12-Jul-2000 |
jayanth |
re-enable the tcp newreno code.
|
63024 |
12-Jul-2000 |
itojun |
remove m_pulldown statistics, which is highly experimental and does not belong to *bsd-merged tree
|
62846 |
09-Jul-2000 |
itojun |
be more cautious about tcp option length field. drop bogus ones earlier. not sure if there is a real threat or not, but it seems that there's possibility for overrun/underrun (like non-NOP option with optlen > cnt).
|
62587 |
04-Jul-2000 |
itojun |
sync with kame tree as of july00. tons of bug fixes/improvements.
API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
|
62573 |
04-Jul-2000 |
phk |
Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by: bde
|
62454 |
03-Jul-2000 |
phk |
Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources:
-sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
|
62159 |
27-Jun-2000 |
ru |
Fixed PunchFWHole(): - ipfw always rejected rule with `neither in nor out' diagnostics. - number of src/dst ports was not set properly.
|
61865 |
20-Jun-2000 |
ru |
- Removed PacketAliasPptp() API function. - SHLIB_MAJOR++.
|
61861 |
20-Jun-2000 |
ru |
Added true support for PPTP aliasing. Some nice features include:
- Multiple PPTP clients behind NAT to the same or different servers.
- Single PPTP server behind NAT -- you just need to redirect TCP port 1723 to a local machine. Multiple servers behind NAT is possible but would require a simple API change.
- No API changes!
For more information on how this works see comments at the start of the alias_pptp.c.
PacketAliasPptp() is no longer necessary and will be removed soon.
Submitted by: Erik Salander <erik@whistle.com> Reviewed by: ru Rewritten by: ru Reviewed by: Erik Salander <erik@whistle.com>
|
61837 |
20-Jun-2000 |
alfred |
return of the accept filter part II
accept filters are now loadable as well as able to be compiled into the kernel.
two accept filters are provided, one that returns sockets when data arrives the other when an http request is completed (doesn't work with 0.9 requests)
Reviewed by: jmg
|
61735 |
16-Jun-2000 |
ru |
- Improved passive mode FTP support by aliasing 229 replies. - Stricter checking of PORT/EPRT/227/229 messages format. - Moved all security checks into one place.
|
61677 |
14-Jun-2000 |
ru |
- Added support for passive mode FTP by aliasing 227 replies. It does mean that it is now possible to run passive-mode FTP server behind NAT.
- SECURITY: FTP aliasing engine now ensures that: o the segment preceding a PORT/227 segment terminates with a \r\n; o the IP address in the PORT/227 matches the source IP address of the packet; o the port number in the PORT command or 277 reply is greater than or equal to 1024.
Submitted by: Erik Salander <erik@whistle.com> Reviewed by: ru
|
61657 |
14-Jun-2000 |
luigi |
Fix behaviour of "ipfw pipe show" -- previous code gave ambiguous data to the userland program (kernel operation was safe, anyways).
|
61420 |
08-Jun-2000 |
dan |
Add tcpoptions to ipfw. This works much in the same way as ipoptions do. It also squashes 99% of packet kiddie synflood orgies. For example, to rate syn packets without MSS,
ipfw pipe 10 config 56Kbit/s queue 10Packets ipfw add pipe 10 tcp from any to any in setup tcpoptions !mss
Submitted by: Richard A. Steenbergen <ras@e-gerbil.net>
|
61413 |
08-Jun-2000 |
luigi |
Implement WF2Q+ in dummynet.
|
61183 |
02-Jun-2000 |
jlemon |
Add boundary checks against IP options.
Obtained from: OpenBSD
|
61179 |
02-Jun-2000 |
jlemon |
When attempting to transmit a packet, if the system fails to allocate a mbuf, it may return without setting any timers. If no more data is scheduled to be transmitted (this was a FIN) the system will sit in LAST_ACK state forever.
Thus, when mbuf allocation fails, set the retransmit timer if neither the retransmit or persist timer is already pending.
Problem discovered by: Mike Silbersack (silby@silby.com) Pushed for a fix by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: jayanth
|
60944 |
26-May-2000 |
darrenr |
define CSUM_DELAY_DATA to match merge
|
60938 |
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
60925 |
25-May-2000 |
darrenr |
fix up #ifdef jungle for FreeBSD
|
60923 |
25-May-2000 |
darrenr |
remove duplicate prototypes
|
60910 |
25-May-2000 |
jlemon |
Mark the checksum as complete when looping back multicast packets.
Submitted by: Jeff Gibbons <jgibbons@n2.net>
|
60889 |
24-May-2000 |
archie |
Just need to pass the address family to if_simloop(), not the whole sockaddr.
|
60883 |
24-May-2000 |
darrenr |
fix duplicate rcsid's
|
60872 |
24-May-2000 |
bde |
Fixed some style bugs (mainly convoluted logic for blackhole processing).
|
60865 |
24-May-2000 |
peter |
It would have been nice if this actually compiled. Close the header comment */.
|
60857 |
24-May-2000 |
darrenr |
fix up conflicts
|
60855 |
24-May-2000 |
darrenr |
fix conflicts
|
60854 |
24-May-2000 |
darrenr |
fix conflicts
|
60853 |
24-May-2000 |
darrenr |
fix conflicts
|
60852 |
24-May-2000 |
darrenr |
fix conflicts
|
60851 |
24-May-2000 |
darrenr |
fix conflicts
|
60850 |
24-May-2000 |
darrenr |
fix conflicts
|
60833 |
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
60798 |
22-May-2000 |
dan |
sysctl'ize ICMP_BANDLIM and ICMP_BANDLIM_SUPPRESS_OUTPUT.
Suggested by: des/nbm
|
60797 |
22-May-2000 |
dan |
Add option ICMP_BANDLIM_SUPPRESS_OUTPUT to the mix. With this option, badport_bandlim() will not muck up your console with printf() messages.
|
60765 |
21-May-2000 |
jlemon |
Compute the checksum before handing the packet off to IPFilter.
Tested by: Cy Schubert <Cy.Schubert@uumail.gov.bc.ca>
|
60690 |
19-May-2000 |
peter |
Return ECONNRESET instead of EINVAL if the connection has been shot down as a result of a reset. Returning EINVAL in that case makes no sense at all and just confuses people as to what happened. It could be argued that we should save the original address somewhere so that getsockname() etc can tell us what it used to be so we know where the problem connection attempts are coming from.
|
60687 |
18-May-2000 |
jayanth |
snd_cwnd was updated twice in the tcp_newreno function.
|
60662 |
17-May-2000 |
jayanth |
Sigh, fix a rookie patch merge error.
Also-missed-by: peter
|
60661 |
17-May-2000 |
jlemon |
Cast sizeof() calls to be of type (int) when they appear in a signed integer expression. Otherwise the sizeof() call will force the expression to be evaluated as unsigned, which is not the intended behavior.
Obtained from: NetBSD (in a different form)
|
60619 |
16-May-2000 |
jayanth |
snd_una was being updated incorrectly, this resulted in the newreno code retransmitting data from the wrong offset.
As a footnote, the newreno code was partially derived from NetBSD and Tom Henderson <tomh@cs.berkeley.edu>
|
60612 |
15-May-2000 |
ru |
Do not call icmp_error() if ipfirewall(4) denied packet.
PR: kern/10747, kern/18382
|
60536 |
14-May-2000 |
archie |
Move code to handle BPF and bridging for incoming Ethernet packets out of the individual drivers and into the common routine ether_input(). Also, remove the (incomplete) hack for matching ethernet headers in the ip_fw code.
The good news: net result of 1016 lines removed, and this should make bridging now work with *all* Ethernet drivers.
The bad news: it's nearly impossible to test every driver, especially for bridging, and I was unable to get much testing help on the mailing lists.
Reviewed by: freebsd-net
|
60408 |
11-May-2000 |
jayanth |
Temporarily turn off the newreno flag until we can track down the known data corruption problem.
|
60363 |
11-May-2000 |
brian |
Revert the default behaviour for incoming connections so that they (once again) go to the target machine rather than the alias address.
PR: 18354 Submitted by: ru
|
60304 |
10-May-2000 |
itojun |
correct more out-of-bounds memory access, if cnt == 1 and optlen > 1. similar to recent fix to sys/netinet/ipf.c (by darren).
|
60295 |
09-May-2000 |
darrenr |
Fix bug in dealing with "hlen == 1 and opt > 1"
|
60265 |
09-May-2000 |
ps |
Add missing include machine/in_cksum.h.
Submitted by: n_hibma
|
60214 |
08-May-2000 |
ken |
Include machine/in_cksum.h to unbreak options MROUTING.
|
60105 |
06-May-2000 |
jlemon |
Add #include <machine/in_cksum.h>, in order to pick up the checksum inline functions and prototypes.
|
60067 |
06-May-2000 |
jlemon |
Implement TCP NewReno, as documented in RFC 2582. This allows better recovery for multiple packet losses in a single window. The algorithm can be toggled via the sysctl net.inet.tcp.newreno, which defaults to "on".
Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com>
|
59909 |
02-May-2000 |
paul |
Force the address of the socket to be INADDR_ANY immediately before calling in_pcbbind so that in_pcbbind sees a valid address if no address was specified (since divert sockets ignore them).
PR: 17552 Reviewed by: Brian
|
59898 |
02-May-2000 |
luigi |
Remove an unnecessary error message
|
59874 |
01-May-2000 |
peter |
Add $FreeBSD$
|
59726 |
28-Apr-2000 |
ru |
Replace PacketAliasRedirectPptp() (which had nothing specific to PPTP) with more generic PacketAliasRedirectProto().
Major number is not bumped because it is believed that noone has started using PacketAliasRedirectPptp() yet.
|
59704 |
27-Apr-2000 |
ru |
Spell PacketAliasRedirectAddr() correctly.
|
59702 |
27-Apr-2000 |
ru |
Load Sharing using IP Network Address Translation (RFC 2391, LSNAT).
LSNAT links are first created by either PacketAliasRedirectPort() or PacketAliasRedirectAddress() and then set up by one or more calls to PacketAliasAddServer().
|
59392 |
19-Apr-2000 |
shin |
Let initialize th_sum before in6_cksum(), again. Without this fix, all IPv6 TCP RST packet has wrong cksum value, so IPv6 connect() trial to 5.0 machine won't fail until tcp connect timeout, when they should fail soon.
Thanks to haro@tk.kubota.co.jp (Munehiro Matsuda) for his much debugging help and detailed info.
|
59391 |
19-Apr-2000 |
phk |
Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
|
59356 |
18-Apr-2000 |
ru |
Add support for multiple PPTP sessions:
- new API function: PacketAliasRedirectPptp() - new mode bit: PKT_ALIAS_DENY_PPTP
Please see manual page for details.
|
59334 |
17-Apr-2000 |
sumikawa |
ND6_HINT() should not be called unless the connection status is ESTABLISHED.
Obtained from: KAME Project
|
59237 |
14-Apr-2000 |
ru |
Apply TCP_EXPIRE_CONNECTED (86400 seconds) timeout only to established connections, after SYN packets were seen from both ends. Before this, it would get applied right after the first SYN packet was seen (either from client or server). With broken TCP connection attempts, when the remote end does not respond with SYNACK nor with RST, this resulted in having a useless (ie, no actual TCP connection associated with it) TCP link with 86400 seconds TTL, wasting system memory. With high rate of such broken connection attempts (for example, remote end simply blocks these connection attempts with ipfw(8) without sending RST back), this could result in a denial-of-service.
PR: bin/17963
|
59202 |
13-Apr-2000 |
ru |
A complete reformatting of manual page.
|
59181 |
12-Apr-2000 |
ru |
Make partially specified permanent links without `dst_addr' but with `dst_port' work for outgoing packets.
This case was not handled properly when I first fixed this in revision 1.17.
This change is also required for the upcoming improved PPTP support patches -- that is how I found the problem.
Before this change:
# natd -v -a aliasIP \ -redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT
Out [TCP] [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to [TCP] aliasIP:localPORT -> remoteIP:remotePORT
After this change:
# natd -v -a aliasIP \ -redirect_port tcp localIP:localPORT publicIP:publicPORT 0:remotePORT
Out [TCP] [TCP] localIP:localPORT -> remoteIP:remotePORT aliased to [TCP] publicIP:publicPORT -> remoteIP:remotePORT
|
59143 |
11-Apr-2000 |
wes |
PR: kern/17872 Submitted by: csg@waterspout.com (C. Stephen Gunn)
|
59075 |
06-Apr-2000 |
ru |
- Add support for FTP EPRT (RFC 2428) command. - Minor optimizations. - Minor spelling fixes.
PR: 14305 Submitted by: ume Rewritten by: ru
|
59047 |
05-Apr-2000 |
ru |
- Remove unused includes. - Minor spelling fixes. - Make IcmpAliasOut2() really work.
Before this change:
# natd -v -n PUB_IFACE -p 12345 -redirect_address 192.168.1.1 P.P.P.P natd[87923]: Aliasing to A.A.A.A, mtu 1500 bytes In [UDP] [UDP] X.X.X.X:49562 -> P.P.P.P:50000 aliased to [UDP] X.X.X.X:49562 -> 192.168.1.1:50000 Out [ICMP] [ICMP] 192.168.1.1 -> X.X.X.X 3(3) aliased to [ICMP] A.A.A.A -> X.X.X.X 3(3)
# tcpdump -n -t -i PUB_IFACE host X.X.X.X and "(udp or icmp)" tcpdump: listening on PUB_IFACE X.X.X.X.49562 > P.P.P.P.50000: udp 3 A.A.A.A > X.X.X.X: icmp: A.A.A.A udp port 50000 unreachable
After this change:
# natd -v -n PUB_IFACE -p 12345 -redirect_address 192.168.1.1 P.P.P.P natd[89360]: Aliasing to A.A.A.A, mtu 1500 bytes In [UDP] [UDP] X.X.X.X:49563 -> P.P.P.P:50000 aliased to [UDP] X.X.X.X:49563 -> 192.168.1.1:50000 Out [ICMP] [ICMP] 192.168.1.1 -> X.X.X.X 3(3) aliased to [ICMP] P.P.P.P -> X.X.X.X 3(3)
# tcpdump -n -t -i PUB_IFACE host X.X.X.X and "(udp or icmp)" tcpdump: listening on PUB_IFACE X.X.X.X.49563 > P.P.P.P.50000: udp 3 P.P.P.P > X.X.X.X: icmp: P.P.P.P udp port 50000 unreachable
|
59046 |
05-Apr-2000 |
ru |
- Moved NULL definition into private include file. - Minor spelling fixes.
|
59031 |
05-Apr-2000 |
ru |
Minor spelling fixes.
|
58943 |
02-Apr-2000 |
brian |
Correct Charles Mott's email address
Requested by: Charles Mott <cmott@scientech.com>
|
58936 |
02-Apr-2000 |
shin |
Move htons() ip_len to after the in_delayed_cksum() call. This should stop cksum error messages on IPsec communication which was reported on freebsd-current.
Reviewed by: jlemon
|
58911 |
02-Apr-2000 |
ps |
Try and make the kernel build again without INET6.
|
58907 |
01-Apr-2000 |
shin |
Support per socket based IPv4 mapped IPv6 addr enable/disable control.
Submitted by: ume
|
58895 |
01-Apr-2000 |
jlemon |
Calculate any delayed checksums before handing an mbuf off to a divert socket. This fixes a problem with ppp/natd.
Reviewed by: bsd (Brian Dean, gotta love that login name)
|
58877 |
31-Mar-2000 |
brian |
Allow PacketAliasSetTarget() to be passed the following: INADDR_NONE: Incoming packets go to the alias address (the default) INADDR_ANY: Incoming packets are not NAT'd (direct access to the internal network from outside) anything else: Incoming packets go to the specified address
Change a few inaddr::s_addr == 0 to inaddr::s_addr == INADDR_ANY while I'm there.
|
58866 |
31-Mar-2000 |
brian |
When an incoming packet is received that is not specifically redirected and when no target address has been specified, NAT the destination address to the alias address rather than allowing people direct access to your internal network from outside.
|
58806 |
30-Mar-2000 |
jlemon |
If `ipfw fwd' loops an mbuf back to ip_input from ip_output and the mbuf is marked for delayed checksums, then additionally mark the packet as having it's checksums computed. This allows us to bypass computing/checking the checksum entirely, which isn't really needeed as the packet has never hit the wire.
Reviewed by: green
|
58770 |
29-Mar-2000 |
joerg |
Peter Johnson found another log() call without a trailing newline. All three of them have been introduced in rev 1.64, so i guess i've got all of them now. :)
Submitted by: Peter Johnson <locke@mcs.net>
|
58758 |
28-Mar-2000 |
joerg |
Added two missing newlines in calls to log(9).
Reported in Usenet by: locke@mcs.net (Peter Johnson)
While i was at it, prepended a 0x to the %D output, to make it clear that the printed value is in hex (i assume %D has been chosen over %#x to obey network byte order).
|
58698 |
27-Mar-2000 |
jlemon |
Add support for offloading IP/TCP/UDP checksums to NIC hardware which supports them.
|
58499 |
23-Mar-2000 |
dillon |
Fix parens in m_pullup() line in arp handling code. The code was improperly doing the equivalent of (m = (function() == NULL)) instead of ((m = function()) == NULL).
This fixes a NULL pointer dereference panic with runt arp packets.
|
58452 |
22-Mar-2000 |
green |
in6_pcb.c: Remove a bogus (redundant, just weird, etc.) key_freeso(so). There are no consumers of it now, nor does it seem there ever will be.
in6?_pcb.c: Add an if (inp->in6?p_sp != NULL) before the call to ipsec[46]_delete_pcbpolicy(inp). In low-memory conditions this can cause a crash because in6?_sp can be NULL...
|
58313 |
19-Mar-2000 |
lile |
o Replace most magic numbers related to token ring with #defines from iso88025.h.
o Add minimal llc support to iso88025_input.
o Clean up most of the source routing code.
* Submitted by: Nikolai Saoukh <nms@otdel-1.org>
|
58279 |
19-Mar-2000 |
brian |
Make _FindLinkIn() static and only define GetDestPort when NO_FW_PUNCH isn't defined.
|
58057 |
14-Mar-2000 |
ru |
Fix reporting of src and dst IP addresses for ICMP and generic IP packets.
PR: 17319 Submitted by: Mike Heffner <spock@techfour.net>
|
57920 |
11-Mar-2000 |
shin |
Disable IPv4 over IPv4 tunnel on the 6to4 interface for better security.
Approved by: jkh
|
57903 |
11-Mar-2000 |
shin |
IPv6 6to4 support.
Now most big problem of IPv6 is getting IPv6 address assignment. 6to4 solve the problem. 6to4 addr is defined like below,
2002: 4byte v4 addr : 2byte SLA ID : 8byte interface ID
The most important point of the address format is that an IPv4 addr is embeded in it. So any user who has IPv4 addr can get IPv6 address block with 2byte subnet space. Also, the IPv4 addr is used for semi-automatic IPv6 over IPv4 tunneling.
With 6to4, getting IPv6 addr become dramatically easy. The attached patch enable 6to4 extension, and confirmed to work, between "Richard Seaman, Jr." <dick@tar.com> and me.
Approved by: jkh
Reviewed by: itojun
|
57900 |
11-Mar-2000 |
rwatson |
The function arpintr() incorrectly checks m->m_len to detect incomplete ARP packets. This can incorrectly reject complete frames since the frame could be stored in more than one mbuf.
The following patches fix the length comparisson, and add several diagnostic log messages to the interrupt handler for out-of-the-norm ARP packets. This should make ARP problems easier to detect, diagnose and fix.
Submitted by: C. Stephen Gunn <csg@waterspout.com> Approved by: jkh Reviewed by: rwatson
|
57855 |
09-Mar-2000 |
shin |
Initialize mbuf pointer at getting ipsec policy. Without this, kernel will panic at getsockopt() of IPSEC_POLICY. Also make compilable libipsec/test-policy.c which tries getsockopt() of IPSEC_POLICY.
Approved by: jkh
Submitted by: sakane@kame.net
|
57686 |
02-Mar-2000 |
sheldonh |
Remove single-space hard sentence breaks. These degrade the quality of the typeset output, tend to make diffs harder to read and provide bad examples for new-comers to mdoc.
|
57631 |
29-Feb-2000 |
luigi |
Fix panic when doing keep-state and "forward". Removed a redundant check. Also move check for expired rules before using them. Sorry for the whitespace changes.
Approved-by: jordan
|
57576 |
28-Feb-2000 |
ps |
Limit the maximum permissible TCP window size to 65535 octets if window scaling is disabled.
PR: kern/16914 Submitted by: Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> Reviewed by: wollman Approved by: jkh
|
57544 |
28-Feb-2000 |
alfred |
-it do, among other things, clear out any +it does, amongst other things, clear out any
The old sentance didn't seem to make sense.
|
57401 |
23-Feb-2000 |
guido |
Remove option IPFILTER_KLD. In case you wanted to kldload ipfilter, the module would only work in kernels built with this option.
Approved by: jkh
|
57178 |
13-Feb-2000 |
peter |
Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world.
Approved by: jkh
|
57140 |
11-Feb-2000 |
luigi |
Forgot one line: don't try to match flags when looking for a flow.
Approved-by: jordan
|
57126 |
10-Feb-2000 |
guido |
Re add rev 1.11 diffs to ip_fil.h Also discover that I did not undefine CVS_FUBAR (which no longer exists) and thus forgot to add $FreeBSD's. Add them.
Approved by: jkh (is part of ipfilter upgrade)
|
57120 |
10-Feb-2000 |
shin |
Forbid include of soem inet6 header files from wrong place
KAME put INET6 related stuff into sys/netinet6 dir, but IPv6 standard API(RFC2553) require following files to be under sys/netinet. netinet/ip6.h netinet/icmp6.h Now those header files just include each following files. netinet6/ip6.h netinet6/icmp6.h
Also KAME has netinet6/in6.h for easy INET6 common defs sharing between different BSDs, but RFC2553 requires only netinet/in.h should be included from userland. So netinet/in.h also includes netinet6/in6.h inside.
To keep apps portability, apps should not directly include above files from netinet6 dir. Ideally, all contents of, netinet6/ip6.h netinet6/icmp6.h netinet6/in6.h should be moved into netinet/ip6.h netinet/icmp6.h netinet/in.h but to avoid big changes in this stage, add some hack, that -Put some special macro define into those files under neitnet -Let files under netinet6 cause error if it is included from some apps, and, if the specifal macro define is not defined. (which should have been defined if files under netinet is included) -And let them print an error message which tells the correct name of the include file to be included.
Also fix apps which includes invalid header files.
Approved by: jkh
Obtained from: KAME project
|
57117 |
10-Feb-2000 |
luigi |
Move definition of fw_enable from ip_fw.c to ip_input.c so we can compile kernels without IPFIREWALL .
Reported-by: Robert Watson Approved-by: jordan
|
57116 |
10-Feb-2000 |
luigi |
Whoops... forgot braces in a conditional
Revealed-by: diff with -STABLE version (the advantage of having multiple lines of development...) Approved-by: jordan
|
57114 |
10-Feb-2000 |
luigi |
Support the net.inet.ip.fw.enable variable, part of the recent ipfw modifications.
Approved-by: jordan
|
57113 |
10-Feb-2000 |
luigi |
Support for stateful (dynamic) ipfw rules. They are very similar to ipfilter's keep-state.
Look at the updated ipfw(8) manpage for details.
Approved-by: jordan
|
57096 |
09-Feb-2000 |
guido |
Bring over ipfilter v3_3_8 kernel sources, including merging the local modifications. Also fix initializing fr_running in KLD case. Rename ipl_inited to fr_runninhg in mlfk_ipl
Approved by: jkh
|
57068 |
09-Feb-2000 |
shin |
Avoid kernel panic when tcp rfc1323 and rfc1644 options are enabled at the same time.
When rfc1323 and rfc1644 option are enabled by sysctl, and tcp over IPv6 is tried, kernel panic happens by the following check in tcp_output(), because now hdrlen is bigger in such case than before.
/*#ifdef DIAGNOSTIC*/ if (max_linkhdr + hdrlen > MHLEN) panic("tcphdr too big"); /*#endif*/
So change the above check to compare with MCLBYTES in #ifdef INET6 case. Also, allocate a mbuf cluster for the header mbuf, in that case.
Bug reported at KAME environment. Approved by: jkh
Reviewed by: sumikawa Obtained from: KAME project
|
56991 |
04-Feb-2000 |
luigi |
Fix a (mostly harmless) scheduling-in-the-past problem with dummynet (already fixed in -stable, was waiting for Jordan's approval due to the code freeze).
Reported-By: Mike Tancsa Approved-By: Jordan
|
56968 |
02-Feb-2000 |
archie |
The flags PKT_ALIAS_PUNCH_FW and PKT_ALIAS_PROXY_ONLY were both being defined as 0x40. Change the former to be 0x100.
Submitted by: Erik Salander <erik@whistle.com> Approved by: jkh
|
56967 |
02-Feb-2000 |
brian |
Mention what PKT_ALIAS_PROXY_ONLY does.
Prompted by: archie
|
56801 |
29-Jan-2000 |
shin |
Sorry in this just befor code freeze commit. This is fix to usr.sbin/trpt and tcp_debug.[ch] I think of putting this after 4.0 but,,,
-There was bug that when INET6 is defined, IPv4 socket is not traced by trpt.
-I received request from a person who distribute a program which use tcp_debug interface and print performance statistics, that -leave comptibility with old program as much as possible -use same interface with other OSes
So, I talked with itojun, and synced API with netbsd IPv6 extension.
makeworld check, kernel build check(includes GENERIC) is done.
But if there happen to any problem, please let me know and I soon backout this change.
|
56724 |
28-Jan-2000 |
imp |
Mitigate the stream.c attacks
o Drop all broadcast and multicast source addresses in tcp_input. o Enable ICMP_BANDLIM in GENERIC. o Change default to 200/s from 100/s. This will still stop the attack, but is conservative enough to do this close to code freeze.
This is not the optimal patch for the problem, but is likely the least intrusive patch that can be made for this.
Obtained from: Don Lewis and Matt Dillon. Reviewed by: freebsd-security
|
56565 |
25-Jan-2000 |
shin |
Avoid m_len and m_pkthdr.len inconsistency when changing m_len for an mbuf whose M_PKTHDR is set.
PR: related to kern/15175 Reviewed by: archie
|
56564 |
25-Jan-2000 |
shin |
Fix the bug that IPv4 ttl is not initialized when AF_INET6 socket is used for IPv4 communication.(IPv4 mapped IPv6 addr.) Also removed IPv6 hoplimit initialization because it is alway done at tcp_output.
Confirmed by: Bernd Walter <ticso@cicely5.cicely.de>
|
56555 |
24-Jan-2000 |
brian |
Move the *intrq variables into net/intrq.c and unconditionally include this in all kernels. Declare some const *intrq_present variables that can be checked by a module prior to using *intrq to queue data.
Make the if_tun module capable of processing atm, ip, ip6, ipx, natm and netatalk packets when TUNSIFHEAD is ioctl()d on.
Review not required by: freebsd-hackers
|
56041 |
15-Jan-2000 |
shin |
Fixed the problem that IPsec connection hangs when bigger data is sent. -opt_ipsec.h was missing on some tcp files (sorry for basic mistake) -made buildable as above fix -also added some missing IPv4 mapped IPv6 addr consideration into ipsec4_getpolicybysock
|
56039 |
15-Jan-2000 |
shin |
Added missing 'else' for 'if (isipv6)' at IPv6 length setting in tcp_respond(). By this bug, IPv6 reset was not sent. (I checked around same kind of bug, but no other found.)
|
56019 |
15-Jan-2000 |
shin |
Removed wrong(unnecessary) & operators for pointer, in ipsec_hdrsiz_tcp(). This must be one of the reason why connections over IPsec hangs for bigger packets.(which was reported on freebsd-current@freebsd.org)
But there still seems to be another bug and the problem is not yet fixed.
|
56016 |
15-Jan-2000 |
shin |
add forward declarations, and small cosmetic changes.
Submitted by: bde
|
55990 |
14-Jan-2000 |
guido |
Apply patches in rev 1.2 and 1.9 that I forgot
Pointe out by: bde
|
55955 |
14-Jan-2000 |
rgrimes |
Replace beforeinstall target with new variables used by .mk system.
Reviewed by: marcel, and make world
|
55929 |
13-Jan-2000 |
guido |
Bring over ipfilter kernel sources, including merging the local modifications.
|
55917 |
13-Jan-2000 |
shin |
Change struct sockaddr_storage member name, because following change is very likely to become consensus as recent ietf/ipng mailing list discussion. Also recent KAME repository and other KAME patched BSDs also applied it.
s/__ss_family/ss_family/ s/__ss_len/ss_len/
Makeworld is confirmed, and no application should be affected by this change yet.
|
55913 |
13-Jan-2000 |
shin |
Clear rt after RTFREE. This might have sometime caused kernel panic at rtfree() on INET6 enabled environment.
|
55875 |
13-Jan-2000 |
shin |
add a comment for some possible? IPv4 option processing.
|
55874 |
13-Jan-2000 |
shin |
removed incorrect ip6 length setting for IPv6 tcp reset packet.
|
55777 |
10-Jan-2000 |
ru |
MGETHDR() does not initialize m_pkthdr.rcvif, do it here.
This fixes page fault panic observed when diverting packets with IP options (e.g. ping -R remoteIP over natd).
PR: kern/8596, kern/11199
|
55679 |
09-Jan-2000 |
shin |
tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
55632 |
09-Jan-2000 |
shin |
enable IPsec over DUMMYNET again
Submitted by: luigi Reviewed by: luigi
|
55601 |
08-Jan-2000 |
shin |
prevent kernel panic which happens when either of IPSEC and IPDIVERT is enabled.
Confirmed by: Eugene M. Kim <ab@astralblue.com>
|
55599 |
08-Jan-2000 |
luigi |
Add ipfw hooks for the new dummynet features.
Support masks on TCP/UDP ports.
Minor cleanup of ip_fw_chk() to avoid repeated calls to PULLUP_TO at each rule.
|
55598 |
08-Jan-2000 |
luigi |
Cleanup dummynet call interface so it should now work on the Alpha as well. Also (probably) fix a bug introduced during the IPv6 import.
|
55597 |
08-Jan-2000 |
luigi |
Implement per-flow queueing. Using a single pipe config rule, now you can dynamically create rate-limited queues for different flows using masks on dst/src IP, port and protocols. Read the ipfw(8) manpage for details and examples.
Restructure the internals of the traffic shaper to use heaps, so that it manages efficiently large number of queues.
Fix a bug which was present in the previous versions which could cause, under certain unfrequent conditions, to send out very large bursts of traffic.
All in all, this new code is much cleaner than the previous one and should also perform better.
Work supported by Akamba Corp.
|
55460 |
05-Jan-2000 |
eivind |
KERNEL -> _KERNEL
|
55205 |
29-Dec-1999 |
peter |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
55198 |
28-Dec-1999 |
msmith |
Make tcp_drain() actually do something. When invoked (usually as a desperation measure in low-memory situations), walk the tcpbs and flush the reassembly queues.
This behaviour is currently controlled by the debug.do_tcpdrain sysctl (defaults to on).
Submitted by: Bosko Milekic <bmilekic@dsuper.net> Reviewed by: wollman
|
55009 |
22-Dec-1999 |
shin |
IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
54952 |
21-Dec-1999 |
eivind |
Change incorrect NULLs to 0s
|
54892 |
20-Dec-1999 |
peter |
The ipfilter module name wasn't exactly conventional..
|
54799 |
19-Dec-1999 |
green |
M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
|
54601 |
14-Dec-1999 |
jlemon |
Use SEQ_* macros for comparing sequence space numbers.
Reviewed by: truckman
|
54526 |
13-Dec-1999 |
shin |
Always set INP_IPV4 flag for IPv4 pcb entries, because netstat needs it to print out protocol specific pcb info.
A patch submitted by guido@gvr.org, and asmodai@wxs.nl also reported the problem. Thanks and sorry for your troubles.
Submitted by: guido@gvr.org Reviewed by: shin
|
54421 |
11-Dec-1999 |
jlemon |
According to RFC 793, a reset should be honored if the sequence number is within the receive window. Follow this behavior, instead of only allowing resets at last_ack_sent.
Pointed out by: jayanth@yahoo-inc.com
|
54415 |
10-Dec-1999 |
archie |
Fix a '&&' that should have been a '&'.
Submitted by: Erik Salander <erik@whistle.com>
|
54376 |
09-Dec-1999 |
archie |
Fix several typos.
Submitted by: Erik Salander <erik@whistle.com>
|
54304 |
08-Dec-1999 |
shin |
Make this buildable with MROUTING defined.
Specified by: eivind, phk
|
54263 |
07-Dec-1999 |
shin |
udp IPv6 support, IPv6/IPv4 tunneling support in kernel, packet divert at kernel for IPv6/IPv4 translater daemon
This includes queue related patch submitted by jburkhol@home.com.
Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
54228 |
06-Dec-1999 |
guido |
Last minute patch that I forgot to apply: check return code of iplattach()
|
54221 |
06-Dec-1999 |
guido |
Revive mlfk_ipl here. This version is slightly changed from the old one: an unnecessary define (KLD_MODULE) has been deleted and the initialisation of the module is done after domaininit was called to be sure inet is running.
Some slight changed were made to ip_auth.c and ip_state.c in order to assure including of sys/systm.h in case we make a kld
Make sure ip_fil does nmot include osreldate in kernel mode
Remove mlfk_ipl.c from here: no sources allowed in these directories!
|
54175 |
06-Dec-1999 |
archie |
Miscellaneous fixes/cleanups relating to ipfw and divert(4):
- Implement 'ipfw tee' (finally) - Divert packets by calling new function divert_packet() directly instead of going through protosw[]. - Replace kludgey global variable 'ip_divert_port' with a function parameter to divert_packet() - Replace kludgey global variable 'frag_divert_port' with a function parameter to ip_reass() - style(9) fixes
Reviewed by: julian, green
|
54018 |
02-Dec-1999 |
jlemon |
Change the delayed ack time from 200ms to 100ms.
This results in closer behavior to earlier versions, where the fixed 200ms timer actually resulted in a delay anywhere from 1..200ms, with the average delay being 100ms.
Pointed out by: dg
|
53716 |
26-Nov-1999 |
luigi |
RTFREE the correct route entry in dummynet_io(). The previous code failed in handling things like "forward" actions.
Reported-and-tested-by: Jean-Hugues ROYER jhroyer@joher.com
|
53645 |
23-Nov-1999 |
guido |
Get rid of useless osreldate include for KLD/LKM modules (sys/param.h already carries what is needed). This is needed for the KLD support.
|
53642 |
23-Nov-1999 |
guido |
Add kernel parts of revived ipfilter (3.3.3.)
|
53541 |
22-Nov-1999 |
shin |
KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
53353 |
18-Nov-1999 |
peter |
Fix a warning and a potential panic if TCPDEBUG is active. (tp is a wild pointer and used by TCPDEBUG2())
|
53295 |
17-Nov-1999 |
phk |
The logic for blackhole processing does not free mbufs if the blackhole flag is set.
PR: 14958 Submitted by: Larry Baird <lab@gta.com> Reviewed by: phk
|
53187 |
15-Nov-1999 |
jmb |
add two more codes to ICMP error 12 (Parameter Problem). these two are detailed in RFC1700.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
53038 |
09-Nov-1999 |
phantom |
Restore sub-chapters order.
PR: docs/14766 Submitted by: Kazutoshi Kubota <kazu@iworks.co.jp>
|
52952 |
07-Nov-1999 |
jlemon |
Undo rev 1.10, which took out TH_FIN from the CLOSING state. This breaks simultaneous closes.
|
52904 |
05-Nov-1999 |
shin |
KAME related header files additions and merges. (only those which don't affect c source files so much)
Reviewed by: cvs-committers Obtained from: KAME project
|
52377 |
18-Oct-1999 |
sheldonh |
Append missing newline to log() message for permanent ARP modification attempt warning, which was added in rev 1.48 .
PR: 14371 Submitted by: sec@pi.musin.de (Stefan `Sec` Zehl)
|
52089 |
10-Oct-1999 |
peter |
Nuke the old antique copy of ipfilter from the tree. This is old enough to be dangerous. It will better serve us as a port building a KLD, ala SKIP.
The hooks are staying although it would be better to port and use the NetBSD pfil interface rather than have custom hooks.
|
52070 |
09-Oct-1999 |
green |
Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total usage limit.
|
51727 |
27-Sep-1999 |
ru |
Properly handle the case when either the aliasing or source address of the link are equal to the default aliasing address. Do not zero them!
This will fix the problem with non-working links added with the source and/or aliasing address equal to the default aliasing address, but the default aliasing address is set later, after the link has been set up, like both natd(8) and ppp(8) do (for objective reasons).
Reviewed by: Brian Somers <brian@FreeBSD.org>, Eivind Eklund <eivind@FreeBSD.org>, Charles Mott <cmott@srv.net>
|
51658 |
25-Sep-1999 |
phk |
Remove five now unused fields from struct cdevsw. They should never have been there in the first place. A GENERIC kernel shrinks almost 1k.
Add a slightly different safetybelt under nostop for tty drivers.
Add some missing FreeBSD tags
|
51550 |
22-Sep-1999 |
ru |
ReLink() partial links in FindLinkOut() in the same manner as we do it in FindLinkIn(). This will make TcpMonitorIn()/TcpMonitorOut() happy.
Reviewed by: eivind
|
51506 |
21-Sep-1999 |
ru |
Restore previous version of FindLinkIn().
Instead, natd(8) should be fixed to call PacketAliasSetAddress() as part of initialization, as required by libalias(3).
|
51494 |
21-Sep-1999 |
ru |
- Make partially specified permanent links (without `dst_addr' and/or `dst_port') work for outgoing packets.
- Make permanent links whose `alias_addr' matches the primary aliasing address `aliasAddress' work for incoming packets.
- Typo fixes.
Reviewed by: brian, eivind
|
51491 |
21-Sep-1999 |
brian |
sys/errno.h -> errno.h
|
51381 |
19-Sep-1999 |
green |
Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually. Make a sonewconn3() which takes an extra argument (proc) so new sockets created with sonewconn() from a user's system call get the correct credentials, not just the parent's credentials.
|
51320 |
16-Sep-1999 |
lile |
Re-arrange the arp code so that fddi arps work properly.
|
51282 |
14-Sep-1999 |
des |
Reorder.
|
51279 |
14-Sep-1999 |
des |
Fix some more disordering, as well as the description string for the net.inet.tcp.drop_synfin sysctl, which for some mysterious reason said "Drop TCP packets with FIN+ACK set" (instead of "...with SYN+FIN set")
|
51209 |
12-Sep-1999 |
des |
Add the net.inet.tcp.restrict_rst and net.inet.tcp.drop_synfin sysctl variables, conditional on the TCP_RESTRICT_RST and TCP_DROP_SYNFIN kernel options, respectively. See the comments in LINT for details.
|
51125 |
10-Sep-1999 |
ru |
- Optimization to the previous (rev 1.15) commit.
Requested by: eivind Discussed with: eivind Reviewed by: brian, eivind
|
51107 |
09-Sep-1999 |
ru |
Handle TCP reset sequence properly.
In the words of originator: :If an incoming connection is initiated through natd and deny_incoming is :not set, then a new alias_link structure is created to handle the link. :If there is nothing listening for the incoming connection, then the kernel :responds with a RST for the connection. However, this is not processed :correctly in libalias/alias.c:TcpMonitor{In,Out} and :libalias/alias_db.c:SetState{In,Out} as it thinks a connection :has been established and therefore applies a timeout of 86400 seconds :to the link. : :If many of these half-connections are initiated (during, for example, a :port scan of the host), then many thousands of unnecessary links are :created and the resident size of natd balloons to 20MB or more.
PR: 13639 Reviewed by: brian
|
51091 |
08-Sep-1999 |
ru |
Fix typo.
|
50705 |
31-Aug-1999 |
jlemon |
Simplify, and return an error if the user attempts to set a TCP time value which results in < 1 tick.
Suggested by: bde
|
50704 |
31-Aug-1999 |
jlemon |
Remove conversion macros that were used during development.
|
50682 |
31-Aug-1999 |
jlemon |
Add a SYSCTL_PROC so that TCP timer values are now expressed to the user in ms, while they are stored internally as ticks. Note that there probably are rounding bogons here, especially on the alpha.
|
50673 |
30-Aug-1999 |
jlemon |
Restructure TCP timeout handling:
- eliminate the fast/slow timeout lists for TCP and instead use a callout entry for each timer. - increase the TCP timer granularity to HZ - implement "bad retransmit" recovery, as presented in "On Estimating End-to-End Network Path Properties", by Allman and Paxson.
Submitted by: jlemon, wollmann
|
50597 |
29-Aug-1999 |
billf |
Add $FreeBSD$ and spell Eklund properly.
Approved by: brian (well, he approved adding $Id$)
|
50596 |
29-Aug-1999 |
obrien |
Remove extra indenting of `break' statements introducted in rev 1.89, plus wrap some long lines from that revision.
While here, wrap some other long lines.
|
50561 |
29-Aug-1999 |
des |
Include the correct header for the IPSTEALTH option.
|
50556 |
29-Aug-1999 |
bde |
Oops, I missed a cast in rev.1.119.
|
50512 |
28-Aug-1999 |
lile |
It is much easier to arp if you don't truncate your arp-reply's. [affects token-ring only]
|
50496 |
28-Aug-1999 |
green |
Also make the "other" packets counter resettable.
|
50477 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50476 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50474 |
27-Aug-1999 |
green |
Correction: uid -> gid (comment)
|
50426 |
26-Aug-1999 |
jlemon |
Add readonly OID ``net.inet.tcp.tcbhashsize'' so it is possible to discover the size of the TCB hashtable on a running system.
|
50273 |
24-Aug-1999 |
bde |
Cast pointers to [u]intptr_t instead of casting them to [u_]long. Don't depend on gcc's feature of casting lvalues, especially for direct assignment where it doesn't even simplify the syntax. Cosmetic.
|
50194 |
22-Aug-1999 |
brian |
Aallow ppp to work with Nortel Networks Extranet Switch product and Windows NT tunneling.
Submitted by: Chain Lee <chain@nortelnetworks.com>
|
50175 |
22-Aug-1999 |
hoek |
Typo: 102 => 192 (PR: docs/13310 - Maxim Sobolev <sobomax@altavista.net>)
|
50129 |
21-Aug-1999 |
green |
To christen the brand new security category for syslog, we get IPFW using syslog(3) (log(9)) for its various purposes! This long-awaited change also includes such nice things as: * macros expanding into _two_ comma-delimited arguments! * snprintf! * more snprintf! * linting and criticism by more people than you can shake a stick at! * a slightly more uniform message style than before! and last but not least * no less than 5 rewrites!
Reviewed by: committers
|
50043 |
19-Aug-1999 |
csgr |
Fix breakage if blackhole=1 and tiflags & TH_SYN, plus style(9) fixes
Submitted by: Jonathon Lemon
|
50015 |
18-Aug-1999 |
csgr |
Slight tweak to tcp.blackhole to add optional behaviour to drop any segment arriving at a closed port. tcp.blackhole=1 - only drop SYN without RST tcp.blackhole=2 - drop everything without RST tcp.blackhole=0 - always send RST - default behaviour
This confuses nmap -sF or -sX or -sN quite badly.
|
49988 |
17-Aug-1999 |
billf |
Fix a printf() formatter to match its variable.
Reviewed by: bde, luigi
|
49968 |
17-Aug-1999 |
csgr |
Add net.inet.tcp.blackhole and net.inet.udp.blackhole sysctl knobs.
With these knobs on, refused connection attempts are dropped without sending a RST, or Port unreachable in the UDP case. In the TCP case, sending of RST is inhibited iff the incoming segment was a SYN.
Docs and rc.conf settings to follow.
|
49828 |
15-Aug-1999 |
mpp |
Various man page cleanup:
- Sort xrefs - FreeBSD.ORG -> FreeBSD.org - Be consistent with section names as outlines in mdoc(7) - Other misc mdoc cleanup.
PR: doc/13144 Submitted by: Alexy M. Zelkin <phantom@cris.net>
|
49630 |
11-Aug-1999 |
luigi |
Implement probabilistic rule match in ipfw. Each rule can be associated with a match probability to achieve non-deterministic behaviour of the firewall. This can be extremely useful for testing purposes such as simulating random packet drop without having to use dummynet (which already does the same thing), and simulating multipath effects and the associated out-of-order delivery (this time in conjunction with dummynet).
The overhead on normal rules is just one comparison with 0.
Since it would have been trivial to implement this by just adding a field to the ip_fw structure, I decided to do it in a backward-compatible way (i.e. struct ip_fw is unchanged, and as a consequence you don't need to recompile ipfw if you don't want to use this feature), since this was also useful for -STABLE.
When, at some point, someone decides to change struct ip_fw, please add a length field and a version number at the beginning, so userland apps can keep working even if they are out of sync with the kernel.
|
49628 |
11-Aug-1999 |
luigi |
Add spl() protection to remove that the timer is invoked multiple times resulting in higher bandwidth and lower delays. Reported-by: Jamshid Madhavi
|
49603 |
10-Aug-1999 |
des |
Add net.inet.icmp.log_redirect and net.inet.icmp.drop_redirect, for respectively logging and dropping ICMP REDIRECT packets.
Note that there is no rate limiting on the log messages, so log_redirect should be used with caution (preferrably only for debugging purposes).
|
49350 |
01-Aug-1999 |
green |
Make ipfw's logging more dynamic. Now, log will use the default limit _or_ you may specify "log logamount number" to set logging specifically the rule. In addition, "ipfw resetlog" has been added, which will reset the logging counters on any/all rule(s). ipfw resetlog does not affect the packet/byte counters (as ipfw reset does), and is the only "set" command that can be run at securelevel >= 3. This should address complaints about not being able to set logging amounts, not being able to restart logging at a high securelevel, and not being able to just reset logging without resetting all of the counters in a rule.
|
49194 |
28-Jul-1999 |
green |
8 -> NBBy
|
49193 |
28-Jul-1999 |
green |
Correct a really gross comment format.
|
48886 |
18-Jul-1999 |
jmb |
fix comment re: RST received in TIME_WAIT to match the code.
|
48788 |
12-Jul-1999 |
green |
Correct a mistake in so_cred changes. In practice, I don't think that it would make a difference. However, my previous diff _did_ change the behavior in some way (not necessarily break it), so I'm fixing it.
Found by: bde Submitted by: bde
|
48758 |
11-Jul-1999 |
green |
Two new sysctls: net.inet.tcp.getcred and net.inet.udp.getcred. These take a sockaddr_in[2] (local, then remote) and return a struct ucred. Example code for these is at: http://www.FreeBSD.org/~green/inetd_ident.patch http://www.FreeBSD.org/~green/freebsd4.c (for pidentd)
Reviewed by: bde
|
48578 |
05-Jul-1999 |
msmith |
Use the new tunable macros for the net.inet.tcp.tcbhashsize tunable.
|
48224 |
25-Jun-1999 |
pb |
In in_pcbconnect(), check the return value from in_pcbbind() and exit on errors.
If we don't, in_pcbrehash() is called without a preceeding in_pcbinshash(), causing a crash.
There are apparently several conditions that could cause the crash; PR misc/12256 is only one of these.
PR: misc/12256
|
48102 |
22-Jun-1999 |
brian |
Don't get caught in an infinite recursion when PKT_ALIAS_REVERSE is set. Document PKT_ALIAS_REVERSE.
Pointed out by: Jonathan Hanna <jh@cr1003333-a.crdva1.bc.home.com> PR: 12304
|
48023 |
19-Jun-1999 |
green |
This is the much-awaited cleaned up version of IPFW [ug]id support. All relevant changes have been made (including ipfw.8).
|
48015 |
19-Jun-1999 |
green |
Add RCS strings to kernel ipfilter files.
|
48013 |
19-Jun-1999 |
green |
This should fix ipfilter for everyone it was broken for. CDEV_MAJOR is _not_ -1.
Noticed by: users on freebsd-current
|
47992 |
17-Jun-1999 |
green |
Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces it with a much more useful struct pcred *so_cred. This is here to be able to do socket-level credential checks (i.e. IPFW uid/gid support, to be added to HEAD soon). Along with this comes an update to pidentd which greatly simplifies the code necessary to get a uid from a socket. Soon to come: a sysctl() interface to finding individual sockets' credentials.
|
47960 |
16-Jun-1999 |
tegge |
Close a race window where a tcp socket is closed while tcp_pcblist is copying out tcp socket info, causing a NULL pointer to be dereferenced.
|
47877 |
11-Jun-1999 |
ru |
Don't accept divert/tee/pipe rules without corresponding option.
PR: 10324 Reviewed by: luigi
|
47720 |
04-Jun-1999 |
peter |
Plug a mbuf leak in tcp_usr_send(). pru_send() routines are expected to either enqueue or free their mbuf chains, but tcp_usr_send() was dropping them on the floor if the tcpcb/inpcb has been torn down in the middle of a send/write attempt. This has been responsible for a wide variety of mbuf leak patterns, ranging from slow gradual leakage to rather rapid exhaustion. This has been a problem since before 2.2 was branched and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.
Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking (extensively) into this on a live production 2.2.x system and that it was the actual cause of the leak and looks like it fixes it. The machine in question was loosing (from memory) about 150 mbufs per hour under load and a change similar to this stopped it. (Don't blame Jayanth for this patch though)
An alternative approach to this would be to recheck SS_CANTSENDMORE etc inside the splnet() right before calling pru_send() after all the potential sleeps, interrupts and delays have happened. However, this would mean exposing knowledge of the tcp stack's reset handling and removal of the pcb to the generic code. There are other things that call pru_send() directly though.
Problem originally noted by: John Plevyak <jplevyak@inktomi.com>
|
47640 |
31-May-1999 |
phk |
Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing.
cdevsw_add() will print an message if the d_maj field looks bogus.
Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL.
Move bdevsw() and devsw() functions to kern/kern_conf.c
Bump __FreeBSD_version to 400006
This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions
if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.
I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.
|
47625 |
30-May-1999 |
phk |
This commit should be a extensive NO-OP:
Reformat and initialize correctly all "struct cdevsw".
Initialize the d_maj and d_bmaj fields.
The d_reset field was not removed, although it is never used.
I used a program to do most of this, so all the files now use the same consistent format. Please keep it that way.
Vinum and i4b not modified, patches emailed to respective authors.
|
47547 |
27-May-1999 |
dg |
Added net.inet.tcp.path_mtu_discovery variable which when set to 0 (default 1) disables PMTUD globally. Although PMTUD can be disabled in the standard case by locking the MTU on a static route (including the default route), this method doesn't work in the face of dynamic routing protocols like gated.
|
47546 |
27-May-1999 |
dg |
Made net.inet.ip.intr_queue_maxlen writeable.
|
47455 |
24-May-1999 |
luigi |
close pr 10889: + add a missing call to dn_rule_delete() when flushing firewall rules, thus preventing possible panics due to dangling pointers (this was already done for single rule deletes). + improve "usage" output in ipfw(8) + add a few checks to ipfw pipe parameters and make it a bit more tolerant of common mistakes (such as specifying kbit instead of Kbit)
PR: kern/10889 Submitted by: Ruslan Ermilov
|
47427 |
23-May-1999 |
brian |
brucify Mentioned by: sprice@hiwaay.net
|
47344 |
20-May-1999 |
eivind |
Make incoming packets work as keepalives, too. This should fix problems for some games.
Notified of problem by: tim@turbinegames.com
|
47023 |
11-May-1999 |
peter |
"fix" warning. This still needs to be kld-ified some day (or removed).
|
46696 |
08-May-1999 |
peter |
Pre-declare struct proc to avoid 'inside param list' warnings.
|
46594 |
06-May-1999 |
peter |
Fix two warnings; and note a problem where a pointer is stored in an int variable - this can't work on an Alpha.
|
46568 |
06-May-1999 |
peter |
Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
|
46420 |
04-May-1999 |
luigi |
Free the dummynet descriptor in ip_dummynet, not in the called routines. The descriptor contains parameters which could be used within those routines (eg. ip_output() ).
On passing, add IPPROTO_PGM entry to netinet/in.h
|
46395 |
04-May-1999 |
brian |
Add missing ``.''.
|
46393 |
04-May-1999 |
luigi |
forgot passing the right pointer to dst to dummynet_io(). (-stable and releng2 were already safe). Debugged-By: phk
|
46385 |
04-May-1999 |
luigi |
assorted dummynet cleanup: + plug an mbuf leak when dummynet used with bridging + make prototype of dummynet_io consistent with usage + code cleanup so that now bandwidth regulation is precise to the bit/s and not to (8*HZ) bit/s as before.
|
46381 |
03-May-1999 |
billf |
Add sysctl descriptions to many SYSCTL_XXXs
PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
|
46155 |
28-Apr-1999 |
phk |
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname.
Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
|
46153 |
28-Apr-1999 |
dt |
s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/. (Edited automatically)
|
46112 |
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
46095 |
26-Apr-1999 |
luigi |
Make one pass through the firewall the default. Multiple pass (which only affects dummynet) is too confusing.
|
46016 |
24-Apr-1999 |
ache |
so_linger is in seconds, not in 1/HZ
PR: 11252 Submitted by: Martin Kammerhofer <dada@sbox.tu-graz.ac.at>
|
45998 |
24-Apr-1999 |
dt |
Use pointer arithmetic as appropriate.
|
45997 |
24-Apr-1999 |
luigi |
postpone the sending of IGMP LEAVE msg to after deleting the mc address from the address list. The latter operation on some hardware resets the card, potentially canceling the pending LEAVE pkt.
|
45926 |
21-Apr-1999 |
luoqi |
Work around an egcs optimizer bug (i386). This should fix the active ftp hang problem. A bug report has been sent to cygnus.
|
45871 |
20-Apr-1999 |
peter |
s/IPFIREWALL_MODULE/KLD_MODULE/
|
45869 |
20-Apr-1999 |
peter |
Tidy up some stray / unused stuff in the IPFW package and friends. - unifdef -DCOMPAT_IPFW (this was on by default already) - remove traces of in-kernel ip_nat package, it was never committed. - Make IPFW and DUMMYNET initialize themselves rather than depend on compiled-in hooks in ip_init(). This means they initialize the same way both in-kernel and as kld modules. (IPFW initializes now :-)
|
45822 |
19-Apr-1999 |
peter |
Zap LKM option and support. Farewell old friend.
|
45743 |
17-Apr-1999 |
peter |
Convert the dummynet lkm code to be kld aware (this isn't actually used anywhere that I can see).
|
45740 |
17-Apr-1999 |
peter |
Oops, forgot this part of lkm code that's been replaced with kld.
|
45705 |
15-Apr-1999 |
eivind |
Better handling for ARP/source routing on Token Ring
Submitted by: Larry Lile <lile@stdio.com>
|
45573 |
11-Apr-1999 |
eivind |
Staticize.
|
45439 |
07-Apr-1999 |
julian |
Two cosmetic changes, one a typo and the other, a clarification.
|
45165 |
30-Mar-1999 |
nsayer |
Merge from RELENG_2_2, per luigi. Fixes the ntoh?() issue for the firewall code when called from the bridge code.
PR: 10818 Submitted by: nsayer Obtained from: luigi
|
45048 |
26-Mar-1999 |
luigi |
Use the correct length from the mbuf header instead of the one from the IP header (this would not work for bridged packets). This has been fixed long ago in the 2.2 branch.
Problem noticed by: a few people Fix suggested by: Remy Nonnenmacher
|
45025 |
25-Mar-1999 |
brian |
PacketAliasProxyRule takes a const char * Reminded by: bde
|
45008 |
24-Mar-1999 |
brian |
Add a ``const'' and remove some inconsistent prototype args.
|
44993 |
24-Mar-1999 |
luigi |
add missing #include "opt_bdg.h"
|
44979 |
23-Mar-1999 |
billf |
Remove duplicate line.
Reviewed by: eivind
|
44797 |
16-Mar-1999 |
luigi |
Fix a dummynet bug caused by passing a bad next hop address (the symptom was the msg "arp failure -- host is not on local network" that some user have seen on multihomed machines. Bug tracked down by Emmanuel Duros
|
44677 |
12-Mar-1999 |
julian |
Fix the 'fwd' option to ipfw when asked to divert to another machine. also rely less on other modules clearing static values, and clear them in a few cases we missed before. Submitted by: Matthew Reimer <mreimer@vpop.net>
|
44627 |
10-Mar-1999 |
julian |
Submitted by: Larry Lile Move the Olicom token ring driver to the officially sanctionned location of /sys/contrib. Also fix some brokenness in the generic token ring support.
Be warned that if_dl.h has been changed and SOME programs might like recompilation.
|
44616 |
09-Mar-1999 |
brian |
Remove all diagnostics to stdout/stderr with #ifdef DEBUG Statify functions in alias_nbt.c
|
44556 |
07-Mar-1999 |
brian |
Document PacketAliasPptp() and allow it to be disabled by passing INADDR_NONE.
|
44548 |
07-Mar-1999 |
brian |
Remove unused function stubs.
|
44546 |
07-Mar-1999 |
brian |
Mention that PacketAliasProxyRule() doesn't accept host names, just IP numbers.
|
44528 |
06-Mar-1999 |
archie |
When an incoming packet is reflected back as an ICMP reply, make sure we zero "m->m_pkthdr.rcvif", otherwise ipfw may wrongly match the outgoing packet. PR: kern/9723 Submitted by: David Malone <dwmalone@maths.tcd.ie>
|
44526 |
06-Mar-1999 |
brian |
Document PacketAliasProxyRule() and fix a typo.
|
44511 |
06-Mar-1999 |
wollman |
Move kernel-only declaration inside #ifdef KERNEL section.
|
44456 |
04-Mar-1999 |
wpaul |
arprequest() allocates an mbuf with m_gethdr() but does not initialize m->m_pkthdr.rcvif to NULL. Bad arprequest(). No biscuit.
|
44307 |
27-Feb-1999 |
brian |
Version 3.0: January 1, 1999 - Transparent proxying support added. - PPTP redirecting support added based on patches contributed by Dru Nelson <dnelson@redwoodsoft.com>.
Submitted by: Charles Mott <cmott@srv.net>
|
44219 |
22-Feb-1999 |
des |
Add support for stealth forwarding (forwarding packets without touching their ttl). This can be used - in combination with the proper ipfw incantations - to make a firewall or router invisible to traceroute and other exploration tools.
This behaviour is controlled by a sysctl variable (net.inet.ip.stealth) and hidden behind a kernel option (IPSTEALTH).
Reviewed by: eivind, bde
|
44165 |
20-Feb-1999 |
julian |
World, I'd like you to meet the first FreeBSD token Ring driver. This is for various Olicom cards. An IBM driver is following. This patch also adds support to tcpdump to decode packets on tokenring. Congratulations to the proud father.. (below)
Submitted by: Larry Lile <lile@stdio.com>
|
44154 |
19-Feb-1999 |
luigi |
avoid panic with pkts larger than MTU and DF set coming out of a pipe.
|
44078 |
16-Feb-1999 |
dfr |
* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
|
43802 |
09-Feb-1999 |
wollman |
After wading in the cesspool of ip_input for an hour, I have managed to convince myself that nothing will break if we permit IP input while interface addresses are unconfigured. (At worst, they will hit some ULP's PCB scan and fail if nobody is listening.) So, remove the restriction that addresses must be configured before packets can be input. Assume that any unicast packet we receive while unconfigured is potentially ours.
|
43764 |
08-Feb-1999 |
julian |
remove leftover garbage line.
|
43763 |
08-Feb-1999 |
julian |
Fix for PR 9309. Divert was not feeding clean data to ifa_ifwithaddr() so it was giving bad results. Submitted by: kseel <kseel@utcorp.com>, Ruslan Ermilov <ru@ucb.crimea.ua>
|
43691 |
06-Feb-1999 |
fenner |
Use snd_nxt, not rcv_nxt, when calculating the ISS during TIME_WAIT. This was missed in the 4.4-Lite2 merge.
Noticed by: Mohan Parthasarathy <Mohan.Parthasarathy@eng.Sun.COM> and jayanth@loc201.tandem.com (vijayaraghavan_jayanth) on the tcp-impl mailing list.
|
43576 |
04-Feb-1999 |
msmith |
Nuke all the stupid ffs() stuff and use powerof2() instead. Submitted by: Bruce Evans <bde@zeta.org.au>
|
43575 |
04-Feb-1999 |
msmith |
Fix power-of-2 check for the TCB hash size.
Submitted by: Brian Feldman <green@unixhelp.org>
|
43562 |
03-Feb-1999 |
msmith |
Make TCBHASHSIZE a boot-time tunable as well, taking its value from the variable net.inet.tcp.tcbhashsize.
Requested by: David Filo <filo@yahoo-inc.com>
|
43305 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43112 |
23-Jan-1999 |
archie |
Move kernel-only declarations to within #ifdef KERNEL Prompted by: gcc warnings when compiling /sbin/ipfw
|
43066 |
22-Jan-1999 |
wollman |
Don't forward unicast packets received via link-layer multicast.
Suggested by: fenner Original complaint: Shiva Shenoy <Shiva.Shenoy@yagosys.com>
|
42902 |
20-Jan-1999 |
fenner |
Add a flag, passed to pru_send routines, PRUS_MORETOCOME. This flag means that there is more data to be put into the socket buffer. Use it in TCP to reduce the interaction between mbuf sizes and the Nagle algorithm.
Based on: "Justin C. Walker" <justin@apple.com>'s description of Apple's fix for this problem.
|
42866 |
19-Jan-1999 |
fenner |
Fix bug in last commit (la was used uninitialized if no route was passed in).
|
42777 |
18-Jan-1999 |
fenner |
Use dynamic memory allocation instead of mbuf's for multicast routing state.
Note: this requires a recompilation of netstat (but netstat has been broken since rev 1.52 of ip_mroute.c anyway)
Obtained from: Significantly based on Steve McCanne's <mccanne@cs.berkeley.edu> work for BSD/OS
|
42776 |
18-Jan-1999 |
fenner |
Rename igmp's MALLOC; it doesn't have anything to do with multicast routing.
|
42775 |
18-Jan-1999 |
fenner |
If arpresolve() gets passed a route with a null llinfo, call arplookup() to try again. This gets rid of at least one user's "arpresolve: can't allocate llinfo" errors, and arplookup() gives better error messages to help track down the problem if there really is a problem with the routing table.
|
42592 |
12-Jan-1999 |
eivind |
... _and_ the (void*) casts for %p. Next, I'll forget my own name :-(
|
42591 |
12-Jan-1999 |
eivind |
Avoid unnecessary GCCism - I hadn't noticed the __unused macro.
|
42578 |
12-Jan-1999 |
eivind |
* Print pointers using the correct type (%p) instead of %x. * Use the correct type for timeout function. * Add missing #include.
|
42574 |
12-Jan-1999 |
eivind |
Add #ifdef's to avoid unused label warning in some cases.
|
42572 |
12-Jan-1999 |
eivind |
Remove unused statics.
|
42516 |
11-Jan-1999 |
luigi |
Add a missing bzero which could be the source of instability problems reported recently (the rtentry pointer in the dummynet queue was not initialized in all cases, resulting in spurious rt_refcnt decreases in the lucky cases, and memory trashing in other cases.
|
42486 |
10-Jan-1999 |
luigi |
Remove check from where arp replies are coming from -- when doing bridging, interfaces are used in clusters so the check does not apply.
|
42454 |
10-Jan-1999 |
brian |
If we can't open alias.log, don't try to write to the resulting NULL FILE *. PR: 9403
|
42194 |
31-Dec-1998 |
luigi |
Partial fix for when ipfw is used with bridging. Bridged packets have all fields in network order, whereas ipfw expects some to be in host order. This resulted in some incorrect matching, e.g. some packets being identified as fragments, or bandwidth not being correctly enforced. NOTE: this only affects bridge+ipfw, normal ipfw usage was already correct).
Reported-By: Dave Alden and others.
|
42193 |
31-Dec-1998 |
luigi |
Remove some unused variables.
|
42019 |
22-Dec-1998 |
luigi |
'ip_fw_head' and 'M_IPFW' are also used in ip_dummynet so cannot be static... Reported by: Dave Alden
|
41993 |
21-Dec-1998 |
luigi |
Recover from previous dummynet screwup
|
41990 |
21-Dec-1998 |
luigi |
Restore 1.82->1.83 change deleted by mistake< per Bruce suggestion
|
41878 |
16-Dec-1998 |
fenner |
Add missing "break"s to allow multicast routing to work.
Submitted by: Amancio Hasty <hasty@rah.star-gate.com>
|
41793 |
14-Dec-1998 |
luigi |
Last bits (i think) of dummynet for -current.
|
41759 |
14-Dec-1998 |
dillon |
Reviewed by: freebsd-current
Add bounds checking to netbios NS packet resolving code. This should prevent natd from crashing on badly formed netbios packets (as might be heard when the machine is sitting on a cable modem or certain DSL networks), and also closes potential security holes that might have exploited the lack of bounds checking in the previous version of the code.
|
41702 |
12-Dec-1998 |
dillon |
PR: kern/8990
If timer calculation results in degenerate value (0), force it to 1 to avoid divide-by-zero panic later on in calls to IGMP_RANDOM_DELAY(). I considered simply adding 1 to the timer calculation, but was unsure if the calculation was part of the IGMP standard or not so did not want to mess with it for all cases.
|
41591 |
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
41575 |
07-Dec-1998 |
eivind |
Clean up some pointer usage.
|
41514 |
04-Dec-1998 |
archie |
Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
|
41497 |
04-Dec-1998 |
dillon |
Cleanup icmp_var.h, make icmp bandlim sysctl permanent but if ICMP_BANDLIM option not defined the sysctl int value is set to -1 and read-only.
#ifdef KERNEL's added appropriately to wall off visibility of kernel routines from user code.
|
41496 |
04-Dec-1998 |
dillon |
Obtained from: "Andrey A. Chernov" <ache@nagual.pp.ru>
Quick add #ifdef KERNEL for ICMP_BANDLIM option so userland program can #include icmp_var.h
|
41487 |
03-Dec-1998 |
dillon |
Reviewed by: freebsd-current
Add ICMP_BANDLIM option and 'net.inet.icmp.icmplim' sysctl. If option is specified in kernel config, icmplim defaults to 100 pps. Setting it to 0 will disable the feature. This feature limits ICMP error responses for packets sent to bad tcp or udp ports, which does a lot to help the machine handle network D.O.S. attacks.
The kernel will report packet rates that exceed the limit at a rate of one kernel printf per second. There is one issue in regards to the 'tail end' of an attack... the kernel will not output the last report until some unrelated and valid icmp error packet is return at some point after the attack is over. This is a minor reporting issue only.
|
41363 |
26-Nov-1998 |
eivind |
Staticize some more.
|
41252 |
19-Nov-1998 |
jdp |
Fix a couple of typos.
|
41208 |
17-Nov-1998 |
dfr |
Remove stale references to ih_next and ih_prev.
Pointed out by: Roman V. Palagin <romanp@wuppy.rcs.ru>
|
41201 |
16-Nov-1998 |
dfr |
Make the previous fix more portable.
Requested by: bde
|
41187 |
15-Nov-1998 |
guido |
The below patch helps to reduce the leakage of internal socket information when a TCP "stealth" scan is directed at a *BSD box by ensuring the window is 0 for all RST packets generated through tcp_respond() Reviewed by: Don Lewis <Don.Lewis@tsc.tdk.com> Obtained from: Bugtraq (from: Darren Reed <avalon@COOMBS.ANU.EDU.AU>)
|
41177 |
15-Nov-1998 |
dfr |
Fix printf format errors on alpha.
|
41173 |
15-Nov-1998 |
bde |
Finished updating module event handlers to be compatible with modeventhand_t.
|
41096 |
11-Nov-1998 |
dg |
Be sure to pullup entire IP header when dealing with fragment packets.
|
41059 |
10-Nov-1998 |
peter |
add #include <sys/kernel.h> where it's needed by MALLOC_DEFINE()
|
40670 |
27-Oct-1998 |
dfr |
Some optimisations to the fragment reassembly code.
Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>
|
40669 |
27-Oct-1998 |
dfr |
Fix a bug in the new fragment reassembly code which was tickled by recieving a fragment which wholly overlapped one or more existing fragments.
Submitted by: Don Lewis <Don.Lewis@tsc.tdk.com>
|
40435 |
16-Oct-1998 |
peter |
*gulp*. Jordan specifically OK'ed this..
This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
|
39681 |
26-Sep-1998 |
dfr |
Dike out some obsolete defines which referenced ih_next and ih_prev from struct ipovly (they don't exist anymore because they don't work when pointers are 64bit).
|
39426 |
17-Sep-1998 |
fenner |
Fix the bind security fix introduced in rev 1.38 to work with multicast: - Don't bother checking for conflicting sockets if we're binding to a multicast address. - Don't return an error if we're binding to INADDR_ANY, the conflicting socket is bound to INADDR_ANY, and the conflicting socket has SO_REUSEPORT set.
PR: kern/7713
|
39389 |
17-Sep-1998 |
fenner |
Prevent modification of permanent ARP entries (PR kern/7649) Ignore ARP replies from the wrong interface (discussion on mailing list) Add interface name to a couple of error messages
|
39267 |
15-Sep-1998 |
jkoshy |
Turn off replies to ICMP echo requests for broadcast and multicast addresses by default.
Add a knob "icmp_bmcastecho" to "rc.network" to allow this behaviour to be controlled from "rc.conf".
Document the controlling sysctl variable "net.inet.icmp.bmcastecho" in sysctl(3).
Reviewed by: dg, jkh Reminded on -hackers by: Steinar Haug <sthaug@nethelp.no>
|
39119 |
12-Sep-1998 |
luigi |
Bring in new files for dummynet support
|
39078 |
11-Sep-1998 |
wollman |
Fix RST validation.
PR: 7892 Submitted by: Don.Lewis@tsc.tdk.com
|
39043 |
10-Sep-1998 |
dfr |
Ensure that m_nextpkt is set to NULL after reassembling fragments.
|
38875 |
06-Sep-1998 |
phk |
RFC 1644 has the status "Experimental Protocol", which means:
4.1.4. Experimental Protocol
A system should not implement an experimental protocol unless it is participating in the experiment and has coordinated its use of the protocol with the developer of the protocol.
Pointed out by: Steinar Haug <sthaug@nethelp.no>
|
38760 |
02-Sep-1998 |
phk |
Widen and change the layout of the IPFW structures flag element.
This will allow us to add dummynet to 3.0
Recompile /sbin/ipfw AND your kernel.
|
38754 |
02-Sep-1998 |
wollman |
Properly fragment multicast packets.
PR: 7802 Submitted by: Steve McCanne <mccanne@cs.berkeley.edu>
|
38681 |
31-Aug-1998 |
brian |
Remove OpenBSD build support - let the Makefile vary per OS rather than making it a mess and potentially screwing up cross builds. Suggested by: bde
Add Id keyword.
|
38663 |
30-Aug-1998 |
brian |
Add OpenBSD build support
|
38513 |
24-Aug-1998 |
dfr |
Re-implement tcp and ip fragment reassembly to not store pointers in the ip header which can't work on alpha since pointers are too big.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
38482 |
23-Aug-1998 |
wollman |
Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
|
38373 |
17-Aug-1998 |
bde |
Fixed printf format errors.
|
38342 |
15-Aug-1998 |
bde |
Made some disgusting ifdefs even more disgusting to enable the support for `u_long cmd' ioctl args if __FreeBSD_version >= 300003. Some ioctls were broken on machines with 32-bit ints and 64-bit longs.
|
38249 |
11-Aug-1998 |
bde |
Fixed printf format errors (ntohl() returns in_addr_t = u_int32_t != long on some 64-bit systems). print_ip() should use inet_ntoa() instead of bloated inline code with 4 ntohl()s.
|
38128 |
05-Aug-1998 |
bde |
Converted the last instance of hzto() to tvtohz().
|
38057 |
03-Aug-1998 |
dfr |
Use explicitly sized types when digging through packet headers.
Reviewed by: Julian Elischer <julian@whistle.com>
|
37996 |
01-Aug-1998 |
peter |
Fix a compile error if IPFIREWALL_FORWARD active without IPDIVERT.
|
37939 |
29-Jul-1998 |
kjc |
update ATM driver. (base version: midway.c 1.67 --> 1.68)
several new features are added: - support vc/vp shaping - support pvc shadow interface
code cleanup: - remove WMAYBE related code. ENI WMAYBE DMA doen't work. - remove updating if_lastchange for every packet. - BPF related code is moved to midway.c as it should be. (bpfwrite should work if atm_pseudohdr and LLC/SNAP are prepended.) - BPF link type is changed to DLT_ATM_RFC1483. BPF now understands only LLC/SNAP!! (because bpf can't handle variable link header length.) It is recommended to use LLC/SNAP instead of NULL encapsulation for various reasons. (BPF, IPv6, interoperability, etc.)
the code has been used for months in ALTQ and KAME IPv6.
OKed by phk long time ago.
|
37745 |
18-Jul-1998 |
alex |
Don't log ICMP type and subtype for non-zero offset packet fragments.
|
37625 |
13-Jul-1998 |
bde |
Removed a bogus forward struct declaration.
Cleaned up ifdefs.
|
37624 |
13-Jul-1998 |
bde |
Fixed some longs that should have been fixed-sized types.
|
37623 |
13-Jul-1998 |
bde |
Fixed overflow and sign extension bugs in `len = min(so->so_snd.sb_cc, win) - off;'. min() has type u_int and `off' has type int, so when min() is 0 and `off' is 1, the RHS overflows to 0U - 1 = UINT_MAX. `len' has type long, so when sizeof(long) == sizeof(int), the LHS normally overflows to to the correct value of -1, but when sizeof(long) > sizeof(int), the LHS is UINT_MAX.
Fixed some u_long's that should have been fixed-sized types.
|
37622 |
13-Jul-1998 |
bde |
Declare tcp_seq and tcp_cc as fixed-size types. Half fixed type mismatches exposed by this (the prototype for tcp_respond() didn't match the function definition lexically, and still depends on a gcc feature to match if ints have more than 32 bits).
|
37621 |
13-Jul-1998 |
bde |
Declare id_mask as a fixed-size type.
|
37620 |
13-Jul-1998 |
bde |
Declare n_short, n_long and n_time as fixed-sized types. Don't ifdef n_long or n_short specially for alphas.
|
37498 |
08-Jul-1998 |
dg |
When not acting as a router (ipforwarding=0), silently discard source routed packets that aren't destined for us, as required by RFC-1122. PR: 7191
|
37434 |
06-Jul-1998 |
julian |
oops ended comment before the comment ended..
|
37433 |
06-Jul-1998 |
julian |
Bring back some slight cleanups from 2.2
|
37413 |
06-Jul-1998 |
julian |
Don't expect the new code to be used without the right option file being included.
|
37412 |
06-Jul-1998 |
julian |
Fix braino in switching to TAILQ macro.
|
37409 |
06-Jul-1998 |
julian |
Support for IPFW based transparent forwarding. Any packet that can be matched by a ipfw rule can be redirected transparently to another port or machine. Redirection to another port mostly makes sense with tcp, where a session can be set up between a proxy and an unsuspecting client. Redirection to another machine requires that the other machine also be expecting to receive the forwarded packets, as their headers will not have been modified.
/sbin/ipfw must be recompiled!!!
Reviewed by: Peter Wemm <peter@freebsd.org> Submitted by: Chrisy Luke <chrisy@flix.net>
|
37334 |
02-Jul-1998 |
julian |
Remove out of date comment.
|
37332 |
02-Jul-1998 |
julian |
Remove the option to keep IPFW diversion backwards compatible WRT diversion reinjection. No-one has been bitten by the new behaviour that I know of.
|
37288 |
30-Jun-1998 |
phk |
Byte count statistics of multicast vifs are invalid. The problem is caused by a wrong endianess in the sum.
PR: 7115 Submitted by: Joao Carlos Mendes Luis <jonny@jonny.eng.br>
|
37183 |
27-Jun-1998 |
jhay |
Only make struct xtcpcb visable if _NETINET_IN_PCB_H_ and _SYS_SOCKETVAR_H_ are defined. Reviewed by: bde
|
37131 |
24-Jun-1998 |
brian |
Add CUSEEME support. This has *not* been tested, nor could I find anyone to test it, so please report any problems to me.
|
37094 |
21-Jun-1998 |
bde |
Removed unused includes.
|
37077 |
20-Jun-1998 |
peter |
Merge ipfilter 3.2.3 -> 3.2.7 changes onto mainline.
|
37072 |
20-Jun-1998 |
peter |
This commit was generated by cvs2svn to compensate for changes in r37071, which included commits to RCS files with non-trunk default branches.
|
36995 |
15-Jun-1998 |
julian |
fix another typo
|
36992 |
14-Jun-1998 |
julian |
Try narrow down the culprit sending undefined packet types through the loopback
|
36933 |
12-Jun-1998 |
julian |
Remove 3 occurances of __FUNCTION__
|
36908 |
12-Jun-1998 |
julian |
Go through the loopback code with a broom.. Remove lots'o'hacks. looutput is now static.
Other callers who want to use loopback to allow shortcutting should call the special entrypoint for this, if_simloop(), which is specifically designed for this purpose. Using looutput for this purpose was problematic, particularly with bpf and trying to keep track of whether one should be using the charateristics of the loopback interface or the interface (e.g. if_ethersubr.c) that was requesting the loopback. There was a whole class of errors due to this mis-use each of which had hacks to cover them up.
Consists largly of hack removal :-)
|
36906 |
12-Jun-1998 |
julian |
include opt_ipdivert.h so we get correct options
|
36903 |
12-Jun-1998 |
julian |
Allow diverted packets from the transmit side to remember if they had a recv interface and allow that state to be available after re-injection for further tests.
|
36834 |
10-Jun-1998 |
brian |
Quieten gcc 2.8.1
|
36767 |
08-Jun-1998 |
bde |
Fixed pedantic semantics errors (bitfields not of type int, signed int or unsigned int (this doesn't change the struct layout, size or alignment in any of the files changed in this commit, at least for gcc on i386's. Using bitfields of type u_char may affect size and alignment but not packing)).
|
36752 |
08-Jun-1998 |
bde |
ip_fil.h has 9 separate declarations of iplioctl() in a disgusting ifdef tangle. The previous commit to ip_fil.h didn't change the one that actually applies to the current FreeBSD kernel, of course. Fixed.
Fixed style bugs in previous commit to ip_fil.h.
|
36735 |
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
36725 |
07-Jun-1998 |
bde |
Fixed pedantic semantics errors (bitfields not of type int, signed int or unsigned int).
|
36711 |
06-Jun-1998 |
brian |
Don't call PunchFWHole() ifdef NO_FW_PUNCH Pointed out by: "Steve Sims" <SimsS@IBM.Net>
|
36710 |
06-Jun-1998 |
julian |
Make sure the default value of a dummy variable is 0 so that it doesn't do anything.
|
36708 |
06-Jun-1998 |
julian |
Fix wrong data type for a pointer.
|
36707 |
06-Jun-1998 |
julian |
clean up the changes made to ipfw over the last weeks (should make the ipfw lkm work again)
|
36692 |
06-Jun-1998 |
jkoshy |
Spelling corrections.
PR: 6868 Submitted by: Josh Gilliam <josh@quick.net>
|
36681 |
05-Jun-1998 |
julian |
Reviewed by: Kirk Mckusick (mckusick@mckusick.com) Submitted by: luoqi Chen fix a type in fsck. (also add a comment that got picked up by mistake but is worth adding)
|
36678 |
05-Jun-1998 |
julian |
Reverse the default sense of the IPFW/DIVERT reinjection code so that the new behaviour is now default. Solves the "infinite loop in diversion" problem when more than one diversion is active. Man page changes follow.
The new code is in -stable as the NON default option.
|
36529 |
31-May-1998 |
peter |
Let the sowwakeup macro decide when to call sowakeup rather than have tcp "know" about it. A pending upcall would be missed, eg: used by NFS.
Obtained from: NetBSD
|
36393 |
26-May-1998 |
dg |
Fixed logic in the test to drop ICMP echo and timestamp packets when net.inet.ip.icmp.bmcastecho = 0 by removing the extra check for the address being a multicast address. The test now relies on the link layer flags that indicate it was received via multicast. The previous logic was broken and replied to ICMP echo/timestamp broadcasts even when the sysctl option disallowed them. Reviewed by: wollman
|
36369 |
25-May-1998 |
julian |
Add optional code to change the way that divert and ipfw work together. Prior to this change, Accidental recursion protection was done by the diverted daemon feeding back the divert port number it got the packet on, as the port number on a sendto(). IPFW knew not to redivert a packet to this port (again). Processing of the ruleset started at the beginning again, skipping that divert port.
The new semantic (which is how we should have done it the first time) is that the port number in the sendto() is the rule number AFTER which processing should restart, and on a recvfrom(), the port number is the rule number which caused the diversion. This is much more flexible, and also more intuitive. If the user uses the same sockaddr received when resending, processing resumes at the rule number following that that caused the diversion. The user can however select to resume rule processing at any rule. (0 is restart at the beginning)
To enable the new code use
option IPFW_DIVERT_RESTART
This should become the default as soon as people have looked at it a bit
|
36364 |
25-May-1998 |
julian |
Hide the interface name in the sin_zero section of the sockaddr_in passed to the user process for incoming packets. When the sockaddr_in is passed back to the divert socket later, use thi sas the primary interface lookup and only revert to the IP address when the name fails. This solves a long standing bug with divert sockets: When two interfaces had the same address (P2P for example) the interface "assigned" to the reinjected packet was sometimes incorect. Probably we should define a "sockaddr_div" to officially hold this extended information in teh same manner as sockaddr_dl.
|
36363 |
25-May-1998 |
julian |
Take the user's "IGNORE_DIVERT" argument from where the user put it and not from the PCB which HAPPENS to contain the same number most of the time, but not always.
|
36335 |
24-May-1998 |
fenner |
Take IP options into account when calculating the allowable length of the TCP payload. See RFC1122 section 4.2.2.6 . This allows Path MTU discovery to be used along with IP options.
PR: problem discovered by Kevin Lahey <kml@nas.nasa.gov>
|
36330 |
24-May-1998 |
dg |
The ipt_ptr field is 1-based (see TCP/IP Illustrated, Vol. 1, pp. 91-95), so it must be adjusted (minus 1) before using it to do the length check. I'm not sure who to give the credit to, but the bug was reported by Jennifer Dawn Myers <jdm@enteract.com>, who also supplied a patch. It was also fixed in OpenBSD previously by andreas.gunnarsson@emw.ericsson.se, and of course I did the homework to verify that the fix was correct per the specification. PR: 6738
|
36321 |
24-May-1998 |
amurai |
Primary verison of NetBIOS over TCP/IP. Now you can connect Windows DOMAIN as DOMAIN user through NAT function. See also RFC1002 for futher detail of SMB structure.
Submitted by: Atsushi Murai <amurai@spec.co.jp>
|
36308 |
23-May-1998 |
phk |
Get more details on the "arpresolve: can't allocate llinfo" bogon.
PR: 2570 Reviewed by: phk Submitted by: fenner
|
36196 |
19-May-1998 |
jdp |
Fix a typo-bug in ipflow_reap that could cause a NULL pointer dereference. I have also sent this fix to Matt Thomas.
|
36194 |
19-May-1998 |
pb |
Move (private) struct ipflow out of ip_var.h, to reduce dependencies (for ipfw for example) on internal implementation details. Add $Id$ where missing.
|
36193 |
19-May-1998 |
dg |
Moved #define of IPFLOW_HASHBITS to ip_flow.c where I think it belongs.
|
36192 |
19-May-1998 |
dg |
Added fast IP forwarding code by Matt Thomas <matt@3am-software.com> via NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and minorly tweaked by me. This is a standard part of FreeBSD, but must be enabled with: "sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must also be enabled. This should probably be modified to use the zone allocator for speed and space efficiency. The current algorithm also appears to lose if the number of active paths exceeds IPFLOW_MAX (256), in which case it wastes lots of time trying to figure out which cache entry to drop.
|
36161 |
18-May-1998 |
guido |
Grumble...It seems I'm suffering from some mental disease. Do it correct now.
|
36159 |
18-May-1998 |
guido |
Add some parenthesis for clarity and fix a bug Pointed out by: Garrett Wollmand
|
36079 |
15-May-1998 |
wollman |
Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).
|
35919 |
10-May-1998 |
jb |
Treat all internet addresses as u_int32_t.
|
35823 |
07-May-1998 |
msmith |
In the words of the submitter:
--------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. ---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35698 |
04-May-1998 |
guido |
Refuse accellerated opens on listening sockets that have not set the TCP_NOPUSH socket option. This disables TAO for those services that do not know about T/TCP.
Reviewed by: Garrett Wollman Submitted by: Peter Wemm
|
35421 |
24-Apr-1998 |
dg |
At the request of Garrett, changed sysctl:
net.inet.tcp.delack_enabled -> net.inet.tcp.delayed_ack
|
35419 |
24-Apr-1998 |
dg |
Ensure that TCP_REXMTVAL doesn't return a value less than t_rttmin. This is believed to have been broken with the Brakmo/Peterson srtt calculation changes. The result of this bug is that TCP connections could time out extremely quickly (in 12 seconds). Also backed out jdp's partial fix for this problem in rev 1.17 of tcp_timer.c as it is obsoleted by this commit. Bug was pointed out by Kevin Lehey <kml@roller.nas.nasa.gov>.
PR: 6068
|
35370 |
21-Apr-1998 |
julian |
Remove the artificial limit on the size of the ipfw filter structure. This allows the addition of extra fields if we need them (I have plans).
|
35314 |
19-Apr-1998 |
brian |
o Support a compile-time -DNO_FW_PUNCH for portability (and those of us that don't want the functionality). o Don't assume sizeof(long) == 4. Ok'd by: Charles Mott <cmott@srv.net>
|
35304 |
19-Apr-1998 |
phk |
According to:
ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers
port numbers are divided into three ranges:
0 - 1023 Well Known Ports 1024 - 49151 Registered Ports 49152 - 65535 Dynamic and/or Private Ports
This patch changes the "local port range" from 40000-44999 to the range shown above (plus fix the comment in in_pcb.c).
WARNING: This may have an impact on firewall configurations!
PR: 5402 Reviewed by: phk Submitted by: Stephen J. Roznowski <sjr@home.net>
|
35256 |
17-Apr-1998 |
des |
Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.
|
35210 |
15-Apr-1998 |
bde |
Support compiling with `gcc -ansi'.
|
35174 |
13-Apr-1998 |
phk |
Wrong header length used for certain reassembled IP packets. PR: 6177 Reviewed by: phk, wollman Submitted by: Eric Sprinkle <eric@ennovatenetworks.com>
|
35065 |
06-Apr-1998 |
phk |
Use read_random()
|
35056 |
06-Apr-1998 |
phk |
Remove the last traces of TUBA.
Inspired by: PR kern/3317
|
34961 |
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
34924 |
28-Mar-1998 |
bde |
Moved some #includes from <sys/param.h> nearer to where they are actually used.
|
34923 |
28-Mar-1998 |
bde |
Fixed style bugs (mostly) in previous commit.
|
34922 |
28-Mar-1998 |
bde |
Get socket and locking stuff by including <sys/socket.h> and <sys/lock.h>, not by including <sys/mount.h> and depending on namespace pollution in it.
|
34916 |
27-Mar-1998 |
peter |
When building in in the kernel rather than as a LKM, don't compile all the LKM load/unload junk, and don't forget to register the SYSINIT so that the cdevsw entry is attached.
BTW: I think the way it builds it's /dev nodes on the fly as an LKM with vnode ops is kinda cute - I guess that'd be one way to solve the devfs persistance problems.. :-) (ie: have the drivers make the nodes in /dev on disk directly if they are missing, but leave them alone if present).
|
34915 |
27-Mar-1998 |
peter |
allow open on all minors
|
34914 |
27-Mar-1998 |
peter |
A fix for a link down route cleanup panic, when the route cleanup pulls the rug out from underneath itself.
Obtained from: wollman (a few months ago, I've been using this for ages)
|
34881 |
24-Mar-1998 |
wollman |
Use the zone allocator to allocate inpcbs and tcpcbs. Each protocol creates its own zone; this is used particularly by TCP which allocates both inpcb and tcpcb in a single allocation. (Some hackery ensures that the tcpcb is reasonably aligned.) Also keep track of the number of pcbs of each type allocated, and keep a generation count (instance version number) for future use.
|
34815 |
23-Mar-1998 |
bde |
FixedSpellingErrorInAFunctionname.
|
34756 |
21-Mar-1998 |
peter |
Make it compile.. missing "opt_ipfilter.h" and missing <sys/malloc.h>
|
34751 |
21-Mar-1998 |
peter |
Some patchups for when this code is compiled in userland (!).
|
34747 |
21-Mar-1998 |
peter |
replaced by FreeBSD specific version
|
34746 |
21-Mar-1998 |
peter |
Make this compile.. There are some unpleasing hacks in here. A major unifdef session is sorely tempting but would destroy any remaining chance of tracking the original sources.
|
34745 |
21-Mar-1998 |
peter |
Merge vendor changes from 3.2.1 -> 3.2.3 onto mainline
|
34743 |
21-Mar-1998 |
peter |
This commit was generated by cvs2svn to compensate for changes in r34742, which included commits to RCS files with non-trunk default branches.
|
34697 |
20-Mar-1998 |
fenner |
Remove the check for SYN in SYN_RECEIVED state; it breaks simultaneous connect. This check was added as part of the defense against the "land" attack, to prevent attacks which guess the ISS from going into ESTABLISHED. The "src == dst" check will still prevent the single-homed case of the "land" attack, and guessing ISS's should be hard anyway.
Submitted by: David Borman <dab@bsdi.com>
|
34586 |
15-Mar-1998 |
alex |
Allow ICMP unreachable messages to be sent in response to ICMP query packets (as per Stevens volume 1 section 6.2).
|
33955 |
01-Mar-1998 |
guido |
Make sure that you can only bind a more specific address when it is done by the same uid. Obtained from: OpenBSD
|
33897 |
27-Feb-1998 |
brian |
1) in CleanupAliasData, don't nullify entry in linkTableOut since there might be permanent entries still left after calls to DeleteLink (it will be nullified by DeleteLink if all entries are deleted, won't it ?)
2) in PacketAliasSetAddress, set the aliasing address even when PKT_ALIAS_RESET_ON_ADDR_CHANGE is in effect. Just don't clean up links in this case.
Submitted by: Ari Suutari <ari@suutari.iki.fi> via: Charles Mott <cmott@srv.net> PR: 5041
|
33851 |
26-Feb-1998 |
dima |
NetBSD PR# 2772
Reviewed by: David Greenman
|
33846 |
26-Feb-1998 |
dg |
Changes to support the addition of a new sysctl variable: net.inet.tcp.delack_enabled Which defaults to 1 and can be set to 0 to disable TCP delayed-ack processing (i.e. all acks are immediate).
|
33814 |
25-Feb-1998 |
julian |
OOPs typo TCF, not TCP....
|
33804 |
25-Feb-1998 |
julian |
Bring our in.h up to date with respect to allocated IP protocol numbers. It is possible that the names may require tuning, but the numbers represent what is in rfc1700 which is the present active RFC.
|
33678 |
20-Feb-1998 |
bde |
Don't depend on "implicit int".
|
33440 |
16-Feb-1998 |
guido |
Add new sysctl variable: net.inet.ip.accept_sourceroute It controls if the system is to accept source routed packets. It used to be such that, no matter if the setting of net.inet.ip.sourceroute, source routed packets destined at us would be accepted. Now it is controllable with eth default set to NOT accept those.
|
33268 |
12-Feb-1998 |
ache |
Replace non-existent ip_forwarding with ipforwarding (compilation error)
|
33260 |
12-Feb-1998 |
alex |
Alter ipfw's behavior with respect to fragmented packets when the packet offset is non-zero:
- Do not match fragmented packets if the rule specifies a port or TCP flags - Match fragmented packets if the rule does not specify a port and TCP flags
Since ipfw cannot examine port numbers or TCP flags for such packets, it is now illegal to specify the 'frag' option with either ports or tcpflags. Both kernel and ipfw userland utility will reject rules containing a combination of these options.
BEWARE: packets that were previously passed may now be rejected, and vice versa.
Reviewed by: Archie Cobbs <archie@whistle.com>
|
33249 |
11-Feb-1998 |
guido |
Only forward source routed packets when ip_forwarding is set to 1. This means that a FreeBSD will only forward source routed packets when both net.inet.ip.forwarding and net.inet.ip.sourceroute are set to 1.
You can hit me now ;-) Submitted by: Thomas Ptacek
|
33181 |
09-Feb-1998 |
eivind |
Staticize.
|
33134 |
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
33130 |
06-Feb-1998 |
alex |
Don't attempt to display information which we don't have: specifically, TCP and UDP port numbers in fragmented packets when IP offset != 0.
2.2.6 candidate.
Discovered by: Marc Slemko <marcs@znep.com> Submitted by: Archie Cobbs <archie@whistle.com> w/fix from me
|
33108 |
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
33067 |
04-Feb-1998 |
eivind |
Add #include "opt_devfs.h"
|
33058 |
03-Feb-1998 |
bde |
Added #include of <sys/queue.h> so that this file is more "self"-sufficent.
|
33054 |
03-Feb-1998 |
bde |
Forward declare some structs so that this file is more self-sufficient.
|
32925 |
31-Jan-1998 |
eivind |
Make POWERFAIL_NMI, PPS_SYNC and NATM new style options.
This also fixes a couple of defunct options; submitted by bde.
|
32920 |
31-Jan-1998 |
eivind |
Add #include "opt_devfs.h".
|
32821 |
27-Jan-1998 |
dg |
Improved connection establishment performance by doing local port lookups via a hashed port list. In the new scheme, in_pcblookup() goes away and is replaced by a new routine, in_pcblookup_local() for doing the local port check. Note that this implementation is space inefficient in that the PCB struct is now too large to fit into 128 bytes. I might deal with this in the future by using the new zone allocator, but I wanted these changes to be extensively tested in their current form first.
Also: 1) Fixed off-by-one errors in the port lookup loops in in_pcbbind(). 2) Got rid of some unneeded rehashing. Adding a new routine, in_pcbinshash() to do the initialial hash insertion. 3) Renamed in_pcblookuphash() to in_pcblookup_hash() for easier readability. 4) Added a new routine, in_pcbremlists() to remove the PCB from the various hash lists. 5) Added/deleted comments where appropriate. 6) Removed unnecessary splnet() locking. In general, the PCB functions should be called at splnet()...there are unfortunately a few exceptions, however. 7) Reorganized a few structs for better cache line behavior. 8) Killed my TCP_ACK_HACK kludge. It may come back in a different form in the future, however.
These changes have been tested on wcarchive for more than a month. In tests done here, connection establishment overhead is reduced by more than 50 times, thus getting rid of one of the major networking scalability problems.
Still to do: make tcp_fastimo/tcp_slowtimo scale well for systems with a large number of connections. tcp_fastimo is easy; tcp_slowtimo is difficult.
WARNING: Anything that knows about inpcb and tcpcb structs will have to be recompiled; at the very least, this includes netstat(1).
|
32773 |
25-Jan-1998 |
steve |
Fix a couple of operator precedence bugs.
PR: 5450 Submitted by: Sakari Jalovaara <sja@tekla.fi>
|
32752 |
25-Jan-1998 |
eivind |
Make TCP_COMPAT_42 a new style option.
|
32662 |
21-Jan-1998 |
fenner |
A more complete fix for the "land" attack, removing the "quick fix" from rev 1.66. This fix contains both belt and suspenders.
Belt: ignore packets where src == dst and srcport == dstport in TCPS_LISTEN. These packets can only legitimately occur when connecting a socket to itself, which doesn't go through TCPS_LISTEN (it goes CLOSED->SYN_SENT->SYN_RCVD-> ESTABLISHED). This prevents the "standard" "land" attack, although doesn't prevent the multi-homed variation.
Suspenders: send a RST in response to a SYN/ACK in SYN_RECEIVED state. The only packets we should get in SYN_RECEIVED are 1. A retransmitted SYN, or 2. An ack of our SYN/ACK. The "land" attack depends on us accepting our own SYN/ACK as an ACK; in SYN_RECEIVED state; this should prevent all "land" attacks.
We also move up the sequence number check for the ACK in SYN_RECEIVED. This neither helps nor hurts with respect to the "land" attack, but puts more of the validation checking in one spot.
PR: kern/5103
|
32561 |
16-Jan-1998 |
bde |
Fixed a missing #include in the synopsis. Fixed some wrong prototypes. Fixed a misspelled function name.
The owner of this file should add a copyright and an Id.
|
32560 |
16-Jan-1998 |
bde |
Added prototypes for functions that were documented in libalias.3 but not prototyped here.
|
32498 |
14-Jan-1998 |
brian |
Remove __libalias_version. Ppp no longer uses it.
|
32443 |
11-Jan-1998 |
eivind |
Remove use of <osreldate.h>.
Screwed up by: myself
|
32398 |
10-Jan-1998 |
steve |
Put back __libalias_version so ppp(8) build again.
|
32396 |
10-Jan-1998 |
alex |
Sync with ipfw interface change: fw_pts is now part of a union (a necessary evil due to the 108 byte setsockopt() limit).
|
32392 |
10-Jan-1998 |
jkh |
include <net/if.h> and restore this to sanity.
|
32377 |
09-Jan-1998 |
eivind |
Teach libalias to work with IPFW firewalls (controlled by a flag).
Obtained from: Yes development tree (+ 10 lines of patches from Charles Mott, original libalias author)
|
32358 |
09-Jan-1998 |
eivind |
Make the BOOTP family new-style options (in opt_bootp.h)
|
32350 |
08-Jan-1998 |
eivind |
Make INET a proper option.
This will not make any of object files that LINT create change; there might be differences with INET disabled, but hardly anything compiled before without INET anyway. Now the 'obvious' things will give a proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The only thing that _should_ work (but can't be made to compile reasonably easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to <net/if_arp.h>.
|
32330 |
08-Jan-1998 |
alex |
Bump up packet and byte counters to 64-bit unsigned ints. As a consequence, ipfw's list command now adjusts its output at runtime based on the largest packet/byte counter values.
NOTE: o The ipfw struct has changed requiring a recompile of both kernel and userland ipfw utility.
o This probably should not be brought into 2.2.
PR: 3738
|
32264 |
05-Jan-1998 |
alex |
Use LIST_FIRST/LIST_NEXT macros instead of accessing the fields lh_first and le_next.
|
32260 |
05-Jan-1998 |
alex |
Added missing parens from previous commit.
|
32257 |
05-Jan-1998 |
alex |
Bound the ICMP type bitmap now that it doesn't cover all possible ICMP type values.
|
32254 |
04-Jan-1998 |
alex |
Reduce the amount of time that network interrupts are blocked while zeroing & deleting rules.
Return EINVAL when zeroing an nonexistent entry.
|
32022 |
27-Dec-1997 |
alex |
Bring back part of rev 1.44 which was commented out by rev 1.58.
Reviewed by: nate
|
31987 |
25-Dec-1997 |
dg |
The spl fixes in in_setsockaddr and in_setpeeraddr that were meant to fix PR#3618 weren't sufficient since malloc() can block - allowing the net interrupts in and leading to the same problem mentioned in the PR (a panic). The order of operations has been changed so that this is no longer a problem. Needs to be brought into the 2.2.x branch. PR: 3618
|
31941 |
23-Dec-1997 |
alex |
Removed unnecessary setting of 'error' -- binding to a privileged port by a non-root user always returns EACCES.
|
31884 |
20-Dec-1997 |
bde |
Fixed gratuitous ANSIisms.
|
31882 |
19-Dec-1997 |
bde |
Don't use ANSI string concatenation to misformat a string.
|
31881 |
19-Dec-1997 |
bde |
Removed a stale comment. (We don't declare ip_len and ip_offset as short. I guess we depend on bogus ANSI value-preserving extension of u_short to int to avoid unsigned comparison bugs.)
|
31848 |
19-Dec-1997 |
julian |
Fix an incredibly horrible bug in the ipfw code where if you are using the "reset tcp" firewall command, the kernel would write ethernet headers onto random kernel stack locations.
Fought to the death by: terry, julian, archie. fix valid for 2.2 series as well.
|
31840 |
18-Dec-1997 |
dg |
Fixed a missing splx(s) bug in tcp_usr_send().
|
31838 |
18-Dec-1997 |
dg |
Call in_pcballoc() at splnet(). As near as I can tell, this won't fix any instability problems, but it was wrong nonetheless and will be required in an upcoming round of PCB changes.
|
31742 |
15-Dec-1997 |
eivind |
Throw options IPX, IPXIP and IPTUNNEL into opt_ipx.h.
The #ifdef IPXIP in netipx/ipx_if.h is OK (used from ipx_usrreq.c and ifconfig.c only).
I also fixed a typo IPXTUNNEL -> IPTUNNEL (and #ifdef'ed out the code inside, as it never could have compiled - doh.)
|
31323 |
20-Nov-1997 |
wollman |
Add Matt Dillon's quick fix hack for the self-connect DoS.
PR: 5103
|
31188 |
16-Nov-1997 |
peter |
This commit was generated by cvs2svn to compensate for changes in r31187, which included commits to RCS files with non-trunk default branches.
|
31163 |
13-Nov-1997 |
julian |
Submitted by: Archie cobbs (IPDIVERT author) close small security hole where an atacker could sendpackets with IPDIVERT protocol, and select how it would be diverted thus bypassing the ipfirewall. Discovered by inspection rather than attack. (you'd have to know how the firewall was configured (EXACTLY) to make use of this but..)
|
31017 |
07-Nov-1997 |
phk |
Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
|
31016 |
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
30966 |
05-Nov-1997 |
joerg |
Make IPDIVERT a supported option. Alas, in_var.h depends on it, i hope i've found out all files that actually depend on this dependancy. IMHO, it's not very good practice to change the size of internal structs depending on kernel options.
|
30948 |
05-Nov-1997 |
julian |
Return the entire if info, rather than just the index number. (at least try) Interface index numbers are an abomination that should go away (at least in that form)
|
30816 |
28-Oct-1997 |
guido |
Fix bugs from my previous commit Submitted by: Bruce Evans
|
30813 |
28-Oct-1997 |
bde |
Removed unused #includes.
|
30790 |
27-Oct-1997 |
guido |
When dosourcerouting is set do not sourceoute....
|
30354 |
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
30309 |
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
30209 |
07-Oct-1997 |
fenner |
Don't allow the window to be increased beyond what is possible to represent in the TCP header. The old code did effectively: win = min(win, MAX_ALLOWED); win = max(win, what_i_think_i_advertised_last_time); so if what_i_think_i_advertised_last_time is bigger than can be represented in the header (e.g. large buffers and no window scaling) then we stuff a too-big number into a short. This fix reverses the order of the comparisons.
PR: kern/4712
|
30052 |
02-Oct-1997 |
dg |
Killed the SYN_RECEIVED addition from rev 1.52. It results in legitimate RST's being ignored, keeping a connection around until it times out, and thus has the opposite effect of what was intended (which is to make the system more robust to DoS attacks).
|
30005 |
30-Sep-1997 |
fenner |
Don't consider a SYN/ACK with CC but no CCECHO a proper T/TCP handshake.
Reviewed by: Rich Stevens <rstevens@kohala.com>
|
29838 |
25-Sep-1997 |
wollman |
Export ipstat via sysctl. Don't understand why this wasn't done before.
|
29681 |
21-Sep-1997 |
gibbs |
Update for new callout interface.
|
29514 |
16-Sep-1997 |
joerg |
Make TCPDEBUG a new-style option.
|
29506 |
16-Sep-1997 |
bde |
Fixed gratuitous ANSIisms.
|
29480 |
15-Sep-1997 |
ache |
Prevent overflow with fragmented packets Reviewed by: wollman
|
29366 |
14-Sep-1997 |
peter |
Update network code to use poll support.
|
29327 |
13-Sep-1997 |
peter |
Some mbuf -> sockaddr changes seem to have been missed here.
|
29268 |
10-Sep-1997 |
peter |
Allow a compile-time override of the ipfw deny rule. For a 'firewall' you don't want this (and the documentation explains why), but if you use ipfw as an as-needed casual filter as needed which normally runs as 'allow all' then having the kernel and /sbin/ipfw get out of sync is a *MAJOR* pain in the behind.
PR: 4141 Submitted by: Heikki Suonsivu <hsu@mail.clinet.fi>
|
29179 |
07-Sep-1997 |
bde |
Some staticized variables were still declared to be extern.
|
29162 |
06-Sep-1997 |
brian |
Upgrade to 2.4 (Fix -PKT_ALIAS_UNREGISTERED_ONLY) Submitted by: Charles Mott <cmott@srv.net>
Add __libalias_version so that ppp can derive the correct library name for dlopen()
|
29024 |
02-Sep-1997 |
bde |
Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
|
28723 |
25-Aug-1997 |
wollman |
ICMP Timestamp Request messages could have harbored the same sort of problem as Echo Requests when broad/multicast. When multicast echo responses are disabled, also do the same for timestamp responses.
|
28683 |
25-Aug-1997 |
wollman |
Configurably don't reply to broadcast or multicast echos. There are still potential problems with other automatic-reply ICMPs, but some of them may depend on broadcast/multicast to operate. (This code can simply be moved to the `reflect' label to generalize it.)
|
28616 |
23-Aug-1997 |
alex |
Fixed logging of verbose limited packets.
PR: 4351 Submitted by: Ron Bickers <rbickers@intercenter.net>
|
28270 |
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
28084 |
11-Aug-1997 |
brian |
Fix file descriptor leak.
Submitted by: Charles Mott <cmott@srv.net> Identified by: Gordon Burditt
|
27981 |
08-Aug-1997 |
alex |
Support interface names up to 15 characters in length. In order to accommodate the expanded name, the ICMP types bitmap has been reduced from 256 bits to 32.
A recompile of kernel and user level ipfw is required.
To be merged into 2.2 after a brief period in -current.
PR: bin/4209 Reviewed by: Archie Cobbs <archie@whistle.com>
|
27926 |
06-Aug-1997 |
alex |
Ensure that the interface name is terminated.
|
27864 |
03-Aug-1997 |
brian |
Update to version 2.2. Only the PacketAlias*() functions should now be used. The old 2.1 stuff is there for backwards compatability. Submitted by: Charles Mott <cmott@snake.srv.net>
|
27845 |
02-Aug-1997 |
bde |
Removed unused #includes.
|
27669 |
25-Jul-1997 |
brian |
Recalculate ip_sum before passing a re-assembled packet to a divert port. Pointed-out by: Ari Suutari <ari@suutari.iki.fi> VS: then name the system in this line, otherwise delete it.
|
27529 |
19-Jul-1997 |
fenner |
Remove crufty LBL ifdef that only applies to Suns.
Submitted by: Craig Leres <leres@ee.lbl.gov>
|
27135 |
01-Jul-1997 |
jdp |
Fix a bug (apparently very old) that can cause a TCP connection to be dropped when it has an unusual traffic pattern. For full details as well as a test case that demonstrates the failure, see the referenced PR.
Under certain circumstances involving the persist state, it is possible for the receive side's tp->rcv_nxt to advance beyond its tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become negative. However, in the code affected by this fix, that difference was interpreted as an unsigned number by max(). Since it was negative, it was taken as a huge unsigned number. The effect was to cause the receiver to believe that its receive window had negative size, thereby rejecting all received segments including ACKs. As the test case shows, this led to fruitless retransmissions and eventually to a dropped connection. Even connections using the loopback interface could be dropped. The fix substitutes the signed imax() for the unsigned max() function.
PR: closes kern/3998 Reviewed by: davidg, fenner, wollman
|
26706 |
18-Jun-1997 |
wollman |
Add for public examination the beginnings of the per-host cache support which will for the basis of RTF_PRCLONING's more efficient, better- designed replacement.
|
26451 |
04-Jun-1997 |
julian |
make it compile with -Wall Submitted by: Archi Cobbs, archie@whistle.com
|
26359 |
02-Jun-1997 |
julian |
Submitted by: Whistle Communications (archie Cobbs)
these are quite extensive additions to the ipfw code. they include a change to the API because the old method was broken, but the user view is kept the same.
The new code allows a particular match to skip forward to a particular line number, so that blocks of rules can be used without checking all the intervening rules. There are also many more ways of rejecting connections especially TCP related, and many many more ...
see the man page for a complete description.
|
26345 |
01-Jun-1997 |
peter |
typo fix, s/imp/inp'; move lookup call inside splnet since there were comments on it being outside.
|
26147 |
26-May-1997 |
peter |
Uninitialised inp variable in div_bind().
Submitted by: Åge Røbekk <aagero@aage.priv.no>
|
26125 |
25-May-1997 |
darrenr |
This commit was generated by cvs2svn to compensate for changes in r26124, which included commits to RCS files with non-trunk default branches.
|
26113 |
25-May-1997 |
peter |
Connect the ipdivert div_usrreqs struct to the ip proto switch table
|
26096 |
24-May-1997 |
peter |
Attempt to convert the ip_divert code to use the new-style protocol request switch. I needed 'LINT' to compile for other reasons so I kinda got the blood on my hands. Note: I don't know how to test this, I don't know if it works correctly.
|
26079 |
23-May-1997 |
julian |
submitted by: archie@whistle.com
Don't search for interface addresses matching interface "NULL" it's likely to cause a page fault.. this can be triggered by the ipfw code rejecting a locally generated packet (e.g. you decide to make some network unreachable by local users)
|
26026 |
23-May-1997 |
brian |
Create the alias library. This is currently only used by ppp (or will be shortly). Natd can now be updated to use this library rather than carrying its own version of the code.
Submitted by: Charles Mott <cmott@srv.net>
|
26008 |
22-May-1997 |
fenner |
Disallow writing raw IP packets shorter than the IP header.
|
25907 |
19-May-1997 |
tegge |
Break apart initialization of s and inp from the declarations in in_setsockaddr and in_setpeeraddr. Suggested by: Justin T. Gibbs <gibbs@plutotech.com>
|
25904 |
19-May-1997 |
tegge |
Disallow network interrupts while the address is found and copied in in_setsockaddr and in_setpeeraddr. Handle the case where the socket was disconnected before the network interrupts were disabled. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
25822 |
14-May-1997 |
tegge |
Don't send arp request for the ip address 0.0.0.0.
|
25723 |
11-May-1997 |
tegge |
Bring in some kernel bootp support. This removes the need for netboot to fill in the nfs_diskless structure, at the cost of some kernel bloat. The advantage is that this code works on a wider range of network adapters than netboot. Several new kernel options are documented in LINT. Obtained from: parts of the code comes from NetBSD.
|
25604 |
09-May-1997 |
kjc |
This commit was generated by cvs2svn to compensate for changes in r25603, which included commits to RCS files with non-trunk default branches.
|
25516 |
06-May-1997 |
fenner |
Pull up the IP header in ip_mloopback(). This makes sure that the operations on the header inside ip_mloopback() are performed on a private copy instead of a shared cluster.
PR: kern/3410
|
25502 |
06-May-1997 |
alex |
Create the default rule with flags IP_FW_F_IN | IP_FW_F_OUT. Closes PR#3100.
|
25201 |
27-Apr-1997 |
wollman |
The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in.
2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
24674 |
06-Apr-1997 |
dufault |
Make MOD_* macros almost consistent:
Use the name argument almost the same in all LKM types. Maintain the current behavior for the external (e.g., modstat) name for DEV, EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only #name. This is a candidate for change and I vote just the name without the "_mod".
Change the DISPATCH macro to MOD_DISPATCH for consistency with the other macros.
Add an LKM_ANON #define to eliminate the magic -1 and associated signed/unsigned warnings.
Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.
Change source in tree to use the new interface.
Reviewed by: Bruce Evans
|
24590 |
03-Apr-1997 |
darrenr |
Resolve conflicts created by import.
|
24587 |
03-Apr-1997 |
darrenr |
This commit was generated by cvs2svn to compensate for changes in r24586, which included commits to RCS files with non-trunk default branches.
|
24570 |
03-Apr-1997 |
dg |
Reorganize elements of the inpcb struct to take better advantage of cache lines. Removed the struct ip proto since only a couple of chars were actually being used in it. Changed the order of compares in the PCB hash lookup to take advantage of partial cache line fills (on PPro).
Discussed-with: wollman
|
24204 |
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 2: include <sys/sockio.h> instead of <sys/ioctl.h> in network files.
|
24203 |
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include it when it is not used. In most cases, the reasons for including it went away when the special ioctl headers became self-sufficient.
|
23324 |
03-Mar-1997 |
dg |
Improved performance of hash algorithm while (hopefully) not reducing the quality of the hash distribution. This does not fix a problem dealing with poor distribution when using lots of IP aliases and listening on the same port on every one of them...some other day perhaps; fixing that requires significant code changes. The use of xor was inspired by David S. Miller <davem@jenolan.rutgers.edu>
|
23286 |
02-Mar-1997 |
peter |
This commit was generated by cvs2svn to compensate for changes in r23285, which included commits to RCS files with non-trunk default branches.
|
23283 |
02-Mar-1997 |
peter |
This commit was generated by cvs2svn to compensate for changes in r23282, which included commits to RCS files with non-trunk default branches.
|
23221 |
28-Feb-1997 |
fenner |
Fix a comment and some commented-out code in ip_mloopback to reflect how multicast loopback really works.
|
23082 |
24-Feb-1997 |
wollman |
Fix #include order.
|
22975 |
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
22967 |
21-Feb-1997 |
wollman |
Properly notice error returns from if_allmulti().
|
22962 |
21-Feb-1997 |
wollman |
Fix potential crash where a user attempts to perform an implied connect in TCP while sending urgent data. It is not clear what purpose is served by doing this, but there's no good reason why it shouldn't work.
Submitted by: tjevans@raleigh.ibm.com via wpaul
|
22952 |
20-Feb-1997 |
wollman |
Fix the parameters of a call to in_setsockaddr().
|
22927 |
19-Feb-1997 |
darrenr |
change IP Filter hooks to match new 3.1.8 patches for FreeBSD
|
22900 |
18-Feb-1997 |
wollman |
Convert raw IP from mondo-switch-statement-from-Hell to pr_usrreqs. Collapse duplicates with udp_usrreq.c and tcp_usrreq.c (calling the generic routines in uipc_socket2.c and in_pcb.c). Calling sockaddr()_ or peeraddr() on a detached socket now traps, rather than harmlessly returning an error; this should never happen. Allow the raw IP buffer sizes to be controlled via sysctl.
|
22719 |
14-Feb-1997 |
wollman |
Fix the mechanism for choosing wehether to save the slow-start threshold in the route. This allows us to remove the unconditional setting of the pipesize in the route, which should mean that SO_SNDBUF and SO_RCVBUF should actually work again. While we're at it:
- Convert udp_usrreq from `mondo switch statement from Hell' to new-style. - Delete old TCP mondo switch statement from Hell, which had previously been diked out.
|
22672 |
13-Feb-1997 |
wollman |
Provide PRC_IFDOWN and PRC_IFUP support for IP. Now, when an interface is administratively downed, all routes to that interface (including the interface route itself) which are not static will be deleted. When it comes back up, and addresses remaining will have their interface routes re-added. This solves the problem where, for example, an Ethernet interface is downed by traffic continues to flow by way of ARP entries.
|
22531 |
10-Feb-1997 |
darrenr |
Add IP Filter hooks (from patches).
|
22333 |
06-Feb-1997 |
brian |
Don't zero ip->ip_sum during sum validation. This should only affect programs that sit on top of divert(4) sockets. The multicast routing code already unconditionally zeros the sum before recalculating.
Any code that unconditionaly sums a packet without first zeroing the sum (assuming that it's already zero'd) will break. No such code seems to exist.
|
22212 |
02-Feb-1997 |
brian |
Reset ip_divert_ignore to zero immediately after use - also, set it in the first place, independent of whether sin->sin_port is set.
The result is that diverted packets that are being forwarded will be diverted once and only once on the way in (ip_input()) and again, once and only once on the way out (ip_output()) - twice in total. ICMP packets that don't contain a port will now also be diverted.
|
21932 |
21-Jan-1997 |
wollman |
Count multicast packets received for groups of which we are not a member separately from generic ``can't forward'' packets. This would have helped me find the previous bug much faster.
|
21929 |
21-Jan-1997 |
wollman |
Who had the conical hat? Correct a typo, hidden by a bad cast, which prevented IP multicast reception from happening.
|
21830 |
17-Jan-1997 |
joerg |
This mega-merge brings Matt Thomas' 960801 FDDI driver (almost) up to -current.
Thanks goes to Ulrike Nitzsche <ulrike@ifw-dresden.de> for giving me a chance to test this. Only the PCI driver is tested though.
One final patch will follow in a separate commit. This is so that everything up to here can be dragged into 2.2, if we decide so.
Reviewed by: joerg Submitted by: Matt Thomas <matt@3am-software.com>
|
21785 |
16-Jan-1997 |
adam |
implement "not" keyword for inverting the address logic
|
21673 |
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
21666 |
13-Jan-1997 |
wollman |
Use the new if_multiaddrs list for multicast addresses rather than the previous hackery involving struct in_ifaddr and arpcom. Get rid of the abominable multi_kludge. Update all network interfaces to use the new machanism. Distressingly few Ethernet drivers program the multicast filter properly (assuming the hardware has one, which it usually does).
|
21261 |
03-Jan-1997 |
wollman |
Expose more of these structures to tthe user so that netstat doesn't walk around with its KERNEL exposed.
More commits to follow...
|
21260 |
03-Jan-1997 |
wollman |
Move the ethertypes from <netinet/if_ether.h> to <net/ethernet.h>. Many programs need the numbers but don't need the internals of ARP.
More commits to follow...
|
21098 |
30-Dec-1996 |
peter |
Add INADDR_LOOPBACK, moved from <rpc/rpc.h>
|
20532 |
15-Dec-1996 |
wollman |
Some days, it just doesn't pay to get out of bed. Fix another broken reference to the now-dead-for-real-this-time ia_next field.
Reminded by: Russell Vincent
|
20527 |
15-Dec-1996 |
wollman |
Somehow the removal of ia_next didn't make it in the last time. Hope it makes it in this time, and remember not to commit changes next time late on a Friday evening!
|
20525 |
15-Dec-1996 |
bde |
Attempt to complete the fix in the previous revision. This version fixes the problem reported by max.
|
20448 |
14-Dec-1996 |
dyson |
Missing TAILQ mod.
|
20407 |
13-Dec-1996 |
wollman |
Convert the interface address and IP interface address structures to TAILQs. Fix places which referenced these for no good reason that I can see (the references remain, but were fixed to compile again; they are still questionable).
|
20337 |
11-Dec-1996 |
wollman |
Use queue macros for the list of interfaces. Next stop: ifaddrs!
|
20330 |
11-Dec-1996 |
wollman |
Include <net/if_arp.h> in the one header that requires it, <netinet/if_ether.h>, rather than in <net/if.h>, most of whose callers have no need of it.
Pointed-out-by: bde
|
20308 |
11-Dec-1996 |
dg |
Only pay attention to the offset and the IP_MF flag in ip_off. Pointed out by Nathaniel D. Daw (daw@panix.com), but fixed differently by me.
|
19940 |
23-Nov-1996 |
fenner |
Allocate a header mbuf for the start of the encapsulated packet. The rest of the code was treating it as a header mbuf, but it was allocated as a normal mbuf.
This fixes the panic: ip_output no HDR when you have a multicast tunnel configured.
|
19794 |
15-Nov-1996 |
fenner |
Reword two messages:
duplicate ip address 204.162.228.7! sent from ethernet address: 08:00:20:09:7b:1d changed to arp: 08:00:20:09:7b:1d is using my IP address 204.162.228.7!
and
arp info overwritten for 204.162.228.2 by 08:00:20:09:7b:1d changed to arp: 204.162.228.2 moved from 08:00:20:07:b6:a0 to 08:00:20:09:7b:1d
I think the new wordings are more clear and could save some support questions.
|
19669 |
12-Nov-1996 |
bde |
Forward-declare `struct inpcb' so that including this file doesn't cause lots of warnings.
Should be in 2.2. Previous version shouldn't have been in 2.2.
|
19622 |
11-Nov-1996 |
fenner |
Add the IP_RECVIF socket option, which supplies a packet's incoming interface using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR) to work for multicast UDP and raw sockets as well. (They previously only worked for unicast UDP).
|
19597 |
10-Nov-1996 |
fenner |
Re-enable the TCP SYN-attack protection code. I was the one who didn't understand the socket state flag.
2.2 candidate.
|
19262 |
30-Oct-1996 |
peter |
Fix braino on my part. When we have three different port ranges (default, "high" and "secure"), we can't use a single variable to track the most recently used port in all three ranges.. :-] This caused the next transient port to be allocated from the start of the range more often than it should.
|
19183 |
25-Oct-1996 |
fenner |
Don't allow reassembly to create packets bigger than IP_MAXPACKET, and count attempts to do so. Don't allow users to source packets bigger than IP_MAXPACKET. Make UDP length and ipovly's protocol length unsigned short.
Reviewed by: wollman Submitted by: (partly by) kml@nas.nasa.gov (Kevin Lahey)
|
19136 |
23-Oct-1996 |
wollman |
Give ip_len and ip_off more natural, unsigned types.
|
19113 |
22-Oct-1996 |
sos |
Changed args to the nat functions.
|
19035 |
19-Oct-1996 |
alex |
Reword two comments.
|
18940 |
15-Oct-1996 |
bde |
Forward-declared `struct route' for the KERNEL case so that <net/route.h> isn't a prerequisite.
Fixed style of ifdefs.
|
18892 |
12-Oct-1996 |
bde |
Removed nested include if <sys/socket.h> from <net/if.h> and <net/if_arp.h> and fixed the things that depended on it. The nested include just allowed unportable programs to compile and made my simple #include checking program report that networking code doesn't need to include <sys/socket.h>.
|
18891 |
12-Oct-1996 |
alex |
Log the interface name which received the packet.
Suggested by: Hal Snyder <hsndyer@thoughtport.com>
|
18874 |
11-Oct-1996 |
pst |
Fix two bugs I accidently put into the syn code at the last minute (yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously did (tail drop's the syns) pending discussion with fenner about some socket state flags that I don't fully understand.
Submitted by: fenner
|
18797 |
07-Oct-1996 |
wollman |
All three files: make COMPAT_IPFW==0 case work again. ip_input.c: - delete some dusty code - _IP_VHL - use fast inline header checksum when possible
|
18795 |
07-Oct-1996 |
dg |
Improved in_pcblookuphash() to support wildcarding, and changed relavent callers of it to take advantage of this. This reduces new connection request overhead in the face of a large number of PCBs in the system. Thanks to David Filo <filo@yahoo.com> for suggesting this and providing a sample implementation (which wasn't used, but showed that it could be done).
Reviewed by: wollman
|
18787 |
07-Oct-1996 |
pst |
Increase robustness of FreeBSD against high-rate connection attempt denial of service attacks.
Reviewed by: bde,wollman,olah Inspired by: vjs@sgi.com
|
18437 |
21-Sep-1996 |
pst |
I don't understand, I committed this fix (move a counter and fixed a typo) this evening.
I think I'm going insane.
|
18436 |
21-Sep-1996 |
ache |
Syntax error: so_incom -> so_incomp
|
18431 |
20-Sep-1996 |
pst |
If the incomplete listen queue for a given socket is full, drop the oldest entry in the queue.
There was a fair bit of discussion as to whether or not the proper action is to drop a random entry in the queue. It's my conclusion that a random drop is better than a head drop, however profiling this section of code (done by John Capo) shows that a head-drop results in a significant performance increase.
There are scenarios where a random drop is more appropriate. If I find one in reality, I'll add the random drop code under a conditional.
Obtained from: discussions and code done by Vernon Schryver (vjs@sgi.com).
|
18416 |
20-Sep-1996 |
pst |
Handle ICMP codes defined in RFC1812 more appropriately
|
18281 |
13-Sep-1996 |
pst |
Move TCPCTL_KEEPINIT to end of MIB list (sigh)
|
18280 |
13-Sep-1996 |
pst |
Make the misnamed tcp initial keepalive timer value (which is really the time, in seconds, that state for non-established TCP sessions stays about) a sysctl modifyable variable.
[part 1 of two commits, I just realized I can't play with the indices as I was typing this commit message.]
|
18278 |
13-Sep-1996 |
pst |
Receipt of two SYN's are sufficient to set the t_timer[TCPT_KEEP] to "keepidle". this should not occur unless the connection has been established via the 3-way handshake which requires an ACK
Submitted by: jmb Obtained from: problem discussed in Stevens vol. 3
|
18193 |
09-Sep-1996 |
wollman |
Set subnetsarelocal to false. In a classless world, the other case is almost never useful. (This is only a quick hack; someone should go back and delete the entire subnetsarelocal==1 code path.)
|
18160 |
08-Sep-1996 |
dg |
Dequeue mbuf before freeing it. Fixes mbuf leak and a potential crash when handling IP fragments.
Submitted by: Darren Reed <avalon@coombs.anu.edu.au>
|
17977 |
31-Aug-1996 |
alex |
Fix the visibility of the sysctl variables.
Submitted by: phk
|
17851 |
27-Aug-1996 |
sos |
Oops, send the operation type, not the name to the NAT code...
|
17795 |
23-Aug-1996 |
phk |
Mark sockets where the kernel chose the port# for. This can be used by netstat to behave more intelligently.
|
17758 |
21-Aug-1996 |
sos |
Add hooks for an IP NAT module, much like the firewall stuff... Move the sockopt definitions for the firewall code from ip_fw.h to in.h where it belongs.
|
17720 |
20-Aug-1996 |
fenner |
Add #define's for RFC1716/RFC1812 new ICMP UNREACHABLE types.
Obtained from: LBL's tcpdump distribution
|
17587 |
13-Aug-1996 |
pst |
Completely rewrite handling of protocol field for firewalls, things are now completely consistent across all IP protocols and should be quite a bit faster.
Discussed with: fenner & alex
|
17541 |
12-Aug-1996 |
peter |
Add two more portrange sysctls, which control the area of the below IPPORT_RESERVED that is used for selection when bind() is told to allocate a reserved port.
Also, implement simple sanity checking for all the addresses set, to make it a little harder for a user/sysadmin to shoot themselves in the feet.
|
17455 |
06-Aug-1996 |
phk |
Megacommit to straigthen out ETHER_ mess.
I'm pretty convinced after looking at this that the majority of our drivers are confused about the in/exclusion of ETHER_CRC_LEN :-(
|
17440 |
05-Aug-1996 |
alex |
Filter by IP protocol.
Submitted by: fenner (with modifications by me)
Use a common prefix string for all warning messages generated during ip_fw_ctl.
|
17269 |
24-Jul-1996 |
wollman |
Eliminate some more references to separate ip_v and ip_hl fields.
|
17227 |
20-Jul-1996 |
alex |
Removed extraneous return.
|
17172 |
14-Jul-1996 |
alex |
Switch back to logging accepted packets with the text "Allow" instead of "Accept"
|
17138 |
12-Jul-1996 |
dg |
Fixed two bugs in previous commit: be sure to include tcp_debug.h when TCPDEBUG is defined, and fix typo in TCPDEBUG2() macro.
|
17137 |
12-Jul-1996 |
fenner |
Fix braino in rev 1.30 fix; m_copy() the mbuf that has the header pulled up already. This bug can cause the first packet from a source to a group to be corrupted when it is delivered to a process listening on the mrouter.
|
17108 |
12-Jul-1996 |
bde |
Don't use NULL in non-pointer contexts.
|
17096 |
11-Jul-1996 |
wollman |
Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
|
17072 |
10-Jul-1996 |
julian |
Adding changes to ipfw and the kernel to support ip packet diversion.. This stuff should not be too destructive if the IPDIVERT is not compiled in.. be aware that this changes the size of the ip_fw struct so ipfw needs to be recompiled to use it.. more changes coming to clean this up.
|
17048 |
09-Jul-1996 |
nate |
Functionality for IPFIREWALL_VERBOSE logging: - State when we've reached the limit on a particular rule in the kernel logfile - State when a rule or all rules have been zero'd.
This gives a log of all actions that occur w/regard to the firewall occurances, and can explain why a particular break-in attempt might not get logged due to the limit being reached.
Reviewed by: alex
|
16827 |
29-Jun-1996 |
alex |
Reject rules which try to mix ports with incompatible protocols.
|
16678 |
25-Jun-1996 |
alex |
Allow fragment checking to work with specific protocols. Reviewed by: phk
Reject the addition of rules that will never match (for example, 1.2.3.4:255.255.255.0). User level utilities specify the policy by either masking the IP address for the user (as ipfw(8) does) or rejecting the entry with an error. In either case, the kernel should not modify chain entries to make them work.
|
16619 |
23-Jun-1996 |
bde |
Use IPFIREWALL_MODULE instead of ACTUALLY_LKM_NOT_KERNEL to indicate LKM'ness. ACTUALLY_LKM_NOT_KERNEL is supposed to be so ugly that it only gets used until <machine/conf.h> goes away. bsd.kmod.mk should define a better-named general macro for this. Some places use PSEUDO_LKM. This is another bad name.
Makefile: Added IPFIREWALL_VERBOSE_LIMIT option (commented out).
|
16576 |
21-Jun-1996 |
peter |
Set the rmx.rmx_expire to 0 when creating fake ethernet addresses for the broadcast and multicast routes, otherwise they will be expired by arptimeout after a few minutes, reverting to " (incomplete)". This makes the work done by rev 1.27 stay around until the route itself is deleted. This is mainly cosmetic for 'arp' and 'netstat -r'.
|
16557 |
20-Jun-1996 |
fenner |
Use the route that's guaranteed to exist when picking a source address for ARP requests.
The NetBSD version of this patch (see NetBSD PR kern/2381) has this change already. This should close our PR kern/1140 .
Although it's not quite what he submitted, I got the idea from him so Submitted by: Jin Guojun <jin@george.lbl.gov>
|
16548 |
20-Jun-1996 |
fenner |
Remove one last rip_output from inetsw (gpalmer missed it in rev 1.30)
|
16542 |
20-Jun-1996 |
nate |
Put the 'debug' messages of the type: /kernel: in_rtqtimo: adjusted rtq_reallyold to 1066 /kernel: in_rtqtimo: adjusted rtq_reallyold to 710 inside of #ifdef DIAGNOSTIC to avoid the support questions from folks asking what this means.
|
16413 |
17-Jun-1996 |
alex |
Fix chain numbering bug when the highest line number installed >= 65435 and the rule being added has no explicit line number set.
Submitted by: Archie Cobbs <archie@whistle.com>
|
16367 |
14-Jun-1996 |
wollman |
Better selection of initial retransmit timeout when no cached RTT information is available.
Submitted by: kbracey@art.acorn.co.uk (Kevin Bracey) (slightly modified by me)
|
16349 |
13-Jun-1996 |
gpalmer |
Don't try to include opt_ipfw.h in LKMs
Submitted by: Ollivier Robert <roberto@keltia.freenix.fr>
|
16341 |
13-Jun-1996 |
dg |
Keep ether_type in network order for BPF to be consistent with other systems.
Submitted by: Ted Lemon, Matt Thomas, and others. Retrofitted for -current by me.
|
16333 |
12-Jun-1996 |
gpalmer |
Convert ipfw to use opt_ipfw.h
|
16322 |
12-Jun-1996 |
gpalmer |
Clean up -Wunused warnings.
Reviewed by: bde
|
16266 |
09-Jun-1996 |
alex |
Big sweep over ipfw, picking up where Poul left off:
- Log ICMP type during verbose output. - Added IPFIREWALL_VERBOSE_LIMIT option to prevent denial of service attacks via syslog flooding. - Filter based on ICMP type. - Timestamp chain entries when they are matched. - Interfaces can now be matched with a wildcard specification (i.e. will match any interface unit for a given name). - Prevent the firewall chain from being manipulated when securelevel is greater than 2. - Fixed bug that allowed the default policy to be deleted. - Ability to zero individual accounting entries. - Remove definitions of old_chk_ptr and old_ctl_ptr when compiling ipfw as a lkm. - Remove some redundant code shared between ip_fw_init and ipfw_load.
Closes PRs: 1192, 1219, and 1267.
|
16206 |
08-Jun-1996 |
bde |
Changed some memcpy()'s back to bcopy()'s.
gcc only inlines memcpy()'s whose count is constant and didn't inline these. I want memcpy() in the kernel go away so that it's obvious that it doesn't need to be optimized. Now it is only used for one struct copy in si.c.
|
16143 |
05-Jun-1996 |
wollman |
Instrument UDP PCB hashing to see how often the hash lookup is effective for incoming packets.
|
16141 |
05-Jun-1996 |
wollman |
Correct formula for TCP RTO calculation. Also try to do a better job in filling in a new PCB's rttvar (but this is not the last word on the subject). And get rid of `#ifdef RTV_RTT', it's been true for four years now...
|
16099 |
03-Jun-1996 |
jdp |
Fix a bug in the handling of the "persist" state which, under certain circumstances, caused perfectly good connections to be dropped. This happened for connections over a LAN, where the retransmit timer calculation TCP_REXMTVAL(tp) returned 0. If sending was blocked by flow control for long enough, the old code dropped the connection, even though timely replies were being received for all window probes.
Reviewed by: W. Richard Stevens <rstevens@noao.edu>
|
16065 |
02-Jun-1996 |
gpalmer |
Correct spelling error in comment
|
16035 |
31-May-1996 |
peter |
More closely preserve the original operation of rresvport() when using IP_PORTRANGE_LOW.
|
15869 |
22-May-1996 |
wollman |
Conditionalize calls to IPFW code on COMPAT_IPFW. This is done slightly unconventionally: If COMPAT_IPFW is not defined, or if it is defined to 1, enable; otherwise, disable.
This means that these changes actually have no effect on anyone at the moment. (It just makes it easier for me to keep my code in sync.) In the future, the `not defined' part of the hack should be eliminated, but doing this now would require everyone to change their config files.
The same conditionals need to be made in ip_input.c as well for this to ave any useful effect, but I'm not ready to do that right now.
|
15850 |
21-May-1996 |
peter |
Fix an embarresing error on my part that made the IP_PORTRANGE options return a failure code (even though it worked). This commit brought to you by the 'C' keyword "break".. :-)
|
15701 |
09-May-1996 |
wollman |
Make it possible to return more than one piece of control information (PR #1178). Define a new SO_TIMESTAMP socket option for datagram sockets to return packet-arrival timestamps as control information (PR #1179).
Submitted by: Louis Mamakos <loiue@TransSys.com>
|
15681 |
08-May-1996 |
gpalmer |
Remove useless entries from the inetsw structure initiliser which only produced compile-time warnings.
Reviewed/Tested by: Bill Fenner <fenner@parc.xerox.com>
|
15680 |
08-May-1996 |
gpalmer |
Clean up various compiler warnings. Most (if not all) were benign
Reviewed by: bde
|
15653 |
06-May-1996 |
phk |
Several locations in sys/netinet/ip_fw.c are lacking or incorrectly use spl() functions.
Reviewed by: phk Submitted by: Alex Nash <alex@zen.nash.org>
|
15652 |
06-May-1996 |
wollman |
Add three new route flags to help determine what sort of address the destination represents. For IP:
- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local (belongs to this host) and broadcast addresses, respectively.
- For all routes, RTF_MULTICAST is set if the destination is multicast.
The RTF_BROADCAST flag is used by ip_output() to eliminate a call to in_broadcast() in a common case; this gives about 1% in our packet-generation experiments. All three flags might be used (although they aren't now) to determine whether a packet can be forwarded; a given host route can represent a forwardable address if:
(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST)) == RTF_HOST
Obviously, one still has to do all the work if a host route is not present, but this code allows one to cache the results of such a lookup if rtalloc1() is called without masking RTF_PRCLONING.
|
15525 |
02-May-1996 |
fenner |
Back out my stupid braino; I was thinking strlen and not sizeof.
|
15524 |
02-May-1996 |
fenner |
Size temp var correctly; buf[4*sizeof "123"] is not long enough to store "192.252.119.189\0".
|
15414 |
27-Apr-1996 |
ache |
inet_ntoa buffer was evaluated twice in log_in_vain, fix it. Thanx to: jdp
|
15396 |
26-Apr-1996 |
wollman |
Delete #ifdef notdef blocks containing old method of srtt calculation.
Requested by: davidg
|
15395 |
26-Apr-1996 |
wollman |
Delete #if 0 block containing remnants of pre-MTU discovery rmx_mtu initialization.
|
15394 |
26-Apr-1996 |
wollman |
Delete #if 0 block containing unused definitions for ARPANET/DDN IMP and HYPERchannel link layers.
|
15335 |
21-Apr-1996 |
bde |
Fixed in-line IP header checksumming. It was performed on the wrong header in one case.
|
15295 |
18-Apr-1996 |
wollman |
Three speed-ups in the output path (two small, one substantial):
1) Require all callers to pass a valid route pointer to ip_output() so that we don't have to check and allocate one off the stack as was done before. This eliminates one test and some stack bloat from the common (UDP and TCP) case.
2) Perform the IP header checksum in-line if it's of the usual length. This results in about a 5% speed-up in my packet-generation test.
3) Use ip_vhl field rather than ip_v and ip_hl bitfields.
|
15294 |
18-Apr-1996 |
wollman |
Define a few macros useful in the _IP_VHL case.
|
15293 |
18-Apr-1996 |
wollman |
Fix a warning by not referencing ip_output() as a pr_output() member.
|
15292 |
18-Apr-1996 |
wollman |
Always call ip_output() with a valid route pointer. For igmp, also get the multicast option structure off the stack rather than malloc.
|
15262 |
15-Apr-1996 |
dg |
Two fixes from Rich Stevens:
1) Set the persist timer to help time-out connections in the CLOSING state. 2) Honor the keep-alive timer in the CLOSING state.
This fixes problems with connections getting "stuck" due to incompletion of the final connection shutdown which can be a BIG problem on busy WWW servers.
|
15238 |
13-Apr-1996 |
bde |
Eliminated sloppy common-style declarations. Now there are no duplicated common labels for LINT. There are still some common declarations for the !KERNEL case in tcp_debug.h and spx_debug.h. trpt depends on the ones in tcp_debug.h.
|
15211 |
12-Apr-1996 |
phk |
Fix a bogon I introduced with my last change.
Submitted by: Andreas Klemm <andreas@knobel.gun.de>
|
15154 |
09-Apr-1996 |
pst |
Logging UDP and TCP connection attempts should not be enabled by default. It's trivial to create a denial of service attack on a box so enabled.
These messages, if enabled at all, must be rate-limited. (!)
|
15092 |
07-Apr-1996 |
dg |
Added proper splnet protection while modifying the interface address list. This fixes a panic that occurs when ifconfig ioctl(s) were interrupted by IP traffic at the wrong time - resulting in a NULL pointer dereference. This was originally noticed on a FreeBSD 1.0 system, but the problem still exists in current sources.
|
15039 |
04-Apr-1996 |
phk |
Add a sysctl (net.inet.tcp.always_keepalive: 0) that when set will force keepalive on all tcp sessions. Setsockopt(2) cannot override this setting. Maybe another one is needed that just changes the default for SO_KEEPALIVE ? Requested by: Joe Greco <jgreco@brasil.moneng.mei.com>
|
15038 |
04-Apr-1996 |
phk |
Log TCP syn packets for ports we don't listen on. Controlled by: sysctl net.inet.tcp.log_in_vain: 1
Log UDP syn packets for ports we don't listen on. Controlled by: sysctl net.inet.udp.log_in_vain: 1
Suggested by: Warren Toomey <wkt@cs.adfa.oz.au>
|
15028 |
03-Apr-1996 |
wollman |
Always pass a route structure when calling ip_output().
|
15026 |
03-Apr-1996 |
phk |
Add feature for tcp "established". Change interface between netinet and ip_fw to be more general, and thus hopefully also support other ip filtering implementations.
|
14998 |
02-Apr-1996 |
phk |
Fix two cases where ia->ia_ifp could be NULL.
|
14841 |
27-Mar-1996 |
wollman |
In tcp_respond(), check that ro->ro_rt is non-null before RTFREEing it.
|
14824 |
26-Mar-1996 |
fenner |
Make rip_input() take the header length Move ipip_input() and rsvp_input() prototypes to ip_var.h Remove unused prototype for rip_ip_input() from ip_var.h Remove unused variable *opts from rip_output()
|
14823 |
26-Mar-1996 |
fenner |
Add missing splx(s) in IP_MULTICAST_IF
Submitted by: Jim Binkley <jrb@cs.pdx.edu>
|
14819 |
25-Mar-1996 |
wollman |
Slight modification of RTO floor calculation.
|
14817 |
25-Mar-1996 |
phk |
Check the validity of ia->ia_ifp before we dereference it.
|
14761 |
23-Mar-1996 |
fenner |
Send ARP's for aliased subnets with the proper source address. Get rid of ac->ac_ipaddr and arpwhohas() since they assume that an interface has only one address.
Obtained from: BSD/OS 2.1, via Rich Stevens <rstevens@noao.edu>
|
14754 |
22-Mar-1996 |
wollman |
Make sure tcp_respond() always calls ip_output() with a valid route pointer. This has no effect in the current ip_output(), but my version requires that ip_output() always be passed a route.
|
14753 |
22-Mar-1996 |
wollman |
A number of performance-reducing flaws fixed based on comments from Larry Peterson &co. at Arizona:
- Header prediction for ACKs did not exclude Fast Retransmit/Recovery. - srtt calculation tended to get ``stuck'' and could never decrease when below 8. It still can't, but the scaling factors are adjusted so that this artifact does not cause as bad an effect on the RTO value as it used to.
The paper also points out the incr/8 error that has been long since fixed, and the problems with ACKing frequency resulting from the use of options which I suspect to be fixed already as well (as part of the T/TCP work).
Obtained from: Brakmo & Peterson, ``Performance Problems in BSD4.4 TCP''
|
14632 |
15-Mar-1996 |
fenner |
Allow SIOCGIFBRDADDR and SIOCGIFNETMASK to return information about aliases, if the alias address was passed in the struct ifreq. Default to first address on the list, for backwards compatibility.
|
14622 |
14-Mar-1996 |
fenner |
IGMPv2 routines rewritten, to be more compact and to fully comply with the IGMPv2 Internet Draft (including Router Alert IP option)
|
14611 |
13-Mar-1996 |
pst |
Fix ip option processing for raw IP sockets. This whole thing is a compromise between ignoring options specified in the setsockopt call if IP_HDRINCL is set (the UCB choice when VJ's code was brought in) vs allowing them (what everyone else did, and what is assumed by programs everywhere...sigh).
Also perform some checking of the passed down packet to avoid running off the end of a mbuf chain.
Reviewed by: fenner
|
14549 |
11-Mar-1996 |
fenner |
Cleaned up uninitialized 'rt' warning properly Make a copy of the header of a packet that gets queued due to lack of forwarding cache entry, so that nobody else can step on it. Thanks to Mike Karels <karels@bsdi.com> for pointing this one out.
|
14546 |
11-Mar-1996 |
dg |
Move or add #include <queue.h> in preparation for upcoming struct socket changes.
|
14328 |
02-Mar-1996 |
peter |
Add more options into the conf/options and i386/conf/options.i386 files and the #include hooks so that 'make depend' is more useful. This covers most of the options I regularly use (but not all) and some other easy ones.
|
14293 |
28-Feb-1996 |
phk |
Forgot to remove this file.
|
14281 |
27-Feb-1996 |
bde |
Spell tcp_listendrop consistently so that tcp_input.c and netstat compile.
|
14268 |
26-Feb-1996 |
guido |
Add a counter for the number of times the listen queue was overflowed to the tcpstat structure. (netstat -s) Reviewed by: wollman Obtained from: Steves, TCP/IP Ill. vol.3, page 189
|
14266 |
26-Feb-1996 |
phk |
Fix wrong logic, certain rules never matched.
|
14232 |
24-Feb-1996 |
phk |
Make getsockopt() capable of handling more than one mbuf worth of data. Use this to read rules out of ipfw. Add the lkm code to ipfw.c
|
14230 |
24-Feb-1996 |
phk |
The new firewall functionality: Filter on the direction (in/out). Filter on fragment/not fragment.
|
14226 |
23-Feb-1996 |
phk |
I overlooked this one.
|
14209 |
23-Feb-1996 |
phk |
Big sweep over the IPFIREWALL and IPACCT code.
Close the ip-fragment hole. Waste less memory. Rewrite to contemporary more readable style. Kill separate IPACCT facility, use "accept" rules in IPFIREWALL. Filter incoming >and< outgoing packets. Replace "policy" by sticky "deny all" rule. Rules have numbers used for ordering and deletion. Remove "rerorder" code entirely. Count packet & bytecount matches for rules.
Code in -current & -stable is now the same.
|
14195 |
22-Feb-1996 |
peter |
Make the default behavior of local port assignment match traditional systems (my last change did not mix well with some firewall configurations). As much as I dislike firewalls, this is one thing I I was not prepared to break by default.. :-)
Allow the user to nominate one of three ranges of port numbers as candidates for selecting a local address to replace a zero port number. The ranges are selected via a setsockopt(s, IPPROTO_IP, IP_PORTRANGE, &arg) call. The three ranges are: default, high (to bypass firewalls) and low (to get a port below 1024).
The default and high port ranges are sysctl settable under sysctl net.inet.ip.portrange.*
This code also fixes a potential deadlock if the system accidently ran out of local port addresses. It'd drop into an infinite while loop.
The secure port selection (for root) should reduce overheads and increase reliability of rlogin/rlogind/rsh/rshd if they are modified to take advantage of it.
Partly suggested by: pst Reviewed by: wollman
|
14181 |
22-Feb-1996 |
dg |
Fixed bug in Path MTU Discovery that caused the system to have to re- discover the Path MTU for each connection if the connecting host didn't offer an initial MSS.
Submitted by: davidg & olah
|
14163 |
20-Feb-1996 |
fenner |
Make the "arpresolve: can't allocate llinfo" error message more useful by printing out the IP address it was trying to resolve, since we're seeing so many complaints about this error.
|
13971 |
08-Feb-1996 |
wollman |
#if out unsupported IMP code.
|
13929 |
05-Feb-1996 |
wollman |
Provide a direct entry point for IP input. This actually results in a slight decrease in performance, but will lead to better performance later.
|
13926 |
05-Feb-1996 |
wollman |
Fill in the corresponding ether address of multicast and broadcast pseudo-``ARP entries'' so arp(8) doesn't show them as `unresolved'.
|
13879 |
03-Feb-1996 |
phk |
Make the sorting of IPFW rules an option. You don't want it to sort them. >>>WARNING<<< you may have to revisit your firewall setup.
|
13779 |
31-Jan-1996 |
olah |
Fix a bug related to the interworking of T/TCP and window scaling: when a connection enters the ESTBLS state using T/TCP, then window scaling wasn't properly handled. The fix is twofold.
1) When the 3WHS completes, make sure that we update our window scaling state variables.
2) When setting the `virtual advertized window', then make sure that we do not try to offer a window that is larger than the maximum window without scaling (TCP_MAXWIN).
Reviewed by: davidg Reported by: Jerry Chen <chen@Ipsilon.COM>
|
13765 |
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
13638 |
26-Jan-1996 |
phk |
The last part of the ether_sprint -> %6D change. Sorry for the delay. (%D is for hexdumping.)
|
13619 |
24-Jan-1996 |
phk |
Use new printf features rather than local kludges.
|
13581 |
23-Jan-1996 |
fenner |
First piece of fixing ppp/proxy arp problem:
If an attempt to add a route fails because an "ARP table" entry is in the way, remove the ARP entry and retry the add.
Reviewed by: nate
|
13492 |
19-Jan-1996 |
peter |
remove tcp_lastport - it has not been used for quite a while (at least since the hashed pcb's I think).
|
13491 |
19-Jan-1996 |
peter |
Change the default local address range for IP from 1024 through 5000 to 20000 through 30000. These numbers are used for local IP port numbers when an explicit address is not specified.
The values are sysctl modifiable under: net.inet.ip.port_{first|last}_auto
These numbers do not overlap with any known server addresses, without going above 32768 which are "negative" on some other implementations.
20000 through 30000 is 2.5 times larger than the old range, but some have suggested even that may not be enough... (gasp!) Setting a low address of 10000 should be plenty.. :-)
|
13486 |
19-Jan-1996 |
fenner |
Add definitions for ICMP router discovery.
Reviewed by: wollman
|
13475 |
17-Jan-1996 |
olah |
Be more conservative when T/TCP extensions are disabled. In particular, do not send data and/or FIN on SYN segments in this case.
|
13357 |
09-Jan-1996 |
dg |
Fix logic bug (!= should be ==) in recent P2P/multicast kludge.
Reviewed by: Bill Fenner <fenner@parc.xerox.com> Submitted by: Dave Marquardt <marquard@austin.ibm.com>
|
13351 |
08-Jan-1996 |
guido |
Fix a bug where having a process listening to both a INADDR_ANY and a local address, that was assigned with ifconfig alias and netmask 0xffffffff, would receive duplictae udp packets. This behaviour can easily be seen by having named run, and using the alias address as the name server. This solution is not the pretiest one, but after talk with Garreth, it is seen as the most easy one.
|
13266 |
05-Jan-1996 |
wollman |
Finally demolished the last, tottering remnants of GATEWAY. If you want to enable IP forwarding, use sysctl(8). Also did the same for IPX, which involved inventing a completely new MIB from whole cloth (which I may not quite have correct); be aware of this if you use IPX forwarding. (The two should never have been controlled by the same option anyway.)
|
13229 |
04-Jan-1996 |
olah |
Reverse the modification which caused the annoying m_copydata crash: set the TF_ACKNOW flag when the REXMT timer goes off to force a retransmission. In certain situations pulling snd_nxt back to snd_una is not sufficient.
|
13200 |
03-Jan-1996 |
wollman |
Try to make multicast routing work correctly over point-to-point links (which was broken previously by the support for half-routers).
Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
13091 |
29-Dec-1995 |
dg |
Remove some bogus externs.
|
12956 |
21-Dec-1995 |
wollman |
If _IP_VHL is defined, declare a single ip_vhl member in struct ip rather than separate ip_v and ip_hl members. Should have no effect on current code, but I'd eventually like to get rid of those obnoxious bitfields completely.
|
12955 |
21-Dec-1995 |
wollman |
Delete old-style-broadcast-address compatibility cruft in IP input path. If users want to use the old-style broadcast addresses, they will have to currectly configure their systems.
|
12942 |
20-Dec-1995 |
wollman |
in_proto.c: spell ``Internet'' right and put whitespace after commas.
others: start to populate the link-layer branch of the net mib, by moving ARP to its proper place. (ARP is not a protocol family, it's an interface layer between a medium-access layer and a protocol family.) sysctl(8) needs to be taught about the structure of this branch, unless Poul-Henning implements dynamic MIB exploration soon.
|
12940 |
20-Dec-1995 |
wollman |
Demolish DIRECTED_BROADCAST. It was always a bad idea, and nobody uses it.
|
12939 |
20-Dec-1995 |
wollman |
Fix a nagging divide-by-zero error resulting from the MTU discovery code getting triggered at a bad time.
|
12934 |
19-Dec-1995 |
wollman |
Added a comment about why trying to make a one-behind cache for the route in ip_output() is a bad idea.
|
12933 |
19-Dec-1995 |
wollman |
Actually call in_rtqdrain()as was originally intended.
|
12881 |
16-Dec-1995 |
bde |
Uniformized pr_ctlinput protosw functions. The third arg is now `void *' instead of caddr_t and it isn't optional (it never was). Most of the netipx (and netns) pr_ctlinput functions abuse the second arg instead of using the third arg but fixing this is beyond the scope of this round of changes.
|
12877 |
16-Dec-1995 |
bde |
Added a prototype.
|
12820 |
14-Dec-1995 |
phk |
Another mega commit to staticize things.
|
12704 |
09-Dec-1995 |
phk |
Staticize.
|
12693 |
09-Dec-1995 |
phk |
Remove old ballast, clean up a little bit, staticize. Add five sysctl variables that you should probably never tweak. net.arp.t_prune: 300 net.arp.t_keep: 1200 net.arp.t_down: 20 net.arp.maxtries: 5 net.arp.useloopback: 1 net.arp.proxyall: 0
(It's net.arp because arp isn't limited to inet, though our present implementation surely is).
|
12676 |
08-Dec-1995 |
wollman |
Added a conditionalized printf for debugging MTU discovery.
|
12657 |
06-Dec-1995 |
bde |
Removed unnecessary #includes of vm stuff. Most of them were once prerequisites for <sys/sysctl.h>.
subr_prof.c: Also replaced #include of <sys/user.h> by #include of <sys/resourcevar.h>.
|
12644 |
05-Dec-1995 |
bde |
Added explicit include of <sys/queue.h>. Currently, some things only compile because <vm/vm.h> happens to be gratuitously included before <netinet/in_pcb.h> and <vm/vm.h> happens to include <sys/queue.h>.
|
12635 |
05-Dec-1995 |
wollman |
Path MTU Discovery is now standard.
|
12628 |
05-Dec-1995 |
dg |
all: Removed ifnet.if_init and ifnet.if_reset as they are generally unused. Change the parameter passed to if_watchdog to be a ifnet * rather than a unit number. All of this is an attempt to move toward not needing an array of softc pointers (which is usually static in size) to point to the driver softc.
if_ed.c: Changed some of the argument passing to some functions to make a little more sense.
if_ep.c, if_vx.c: Killed completely bogus use of if_timer. It was being set in such a way that the interface was being reset once per second (blech!).
|
12579 |
02-Dec-1995 |
bde |
Completed function declarations and/or added prototypes.
|
12426 |
20-Nov-1995 |
phk |
fix #includes & warnings.
|
12376 |
18-Nov-1995 |
bde |
Fixed the type of a function pointer.
|
12325 |
16-Nov-1995 |
bde |
Fixed recent staticizations. Some protypes for static functions were left in headers and not staticized.
|
12296 |
14-Nov-1995 |
phk |
New style sysctl & staticize alot of stuff.
|
12172 |
09-Nov-1995 |
phk |
Start adding new style sysctl here too.
|
12047 |
03-Nov-1995 |
olah |
Cosmetic changes to processing of segments in the SYN_SENT state: - remove a redundant condition; - complete all validity checks on segment before calling soisconnected(so).
Reviewed by: Richard Stevens, davidg, wollman
|
12046 |
03-Nov-1995 |
olah |
Setting the TF_ACKNOW flag was redundant in the REXMT timeout because tcp_output() checks for the condition snd_nxt == snd_una.
Reviewed by: davidg, wollman, olah Suggested by: Richard Stevens
|
12045 |
03-Nov-1995 |
olah |
Fix a logical error in T/TCP: when we actively open a connection, we have to decide whether to send a CC or CCnew option in our SYN segment depending on the contents of our TAO cache. This decision has to be made once when the connection starts. The earlier code delayed this decision until the segment was assembled in tcp_output() and retransmitted SYN segments could have different CC options.
Reviewed by: Richard Stevens, davidg, wollman
|
12003 |
01-Nov-1995 |
wollman |
Instrument the IP input queue with two new read-only MIB entries: net.inet.ip.intr-queue-maxlen (=== ipintrq.ifq_maxlen) and net.inet.ip.intr-queue-drops (=== ipintrq.ifq_drops)
There should probably be a standard way of getting the same information going the other way.
|
11928 |
29-Oct-1995 |
olah |
Start the 2MSL timer when the socket is closed and the TCP connection is in the FIN_WAIT_2 state in order to prevent the conn. hanging there forever.
Reviewed by: davidg, olah Submitted by: Arne Henrik Juul <arnej@imf.unit.no> Obtained from: bugs@netbsd.org
|
11921 |
29-Oct-1995 |
phk |
Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
|
11819 |
26-Oct-1995 |
julian |
Reviewed by: julian and jhay@mikom.csir.co.za Submitted by: Mike Mitchell, supervisor@alb.asctmd.com
This is a bulk mport of Mike's IPX/SPX protocol stacks and all the related gunf that goes with it.. it is not guaranteed to work 100% correctly at this time but as we had several people trying to work on it I figured it would be better to get it checked in so they could all get teh same thing to work on..
Mikes been using it for a year or so but on 2.0
more changes and stuff will be merged in from other developers now that this is in.
Mike Mitchell, Network Engineer AMTECH Systems Corporation, Technology and Manufacturing 8600 Jefferson Street, Albuquerque, New Mexico 87113 (505) 856-8000 supervisor@alb.asctmd.com
|
11706 |
23-Oct-1995 |
ugen |
Support all the tcpflag options in firewall. Add reading options from file, now ipfw <filename> will read commands string after string from file , form of strings same as command line interface.
|
11680 |
22-Oct-1995 |
phk |
Remove the last trace of arptnew()
|
11603 |
21-Oct-1995 |
dg |
Fix panic caused by PRU_CONTROL not being dealt with properly. Bug pointed out by David Maltz <dmaltz@orval.mach.cs.cmu.edu>, but this fix is by me.
|
11537 |
16-Oct-1995 |
wollman |
The ability to administratively change the MTU of an interface presents a few new wrinkles for MTU discovery which tcp_output() had better be prepared to handle. ip_output() is also modified to do something helpful in this case, since it has already calculated the information we need.
|
11458 |
13-Oct-1995 |
wollman |
Routes can be asymmetric. Always offer to /accept/ an MSS of up to the capacity of the link, even if the route's MTU indicates that we cannot send that much in their direction. (This might actually make it possible to test Path MTU discovery in a useful variety of cases.)
|
11450 |
12-Oct-1995 |
wollman |
The additional checks involving sequence numbers in MTU discovery resends turned out not to be necessary; simply watching for MTU decreases (which we already did) automagically eliminates all the cases we were trying to protect against.
|
11415 |
10-Oct-1995 |
wollman |
More MTU discovery: avoid over-retransmission if route changes in the middle of a fully-open window. Also, keep track of how many retransmits we do as a result of MTU discovery. This may actually do more work than necessary, but it's an unusual condition...
Suggested by: Janey Hoe <janey@lcs.mit.edu>
|
11284 |
06-Oct-1995 |
wollman |
Put newline at end of log()ed messages so syslog can't fill up your /var quite as fast.
|
11225 |
05-Oct-1995 |
wollman |
Convert ARP to use queue.h macros rather than insque/remque. While we're at it, eliminate obsolete exposure of `struct llinfo_arp' to the world. (This dates back to when ARP entries were not stored in the routing table, and there was no other way for the `arp' program to read the whole table than to grovel around in /dev/kmem.)
|
11187 |
04-Oct-1995 |
wollman |
Make a whole bunch of PCB variables ints rather than shorts. There appear to be no ill effects, and so far as Iknow none of the variables in question depend on 16-bit wraparound behavior. (The sizes are in many cases relics from when a PCB had to fit inside a 128-byte mbuf. PCBs are no longer stored in that way, and the old structure would not have fit, either.)
|
11150 |
03-Oct-1995 |
wollman |
Finish 4.4-Lite-2 merge: randomize TCP initial sequence numbers to make ISS-guessing spoofing attacks harder.
|
11119 |
01-Oct-1995 |
ugen |
Well..finally..this is the first part..it should take care of matching IP options..Check and test this - i made only a couple of rough tests and this could be buggy.. Ipaccounting can't use IP Options (and i don't see any need to cound packets with specific options either..) More to come...
|
10965 |
22-Sep-1995 |
wollman |
Merge 4.4-Lite-2: update version number (we already have the same fixes).
Obtained from: 4.4BSD-Lite-2
|
10961 |
22-Sep-1995 |
wollman |
Merge 4.4-Lite-2: always check the UDP checksum if it is present, even if we are not generating checksums. (Save a test in the input path.)
|
10956 |
22-Sep-1995 |
wollman |
Correct spelling error in MTUDISC code.
|
10950 |
22-Sep-1995 |
peter |
Remove duplicate definition for tcps_persistdrop, as added by davidg some time ago. I left in Garrett's one, because his was in the 4.4-Lite-2 location, making any diffs just that little bit smaller.
I presume this choice means that netstat needs to be recompiled before "netstat -s" will give a meaningful answer on tcp stats.
|
10944 |
21-Sep-1995 |
wollman |
Merge with 4.4-Lite-2: fix bug that caused getsockopt of IP_HDRINCL to fail.
Obtained from: 4.4BSD-Lite-2
|
10942 |
21-Sep-1995 |
wollman |
Merge 4.4-Lite-2 by updating the version number.
Obtained from: 4.4BSD-Lite-2
|
10941 |
21-Sep-1995 |
wollman |
Merge 4.4-Lite-2: update some declarations that we don't support anyway.
Obtained from: 4.4BSD-Lite-2
|
10940 |
21-Sep-1995 |
wollman |
Merge 4.4-Lite-2: use M_NOWAIT in in_pcballoc(), and return EACCES rather than EPERM on illegal attempt to bind a reserved port.
Obtained from: 4.4BSD-Lite-2
|
10939 |
21-Sep-1995 |
wollman |
Merge with 4.4-Lite-2. This is actually a 64-bit fix; the second parameter to in_control() is sometimes a pointer, and sometimes an integer, so use u_long rather than int.
Obtained from: 4.4BSD-Lite-2
|
10938 |
21-Sep-1995 |
wollman |
Merge with 4.4-Lite-2. This involves changing the version number and moving a declaration around.
Obtained from: 4.4BSD-Lite-2
|
10937 |
21-Sep-1995 |
wollman |
Merge with 4.4-Lite-2. This just adds a couple of tcpstat entries which we don't currently set, but might in the future.
|
10930 |
20-Sep-1995 |
wollman |
Add support in TCP for Path MTU discovery. This is highly experimental and gated on `options MTUDISC' in the source. It is also practically untested becausse (sniff!) I don't have easy access to a network with an MTU of less than an Ethernet. If you have a small MTU network, please try it and tell me if it works!
|
10881 |
18-Sep-1995 |
wollman |
Initial back-end support for IP MTU discovery, gated on MTUDISC. The support for TCP has yet to be written.
|
10714 |
13-Sep-1995 |
wollman |
Don't leak mbufs in an unusual error case in tcp_usrreq().
Reviewed by: Andras Olah <olah@freebsd.org> Obtained from: Lite-2
|
10712 |
13-Sep-1995 |
wollman |
If tcp_output() is unable to allocate space for a copy of the data waiting to be sent, just clean up and return ENOBUFS rather than silently proceeding without sending any of the data. This makes it consistent with the `#ifdef notyet' case immediately above.
Reviewed by: Andras Olah <olah@freebsd.org> Obtained from: Lite-2
|
10421 |
29-Aug-1995 |
wollman |
Fix long-standing bug in ICMPPRINTFS code where NTOHL was used instead of ntohl for printing IP addresses, by instead substituting inet_ntoa() to produce human-readable output.
Obtained from: 4.4-Lite-2
|
10203 |
23-Aug-1995 |
wollman |
Fix some problems with multicast forwarding:
Garrett,
Here are some patches for the rate limiting code. It should be faster, and in particular it doesn't leak malloc'd memory any more when rate_limit'ing a phyint.
It now uses an mbuf chain at each vif, instead of the static queue array. This means that the MAXQSIZE is now variable per vif (although there is no interface to change it other than a debugger); this is an area for more experimentation.
Bill
Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
10095 |
17-Aug-1995 |
olah |
Add a sanity check for the UDP length field in order to prevent malformed UDP packets to panic the kernel. Reviewed by: davidg, wollman Obtained from: dab@berserkly.cray.com (David A. Borman) via end2end list
|
9820 |
31-Jul-1995 |
gpalmer |
Try to make the `syn' blocking code act a bit more sensibly - don't block `syn' packets that have `ack' set. Reviewed by: Submitted by: Obtained from:
|
9818 |
31-Jul-1995 |
olah |
Remove a redundant `if' from tcp_reass().
Correct a typo in a comment (SEND_SYN -> NEEDSYN).
Reviewed by: David Greenman
|
9773 |
29-Jul-1995 |
dg |
Add connection drop capability for persist timeouts.
Reviewed by: Andras Olah Obtained from: 4.4BSD-lite2 via W. Richard Stevens
|
9728 |
26-Jul-1995 |
wollman |
Fix test for determining when RSVP is inactive in a router. (In this case, multicast options are not passed to ip_mforward().) The previous version had a wrong test, thus causing RSVP mrouters to forward RSVP messages in violation of the spec.
|
9682 |
24-Jul-1995 |
wollman |
Declare rsvp_input() to take the correct set of arguments and figure out the receipt interface in the correct way.
|
9680 |
24-Jul-1995 |
wollman |
Completely turn off RSVP intercept when a socket being used for that purpose is PRU_DETACHed. This solves the problem that RSVP would not come up inm raw mode if previously killed.
|
9661 |
23-Jul-1995 |
dg |
Added $Id$.
|
9575 |
18-Jul-1995 |
peter |
Change the compile-time option of DIRECTED_BROADCAST into a sysctl variable underneath ip, "directed-broadcast". Reviewed by: David Greenman Obtained from: NetBSD, by Darren Reed.
|
9563 |
17-Jul-1995 |
wollman |
Return EDESTADDRREQ rather than EADDRNOTAVAIL if the user attempts to half-configure a point-to-point interface.
Submitted by: Jonathan M. Bresler <jmb@kryten.atinc.com>
|
9472 |
10-Jul-1995 |
wollman |
ICMP messages received from broken hosts which reply to multicast packets were mistakenly delivered, rather than getting thrown out, which caused substantial lossage.
Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
9470 |
10-Jul-1995 |
wollman |
tcp_input.c - keep track of how many times a route contained a cached rtt or ssthresh that we were able to use
tcp_var.h - declare tcpstat entries for above; declare tcp_{send,recv}space
in_rmx.c - fill in the MTU and pipe sizes with the defaults TCP would have used anyway in the absence of values here
|
9460 |
09-Jul-1995 |
dg |
Fixed panic that occurs on certain firewall rejected packets that was caused by dtom() being used on an mbuf cluster. The fix involves passing around the mbuf pointer.
Submitted by: Bill Fenner
|
9392 |
04-Jul-1995 |
dg |
Added some spaces for KNF. Moved some zero-initialized pointers into the kernel's .bss.
|
9391 |
04-Jul-1995 |
dg |
This is the end result of about a dozen passes through this code to fix incorrect indents, a variety of poor coding practices such as comparing pointers to constants ('0'), poor code structuring, etc, etc. This brings the code up to the minimum standards for inclusion in FreeBSD.
|
9390 |
04-Jul-1995 |
dg |
Define TRUE and FALSE.
|
9389 |
04-Jul-1995 |
dg |
1) Removed bogus #include 2) Rewrote "bad_packet" code to be less buggy and more readable. 3) Removed a pile of goto's; the code is now somewhat less reminiscent of a certain Italian pasta. 4) Changed all boolean returns of "0" and "1" to FALSE/TRUE.
|
9386 |
02-Jul-1995 |
joerg |
Slightly modify my previous change to return EINVAL instead of EFAULT.
Submitted by: Peter Wemm
|
9383 |
01-Jul-1995 |
joerg |
I saw a very low-key commit message on the netbsd mailing lists and figured out what the problem was.. Anyway, I rate it as "highly serious".
Submitted by: peter@haywire.DIALix.COM (Peter Wemm)
|
9373 |
29-Jun-1995 |
wollman |
Keep track of the number of samples through the srtt filter so that we know better when to cache values in the route, rather than relying on a heuristic involving sequence numbers that broke when tcp_sendspace was increased to 16k.
|
9359 |
28-Jun-1995 |
gpalmer |
Add a missing `goto' statement so that this compiles yet again.
|
9347 |
28-Jun-1995 |
dg |
Added function prototypes for ip_rsvp_vif_init, ip_rsvp_vif_done, and ip_rsvp_force_done.
|
9339 |
27-Jun-1995 |
wollman |
Delete obsolete #if 0 block.
|
9338 |
27-Jun-1995 |
guido |
reject option in ip_fw used to panic the system. This fixes it.
-Guido Reviewed by: Submitted by: Obtained from:
|
9334 |
26-Jun-1995 |
wollman |
From Bill Fenner:
> Also, I don't remember if I sent you this; it affects PIM assert processing.
Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
9333 |
26-Jun-1995 |
wollman |
Corrected a bug that caused protocol-4 tunnels (used for multicast forwarding between networks that aren't directly connected) not to work by intercepting the wrong protocol number. This should fix a bug reported previously by someone I don't remember.
|
9279 |
21-Jun-1995 |
wollman |
Fix an error in the comparison direction of the ap->updating case of in_rtqkill().
Submitted by: W. Richard Stevens
|
9266 |
19-Jun-1995 |
wollman |
Fix a resource allocation bug where multicast forwarding would leak mbufs in certain cases when allocation of another mbuf has already failed.
Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
9263 |
19-Jun-1995 |
wollman |
Now that we've gone to all sorts of effort to allow TCP to cache some of its connection parameters, we want to keep statistics on how often this actually happens to see whether there is any work that needs to be done in TCP itself.
Suggested by: John Wroclawski <jtw@lcs.mit.edu>
|
9209 |
13-Jun-1995 |
wollman |
Kernel side of 3.5 multicast routing code, based on work by Bill Fenner and other work done here. The LKM support is probably broken, but it still compiles and will be fixed later.
|
9202 |
11-Jun-1995 |
rgrimes |
Merge RELENG_2_0_5 into HEAD
|
8876 |
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
8546 |
16-May-1995 |
dg |
These diffs modify the behaviour of multicast clients to conform with the IGMPv2 spec. This fixes the following bugs:
o ntohs() on a char provides silly results o timer needs to be scaled to units of PR_FASTHZ; this was being done inconsistenly so now it gets done when it is initialized.
Reviewed by: Garrett Wollman Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
8483 |
12-May-1995 |
ache |
Fix getsockopt(IP_ACCT_*) to not panic kernel Submitted by: Bill Fenner <fenner@parc.xerox.com>
|
8456 |
11-May-1995 |
rgrimes |
Fix -Wformat warnings from LINT kernel.
|
8429 |
11-May-1995 |
dg |
#ifdef'd my Nagel/ACK hack with "TCP_ACK_HACK", disabled by default. I'm currently considering reducing the TCP fasttimo to 100ms to help improve things, but this would be done as a seperate step at some point in the future. This was done because it was causing some sometimes serious performance problems with T/TCP.
|
8426 |
11-May-1995 |
wollman |
Make networking domains drop-ins, through the magic of GNU ld. (Some day, there may even be LKMs.) Also, change the internal name of `unixdomain' to `localdomain' since AF_LOCAL is now the preferred name of this family. Declare netisr correctly and in the right place.
|
8384 |
09-May-1995 |
dg |
Replaced some bcopy()'s with memcpy()'s so that gcc while inline/optimize.
|
8377 |
09-May-1995 |
olah |
Fix a misspelled constant in tcp_input.c.
On Tue, 09 May 1995 04:35:27 PDT, Richard Stevens wrote: > In tcp_dooptions() under the case TCPOPT_CC there is an assignment > > to->to_flag |= TCPOPT_CC; > > that should be > > to->to_flag |= TOF_CC; > > I haven't thought through the ramifications of what's been happening ... > > Rich Stevens
Submitted by: rstevens@noao.edu (Richard Stevens)
|
8293 |
05-May-1995 |
ache |
Add IPTOS_MINCOST according to RFC 1349 Change IPTOS_PREC_ROUTINE to 0 (was conflict with IPTOS_LOWDELAY) according to RFC 791 (unchanged since it) and BSDI 2.0 style Submitted by: Igor Sviridov <siac@ua.net>
|
8235 |
03-May-1995 |
dg |
Changed in_pcblookuphash() to not automatically call in_pcblookup() if the lookup fails. Updated callers to deal with this. Call in_pcblookuphash instead of in_pcblookup() in in_pcbconnect; this improves performance of UDP output by about 17% in the standard case.
|
8090 |
26-Apr-1995 |
pst |
Cleanup loopback interface support. Reviewed by: wollman
|
8071 |
25-Apr-1995 |
wollman |
Disallow half-configured point-to-point interfaces. It's still possible to get into a half-configured state by using the old-style ioctls;this may be a feature.
|
7933 |
19-Apr-1995 |
olah |
Include <sys/queue.h> because <netinet/in_pcb.h> (also included later in tcp_debug.c) requires it due to the pcb changes of DavidG.
|
7770 |
12-Apr-1995 |
dg |
Fixed bug I introduced when changing PCB list to use 4.4BSD style queue macros. Basically, detect 'tp' going away differently.
|
7738 |
10-Apr-1995 |
dg |
Further satisfy my paranoia by making sure that the ACKNOW is only set when ti_len is non-zero.
|
7737 |
10-Apr-1995 |
dg |
Fixed bug I introduced with my Nagel hack which caused tcp_input and tcp_output to loop endlessly. This was freefall's problem during the past day.
|
7735 |
10-Apr-1995 |
dg |
Added splnet protections for PCB list manipulations and traversals.
|
7728 |
10-Apr-1995 |
dg |
Backed out Jordan's #include of queue.h
|
7720 |
09-Apr-1995 |
jkh |
#include <sys/queue.h> or die horribly.
|
7684 |
09-Apr-1995 |
dg |
Implemented PCB hashing. Includes new functions in_pcbinshash, in_pcbrehash, and in_pcblookuphash.
|
7634 |
05-Apr-1995 |
olah |
Fix a bug in tcp_input reported by Rick Jones <raj@hpisrdq.cup.hp.com>.
If a goto findpcb occurred during the processing of a segment, the TCP and IP headers were dropped twice from the mbuf which resulted in data acked by TCP but not delivered to the user. Reviewed by: davidg
|
7593 |
02-Apr-1995 |
bde |
Remove redundant declarations.
|
7575 |
02-Apr-1995 |
wpaul |
Add declaration for struct ether_addr (this is where Sun documents it to go).
|
7504 |
30-Mar-1995 |
dg |
Backed out changes in rev 1.5 that prevent sending FIN if in CLOSING state. This causes an infinite loop in some rare cases (probably caused by some other, much more difficult to find bug).
|
7417 |
27-Mar-1995 |
dg |
Re-apply my "breakage" to the Nagel congestion avoidence. This version differs slightly in the logic from the previous version; packets are now acked immediately if the sender set PUSH.
|
7280 |
23-Mar-1995 |
wollman |
in_var.h: in_multi structures now form a queue(3)-style LIST structure in.c: when an interface address is deleted, keep its multicast membership . records (attached to a struct multi_kludge) for attachment to the . next address on the same interface. Also, in_multi structures now . gain a reference to the ifaddr so that they won't point off into . freed memory if an interface goes away and doesn't come back before . the last socket reference drops. This is analogous to how it is . done for routes, and seems to make the most sense.
|
7191 |
20-Mar-1995 |
wollman |
This should be splimp() rather than splnet() since ifaddrs might go away as a result of link-layer processing.
|
7190 |
20-Mar-1995 |
wollman |
Fix race conditions involved in setting IP multicast options. This should fix Dennis Fortin's problem for good, if I've got it figured out right.
(The problem was that a `struct ifaddr' could get deleted out from under the current requester, thus leaving him with an invalid interface pointer and causing even more bogus accesses.)
|
7170 |
19-Mar-1995 |
dg |
Removed redundant newlines that were in some panic strings.
|
7091 |
16-Mar-1995 |
wollman |
Reject source routes unless configured on by administrator.
|
7090 |
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
7088 |
16-Mar-1995 |
wollman |
Add inet_ntoa() and replace ARP's private routine with same.
|
7083 |
16-Mar-1995 |
wollman |
This set of patches enables IP multicasting to work under FreeBSD. I am submitting them as context diffs for the following files:
sys/netinet/ip_mroute.c sys/netinet/ip_var.h sys/netinet/raw_ip.c usr.sbin/mrouted/igmp.c usr.sbin/mrouted/prune.c
The routine rip_ip_input in raw_ip.c is suggested by Mark Tinguely (tinguely@plains.nodak.edu). I have been running mrouted with these patches for over a week and nothing has seemed seriously wrong. It is being run in two places on our network as a tunnel on one and a subnet querier on the other. The only problem I have run into is that mrouted on the tunnel must start up last or the pruning isn't done correctly and multicast packets flood your subnets.
Submitted by: Soochon Radee <slr@mitre.org>
|
7060 |
14-Mar-1995 |
dg |
pcb allocations are not always done on behalf of a process; it is not okay to wait.
|
7055 |
14-Mar-1995 |
dg |
Added support for generic FDDI and the DEC DEFEA and DEFPA FDDI adapters.
Submitted by: Matt Thomas
|
7035 |
12-Mar-1995 |
ugen |
Allocate memory as M_IPFW,now we can watch firewall memory usage in vmstat..
|
6922 |
06-Mar-1995 |
nate |
Removed unnecessary define for TCPOUTFLAGS since they are not used.
|
6835 |
02-Mar-1995 |
dg |
Move exact match pcb's to the head of the list to improve lookup performance.
|
6690 |
24-Feb-1995 |
ugen |
Allow "via" to be specified ever as IP adress or as interface name/unit...
|
6616 |
22-Feb-1995 |
bde |
Fix benign type mismatch.
|
6568 |
20-Feb-1995 |
dg |
Added missing newlines to calls to log().
|
6510 |
17-Feb-1995 |
wollman |
Include missing <sys/kernel.h> for `hz'.
Submitted by: David Greenman, Rod Grimes, Christoph Kukulies
|
6483 |
16-Feb-1995 |
wollman |
Don't need to retransmit FIN bit in CLOSING state.
Obtained from: Stevens, vol. 2, exercise 29.5 (solution p. 1090)
|
6482 |
16-Feb-1995 |
wollman |
spl back down in unusual out-of-memory condition in udp_output().
Obtained from: Stevens, vol. 2, exercise 23.4 (solution p. 1083)
|
6481 |
16-Feb-1995 |
wollman |
Correctly initialize so_linger in ticks (not seconds).
Obtained from: Stevens, vol. 2, p. 1010
|
6480 |
16-Feb-1995 |
wollman |
Avoid deadlock situation described by Stevens using his suggested replacement code.
Obtained from: Stevens, vol. 2, pp. 959-960
|
6479 |
16-Feb-1995 |
wollman |
Don't add back in the IP header length to ip_len; icmp_error will do it for us.
Obtained from: Stevens, vol. 2, p. 774
|
6475 |
16-Feb-1995 |
wollman |
Transaction TCP support now standard. Hack away!
|
6472 |
16-Feb-1995 |
wollman |
Add lots of useful MIB variables and a few not-so-useful ones for completeness.
|
6400 |
14-Feb-1995 |
wollman |
After dynamically reducing rtq_reallyold, have in_rtqkill() reduce the expiration timer of anything which would expire later than that. (There should be a way to call this from ip_sysctl() as well, but there currently isn't.)
|
6399 |
14-Feb-1995 |
wollman |
Attempt to make the host route cache a bit smarter under conditions of high load:
1) If there ever get to be more than net.inet.ip.rtmaxcache entries in the cache, in_rtqtimo() will reduce net.inet.ip.rtexpire by 1/3 and do another round, unles net.inet.ip.rtexpire is less than net.inet.ip.rtminexpire, and never more than once in ten minutes (rtq_timeout).
2) If net.inet.ip.rtexpire is set to zero, don't bother to cache anything.
|
6363 |
14-Feb-1995 |
phk |
YFfix.
|
6362 |
14-Feb-1995 |
phk |
YPfix
|
6348 |
14-Feb-1995 |
wollman |
Get rid of some unneeded #ifdef TTCP lines. Also, get rid of some bogus commons declared in header files.
|
6283 |
09-Feb-1995 |
wollman |
Merge Transaction TCP, courtesy of Andras Olah <olah@cs.utwente.nl> and Bob Braden <braden@isi.edu>.
NB: This has not had David's TCP ACK hack re-integrated. It is not clear what the correct solution to this problem is, if any. If a better solution doesn't pop up in response to this message, I'll put David's code back in (or he's welcome to do so himself).
|
6257 |
09-Feb-1995 |
dg |
Fixed another TTCP ifdef problem...there isn't any tcp_sysctl field in !TTCP.
|
6256 |
09-Feb-1995 |
dg |
Fix/#ifdef prototype for tcp_mss...apparantly overlooked by Garrett.
|
6248 |
08-Feb-1995 |
wollman |
T/TCP changes to generic IP code. This is all ifdefed TTCP so should have no effect on most users for now. (Eventually, once this code is fully tested, the ifdefs will go away.)
|
6247 |
08-Feb-1995 |
wollman |
Merge in T/TCP TCP header file changes.
|
6237 |
07-Feb-1995 |
gpalmer |
Remove a possible loophole - previously the code wouldn't pass packets destined to the loopback address to the packet filter.
Reviewed by: "Ugen J.S.Antsilevich" <ugen@netvision.net.il>
|
6224 |
07-Feb-1995 |
wollman |
Make sure to disable RSVP intercept when the socket is closed.
|
5941 |
26-Jan-1995 |
wollman |
Correct long-standing error in the RSVP hooks (would initialize but never return success).
|
5936 |
26-Jan-1995 |
ugen |
ip_fwdef.c was missing some assignments , and this caused that bug by which firewall code was not working if configured into kernel and worked only as lkm. Now this must be fixed...Sorry guys..
|
5919 |
26-Jan-1995 |
dg |
Kill previous commit as it isn't necessary.
|
5835 |
24-Jan-1995 |
dg |
Extended the previous change to cover the non-options case, too.
|
5802 |
23-Jan-1995 |
dg |
Applied fix from Andreas Schulz with a different comment by me. Fixes a bug where TCP connections are closed prematurely.
Submitted by: Andreas Schulz
|
5792 |
23-Jan-1995 |
wollman |
Change caching strategy somewhat: 1) Don't clone routes to multicast destinations; there is nothing useful to be gained in this case. 2) Reduce default expiration timer to one hour. Busy sites will still likely want to reduce this, but for ordinary users this is a reasonable value to use.
|
5543 |
12-Jan-1995 |
ugen |
Actual firewall change. 1) Firewall is not subdivided on forwarding / blocking chains anymore.Actually only one chain left-it was the blocking one. 2) LKM support.ip_fwdef.c is function pointers definition and goes into kernel along with all INET stuff.
|
5534 |
12-Jan-1995 |
dg |
Fixed mbuf lossage when level != IPPROTO_IP. Problem reported by Robert Dobbs, hint from Charles Hannum, fix by me.
|
5196 |
22-Dec-1994 |
wollman |
Make arp_rtrequest() static since nobody needs to referene it any more.
|
5195 |
22-Dec-1994 |
wollman |
Move ARP interface initialization into if_ether.c:arp_ifinit().
|
5180 |
21-Dec-1994 |
wollman |
Avoid a serious race by blocking netisrs while walking the route tree. (IWBRNI we could just block IP netisrs...)
|
5179 |
21-Dec-1994 |
wollman |
Correct sysctl info so that net.inet.ip.rtexpire is actually accessible.
|
5112 |
15-Dec-1994 |
wollman |
Fix PR 59: don't allow TCP connections withmulticast addresses at either end.
|
5109 |
14-Dec-1994 |
wollman |
Make rtq_reallyold user-configurable via sysctl.
|
5105 |
13-Dec-1994 |
wollman |
Call rtalloc_ign() so that protocol cloning will not occur at the IP layer.
|
5101 |
13-Dec-1994 |
wollman |
Update calls to rtalloc1(). Also merge rt_prflags with rt_flags.
|
5089 |
13-Dec-1994 |
ugen |
Add clear one accounting entry control. Structure fields changed to seem more standart.
|
5086 |
12-Dec-1994 |
ugen |
Late patch for delete control..
|
5085 |
12-Dec-1994 |
ugen |
Add match by interface from which packet arrived (via) Handle right fragmented packets. Remove checking option from kernel..
|
5045 |
11-Dec-1994 |
wollman |
Advanced route cache management is now an official part of IP support.
|
4909 |
02-Dec-1994 |
wollman |
Delete old, confusing comment.
|
4896 |
02-Dec-1994 |
wollman |
Add a check to make sure that we don't fiddle with the NFS routing tables as well (bleah!). Also, increase the interval to the real-life value and eliminate debugging printfs. This will be standard once tested by others.
|
4893 |
01-Dec-1994 |
wollman |
Add latest version of ``advanced route metric management'' :-) As before, this is currently conditionalized on options IN_RMX until I'm sure it's working.
|
4849 |
28-Nov-1994 |
ugen |
Added: ICMP reply,TCP SYN check,logging..
|
4523 |
16-Nov-1994 |
jkh |
Ugen J.S.Antsilevich's latest, happiest, IP firewall code. Poul: Please take this into BETA. It's non-intrusive, and a rather substantial improvement over what was there before.
|
4286 |
08-Nov-1994 |
jkh |
Ugen makes it in with 10 seconds to spare with a one-char diff. Some people are born lucky.. Submitted by: ugen
|
4277 |
08-Nov-1994 |
jkh |
Almost 12th hour (the 11th hour was almost an hour ago :-) patches from Ugen.
|
4234 |
07-Nov-1994 |
jkh |
2 11th-hour fixes from Ugen (not Uben, sorry!) J.S.Antsilevich. I think it's time for Ugen to get a freefall account, just so I can direct mail at him directly and let him drop off patches for us here. Ugen? Done! Submitted by: ugen
|
4127 |
03-Nov-1994 |
wollman |
Fix off-by-one error reported to NetBSD by Karl Fox in <9411031449.AA11102@gefilte.MorningStar.Com>.
|
4105 |
03-Nov-1994 |
wollman |
Completely replace JTW's idea with my (incompletely implemented) original idea. This is les likely to crash your machine. As before, this code is only enabled under `options IN_RMX'.
|
4074 |
02-Nov-1994 |
wollman |
This is the file that actually implements the smarter behavior.
|
4073 |
02-Nov-1994 |
wollman |
Add code to be a bit smarter about IP routes, conditioned on the option IN_RMX. (Eventually this will be standard, but I just wrote the code today and don't want to break anyone.)
|
4069 |
02-Nov-1994 |
wollman |
Clean up ARP error messages: format IP addresses, explain arplookup() failures in English.
|
4036 |
31-Oct-1994 |
jkh |
Latest changes from Uben. Submitted by: uben
|
4028 |
31-Oct-1994 |
pst |
Detect old-style multicast routers and interoperate properly
|
3969 |
28-Oct-1994 |
jkh |
IP Firewall code from Daniel Boulet and J.S.Antsilevich Submitted by: danny ugen
|
3865 |
25-Oct-1994 |
swallace |
Patch for proper multicast support on point-to-point links. Submitted by: apg@demos.su (Paul Antonov) - patch020
|
3747 |
21-Oct-1994 |
wollman |
Bug fixes from John Brezak.
|
3571 |
13-Oct-1994 |
wollman |
Fix some endianness and packet header bugs found in BSDi's port of this code. (From mbone mailing-list.)
|
3561 |
13-Oct-1994 |
wollman |
As suggested by Sally Floyd, don't add the ``small fraction of the window size'' when doing congestion avoidance.
Submitted by: Mark Andrews
|
3514 |
11-Oct-1994 |
wollman |
Fix a bug which caused panics when attempting to change just the flags of a route. (This still doesn't work, but it doesn't panic now.) It looks like there may be a number of incipient bugs in this code.
Also, get ready for the time when all IP gateway routes are cloning, which is necessary to keep proper TCP statistics.
|
3497 |
10-Oct-1994 |
phk |
Cosmetics. Silence gcc -Wall.
|
3444 |
08-Oct-1994 |
phk |
Cosmetics: silences gcc -Wall.
|
3311 |
02-Oct-1994 |
phk |
GCC cleanup. Reviewed by: Submitted by: Obtained from:
|
3282 |
01-Oct-1994 |
wollman |
Implement full proxy ARP, gated on option ARP_PROXYALL. This allows a FreeBSD box to do proxy ARP as easily as most commercial routers do, without messing around with (potentially variable) Ethernet addresses. This code is really quite simple; I'm not at all sure why it wasn't implemented in 4.4.
It might be worth stealing an interface flag (maybe IFF_LINK1) to use for finer-grained control over which interfaces get proxy treatment. For the moment, it's all or nothing.
|
2822 |
16-Sep-1994 |
phk |
Made the kernel compile even without "ether".
|
2788 |
15-Sep-1994 |
dg |
Made TCPDEBUG truely optional. Based on changes I made in FreeBSD 1.1.5. Fixed somebody's idea of a joke - about the first half of the lines in in_proto.c were spaced over by one space.
|
2763 |
14-Sep-1994 |
wollman |
Add code to make multicast routing be an LKM.
|
2754 |
14-Sep-1994 |
wollman |
Shuffle some functions and variables around to make it possible for multicast routing to be implemented as an LKM. (There's still a bit of work to do in this area.)
|
2628 |
09-Sep-1994 |
wollman |
Disable IPMULTICAST_VIF socket option when MROUTING is not defined, since it doesn'tmake any sense for non-routers. CVS:
|
2531 |
06-Sep-1994 |
wollman |
Initial get-the-easy-case-working upgrade of the multicast code to something more recent than the ancient 1.2 release contained in 4.4. This code has the following advantages as compared to previous versions (culled from the README file for the SunOS release):
- True multicast delivery - Configurable rate-limiting of forwarded multicast traffic on each physical interface or tunnel, using a token-bucket limiter. - Simplistic classification of packets for prioritized dropping. - Administrative scoping of multicast address ranges. - Faster detection of hosts leaving groups. - Support for multicast traceroute (code not yet available). - Support for RSVP, the Resource Reservation Protocol.
What still needs to be done:
- The multicast forwarder needs testing. - The multicast routing daemon needs to be ported. - Network interface drivers need to have the `#ifdef MULTICAST' goop ripped out of them. - The IGMP code should probably be bogon-tested.
Some notes about the porting process:
In some cases, the Berkeley people decided to incorporate functionality from later releases of the multicast code, but then had to do things differently. As a result, if you look at Deering's patches, and then look at our code, it is not always obvious whether the patch even applies. Let the reader beware.
I ran ip_mroute.c through several passes of `unifdef' to get rid of useless grot, and to permanently enable the RSVP support, which we will include as standard.
Ported by: Garrett Wollman Submitted by: Steve Deering and Ajit Thyagarajan (among others)
|
2304 |
26-Aug-1994 |
wollman |
Obey RFC 793, section 3.4:
Several examples of connection initiation follow. Although these examples do not show connection synchronization using data-carrying segments, this is perfectly legitimate, so long as the receiving TCP doesn't deliver the data to the user until it is clear the data is valid (i.e., the data must be buffered at the receiver until the connection reaches the ESTABLISHED state).
|
2169 |
21-Aug-1994 |
paul |
Made idempotent.
Submitted by: Paul
|
2112 |
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
1817 |
02-Aug-1994 |
dg |
Added $Id$
|
1813 |
01-Aug-1994 |
dg |
fixed bug where large amounts of unidirectional UDP traffic would fill the interface output queue and further udp packets would be fragmented and only partially sent - keeping the output queue full and jamming the network, but not actually getting any real work done (because you can't send just 'part' of a udp packet - if you fragment it, you must send the whole thing). The fix involves adding a check to make sure that the output queue has sufficient space for all of the fragments.
|
1812 |
01-Aug-1994 |
dg |
Fixed bug with Nagel Congestion Avoidance where a tcp connection would stall unnecessarily - always send an ACK when a packet len of < mss is received.
|
1621 |
29-May-1994 |
dg |
Increased tcp_send/recvspace to 16k, and added TCP_SMALLSPACE ifdef to set it to 4k.
|
1565 |
26-May-1994 |
dg |
Added missing ntohl()'s that are needed before calling IN_MULTICAST in a couple of places. Submitted by: Johannes Helander
|
1549 |
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
1542 |
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
1541 |
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|