#
332821 |
|
20-Apr-2018 |
jtl |
MFC r331309: If the INP lock is uncontested, avoid taking a reference and jumping through the lock-switching hoops.
A few of the INP lookup operations that lock INPs after the lookup do so using this mechanism (to maintain lock ordering):
1. Lock lookup structure. 2. Find INP. 3. Acquire reference on INP. 4. Drop lock on lookup structure. 5. Acquire INP lock. 6. Drop reference on INP.
This change provides a slightly shorter path for cases where the INP lock is uncontested:
1. Lock lookup structure. 2. Find INP. 3. Try to acquire the INP lock. 4. If successful, drop lock on lookup structure.
Of course, if the INP lock is contested, the functions will need to revert to the previous way of switching locks safely.
This saves a few atomic operations when the INP lock is uncontested.
Sponsored by: Netflix, Inc.
|
#
331722 |
|
29-Mar-2018 |
eadler |
Revert r330897:
This was intended to be a non-functional change. It wasn't. The commit message was thus wrong. In addition it broke arm, and merged crypto related code.
Revert with prejudice.
This revert skips files touched in r316370 since that commit was since MFCed. This revert also skips files that require $FreeBSD$ property changes.
Thank you to those who helped me get out of this mess including but not limited to gonzo, kevans, rgrimes.
Requested by: gjb (re)
|
#
331643 |
|
27-Mar-2018 |
dim |
MFC r314568 (by emaste):
kern_sig.c: ANSIfy and remove archaic register keyword
Sponsored by: The FreeBSD Foundation
MFC r318389 (by emaste):
Remove register keyword from sys/ and ANSIfy prototypes
A long long time ago the register keyword told the compiler to store the corresponding variable in a CPU register, but it is not relevant for any compiler used in the FreeBSD world today.
ANSIfy related prototypes while here.
Reviewed by: cem, jhb Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10193
|
#
330897 |
|
14-Mar-2018 |
eadler |
Partial merge of the SPDX changes
These changes are incomplete but are making it difficult to determine what other changes can/should be merged.
No objections from: pfg
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
301217 |
|
02-Jun-2016 |
gnn |
This change re-adds L2 caching for TCP and UDP, as originally added in D4306 but removed due to other changes in the system. Restore the llentry pointer to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as appropriate.
Submitted by: Mike Karels Differential Revision: https://reviews.freebsd.org/D6262
|
#
297225 |
|
24-Mar-2016 |
gnn |
FreeBSD previously provided route caching for TCP (and UDP). Re-add route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted.
Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306
|
#
293633 |
|
10-Jan-2016 |
melifaro |
Split in6_selectsrc() into in6_selectsrc_addr() and in6_selectsrc_socket().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and socket-less (ND code). The main reason for that change is inability to specify non-default FIB for callers w/o socket since (internally) inpcb is used to determine fib.
As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc() static): 1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check along with returning hop limit when needed. 2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and pass IPv6 address w/ explicitly specified scope as separate argument.
Reviewed by: ae (previous version)
|
#
293101 |
|
03-Jan-2016 |
melifaro |
Remove 'struct route_int6' argument from in6_selectsrc() and in6_selectif().
The main task of in6_selectsrc() is to return IPv6 SAS (along with output interface used for scope checks). No data-path code uses route argument for caching. The only users are icmp6 (reflect code), ND6 ns/na generation code. All this fucntions are control-plane, so there is no reason to try to 'optimize' something by passing cached route into to ip6_output(). Given that, simplify code by eliminating in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is used only by in6_selectsrc(), eliminate its 'struct route_in6' argument, too. While here, reshape rte-related code inside in6_selectif() to free lookup result immediately after saving all the needed fields.
|
#
286227 |
|
03-Aug-2015 |
jch |
Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list stability during full list traversal (e.g. tcp_pcblist()).
- A new INP_LIST lock protects inpcb list actual modifications (inp allocation and free) and inpcb global counters.
It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input()) and INP_INFO_WLOCK only in occasional operations that walk all connections.
PR: 183659 Differential Revision: https://reviews.freebsd.org/D2599 Reviewed by: jhb, adrian Tested by: adrian, nitroboost-gmail.com Sponsored by: Verisign, Inc.
|
#
279684 |
|
06-Mar-2015 |
ae |
tcp6_ctlinput() doesn't pass MTU value to in6_pcbnotify(). Check cmdarg isn't NULL before dereference, this check was in the ip6_notify_pmtu() before r279588.
Reported by: Florian Smeets MFC after: 1 week
|
#
279588 |
|
04-Mar-2015 |
ae |
Fix deadlock in IPv6 PCB code.
When several threads are trying to send datagram to the same destination, but fragmentation is disabled and datagram size exceeds link MTU, ip6_output() calls pfctlinput2(PRC_MSGSIZE). It does notify all sockets wanted to know MTU to this destination. And since all threads hold PCB lock while sending, taking the lock for each PCB in the in6_pcbnotify() leads to deadlock.
RFC 3542 p.11.3 suggests notify all application wanted to receive IPV6_PATHMTU ancillary data for each ICMPv6 packet too big message. But it doesn't require this, when we don't receive ICMPv6 message.
Change ip6_notify_pmtu() function to be able use it directly from ip6_output() to notify only one socket, and to notify all sockets when ICMPv6 packet too big message received.
PR: 197059 Differential Revision: https://reviews.freebsd.org/D1949 Reviewed by: no objection from #network Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC
|
#
275358 |
|
01-Dec-2014 |
hselasky |
Start process of removing the use of the deprecated "M_FLOWID" flag from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before.
Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped.
MFC after: 1 month Sponsored by: Mellanox Technologies
|
#
274331 |
|
09-Nov-2014 |
melifaro |
Renove faith(4) and faithd(8) from base. It looks like industry have chosen different (and more traditional) stateless/statuful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD.
No objections from: net@
|
#
271391 |
|
10-Sep-2014 |
ae |
Make in6_pcblookup_hash_locked and in6_pcbladdr static.
Obtained from: Yandex LLC Sponsored by: Yandex LLC
|
#
271386 |
|
10-Sep-2014 |
ae |
Introduce INP6_PCBHASHKEY macro. Replace usage of hardcoded part of IPv6 address as hash key in all places.
Obtained from: Yandex LLC
|
#
268562 |
|
12-Jul-2014 |
adrian |
Add IPv6 flowid, bindmulti and RSS awareness.
|
#
263198 |
|
14-Mar-2014 |
rwatson |
Several years after initial development, merge prototype support for linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets.
(1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs.
(2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed.
(3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context).
(4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead.
Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement.
No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers.
Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge)
|
#
257176 |
|
26-Oct-2013 |
glebius |
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
253282 |
|
12-Jul-2013 |
trociny |
A complete duplication of binding should be allowed if on both new and duplicated sockets a multicast address is bound and either SO_REUSEPORT or SO_REUSEADDR is set.
But actually it works for the following combinations:
* SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new; * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new; * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;
and fails for this:
* SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.
Fix the last case.
PR: 179901 MFC after: 1 month
|
#
252710 |
|
04-Jul-2013 |
trociny |
In r227207, to fix the issue with possible NULL inp_socket pointer dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly.
Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set.
Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR.
PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week
|
#
249837 |
|
24-Apr-2013 |
ae |
Remove unused variable.
MFC after: 1 week
|
#
233272 |
|
21-Mar-2012 |
glebius |
in6_pcblookup_local() still can return a pcb with NULL inp_socket. To avoid panic, do not dereference inp_socket, but obtain reuse port option from inp_flags2, like this is done after next call to in_pcblookup_local() a few lines down below.
Submitted by: rwatson
|
#
227449 |
|
11-Nov-2011 |
trociny |
Fix false positive EADDRINUSE that could be returned by bind, due to the typo made in r227207.
Reported by: kib Tested by: kib
|
#
227207 |
|
06-Nov-2011 |
trociny |
Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid inp_socket->so_options dereference when we may not acquire the lock on the inpcb.
This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked.
Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>
|
#
227206 |
|
06-Nov-2011 |
trociny |
Before dereferencing intotw() check for NULL, the same way as it is done for in_pcb (see r157474).
MFC after: 1 week
|
#
222748 |
|
06-Jun-2011 |
rwatson |
Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table.
Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow).
Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state.
Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture.
Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz).
Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default.
Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x.
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
222691 |
|
04-Jun-2011 |
rwatson |
Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes).
Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently.
(In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.)
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
222488 |
|
30-May-2011 |
rwatson |
Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required.
A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary.
Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
222215 |
|
23-May-2011 |
rwatson |
Move from passing a wildcard boolean to a general set up lookup flags into in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly for IPv6 functions. In the future, we would like to support other flags relating to locking strategy.
This change doesn't appear to modify the KBI in practice, as callers already passed in INPLOOKUP_WILDCARD rather than a simple boolean.
MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.
|
#
221247 |
|
30-Apr-2011 |
bz |
Make the PCB code compile without INET support by adding #ifdef INETs and correcting few #includes.
Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
#
219570 |
|
12-Mar-2011 |
bz |
Push a possible "unbind" in some situation from in6_pcbsetport() to callers. This also fixes a problem when the prison call could set the inp->in6p_laddr (laddr) and a following priv_check_cred() call would return an error and will allow us to merge the IPv4 and IPv6 implementation.
MFC after: 2 weeks
|
#
204072 |
|
18-Feb-2010 |
pjd |
No need to include security/mac/mac_framework.h here.
|
#
202915 |
|
24-Jan-2010 |
bz |
Correct a typo.
Submitted by: kensmith MFC after: 3 days
|
#
196019 |
|
01-Aug-2009 |
rwatson |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes.
Reviewed by: bz Approved by: re (vimage blanket)
|
#
195699 |
|
14-Jul-2009 |
rwatson |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
#
194907 |
|
24-Jun-2009 |
rwatson |
Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible.
Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)
|
#
194777 |
|
23-Jun-2009 |
bz |
Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory to save the selected source address rather than returning an unreferenced copy to a pointer that might long be gone by the time we use the pointer for anything meaningful.
Asked for by: rwatson Reviewed by: rwatson
|
#
194760 |
|
23-Jun-2009 |
rwatson |
Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references:
ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr
Remove unused macro which didn't have required referencing:
IFP_TO_IA6
This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references.
Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed.
Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
|
#
193511 |
|
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
#
193217 |
|
01-Jun-2009 |
pjd |
- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option as there is no more space for more SOL_SOCKET options, but this option also fits better as an IP socket option, it seems. - Implement this functionality also for IPv6 and RAW IP sockets. - Always compile it in (don't use additional kernel options). - Remove sysctl to turn this functionality on and off. - Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this functionality (currently only unjail root can use it).
Discussed with: julian, adrian, jhb, rwatson, kmacy
|
#
192895 |
|
27-May-2009 |
jamie |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings.
Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call.
Approved by: bz (mentor)
|
#
191672 |
|
29-Apr-2009 |
bms |
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes:
* Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING.
NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr.
This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.
|
#
189848 |
|
15-Mar-2009 |
rwatson |
Correct a number of evolved problems with inp_vflag and inp_flags: certain flags that should have been in inp_flags ended up in inp_vflag, meaning that they were inconsistently locked, and in one case, interpreted. Move the following flags from inp_vflag to gaps in the inp_flags space (and clean up the inp_flags constants to make gaps more obvious to future takers):
INP_TIMEWAIT INP_SOCKREF INP_ONESBCAST INP_DROPPED
Some aspects of this change have no effect on kernel ABI at all, as these are UDP/TCP/IP-internal uses; however, netstat and sockstat detect INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this into account.
MFC after: 1 week (or after dependencies are MFC'd) Reviewed by: bz
|
#
188148 |
|
05-Feb-2009 |
jamie |
Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of prison_local_ip6 in in6_pcbbind.
Approved by: bz (mentor)
|
#
188144 |
|
05-Feb-2009 |
jamie |
Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL.
Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls.
Approved by: bz (mentor)
|
#
186141 |
|
15-Dec-2008 |
bz |
Another step assimilating IPv[46] PCB code - directly use the inpcb names rather than the following IPv6 compat macros: in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag, in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and sotoin6pcb().
Apart from removing duplicate code in netipsec, this is a pure whitespace, not a functional change.
Discussed with: rwatson Reviewed by: rwatson (version before review requested changes) MFC after: 4 weeks (set the timer and see then)
|
#
185571 |
|
02-Dec-2008 |
bz |
Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files.
For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h.
Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
|
#
185435 |
|
29-Nov-2008 |
bz |
MFp4: Bring in updated jail support from bz_jail branch.
This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,..
SCTP support was updated and supports IPv6 in jails as well.
Cpuset support permits jails to be bound to specific processor sets after creation.
Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future.
DDB 'show jails' command was added to aid debugging.
Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities.
Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years.
Bump __FreeBSD_version for the afore mentioned and in kernel changes.
Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this.
Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
|
#
185370 |
|
27-Nov-2008 |
bz |
Merge in6_pcbfree() into in_pcbfree() which after the previous IPsec change in r185366 only differed in two additonal IPv6 lines. Rather than splattering conditional code everywhere add the v6 check centrally at this single place.
Reviewed by: rwatson (as part of a larger changset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
|
#
185366 |
|
27-Nov-2008 |
bz |
Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy. Ignoring different names because of macros (in6pcb, in6p_sp) and inp vs. in6p variable name both functions were entirely identical.
Reviewed by: rwatson (as part of a larger changeset) MFC after: 6 weeks (*) (*) possibly need to leave a stub wrappers in 7 to keep the symbols.
|
#
185344 |
|
26-Nov-2008 |
bz |
Remove in6_pcbdetach() as it is exactly the same function as in_pcbdetach() and we don't need the code twice.
Reviewed by: rwatson MFC after: 6 weeks (*) (*) possibly need to leave a stub wrapper in 7 to keep the symbol.
|
#
185333 |
|
26-Nov-2008 |
bz |
Unify the v4 and v6 versions of pcbdetach and pcbfree as good as possible so that they are easily diffable.
No functional changes.
Reviewed by: rwatson MFC after: 6 weeks
|
#
185332 |
|
26-Nov-2008 |
bz |
Plug a credential leak in case the inpcb is freed by in6_pcbfree() instead of in_pcbfree(); missed in r183606.
Reviewed by: rwatson MFC after: 3 days (instantly for 7.1-RC?)
|
#
184205 |
|
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
#
183611 |
|
04-Oct-2008 |
bz |
Style changes: compare pointer to NULL and move a }.
MFC after: 6 weeks
|
#
183606 |
|
04-Oct-2008 |
bz |
Cache so_cred as inp_cred in the inpcb. This means that inp_cred is always there, even after the socket has gone away. It also means that it is constant for the lifetime of the inp. Both facts lead to simpler code and possibly less locking.
Suggested by: rwatson Reviewed by: rwatson MFC after: 6 weeks X-MFC Note: use a inp_pspare for inp_cred
|
#
183550 |
|
02-Oct-2008 |
zec |
Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
181887 |
|
19-Aug-2008 |
julian |
A bunch of formatting fixes brough to light by, or created by the Vimage commit a few days ago.
|
#
181803 |
|
17-Aug-2008 |
bz |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course of the next few weeks.
Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
#
180427 |
|
10-Jul-2008 |
bz |
Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks.
Reviewed by: rwatson
|
#
180425 |
|
10-Jul-2008 |
bz |
For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t.
Reviewed by: rwatson
|
#
180371 |
|
08-Jul-2008 |
bz |
Change the parameters to in6_selectsrc(): - pass in the inp instead of both in6p_moptions and laddr. - pass in cred for upcoming prison checks.
Reviewed by: rwatson
|
#
178320 |
|
19-Apr-2008 |
rwatson |
When querying a local or remote address on an IPv6 socket, use only a read lock on the inpcb.
MFC after: 3 months
|
#
178285 |
|
17-Apr-2008 |
rwatson |
Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive.
This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code.
MFC after: 3 months Tested by: kris (superset of committered patch)
|
#
177961 |
|
06-Apr-2008 |
rwatson |
In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and eliminate unnecessary local variable caching of the list head pointer, making the code a bit easier to read.
MFC after: 3 weeks
|
#
175162 |
|
08-Jan-2008 |
obrien |
un-__P()
|
#
174717 |
|
17-Dec-2007 |
rwatson |
Fix leaking MAC labels for IPv6 inpcbs by adding missing MAC label destroy call; this transpired because the inpcb alloc path for IPv4/IPv6 is the same code, but IPv6 has a separate free path. The results was that as new IPv6 TCP connections were created, kernel memory would gradually leak.
MFC after: 3 days Reported by: tanyong <tanyong at ercist dot iscas dot ac dot cn>, zhouzhouyi
|
#
174510 |
|
10-Dec-2007 |
obrien |
Clean up VCS Ids.
|
#
171260 |
|
05-Jul-2007 |
delphij |
Space cleanup
Approved by: re (rwatson)
|
#
171259 |
|
05-Jul-2007 |
delphij |
ANSIfy[1] plus some style cleanup nearby.
Discussed with: gnn, rwatson Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1] Approved by: re (rwatson)
|
#
171167 |
|
03-Jul-2007 |
gnn |
Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC.
Approved by: re Sponsored by: Secure Computing
|
#
171133 |
|
01-Jul-2007 |
gnn |
Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit.
Reviewed by: bz Approved by: re Supported by: Secure Computing
|
#
170613 |
|
12-Jun-2007 |
bms |
Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)
|
#
170587 |
|
11-Jun-2007 |
rwatson |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp Obtained from: TrustedBSD Project
|
#
169462 |
|
11-May-2007 |
rwatson |
Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.
|
#
169179 |
|
01-May-2007 |
rwatson |
Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().
|
#
169154 |
|
30-Apr-2007 |
rwatson |
Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.
|
#
168932 |
|
21-Apr-2007 |
rwatson |
Teach netinet6 to use PRIV_NETINET_REUSEPORT.
|
#
164033 |
|
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
160491 |
|
18-Jul-2006 |
ups |
Fix race conditions on enumerating pcb lists by moving the initialization ( and where appropriate the destruction) of the pcb mutex to the init/finit functions of the pcb zones. This allows locking of the pcb entries and race condition free comparison of the generation count. Rearrange locking a bit to avoid extra locking operation to update the generation count in in_pcballoc(). (in_pcballoc now returns the pcb locked)
I am planning to convert pcb list handling from a type safe to a reference count model soon. ( As this allows really freeing the PCBs)
Reviewed by: rwatson@, mohans@ MFC after: 1 week
|
#
160024 |
|
29-Jun-2006 |
bz |
Use INPLOOKUP_WILDCARD instead of just 1 more consistently.
OKed by: rwatson (some weeks ago)
|
#
159976 |
|
27-Jun-2006 |
pjd |
- Use suser_cred(9) instead of directly checking cr_uid. - Change the order of conditions to first verify that we actually need to check for privileges and then eventually check them.
Reviewed by: rwatson
|
#
158011 |
|
25-Apr-2006 |
rwatson |
Move lock assertions to top of in6_pcbladdr(): we still want them to run even if we're going to return an argument-based error.
Assert pcbinfo lock in in6_pcblookup_local(), in6_pcblookup_hash(), since they walk pcbinfo inpcb lists.
Assert inpcb and pcbinfo locks in in6_pcbsetport(), since port reservations are changing.
MFC after: 3 months
|
#
157978 |
|
23-Apr-2006 |
rwatson |
Modify in6_pcbpurgeif0() to accept a pcbinfo structure rather than a pcb list head structure; this improves congruence to IPv4, and also allows in6_pcbpurgeif0() to lock the pcbinfo. Modify in6_pcbpurgeif0() to lock the pcbinfo before iterating the pcb list, use queue(9)'s LIST_FOREACH() for the iteration, and to lock individual inpcb's while manipulating them.
MFC after: 3 months
|
#
157767 |
|
15-Apr-2006 |
rwatson |
Mirror IPv4 pcb locking into in6_setsockaddr() and in6_setpeeraddr(): acquire inpcb lock when reading inpcb port+address in order to prevent races with other threads that may be changing them.
MFC after: 3 months
|
#
157673 |
|
12-Apr-2006 |
rwatson |
Remove spl use from IPv6 inpcb code.
In various inpcb methods for IPv6 sockets, don't check of so_pcb is NULL, assert it isn't.
MFC after: 3 months
|
#
157373 |
|
01-Apr-2006 |
rwatson |
Break out in_pcbdetach() into two functions:
- in_pcbdetach(), which removes the link between an inpcb and its socket.
- in_pcbfree(), which frees a detached pcb.
Unlike the previous in_pcbdetach(), neither of these functions will attempt to conditionally free the socket, as they are responsible only for managing in_pcb memory. Mirror these changes into in6_pcbdetach() by breaking it into in6_pcbdetach() and in6_pcbfree().
While here, eliminate undesired checks for NULL inpcb pointers in sockets, as we will now have as an invariant that sockets will always have valid so_pcb pointers.
MFC after: 3 months
|
#
156877 |
|
19-Mar-2006 |
dwmalone |
Make net.inet.ip.portrange.reservedhigh and net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4.
We could have made new sysctls for IPv6, but that potentially makes things complicated for mapped addresses. This seems like the least confusing option and least likely to cause obscure problems in the future.
This change makes the mac_portacl module useful with IPv6 apps.
Reviewed by: ume MFC after: 1 month
|
#
149849 |
|
07-Sep-2005 |
obrien |
IPv6 was improperly defining its malloc type the same as IPv4 (M_IPMADDR, M_IPMOPTS, M_MRTABLE). Thus we had conflicting instantiations. Create an IPv6-specific type to overcome this.
|
#
148385 |
|
25-Jul-2005 |
ume |
scope cleanup. with this change - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current *BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the *BSD kernel cannot reject this attempt.
Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp> Obtained from: KAME
|
#
139826 |
|
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes, separate for KAME
|
#
136682 |
|
18-Oct-2004 |
rwatson |
Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()):
- This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd.
- This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket.
This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements.
RELENG_5_3 candidate.
MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>
|
#
134121 |
|
21-Aug-2004 |
rwatson |
When notifying protocol components of an event on an in6pcb, use the result of the notify() function to decide if we need to unlock the in6pcb or not, rather than always unlocking. Otherwise, we may unlock and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net> Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net> Discussed with: mdodd
|
#
133720 |
|
14-Aug-2004 |
dwmalone |
Get rid of the RANDOM_IP_ID option and make it a sysctl. NetBSD have already done this, so I have styled the patch on their work:
1) introduce a ip_newid() static inline function that checks the sysctl and then decides if it should return a sequential or random IP ID.
2) named the sysctl net.inet.ip.random_id
3) IPv6 flow IDs and fragment IDs are now always random. Flow IDs and frag IDs are significantly less common in the IPv6 world (ie. rarely generated per-packet), so there should be smaller performance concerns.
The sysctl defaults to 0 (sequential IP IDs).
Reviewed by: andre, silby, mlaier, ume Based on: NetBSD MFC after: 2 months
|
#
133192 |
|
06-Aug-2004 |
rwatson |
Pass pcbinfo structures to in6_pcbnotify() rather than pcbhead structures, allowing in6_pcbnotify() to lock the pcbinfo and each inpcb that it notifies of ICMPv6 events. This prevents inpcb assertions from firing when IPv6 generates and delievers event notifications for inpcbs.
Reported by: kuriyama Tested by: kuriyama
|
#
132794 |
|
28-Jul-2004 |
yar |
Disallow a particular kind of port theft described by the following scenario:
Alice is too lazy to write a server application in PF-independent manner. Therefore she knocks up the server using PF_INET6 only and allows the IPv6 socket to accept mapped IPv4 as well. An evil hacker known on IRC as cheshire_cat has an account in the same system. He starts a process listening on the same port as used by Alice's server, but in PF_INET. As a consequence, cheshire_cat will distract all IPv4 traffic supposed to go to Alice's server.
Such sort of port theft was initially enabled by copying the code that implemented the RFC 2553 semantics on IPv4/6 sockets (see inet6(4)) for the implied case of the same owner for both connections. After this change, the above scenario will be impossible. In the same setting, the user who attempts to start his server last will get EADDRINUSE.
Of course, using IPv4 mapped to IPv6 leads to security complications in the first place, but there is no reason to make it even more unsafe.
This change doesn't apply to KAME since it affects a FreeBSD-specific part of the code. It doesn't modify the out-of-box behaviour of the TCP/IP stack either as long as mapping IPv4 to IPv6 is off by default.
MFC after: 1 month
|
#
132715 |
|
27-Jul-2004 |
rwatson |
Forced commit to note that the previous commit was:
Approved by: suz (core@kame.net)
|
#
132714 |
|
27-Jul-2004 |
rwatson |
Commit a first pass at in6pcb and pcbinfo locking for IPv6, synchronizing IPv6 protocol control blocks and lists. These changes are modeled on the inpcb locking for IPv4, submitted by Jennifer Yang, and committed by Jeffrey Hsu. With these locking changes, IPv6 use of inpcbs is now substantially more MPSAFE, and permits IPv4 inpcb locking assertions to be run in the presence of IPv6 compiled into the kernel.
|
#
132699 |
|
27-Jul-2004 |
yar |
Don't consider TCP connections beyond LISTEN state (i.e. with the foreign address being not wildcard) when checking for possible port theft since such connections cannot be stolen.
The port theft check is FreeBSD-specific and isn't in the KAME tree.
PR: bin/65928 (in the audit trail) Reviewed by: -net, -hackers (silence) Tested by: Nick Leuta <skynick at mail.sc.ru> MFC after: 1 month
|
#
132653 |
|
26-Jul-2004 |
cperciva |
Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is somewhat clearer, but more importantly allows for a consistent naming scheme for suser_cred flags.
The old name is still defined, but will be removed in a few days (unless I hear any complaints...)
Discussed with: rwatson, scottl Requested by: jhb
|
#
130388 |
|
12-Jun-2004 |
rwatson |
Missed directory in previous commit; need to hold SOCK_LOCK(so) before calling sotryfree().
-- Body of earlier bulk commit this belonged with --
Log: Extend coverage of SOCK_LOCK(so) to include so_count, the socket reference count:
- Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele().
- Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree().
- Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers.
- In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket.
- Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket.
Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
|
#
128019 |
|
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson.
Approved by: core, peter, alc, rwatson
|
#
127505 |
|
27-Mar-2004 |
pjd |
Reduce 'td' argument to 'cred' (struct ucred) argument in those functions: - in_pcbbind(), - in_pcbbind_setup(), - in_pcbconnect(), - in_pcbconnect_setup(), - in6_pcbbind(), - in6_pcbconnect(), - in6_pcbsetport(). "It should simplify/clarify things a great deal." --rwatson
Requested by: rwatson Reviewed by: rwatson, ume
|
#
125776 |
|
13-Feb-2004 |
ume |
supported IPV6_RECVPATHMTU socket option.
Obtained from: KAME
|
#
125626 |
|
09-Feb-2004 |
ume |
fix build with FAST_IPSEC.
Reported by: cjc
|
#
124465 |
|
13-Jan-2004 |
ume |
call ipsec_pcbconn()/ipsec_pcbdisconn() from in6_pcbconnect().
Obtained from: KAME
|
#
124332 |
|
10-Jan-2004 |
ume |
in set{peer, sock}addr, do not convert the unspecified address (::) to the mapped address form.
PR: kern/22868 Obtained from: KAME MFC after: 3 days
|
#
122922 |
|
20-Nov-2003 |
andre |
Introduce tcp_hostcache and remove the tcp specific metrics from the routing table. Move all usage and references in the tcp stack from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better initial start values for following connections from or to the same source or destination. Depending on the network parameters to/from the remote host this can lead to significant speedups for new tcp connections after the first one because they inherit and shortcut the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP environments with high contention and is hash indexed by remote ip address.
It removes significant locking requirements from the tcp stack with regard to the routing table.
Reviewed by: sam (mentor), bms Reviewed by: -net, -current, core@kame.net (IPv6 parts) Approved by: re (scottl)
|
#
121770 |
|
30-Oct-2003 |
sam |
Overhaul routing table entry cleanup by introducing a new rtexpunge routine that takes a locked routing table reference and removes all references to the entry in the various data structures. This eliminates instances of recursive locking and also closes races where the lock on the entry had to be dropped prior to calling rtrequest(RTM_DELETE). This also cleans up confusion where the caller held a reference to an entry that might have been reclaimed (and in some cases used that reference).
Supported by: FreeBSD Foundation
|
#
121472 |
|
24-Oct-2003 |
ume |
Switch Advanced Sockets API for IPv6 from RFC2292 to RFC3542 (aka RFC2292bis). Though I believe this commit doesn't break backward compatibility againt existing binaries, it breaks backward compatibility of API. Now, the applications which use Advanced Sockets API such as telnet, ping6, mld6query and traceroute6 use RFC3542 API.
Obtained from: KAME
|
#
120913 |
|
08-Oct-2003 |
ume |
- fix typo in comments. - style. - NULL is not 0. - some variables were renamed. - nuke unused logic. (there is no functional change.)
Obtained from: KAME
|
#
120856 |
|
06-Oct-2003 |
ume |
return(code) -> return (code) (reduce diffs against KAME)
|
#
120727 |
|
04-Oct-2003 |
sam |
Locking for updates to routing table entries. Each rtentry gets a mutex that covers updates to the contents. Note this is separate from holding a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference may be returned; this was never used and added unwarranted complexity for locking. o minor style cleanups to routing code (e.g. ansi-fy function decls) o remove the logic to bump the refcnt on the parent of cloned routes, we assume the parent will remain as long as the clone; doing this avoids a circularity in locking during delete o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level applications cannot/do-no know about mutex's. Doing this requires that the mutex be the last element in the structure. A better solution is to introduce an externalized version of struct rtentry but this is a major task because of the intertwining of rtentry and other data structures that are visible to user applications. 2. There are known LOR's that are expected to go away with forthcoming work to eliminate many held references. If not these will be resolved prior to release. 3. ATM changes are untested.
Sponsored by: FreeBSD Foundation Obtained from: BSD/OS (partly)
|
#
120649 |
|
01-Oct-2003 |
ume |
randomize IPv6 flowlabel when RANDOM_IP_ID is defined.
Obtained from: KAME
|
#
119995 |
|
11-Sep-2003 |
ru |
Fix a bunch of off-by-one errors in the range checking code.
|
#
116453 |
|
16-Jun-2003 |
cognet |
Do not attempt to access to inp_socket fields if the socket is in the TIME_WAIT state, as inp_socket will then be NULL. This fixes a panic that occurs when one tries to bind a port that was previously binded with remaining TIME_WAIT sockets.
|
#
111145 |
|
19-Feb-2003 |
jlemon |
Add a TCP TIMEWAIT state which uses less space than a fullblown TCP control block. Allow the socket and tcpcb structures to be freed earlier than inpcb. Update code to understand an inp w/o a socket.
Reviewed by: hsu, silby, jayanth Sponsored by: DARPA, NAI Labs
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
105199 |
|
16-Oct-2002 |
sam |
Tie new "Fast IPsec" code into the build. This involves the usual configuration stuff as well as conditional code in the IPv4 and IPv6 areas. Everything is conditional on FAST_IPSEC which is mutually exclusive with IPSEC (KAME IPsec implmentation).
As noted previously, don't use FAST_IPSEC with INET6 at the moment.
Reviewed by: KAME, rwatson Approved by: silence Supported by: Vernier Networks
|
#
102218 |
|
21-Aug-2002 |
truckman |
Create new functions in_sockaddr(), in6_sockaddr(), and in6_v4mapsin6_sockaddr() which allocate the appropriate sockaddr_in* structure and initialize it with the address and port information passed as arguments. Use calls to these new functions to replace code that is replicated multiple times in in_setsockaddr(), in_setpeeraddr(), in6_setsockaddr(), in6_setpeeraddr(), in6_mapped_sockaddr(), and in6_mapped_peeraddr(). Inline COMMON_END in tcp_usr_accept() so that we can call in_sockaddr() with temporary copies of the address and port after the PCB is unlocked.
Fix the lock violation in tcp6_usr_accept() (caused by calling MALLOC() inside in6_mapped_peeraddr() while the PCB is locked) by changing the implementation of tcp6_usr_accept() to match tcp_usr_accept().
Reviewed by: suz
|
#
98211 |
|
14-Jun-2002 |
hsu |
Notify functions can destroy the pcb, so they have to return an indication of whether this happenned so the calling function knows whether or not to unlock the pcb.
Submitted by: Jennifer Yang (yangjihui@yahoo.com) Bug reported by: Sid Carter (sidcarter@symonds.net)
|
#
98141 |
|
12-Jun-2002 |
hsu |
As a stop-gap measure, add one INP_LOCK_DESTROY() to in6_pcbdetach() to get kernel compiled with INET6 to boot.
|
#
98102 |
|
10-Jun-2002 |
hsu |
Lock up inpcb.
Submitted by: Jennifer Yang <yangjihui@yahoo.com>
|
#
97658 |
|
31-May-2002 |
tanimura |
Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by: hsu
|
#
96972 |
|
20-May-2002 |
tanimura |
Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count - so_options - so_linger - so_state
o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket:
- sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup()
Reviewed by: alfred
|
#
95023 |
|
19-Apr-2002 |
suz |
just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD. (based on freebsd4-snap-20020128)
Reviewed by: ume MFC after: 1 week
|
#
93593 |
|
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
#
92767 |
|
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
|
#
91346 |
|
27-Feb-2002 |
alfred |
Fix warnings caused by discarding const.
Hairy Eyeball At: peter
|
#
86487 |
|
17-Nov-2001 |
dillon |
Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
|
#
85074 |
|
17-Oct-2001 |
ru |
Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *'' as the argument. Pass rt_addrinfo all the way down to rtrequest1 and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now ``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is using it anyways).
Benefit: the following command now works. Previously we needed two route(8) invocations, "add" then "change". # route add -inet6 default ::1 -ifp gif0
Remove unsafe typecast in rtrequest(), from ``rtentry *'' to ``sockaddr *''. It was introduced by 4.3BSD-Reno and never corrected.
Obtained from: BSD/OS, NetBSD MFC after: 1 month PR: kern/28360
|
#
85072 |
|
17-Oct-2001 |
ru |
Pull fix for memory leak in in6_losing() from netinet/in_pcb.c,v 1.85.
MFC after: 1 week
|
#
83934 |
|
25-Sep-2001 |
brooks |
Make faith loadable, unloadable, and clonable.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
83130 |
|
06-Sep-2001 |
jlemon |
Wrap array accesses in macros, which also happen to be lvalues:
ifnet_addrs[i - 1] -> ifaddr_byindex(i) ifindex2ifnet[i] -> ifnet_byindex(i)
This is intended to ease the conversion to SMPng.
|
#
81127 |
|
04-Aug-2001 |
ume |
When running aplication joined multicast address, removing network card, and kill aplication. imo_membership[].inm_ifp refer interface pointer after removing interface. When kill aplication, release socket,and imo_membership. imo_membership use already not exist interface pointer. Then, kernel panic.
PR: 29345 Submitted by: Inoue Yuichi <inoue@nd.net.fujitsu.co.jp> Obtained from: KAME MFC after: 3 days
|
#
78064 |
|
11-Jun-2001 |
ume |
Sync with recent KAME. This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge.
TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT.
Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
|
#
71350 |
|
21-Jan-2001 |
des |
First step towards an MP-safe zone allocator: - have zalloc() and zfree() always lock the vm_zone. - remove zalloci() and zfreei(), which are now redundant.
Reviewed by: bmilekic, jasone
|
#
62587 |
|
04-Jul-2000 |
itojun |
sync with kame tree as of july00. tons of bug fixes/improvements.
API changes: - additional IPv6 ioctls - IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8). (also syntax change)
|
#
58907 |
|
01-Apr-2000 |
shin |
Support per socket based IPv4 mapped IPv6 addr enable/disable control.
Submitted by: ume
|
#
58452 |
|
22-Mar-2000 |
green |
in6_pcb.c: Remove a bogus (redundant, just weird, etc.) key_freeso(so). There are no consumers of it now, nor does it seem there ever will be.
in6?_pcb.c: Add an if (inp->in6?p_sp != NULL) before the call to ipsec[46]_delete_pcbpolicy(inp). In low-memory conditions this can cause a crash because in6?_sp can be NULL...
|
#
56117 |
|
16-Jan-2000 |
shin |
fix kernel panic at rtfree() in INET6 enabled envrionment. This is probably due to twice rtfree() in in6_pcbdetach(), one for inp->in6p_route.ro_rt, and another one for inp->inp_route.ro_rt. But these 2 are actually shared in inpcb, so 2nd rtfree() is not necessary.
|
#
55919 |
|
13-Jan-2000 |
shin |
fix wrong name which is hidden by wrong ifdef. Sorry for build failure. There was a mistake when I moved the patch from my build check machine to commit machine.
Specified by: peter
|
#
55871 |
|
13-Jan-2000 |
shin |
remove unnecessary "$ifdef INET6"
|
#
55679 |
|
09-Jan-2000 |
shin |
tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
#
55009 |
|
22-Dec-1999 |
shin |
IPSEC support in the kernel. pr_input() routines prototype is also changed to support IPSEC and IPV6 chained protocol headers.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
#
54952 |
|
21-Dec-1999 |
eivind |
Change incorrect NULLs to 0s
|
#
54350 |
|
09-Dec-1999 |
shin |
rtcalloc() is removed because it turned out not to be necessary for FreeBSD. (It was added as a part of KAME patch)
Specified by: jdp@polstra.com
|
#
54263 |
|
07-Dec-1999 |
shin |
udp IPv6 support, IPv6/IPv4 tunneling support in kernel, packet divert at kernel for IPv6/IPv4 translater daemon
This includes queue related patch submitted by jburkhol@home.com.
Submitted by: queue related patch from jburkhol@home.com Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
#
53626 |
|
23-Nov-1999 |
shin |
Removed IPSEC and IPV6FIREWALL because they are not ready yet.
|
#
53541 |
|
22-Nov-1999 |
shin |
KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP for IPv6 yet)
With this patch, you can assigne IPv6 addr automatically, and can reply to IPv6 ping.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|