#
296373 |
|
04-Mar-2016 |
marius |
- Copy stable/10@296371 to releng/10.3 in preparation for 10.3-RC1 builds. - Update newvers.sh to reflect RC1. - Update __FreeBSD_version to reflect 10.3. - Update default pkg(8) configuration to use the quarterly branch.
Approved by: re (implicit) |
#
274043 |
|
03-Nov-2014 |
hselasky |
MFC r271946 and r272595: Improve transmit sending offload, TSO, algorithm in general. This change allows all HCAs from Mellanox Technologies to function properly when TSO is enabled. See r271946 and r272595 for more details about this commit.
Sponsored by: Mellanox Technologies
|
#
268432 |
|
08-Jul-2014 |
delphij |
Fix kernel memory disclosure in control message and SCTP notifications.
Security: FreeBSD-SA-14:17.kmem Security: CVE-2014-3952, CVE-2014-3953
|
#
263820 |
|
27-Mar-2014 |
asomers |
MFC r262867
Fix PR kern/185813 "SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets". It was caused by a check for the space available in a sockbuf, but it was checking the wrong sockbuf.
sys/sys/sockbuf.h sys/kern/uipc_sockbuf.c Add sbappendaddr_nospacecheck_locked(), which is just like sbappendaddr_locked but doesn't validate the receiving socket's space. Factor out common code into sbappendaddr_locked_internal(). We shouldn't simply make sbappendaddr_locked check the space and then call sbappendaddr_nospacecheck_locked, because that would cause the O(n) function m_length to be called twice.
sys/kern/uipc_usrreq.c Use sbappendaddr_nospacecheck_locked for SOCK_SEQPACKET sockets, because the receiving sockbuf's size limit is irrelevant.
tests/sys/kern/unix_seqpacket_test.c Now that 185813 is fixed, pipe_128k_8k fails intermittently due to 185812. Make it fail every time by adding a usleep after starting the writer thread and before starting the reader thread in test_pipe. That gives the writer time to fill up its send buffer. Also, clear the expected failure message due to 185813. It actually said "185812", but that was a typo.
PR: kern/185813
|
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
256185 |
|
09-Oct-2013 |
glebius |
- Substitute sbdrop_internal() with sbcut_internal(). The latter doesn't free mbufs, but return chain of free mbufs to a caller. Caller can either reuse them or return to allocator in a batch manner. - Implement sbdrop()/sbdrop_locked() as a wrapper around sbcut_internal(). - Expose sbcut_locked() for outside usage.
Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)
|
#
255138 |
|
01-Sep-2013 |
davide |
Fix socket buffer timeouts precision using the new sbintime_t KPI instead of relying on the tvtohz() workaround. The latter has been introduced lately by jhb@ (r254699) in order to have a fix that can be backported to STABLE.
Reported by: Vitja Makarov <vitja.makarov at gmail dot com> Reviewed by: jhb (earlier version)
|
#
251984 |
|
19-Jun-2013 |
lstewart |
When a previous call to sbsndptr() leaves sb->sb_sndptroff at the start of an mbuf that was fully consumed by the previous call, the mbuf ptr returned by the current call ends up being the previous mbuf in the sb chain to the one that contains the data we want.
This does not cause any observable issues because the mbuf copy routines happily walk the mbuf chain to get to the data at the moff offset, which in this case means they effectively skip over the mbuf returned by sbsndptr().
We can't adjust sb->sb_sndptr during the previous call for this case because the next mbuf in the chain may not exist yet. We therefore need to detect the condition and make the adjustment during the current call.
Fix by detecting the special case of moff being at the start of the next mbuf in the chain and adjust the required accounting variables accordingly.
Reviewed by: andre MFC after: 2 weeks
|
#
248886 |
|
29-Mar-2013 |
glebius |
Once ng_ksocket(4) is fixed, re-apply r194662. See this revision for longer description.
Discussed with: andre, rwatson Sponsored by: Nginx, Inc.
|
#
248318 |
|
15-Mar-2013 |
glebius |
Use m_get() and m_getcl() instead of compat macros.
|
#
243882 |
|
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
#
228449 |
|
13-Dec-2011 |
eadler |
Document a large number of currently undocumented sysctls. While here fix some style(9) issues and reduce redundancy.
PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week
|
#
225169 |
|
25-Aug-2011 |
bz |
Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB.
For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths.
Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls.
Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)
|
#
220622 |
|
14-Apr-2011 |
glebius |
Revert r194662, since it breaks ng_ksocket(4) and may break other socket consumers with alternate sb_upcall.
PR: kern/154676 Submitted by: Arnaud Lacombe <lacombar gmail.com> MFC after: 7 days
|
#
194662 |
|
22-Jun-2009 |
andre |
In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain.
sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently.
This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket.
For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.
|
#
193272 |
|
01-Jun-2009 |
jhb |
Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call.
Discussed with: rwatson, rmacklem
|
#
191366 |
|
21-Apr-2009 |
emax |
Fix sbappendrecord_locked().
The main problem is that sbappendrecord_locked() relies on sbcompress() to set sb_mbtail. This will not happen if sbappendrecord_locked() is called with mbuf chain made of exactly one mbuf (i.e. m0->m_next == NULL). In this case sbcompress() will be called with m == NULL and will do nothing. I'm not entirely sure if m == NULL is a valid argument for sbcompress(), and, it rather pointless to call it like that, but keep calling it so it can do SBLASTMBUFCHK().
The problem is triggered by the SOCKBUF_DEBUG kernel option that enables SBLASTRECORDCHK() and SBLASTMBUFCHK() checks.
PR: kern/126742 Investigated by: pluknet < pluknet -at- gmail -dot- com > No response from: freebsd-current@, freebsd-bluetooth@ MFC after: 3 days
|
#
183663 |
|
07-Oct-2008 |
rwatson |
Rewrite sbreserve_locked()'s comment on NULL thread pointers, eliminating an XXXRW about the comment being stale.
MFC after: 3 days
|
#
182842 |
|
07-Sep-2008 |
bz |
Catch a possible NULL pointer deref in case the offsets got mangled somehow. As a consequence we may now get an unexpected result(*). Catch that error cases with a well defined panic giving appropriate pointers to ease debugging.
(*) While the concensus was that the case should never happen unless there was a bug, noone was definitively sure.
Discussed with: kmacy (about 8 months back) Reviewed by: silby (as part of a larger patch in March) MFC after: 2 months
|
#
179027 |
|
15-May-2008 |
gnn |
Update the kernel to count the number of mbufs and clusters (all types) used per socket buffer.
Add support to netstat to print out all of the socket buffer statistics.
Update the netstat manual page to describe the new -x flag which gives the extended output.
Reviewed by: rwatson, julian
|
#
175968 |
|
04-Feb-2008 |
rwatson |
Further clean up sorflush:
- Expose sbrelease_internal(), a variant of sbrelease() with no expectations about the validity of locks in the socket buffer. - Use sbrelease_internel() in sorflush(), and as a result avoid intializing and destroying a socket buffer lock for the temporary stack copy of the actual buffer, asb. - Add a comment indicating why we do what we do, and remove an XXX since things have gotten less ugly in sorflush() lately.
This makes socket close cleaner, and possibly also marginally faster.
MFC after: 3 weeks
|
#
175845 |
|
31-Jan-2008 |
rwatson |
Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close():
- Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv().
- Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it.
To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked.
Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>
|
#
174711 |
|
17-Dec-2007 |
kmacy |
Add SB_NOCOALESCE flag to disable socket buffer update in place
|
#
174647 |
|
16-Dec-2007 |
jeff |
Refactor select to reduce contention and hide internal implementation details from consumers.
- Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details.
Tested by: kris Reviewed on: arch
|
#
172557 |
|
12-Oct-2007 |
mohans |
Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)
|
#
170151 |
|
31-May-2007 |
rwatson |
Now that sx(9) locks support an interruptible lock acquire primitive, properly observe the SB_NOINTR flag in sblock. This restores the required behavior that lock acquisition be interruptible on the socket buffer I/O serialization lock to allow threads waiting for I/O to be signaled even if they aren't the thread currently holding the I/O lock. With this change, the sblock regression test is again passed.
Reported by: alfred sx(9) handiwork: attilio
|
#
169624 |
|
16-May-2007 |
rwatson |
Generally migrate to ANSI function headers, and remove 'register' use.
|
#
169236 |
|
03-May-2007 |
rwatson |
sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
|
#
167902 |
|
26-Mar-2007 |
rwatson |
Following movement of functions from uipc_socket2.c to uipc_socket.c and uipc_sockbuf.c, clean up and update comments.
|
#
167895 |
|
26-Mar-2007 |
rwatson |
Complete removal of uipc_socket2.c by moving the last few functions to other C files:
- Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c. While sbcreatecontrol() is really an mbuf allocation routine, it does its work with awareness of the layout of socket buffer memory.
- Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub versions of several of these functions live. Likewise, move socket state transition calls (soisconnecting(), etc) to uipc_socket.c. Moveo sodupsockaddr() and sotoxsocket().
|
#
167715 |
|
19-Mar-2007 |
andre |
Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets.
Adjust tcp_output() make use of it.
Tested by: gallatin
|
#
162086 |
|
06-Sep-2006 |
jhb |
Use sysctl_handle_long() instead of duplicating it's logic for kern.ipc.maxsockbuf so that this sysctl works for 32-bit binaries running on amd64 via compat/freebsd32.
MFC after: 3 days
|
#
160915 |
|
02-Aug-2006 |
rwatson |
Remove 'register'. Use ANSI C prototypes/function headers. More deterministically line wrap comments.
|
#
160875 |
|
01-Aug-2006 |
rwatson |
Reimplement socket buffer tear-down in sofree(): as the socket is no longer referenced by other threads (hence our freeing it), we don't need to set the can't send and can't receive flags, wake up the consumers, perform two levels of locking, etc. Implement a fast-path teardown, sbdestroy(), which flushes and releases each socket buffer. A manual dom_dispose of the receive buffer is still required explicitly to GC any in-flight file descriptors, etc, before flushing the buffer.
This results in a 9% UP performance improvement and 16% SMP performance improvement on a tight loop of socket();close(); in micro-benchmarking, but will likely also affect CPU-bound macro-benchmark performance.
|
#
160621 |
|
24-Jul-2006 |
rwatson |
Remove non-socket buffer routines from uipc_sockbuf.c, and socket buffer specific routines from uipc_socket2.c following repo-copy. We might rethink the location of one or two at some point, but the division was relatively clean. uipc_sockbuf.c is now the home of routines that manipulate socket buffers.
|
#
160281 |
|
11-Jul-2006 |
rwatson |
Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel) return void, so don't implement no-op versions of these functions. Instead, consistently check if those switch pointers are NULL before invoking them.
|
#
160135 |
|
06-Jul-2006 |
rwatson |
Remove now unneeded opt_mac.h and mac.h includes.
|
#
159706 |
|
17-Jun-2006 |
rwatson |
Remove sbinsertoob(), sbinsertoob_locked(). They violate (and have basically always violated) invariannts of soreceive(), which assume that the first mbuf pointer in a receive socket buffer can't change while the SB_LOCK sleepable lock is held on the socket buffer, which is precisely what these functions do. No current protocols invoke these functions, and removing them will help discourage them from ever being used. I should have removed them years ago, but lost track of it.
MFC after: 1 week Prodded almost by accident by: peter
|
#
159481 |
|
10-Jun-2006 |
rwatson |
Move some functions and definitions from uipc_socket2.c to uipc_socket.c:
- Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c.
- Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code.
- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules.
- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation.
After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions.
MFC after: 1 month
|
#
157927 |
|
21-Apr-2006 |
ps |
Allow for nmbclusters and maxsockets to be increased via sysctl. An eventhandler is used to update all the various zones that depend on these values.
|
#
157370 |
|
01-Apr-2006 |
rwatson |
Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket.
soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals.
Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it.
In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach.
netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic.
MFC after: 3 months
|
#
157366 |
|
01-Apr-2006 |
rwatson |
Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this.
This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components.
MFC after: 3 months
|
#
157158 |
|
26-Mar-2006 |
rwatson |
Add a sysctl, regression.sonewconn_earlytest, which when options REGRESSION is enabled, allows user space to dictate that sonewconn() should skip it's "skip the hard work" check to see if the listen queue is full, and instead proceed with allocation of a socket and trimming of the overflowed queue. This makes it easier to test the queue overflow logic.
MFC after: 1 month
|
#
156763 |
|
16-Mar-2006 |
rwatson |
Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.
|
#
152672 |
|
22-Nov-2005 |
jdp |
Fix a bug in the loop in sonewconn that makes room on the incomplete connection queue for a new connection. It was removing connections from the wrong list.
Submitted by: Paul Mikesell Sponsored by: Isilon Systems MFC after: 1 week
|
#
151967 |
|
02-Nov-2005 |
andre |
Retire MT_HEADER mbuf type and change its users to use MT_DATA.
Having an additional MT_HEADER mbuf type is superfluous and redundant as nothing depends on it. It only adds a layer of confusion. The distinction between header mbuf's and data mbuf's is solely done through the m->m_flags M_PKTHDR flag.
Non-native code is not changed in this commit. For compatibility MT_HEADER is mapped to MT_DATA.
Sponsored by: TCP/IP Optimization Fundraise 2005
|
#
151888 |
|
30-Oct-2005 |
rwatson |
Push the assignment of a new or updated so_qlimit from solisten() following the protocol pru_listen() call to solisten_proto(), so that it occurs under the socket lock acquisition that also sets SO_ACCEPTCONN. This requires passing the new backlog parameter to the protocol, which also allows the protocol to be aware of changes in queue limit should it wish to do something about the new queue limit. This continues a move towards the socket layer acting as a library for the protocol.
Bump __FreeBSD_version due to a change in the in-kernel protocol interface. This change has been tested with IPv4 and UNIX domain sockets, but not other protocols.
|
#
150280 |
|
18-Sep-2005 |
rwatson |
Re-comment sbcompress() to explain what it is it does; it took me quite a bit of reading to figure it out, and I want to avoid figuring it out again.
Convert an if (foo) else printf("this is almost a panic") into a KASSERT.
MFC after: 3 days
|
#
147730 |
|
01-Jul-2005 |
ssouhlal |
Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively.
- Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread.
Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
|
#
146688 |
|
27-May-2005 |
rwatson |
In the current world order, each socket has two mutexes: a mutex that protects socket and receive socket buffer state, and a second mutex to protect send socket buffer state. In some places, the mutex shared between the socket and receive socket buffer will be acquired twice, once by each layer, resulting in some inconsistency, but providing the abstraction benefit of being able to more easily separate the two mutexes in the future if desired.
When transitioning a socket to the SS_ISDISCONNECTING or SS_ISDISCONNECTED states, grab the socket/receive socket buffer lock once rather than grabbing it as the socket lock, modifying socket state, then grabbing a second time as the receive lock in order to modify the socket buffer state to indicate no further data can be read. This change is believed to close a race between the change in socket state and the change in socket buffer state, which for a remotely initiated close on a UNIX domain socket, resulted in soreceive() returning ENOTCONN rather than an EOF condition.
A similar race still exists in the case of send, however, and is harder to fix as the socket and send socket buffer mutexes are not the same, and we would like to avoid holding combinations of socket mutexes over sb_upcall until we've finished clarifying the locking protocol for upcalls.
This change has the side affect of reducing the number of mutex operations to initiate disconnect or perform disconnect on a socket by two.
PR: 78824 Rerported by: Marc Olzheim <marcolz@stack.nl> MFC after: 2 weeks
|
#
143465 |
|
12-Mar-2005 |
rwatson |
Extend the coverage of the accept and socket mutexes in soisconnected() so that the socket lock is held over the test-and-set removal of the accept filter option during connect, and the two socket mutex regions (transition to connected, perform accept filter) are combined.
|
#
143245 |
|
07-Mar-2005 |
rwatson |
When upcalling from a socket in soisconnected() for an accept filter, call with flag M_DONTWAIT rather than M_TRYWAIT, as we don't want to do blocking memory allocation (etc) in the netisr.
MFC after: 3 days
|
#
142128 |
|
20-Feb-2005 |
rwatson |
Prefer NULL to returning 0 cast to a pointer type.
MFC after: 3 days
|
#
142015 |
|
17-Feb-2005 |
rwatson |
In sonewconn(), set the new socket's state to show the protocol-provided connection status before inserting the new socket into the listen socket's accept queue, or there might be a race in which another thread wakes up when the accept lock is released, and sees the socket before its state is set correctly. The wakeup still occurs after the accept lock is released. There have been no diagnoses of this bug in real-world systems (as yet).
MFC after: 3 days
|
#
139804 |
|
06-Jan-2005 |
imp |
/* -> /*- for copyright notices, minor format tweaks as necessary
|
#
139217 |
|
23-Dec-2004 |
rwatson |
In sonewconn(), the s/if/while/ change to wait for room at the tail of the accept queue is a feature, not a bug/issue, so remove the XXXRW from the comment.
|
#
136987 |
|
27-Oct-2004 |
maxim |
Fix a typo in a comparison appeared in rev. 1.125.
Submitted by: JINMEI Tatuya
|
#
136692 |
|
19-Oct-2004 |
andre |
Support for dynamically loadable and unloadable protocols within existing protocol families.
The protosw[] array of any particular protocol family ("domain") is of fixed size defined at compile time. This made it impossible to dynamically add or remove any protocols to or from it. We work around this by introducing so called SPACER's which are embedded into the protosw[] array at compile time. The SPACER's have a special protocol number (32767) to indicate the fact that they are SPACER's but are otherwise NULL. Only as many protocols can be dynamically loaded as SPACER's are provided in the protosw[] structure.
The pr_usrreqs structure is treated more special and contains pointers to dummy functions only returning EOPNOTSUPP. This is needed because the use of those functions pointers is usually not checked within the kernel because until now it was assumed to be a valid function pointer. Instead of fixing all potential callers we just return a proper error code.
Two new functions provide a clean API to register and unregister a protocol. The register function expects a pointer to a valid and complete struct protosw including a pointer to struct pru_usrreqs provided by the caller. Upon successful registration the pr_init() function will be called to finish initialization of the protocol. The unregister function restores the SPACER in place of the protocol again. It is the responseability of the caller to ensure proper closing of all sockets and freeing of memory allocation by the unloading protocol.
sys/protosw.h
o Define generic PROTO_SPACER to be 32767 o Prototypes for all pru_*_notsupp() functions o Prototypes for pf_proto_[un]register() functions
kern/uipc_domain.c
o Global struct pr_usrreqs nousrreqs containing valid pointers to the pru_*_notsupp() functions o New functions pf_proto_[un]register()
kern/uipc_socket2.c
o New functions bodies for all pru_*_notsupp() functions
|
#
133741 |
|
15-Aug-2004 |
jmg |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops.
Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases).
Reviewed by: green, rwatson (both earlier versions)
|
#
131151 |
|
26-Jun-2004 |
rwatson |
Reduce the number of unnecessary unlock-relocks on socket buffer mutexes associated with performing a wakeup on the socket buffer:
- When performing an sbappend*() followed by a so[rw]wakeup(), explicitly acquire the socket buffer lock and use the _locked() variants of both calls. Note that the _locked() sowakeup() versions unlock the mutex on return. This is done in uipc_send(), divert_packet(), mroute socket_send(), raw_append(), tcp_reass(), tcp_input(), and udp_append().
- When the socket buffer lock is dropped before a sowakeup(), remove the explicit unlock and use the _locked() sowakeup() variant. This is done in soisdisconnecting(), soisdisconnected() when setting the can't send/ receive flags and dropping data, and in uipc_rcvd() which adjusting back-pressure on the sockets.
For UNIX domain sockets running mpsafe with a contention-intensive SMP mysql benchmark, this results in a 1.6% query rate improvement due to reduce mutex costs.
|
#
131006 |
|
24-Jun-2004 |
rwatson |
Introduce sbreserve_locked(), which asserts the socket buffer lock on the socket buffer having its limits adjusted. sbreserve() now acquires the lock before calling sbreserve_locked(). In soreserve(), acquire socket buffer locks across read-modify-writes of socket buffer fields, and calls into sbreserve/sbrelease; make sure to acquire in keeping with the socket buffer lock order. In tcp_mss(), acquire the socket buffer lock in the calling context so that we have atomic read-modify -write on buffer sizes.
|
#
130831 |
|
21-Jun-2004 |
rwatson |
Merge next step in socket buffer locking:
- sowakeup() now asserts the socket buffer lock on entry. Move the call to KNOTE higher in sowakeup() so that it is made with the socket buffer lock held for consistency with other calls. Release the socket buffer lock prior to calling into pgsigio(), so_upcall(), or aio_swake(). Locking for this event management will need revisiting in the future, but this model avoids lock order reversals when upcalls into other subsystems result in socket/socket buffer operations. Assert that the socket buffer lock is not held at the end of the function.
- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now have _locked versions which assert the socket buffer lock on entry. If a wakeup is required by sb_notify(), invoke sowakeup(); otherwise, unconditionally release the socket buffer lock. This results in the socket buffer lock being released whether a wakeup is required or not.
- Break out socantsendmore() into socantsendmore_locked() that asserts the socket buffer lock. socantsendmore() unconditionally locks the socket buffer before calling socantsendmore_locked(). Note that both functions return with the socket buffer unlocked as socantsendmore_locked() calls sowwakeup_locked() which has the same properties. Assert that the socket buffer is unlocked on return.
- Break out socantrcvmore() into socantrcvmore_locked() that asserts the socket buffer lock. socantrcvmore() unconditionally locks the socket buffer before calling socantrcvmore_locked(). Note that both functions return with the socket buffer unlocked as socantrcvmore_locked() calls sorwakeup_locked() which has similar properties. Assert that the socket buffer is unlocked on return.
- Break out sbrelease() into a sbrelease_locked() that asserts the socket buffer lock. sbrelease() unconditionally locks the socket buffer before calling sbrelease_locked(). sbrelease_locked() now invokes sbflush_locked() instead of sbflush().
- Assert the socket buffer lock in socket buffer sanity check functions sblastrecordchk(), sblastmbufchk().
- Assert the socket buffer lock in SBLINKRECORD().
- Break out various sbappend() functions into sbappend_locked() (and variations on that name) that assert the socket buffer lock. The !_locked() variations unconditionally lock the socket buffer before calling their _locked counterparts. Internally, make sure to call _locked() support routines, etc, if already holding the socket buffer lock.
- Break out sbinsertoob() into sbinsertoob_locked() that asserts the socket buffer lock. sbinsertoob() unconditionally locks the socket buffer before calling sbinsertoob_locked().
- Break out sbflush() into sbflush_locked() that asserts the socket buffer lock. sbflush() unconditionally locks the socket buffer before calling sbflush_locked(). Update panic strings for new function names.
- Break out sbdrop() into sbdrop_locked() that asserts the socket buffer lock. sbdrop() unconditionally locks the socket buffer before calling sbdrop_locked().
- Break out sbdroprecord() into sbdroprecord_locked() that asserts the socket buffer lock. sbdroprecord() unconditionally locks the socket buffer before calling sbdroprecord_locked().
- sofree() now calls socantsendmore_locked() and re-acquires the socket buffer lock on return. It also now calls sbrelease_locked().
- sorflush() now calls socantrcvmore_locked() and re-acquires the socket buffer lock on return. Clean up/mess up other behavior in sorflush() relating to the temporary stack copy of the socket buffer used with dom_dispose by more properly initializing the temporary copy, and selectively bzeroing/copying more carefully to prevent WITNESS from getting confused by improperly initialized mutexes. Annotate why that's necessary, or at least, needed.
- soisconnected() now calls sbdrop_locked() before unlocking the socket buffer to avoid locking overhead.
Some parts of this change were:
Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
|
#
130705 |
|
19-Jun-2004 |
rwatson |
Assert socket buffer lock in sb_lock() to protect socket buffer sleep lock state. Convert tsleep() into msleep() with socket buffer mutex as argument. Hold socket buffer lock over sbunlock() to protect sleep lock state.
Assert socket buffer lock in sbwait() to protect the socket buffer wait state. Convert tsleep() into msleep() with socket buffer mutex as argument.
Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK() in order to call into these functions with the lock, as well as to start protecting other socket buffer use in their implementation. Drop the socket buffer mutexes around calls into the protocol layer, around potentially blocking operations, for copying to/from user space, and VM operations relating to zero-copy. Assert the socket buffer mutex strategically after code sections or at the beginning of loops. In some cases, modify return code to ensure locks are properly dropped.
Convert the potentially blocking allocation of storage for the remote address in soreceive() into a non-blocking allocation; we may wish to move the allocation earlier so that it can block prior to acquisition of the socket buffer lock.
Drop some spl use.
NOTE: Some races exist in the current structuring of sosend() and soreceive(). This commit only merges basic socket locking in this code; follow-up commits will close additional races. As merged, these changes are not sufficient to run without Giant safely.
Reviewed by: juli, tjr
|
#
130653 |
|
17-Jun-2004 |
rwatson |
Merge additional socket buffer locking from rwatson_netperf:
- Lock down low hanging fruit use of sb_flags with socket buffer lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock.
- Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to follow.
|
#
130513 |
|
15-Jun-2004 |
rwatson |
Grab the socket buffer send or receive mutex when performing a read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.
|
#
130480 |
|
14-Jun-2004 |
rwatson |
The socket field so_state is used to hold a variety of socket related flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state:
SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state)
Rename respectively to:
SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state)
This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
|
#
130398 |
|
13-Jun-2004 |
rwatson |
Socket MAC labels so_label and so_peerlabel are now protected by SOCK_LOCK(so):
- Hold socket lock over calls to MAC entry points reading or manipulating socket labels.
- Assert socket lock in MAC entry point implementations.
- When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.
|
#
130050 |
|
04-Jun-2004 |
rwatson |
Mark sun_noname as const since it's immutable. Update definitions of functions that potentially accept &sun_noname (sbappendaddr(), et al) to accept a const sockaddr pointer.
|
#
129979 |
|
02-Jun-2004 |
rwatson |
Integrate accept locking from rwatson_netperf, introducing a new global mutex, accept_mtx, which serializes access to the following fields across all sockets:
so_qlen so_incqlen so_qstate so_comp so_incomp so_list so_head
While providing only coarse granularity, this approach avoids lock order issues between sockets by avoiding ownership of the fields by a specific socket and its per-socket mutexes.
While here, rewrite soclose(), sofree(), soaccept(), and sonewconn() to add assertions, close additional races and address lock order concerns. In particular:
- Reorganize the optimistic concurrency behavior in accept1() to always allocate a file descriptor with falloc() so that if we do find a socket, we don't have to encounter the "Oh, there wasn't a socket" race that can occur if falloc() sleeps in the current code, which broke inbound accept() ordering, not to mention requiring backing out socket state changes in a way that raced with the protocol level. We may want to add a lockless read of the queue state if polling of empty queues proves to be important to optimize.
- In accept1(), soref() the socket while holding the accept lock so that the socket cannot be free'd in a race with the protocol layer. Likewise in netgraph equivilents of the accept1() code.
- In sonewconn(), loop waiting for the queue to be small enough to insert our new socket once we've committed to inserting it, or races can occur that cause the incomplete socket queue to overfill. In the previously implementation, it was sufficient to simply tested once since calling soabort() didn't release synchronization permitting another thread to insert a socket as we discard a previous one.
- In soclose()/sofree()/et al, it is the responsibility of the caller to remove a socket from the incomplete connection queue before calling soabort(), which prevents soabort() from having to walk into the accept socket to release the socket from its queue, and avoids races when releasing the accept mutex to enter soabort(), permitting soabort() to avoid lock ordering issues with the caller.
- Generally cluster accept queue related operations together throughout these functions in order to facilitate locking.
Annotate new locking in socketvar.h.
|
#
129916 |
|
01-Jun-2004 |
rwatson |
The SS_COMP and SS_INCOMP flags in the so_state field indicate whether the socket is on an accept queue of a listen socket. This change renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new state field on the socket, so_qstate, as the locking for these flags is substantially different for the locking on the remainder of the flags in so_state.
|
#
129906 |
|
31-May-2004 |
bmilekic |
Bring in mbuma to replace mballoc.
mbuma is an Mbuf & Cluster allocator built on top of a number of extensions to the UMA framework, all included herein.
Extensions to UMA worth noting: - Better layering between slab <-> zone caches; introduce Keg structure which splits off slab cache away from the zone structure and allows multiple zones to be stacked on top of a single Keg (single type of slab cache); perhaps we should look into defining a subset API on top of the Keg for special use by malloc(9), for example. - UMA_ZONE_REFCNT zones can now be added, and reference counters automagically allocated for them within the end of the associated slab structures. uma_find_refcnt() does a kextract to fetch the slab struct reference from the underlying page, and lookup the corresponding refcnt.
mbuma things worth noting: - integrates mbuf & cluster allocations with extended UMA and provides caches for commonly-allocated items; defines several zones (two primary, one secondary) and two kegs. - change up certain code paths that always used to do: m_get() + m_clget() to instead just use m_getcl() and try to take advantage of the newly defined secondary Packet zone. - netstat(1) and systat(1) quickly hacked up to do basic stat reporting but additional stats work needs to be done once some other details within UMA have been taken care of and it becomes clearer to how stats will work within the modified framework.
From the user perspective, one implication is that the NMBCLUSTERS compile-time option is no longer used. The maximum number of clusters is still capped off according to maxusers, but it can be made unlimited by setting the kern.ipc.nmbclusters boot-time tunable to zero. Work should be done to write an appropriate sysctl handler allowing dynamic tuning of kern.ipc.nmbclusters at runtime.
Additional things worth noting/known issues (READ): - One report of 'ips' (ServeRAID) driver acting really slow in conjunction with mbuma. Need more data. Latest report is that ips is equally sucking with and without mbuma. - Giant leak in NFS code sometimes occurs, can't reproduce but currently analyzing; brueffer is able to reproduce but THIS IS NOT an mbuma-specific problem and currently occurs even WITHOUT mbuma. - Issues in network locking: there is at least one code path in the rip code where one or more locks are acquired and we end up in m_prepend() with M_WAITOK, which causes WITNESS to whine from within UMA. Current temporary solution: force all UMA allocations to be M_NOWAIT from within UMA for now to avoid deadlocks unless WITNESS is defined and we can determine with certainty that we're not holding any locks when we're M_WAITOK. - I've seen at least one weird socketbuffer empty-but- mbuf-still-attached panic. I don't believe this to be related to mbuma but please keep your eyes open, turn on debugging, and capture crash dumps.
This change removes more code than it adds.
A paper is available detailing the change and considering various performance issues, it was presented at BSDCan2004: http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf Please read the paper for Future Work and implementation details, as well as credits.
Testing and Debugging: rwatson, brueffer, Ketrien I. Saihr-Kesenchedra, ... Reviewed by: Lots of people (for different parts)
|
#
129410 |
|
19-May-2004 |
ps |
syncache broke rev 1.23 which was done to fix the "thundering herd" problem in Apache. Fix it.
Reviewed by: peter
|
#
127911 |
|
05-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999.
Approved by: core
|
#
127299 |
|
22-Mar-2004 |
ps |
Remove some netbsd debug code that crept into rev 1.116
|
#
126425 |
|
01-Mar-2004 |
rwatson |
Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c.
Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0".
Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO.
Submitted by: sam
|
#
126411 |
|
29-Feb-2004 |
rwatson |
Modify soalloc() API so that it accepts a malloc flags argument rather than a "waitok" argument. Callers now passing M_WAITOK or M_NOWAIT rather than 0 or 1. This simplifies the soalloc() logic, and also makes the waiting behavior of soalloc() more clear in the calling context.
Submitted by: sam
|
#
125454 |
|
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
#
122875 |
|
18-Nov-2003 |
rwatson |
Introduce a MAC label reference in 'struct inpcb', which caches the MAC label referenced from 'struct socket' in the IPv4 and IPv6-based protocols. This permits MAC labels to be checked during network delivery operations without dereferencing inp->inp_socket to get to so->so_label, which will eventually avoid our having to grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the MAC Framework, along with the normal circus of entry points: initialization, creation from socket, destruction, as well as a delivery access control check.
For most policies, the inpcb label will simply be a cache of the socket label, so a new protocol switch method is introduced, pr_sosetlabel() to notify protocols that the socket layer label has been updated so that the cache can be updated while holding appropriate locks. Most protocols implement this using pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use the the worker function in_pcbsosetlabel(), which calls into the MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub policy, and test policy.
Reviewed by: sam, bms Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
122352 |
|
09-Nov-2003 |
tanimura |
- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep().
- Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities.
- Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs.
Not objected in: -arch, -current
|
#
121628 |
|
28-Oct-2003 |
sam |
speedup stream socket recv handling by tracking the tail of the mbuf chain instead of walking the list for each append
Submitted by: ps/jayanth Obtained from: netbsd (jason thorpe)
|
#
121307 |
|
21-Oct-2003 |
silby |
Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
|
#
118045 |
|
26-Jul-2003 |
scottl |
Guard against MLEN growing larger than a uint8_t due to MSIZE grwoing to a value of 512 in LINT. This keeps gcc from complaining.
|
#
116182 |
|
11-Jun-2003 |
obrien |
Use __FBSDID().
|
#
114293 |
|
30-Apr-2003 |
markm |
Fix some easy, global, lint warnings. In most cases, this means making some local variables static. In a couple of cases, this means removing an unused variable.
|
#
111227 |
|
21-Feb-2003 |
peter |
Missing M_TRYWAIT from so_upcall third argument.
|
#
111119 |
|
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
#
110268 |
|
03-Feb-2003 |
harti |
Make the variable types, the sysctl macros and the sysctl handler for kern.ipc.{maxsockbuf,sockbuf_waste_factor} to agree that those variables are of type unsigned long.
PR: sparc64/47389 Approved by: jake (mentor)
|
#
109623 |
|
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
#
109098 |
|
11-Jan-2003 |
tjr |
Don't count mbufs with m_type == MT_HEADER or MT_OOBDATA as control data in sballoc(), sbcompress(), sbdrop() and sbfree(). Fixes fstat() st_size reporting and kevent() EVFILT_READ on TCP sockets.
|
#
106473 |
|
05-Nov-2002 |
kbyanc |
Spotted a couple of places where the socket buffer's counters were being manipulated directly (rather than using sballoc()/sbfree()); update them to tweak the new sb_ctl field too.
Sponsored by: NTT Multimedia Communications Labs
|
#
106326 |
|
02-Nov-2002 |
alc |
Revert the change in revision 1.77 of kern/uipc_socket2.c. It is causing a panic because the socket's state isn't as expected by sofree().
Discussed with: dillon, fenner
|
#
103554 |
|
18-Sep-2002 |
phk |
Use m_length() instead of home-rolled versions.
|
#
101996 |
|
16-Aug-2002 |
dg |
Further improved the performance of sbreserve() by moving the calculation of the adjusted sb_max into a sysctl handler for sb_max and assigning it to a variable that is used instead. This eliminates the 32bit multiply and divide from the fast path that was being done previously.
|
#
101966 |
|
16-Aug-2002 |
dg |
Rewrote the space check algorithm in sbreserve() so that the extremely expensive (!) 64bit multiply, divide, and comparison aren't necessary (this came in originally from rev 1.19 to fix an overflow with large sb_max or MCLBYTES). The 64bit math in this function was measured in some kernel profiles as being as much as 5-8% of the total overhead of the TCP/IP stack and is eliminated with this commit. There is a harmless rounding error (of about .4% with the standard values) introduced with this change, however this is in the conservative direction (downward toward a slightly smaller maximum socket buffer size).
MFC after: 3 days
|
#
101173 |
|
01-Aug-2002 |
rwatson |
Include file cleanup; mac.h and malloc.h at one point had ordering relationship requirements, and no longer do.
Reminded by: bde
|
#
101013 |
|
31-Jul-2002 |
rwatson |
Introduce support for Mandatory Access Control and extensible kernel access control.
Invoke the necessary MAC entry points to maintain labels on sockets. In particular, invoke entry points during socket allocation and destruction, as well as creation by a process or during an accept-scenario (sonewconn). For UNIX domain sockets, also assign a peer label. As the socket code isn't locked down yet, locking interactions are not yet clear. Various protocol stack socket operations (such as peer label assignment for IPv4) will follow.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
100775 |
|
27-Jul-2002 |
dwmalone |
If a socket is disconnected for some reason (like a TCP connection not responding) then drop any data on the outgoing queue in soisdisconnected because there is no way to get it to its destination any longer.
The only objection to this patch I got on -net was from Terry, who wasn't sure that the condition in question could arise, so I provided some example code.
|
#
100712 |
|
26-Jul-2002 |
robert |
Fix -Werror build for sparc64: Use the appropriate conversion specifier for an 'unsigned int' argument.
|
#
98998 |
|
29-Jun-2002 |
alfred |
More caddr_t removal. Change struct knote's kn_hook from caddr_t to void *.
|
#
98385 |
|
18-Jun-2002 |
tanimura |
Remove so*_locked(), which were backed out by mistake.
|
#
97658 |
|
31-May-2002 |
tanimura |
Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by: hsu
|
#
97004 |
|
20-May-2002 |
silby |
Subtle fix to the accept filter LRU code. In some cases, a newly initialized socket with no qlimit was being passed in. In order to handle this case properly, we must not use >= when comparing queue sizes to qlimit. As a result of this improper handling, a panic could result in certain cases.
PR: 38325 MFC after: 3 days
|
#
96972 |
|
20-May-2002 |
tanimura |
Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count - so_options - so_linger - so_state
o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket:
- sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup()
Reviewed by: alfred
|
#
96165 |
|
07-May-2002 |
tanimura |
Do not forget to increase the number of completely connected sockets in soisconnected_locked().
Forgotten by: tanimura
|
#
95883 |
|
01-May-2002 |
alfred |
Redo the sigio locking.
Turn the sigio sx into a mutex.
Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed.
In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.
|
#
95759 |
|
30-Apr-2002 |
tanimura |
Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.
Requested by: bde
Since locking sigio_lock is usually followed by calling pgsigio(), move the declaration of sigio_lock and the definitions of SIGIO_*() to sys/signalvar.h.
While I am here, sort include files alphabetically, where possible.
|
#
95553 |
|
27-Apr-2002 |
tanimura |
Fix the code fragment clobbered in my last commit.
|
#
95552 |
|
27-Apr-2002 |
tanimura |
Add a global sx sigio_lock to protect the pointer to the sigio object of a socket. This avoids lock order reversal caused by locking a process in pgsigio().
sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now require sigio_lock to be locked. Provide sowwakeup_locked(), soisconnected_locked(), and so on in case where we have to modify a socket and wake up a process atomically.
|
#
95478 |
|
26-Apr-2002 |
silby |
Make sure that sockets undergoing accept filtering are aborted in a LRU fashion when the listen queue fills up. Previously, there was no mechanism to kick out old sockets, leading to an easy DoS of daemons using accept filtering.
Reviewed by: alfred MFC after: 3 days
|
#
95343 |
|
24-Apr-2002 |
silby |
Remove sodropablereq - this function hasn't been used since the syncache went in.
MFC after: 3 days
|
#
92754 |
|
20-Mar-2002 |
jeff |
Backout part of my previous commit; I was wrong about vm_zone's handling of limits on zones w/o objects.
|
#
92751 |
|
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
|
#
90227 |
|
05-Feb-2002 |
dillon |
Get rid of the twisted MFREE() macro entirely.
Reviewed by: dg, bmilekic MFC after: 3 days
|
#
89109 |
|
09-Jan-2002 |
silby |
Revert 1.81; 1.19 fixed this already in a different way.
|
#
88949 |
|
06-Jan-2002 |
silby |
Reorder a calculation in sbreserve so that it does not overflow with multi-megabyte socket buffer sizes.
PR: 7420 MFC after: 3 weeks
|
#
88633 |
|
29-Dec-2001 |
alfred |
Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO will use at_exit(9).
Add functions at_exec(9), rm_at_exec(9) which function nearly the same as at_exec(9) and rm_at_exec(9), these functions are called on behalf of modules at the time of execve(2) after the image activator has run.
Use a modified version of tegge's suggestion via at_exec(9) to close an exploitable race in AIO.
Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral, the problem was that one had to pass it a paramater indicating the number of arguments which were actually the number of "int". Fix it by using an inline version of the AS macro against the syscall arguments. (AS should be available globally but we'll get to that later.)
Add a primative system for dynamically adding kqueue ops, it's really not as sophisticated as it should be, but I'll discuss with jlemon when he's around.
|
#
88329 |
|
21-Dec-2001 |
peter |
Avoid an interaction between syncache and accept filters. The syncache code only passed up the connection to the tcp stack when it was complete, so it went directly into the so_comp (complete) queue. However, with accept filters, there is an additional phase before calling it "complete".
Reviewed by: jlemon
|
#
87821 |
|
13-Dec-2001 |
rwatson |
o Back out portions of 1.50 and 1.47, eliminating sonewconn3() and always deriving the credential for a newly accepted connection from the listen socket. Previously, the selection of the credential depended on the protocol: UNIX domain sockets would use the connecting process's credential, and protocols supporting a creation of the socket before the receiving end called accept() would use the listening socket. After this change, it is always the listening credential.
Reviewed by: green
|
#
86487 |
|
17-Nov-2001 |
dillon |
Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
|
#
84827 |
|
11-Oct-2001 |
jhb |
Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
|
#
84468 |
|
04-Oct-2001 |
dwmalone |
Allow sbcreatecontrol to make cluster sized control messages.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
78945 |
|
29-Jun-2001 |
jlemon |
Fix up indentation.
|
#
77900 |
|
08-Jun-2001 |
peter |
"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time around, use a common function for looking up and extracting the tunables from the kernel environment. This saves duplicating the same function over and over again. This way typically has an overhead of 8 bytes + the path string, versus about 26 bytes + the path string.
|
#
77853 |
|
07-Jun-2001 |
peter |
Back out part of my previous commit. This was a last minute change and I botched testing. This is a perfect example of how NOT to do this sort of thing. :-(
|
#
77843 |
|
06-Jun-2001 |
peter |
Make the TUNABLE_*() macros look and behave more consistantly like the SYSCTL_*() macros. TUNABLE_INT_DECL() was an odd name because it didn't actually declare the int, which is what the name suggests it would do.
|
#
77598 |
|
01-Jun-2001 |
jesper |
Revert the last bits of my bogus move of NMBCLUSTERS to <sys/param.h>
|
#
77544 |
|
31-May-2001 |
jesper |
Move the definition of NMBCLUSTERS from src/sys/kern/uipc_mbuf.c to <sys/param.h>, so it's available to src/sys/netinet/ip_input.c, and remove the now unneeded includes of "opt_param.h".
MFC after: 1 week
|
#
76166 |
|
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
#
68918 |
|
19-Nov-2000 |
dwmalone |
Make sbcompress use the new M_WRITABLE macro. Previously sbcompress could not compress into clusters. This could result in lots of wasted clusters while recieving small packets from an interface that uses clusters for all it's packets.
Patch is partially from BSDi (limiting the size of the copy) and based on a patch for 4.1 by Ian Dowse <iedowse@maths.tcd.ie> and myself.
Reviewed by: bmilekic Obtained From: BSDi Submitted by: iedowse
|
#
65495 |
|
05-Sep-2000 |
truckman |
Remove uidinfo hash table lookup and maintenance out of chgproccnt() and chgsbsize(), which are called rather frequently and may be called from an interrupt context in the case of chgsbsize(). Instead, do the hash table lookup and maintenance when credentials are changed, which is a lot less frequent. Add pointers to the uidinfo structures to the ucred and pcred structures for fast access. Pass a pointer to the credential to chgproccnt() and chgsbsize() instead of passing the uid. Add a reference count to the uidinfo structure and use it to decide when to free the structure rather than freeing the structure when the resource consumption drops to zero. Move the resource tracking code from kern_proc.c to kern_resource.c. Move some duplicate code sequences in kern_prot.c to separate helper functions. Change KASSERTs in this code to unconditional tests and calls to panic().
|
#
65278 |
|
31-Aug-2000 |
green |
Fix hangs caused by overzealous code removal.
Thanks, Nickolay, for figuring out this is the problem.
Submitted by: Nickolay Dudorov <nnd@mail.nsk.ru>
|
#
65229 |
|
30-Aug-2000 |
green |
Remove an extraneous setting of sb_hiwat.
|
#
65198 |
|
29-Aug-2000 |
green |
Remove any possibility of hiwat-related race conditions by changing the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and a u_long target to set it to. The whole thing is splnet().
This fixes a problem that jdp has been able to provoke.
|
#
64046 |
|
31-Jul-2000 |
ps |
Remove unnecessary call to splnet when setting an accept filter since we are already at splnet.
|
#
61976 |
|
22-Jun-2000 |
alfred |
fix races in the uidinfo subsystem, several problems existed:
1) while allocating a uidinfo struct malloc is called with M_WAITOK, it's possible that while asleep another process by the same user could have woken up earlier and inserted an entry into the uid hash table. Having redundant entries causes inconsistancies that we can't handle.
fix: do a non-waiting malloc, and if that fails then do a blocking malloc, after waking up check that no one else has inserted an entry for us already.
2) Because many checks for sbsize were done as "test then set" in a non atomic manner it was possible to exceed the limits put up via races.
fix: instead of querying the count then setting, we just attempt to set the count and leave it up to the function to return success or failure.
3) The uidinfo code was inlining and repeating, lookups and insertions and deletions needed to be in their own functions for clarity.
Reviewed by: green
|
#
61837 |
|
20-Jun-2000 |
alfred |
return of the accept filter part II
accept filters are now loadable as well as able to be compiled into the kernel.
two accept filters are provided, one that returns sockets when data arrives the other when an http request is completed (doesn't work with 0.9 requests)
Reviewed by: jmg
|
#
61799 |
|
18-Jun-2000 |
alfred |
backout accept optimizations.
Requested by: jmg, dcs, jdp, nate
|
#
61714 |
|
15-Jun-2000 |
alfred |
add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept() until the incoming connection has either data waiting or what looks like a HTTP request header already in the socketbuffer. This ought to reduce the context switch time and overhead for processing requests.
The initial idea and code for HTTPACCEPT came from Yahoo engineers and has been cleaned up and a more lightweight DELAYACCEPT for non-http servers has been added
Reviewed by: silence on hackers.
|
#
59288 |
|
16-Apr-2000 |
jlemon |
Introduce kqueue() and kevent(), a kernel event notification facility.
|
#
57719 |
|
03-Mar-2000 |
shin |
CMSG_XXX macros alignment fixes to follow RFC2292.
Approved by: jkh
Submitted by: Partly from tech@openbsd Reviewed by: itojun
|
#
57441 |
|
24-Feb-2000 |
shin |
Add length check to sbcreatecontrol().
Now this check is necessary because IPv6 source routing might use control data bigger than MLEN. (e.g. 16bytes IPv6 addr x 23 hops) Actually mbuf cluster should be used in uipc_socket.c:sbcreatecontrol() and uipc_syscalls.c:sockargs() when data size is bigger then MLEN, and such patches were already in KAME environment and have been confirmed to work well. I just forgot to merge them into 4.0, sorry.
For safety, I'll postpone such patches until after 4.0 release. The effect of postponement is followings. -Ping6 source routing hops are limitted to around 6 or so. -If some apps do setsockopt IPV6_RTHDR and try to receive incoming IPv6 source routing info, it can't receive more than 6 hops source routing info. (But currently, no apps seems to be doing it.)
Approved by: jkh
|
#
55943 |
|
14-Jan-2000 |
jasone |
Add aio_waitcomplete(). Make aio work correctly for socket descriptors. Make gratuitous style(9) fixes (me, not the submitter) to make the aio code more readable.
PR: kern/12053 Submitted by: Chris Sedore <cmsedore@maxwell.syr.edu>
|
#
52070 |
|
09-Oct-1999 |
green |
Implement RLIMIT_SBSIZE in the kernel. This is a per-uid sockbuf total usage limit.
|
#
51757 |
|
28-Sep-1999 |
pb |
In sbflush(), don't exit the while loop too early: this can cause an empty mbuf to stay in the queue, then causing a needless panic because sb_cc == 0 and sb_mbcnt != 0.
But we still need to panic rather than endlessly looping if, for some reason, sb_cc == 0 and there are non-empty mbufs in the queue.
PR: kern/11988 Reviewed by: fenner
|
#
51381 |
|
19-Sep-1999 |
green |
Change so_cred's type to a ucred, not a pcred. THis makes more sense, actually. Make a sonewconn3() which takes an extra argument (proc) so new sockets created with sonewconn() from a user's system call get the correct credentials, not just the parent's credentials.
|
#
50477 |
|
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
48579 |
|
05-Jul-1999 |
msmith |
Move the initialisation/tuning of nmbclusters from param.c/machdep.c into uipc_mbuf.c. This reduces three sets of identical tunable code to one set, and puts the initialisation with the mbuf code proper.
Make NMBUFs tunable as well.
Move the nmbclusters sysctl here as well.
Move the initialisation of maxsockets from param.c to uipc_socket2.c, next to its corresponding sysctl.
Use the new tunable macros for the kern.vm.kmem.size tunable (this should have been in a separate commit, whoops).
|
#
47992 |
|
17-Jun-1999 |
green |
Reviewed by: the cast of thousands
This is the change to struct sockets that gets rid of so_uid and replaces it with a much more useful struct pcred *so_cred. This is here to be able to do socket-level credential checks (i.e. IPFW uid/gid support, to be added to HEAD soon). Along with this comes an update to pidentd which greatly simplifies the code necessary to get a uid from a socket. Soon to come: a sysctl() interface to finding individual sockets' credentials.
|
#
46922 |
|
10-May-1999 |
peter |
Update one set of comments.. s/so_q0/so_incomp/ and s/so_q/so_comp/ (that's incomplete and complete connections I think)
|
#
46381 |
|
03-May-1999 |
billf |
Add sysctl descriptions to many SYSCTL_XXXs
PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
|
#
43196 |
|
25-Jan-1999 |
fenner |
Port NetBSD's 19990120-accept bug fix. This works around the race condition where select(2) can return that a listening socket has a connected socket queued, the connection is broken, and the user calls accept(2), which then blocks because there are no connections queued.
Reviewed by: wollman Obtained from: NetBSD (ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)
|
#
41591 |
|
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
#
41298 |
|
23-Nov-1998 |
truckman |
We can't call fsetown() from sonewconn() because sonewconn() is be called from an interrupt context and fsetown() wants to peek at curproc, call malloc(..., M_WAITOK), and fiddle with various unprotected data structures. The fix is to move the code that duplicates the F_SETOWN/FIOSETOWN state of the original socket to the new socket from sonewconn() to accept1(), since accept1() runs in the correct context. Deferring this until the process calls accept() is harmless since the process can't do anything useful with SIGIO on the new socket until it has the descriptor for that socket.
One could make the case for not bothering to duplicate the F_SETOWN/FIOSETOWN state and requiring the process to explicitly make the fcntl() or ioctl() call on the new socket, but this would be incompatible with the previous implementation and might break programs which rely on the old semantics.
This bug was discovered by Andrew Gallatin <gallatin@cs.duke.edu>.
|
#
41086 |
|
11-Nov-1998 |
truckman |
Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments.
This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR.
Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.
PR: kern/7899 Reviewed by: bde, elvind
|
#
40913 |
|
04-Nov-1998 |
fenner |
Fix sbcheck() to check all packets on socket buffer. Also fix data types and printf formats while I'm here.
PR: misc/8494
Panic instead of looping forever in sbflush(). If sb_mbcnt counts more mbufs than sb_cc counts bytes, the original code can turn into an infinite loop of removing 0 bytes from the socket buffer until it's empty.
|
#
38860 |
|
05-Sep-1998 |
bde |
Fixed recently perpetrated printf format errors.
|
#
38808 |
|
04-Sep-1998 |
ache |
make sbflush panic messages more descriptive
|
#
36735 |
|
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
#
36528 |
|
31-May-1998 |
peter |
Have the wakeup routine do the upcall if needed.
Obtained from: NetBSD
|
#
36119 |
|
17-May-1998 |
phk |
s/nanoruntime/nanouptime/g s/microruntime/microuptime/g
Reviewed by: bde
|
#
36079 |
|
15-May-1998 |
wollman |
Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a system, and use this to size the zone allocators used for sockets and for certain PCBs.
Convert PF_LOCAL PCB structures to be type-stable and add a version number.
Define an external format for infomation about socket structures and use it in several places.
Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through sysctl(3) without blocking network interrupts for an unreasonable length of time. This probably still has some bugs and/or race conditions, but it seems to work well enough on my machines.
It is now possible for `netstat' to get almost all of its information via the sysctl(3) interface rather than reading kmem (changes to follow).
|
#
35413 |
|
24-Apr-1998 |
dg |
Added kern.ipc.nmbclusters
|
#
35029 |
|
04-Apr-1998 |
phk |
Time changes mark 2:
* Figure out UTC relative to boottime. Four new functions provide time relative to boottime.
* move "runtime" into struct proc. This helps fix the calcru() problem in SMP.
* kill mono_time.
* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)
* nanosleep, select & poll takes long sleeps one day at a time
Reviewed by: bde Tested by: ache and others
|
#
33955 |
|
01-Mar-1998 |
guido |
Make sure that you can only bind a more specific address when it is done by the same uid. Obtained from: OpenBSD
|
#
29210 |
|
07-Sep-1997 |
bde |
Removed trailing semicolons from the definitions of the sysctl declaration macros so that a semicolon can be added when the macros are invoked without giving a (pedantic) syntax error. Invocations need to be followed by a semicolon so that programs like indent and gtags don't get confused.
Fixed the one invocation that wasn't followed by a trailing semicolon.
|
#
29111 |
|
04-Sep-1997 |
tegge |
sonewconn no longer passes curproc to the protocol attach method since that might cause in_pcballoc to call MALLOC with M_WAITOK during a software interrupt. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
#
29041 |
|
02-Sep-1997 |
bde |
Removed unused #includes.
|
#
28270 |
|
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
#
27531 |
|
19-Jul-1997 |
fenner |
Remove sonewconn() macro kludge, introduced in 4.3-Reno to catch argument mismatches. Prototypes do a much better job these days.
Noticed by: bde
|
#
26096 |
|
24-May-1997 |
peter |
Attempt to convert the ip_divert code to use the new-style protocol request switch. I needed 'LINT' to compile for other reasons so I kinda got the blood on my hands. Note: I don't know how to test this, I don't know if it works correctly.
|
#
25201 |
|
27-Apr-1997 |
wollman |
The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in.
2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
#
24442 |
|
31-Mar-1997 |
dg |
In accept1(), falloc() is called after the process has awoken, but prior to removing the connection from the queue. The problem here is that falloc() may block and this would allow another process to accept the connection instead. If this happens to leave the queue empty, then the system will panic with an "accept: nothing queued".
Also changed a wakeup() to a wakeup_one() to avoid the "thundering herd" problem on new connections in Apache (or any other application that has multiple processes blocked in accept() for the same socket).
|
#
23081 |
|
24-Feb-1997 |
wollman |
Create a new branch of the kernel MIB, kern.ipc, to store all of the configurables and instrumentation related to inter-process communication mechanisms. Some variables, like mbuf statistics, are instrumented here for the first time.
For mbuf statistics: also keep track of m_copym() and m_pullup() failures, and provide for the user's inspection the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.
|
#
22936 |
|
19-Feb-1997 |
wollman |
Make the operation of sonewconn1() a bit clearer by calling pru_attach() before putting the new connection on the connection queue.
|
#
22899 |
|
18-Feb-1997 |
wollman |
uipc_mbuf.c: do a better job of counting how often we have to wait for memory, or are denied a cluster.
uipc_socket2.c: define some generic ``operation-not-supported'' entry points for pr_usrreqs.
|
#
22658 |
|
13-Feb-1997 |
wollman |
For large values of sb_max or MCLBYTES, it was possible for the expression sb_max * MCLBYTES / (MSIZE + MCLBYTES) used in sbreserve() to overflow, causing all socket creation attempts to fail. Force the calculation to use u_quad_t's, which makes overflow less likely.
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
19622 |
|
11-Nov-1996 |
fenner |
Add the IP_RECVIF socket option, which supplies a packet's incoming interface using a sockaddr_dl.
Fix the other packet-information socket options (SO_TIMESTAMP, IP_RECVDSTADDR) to work for multicast UDP and raw sockets as well. (They previously only worked for unicast UDP).
|
#
18874 |
|
11-Oct-1996 |
pst |
Fix two bugs I accidently put into the syn code at the last minute (yes I had tested the hell out of this).
I've also temporarily disabled the code so that it behaves as it previously did (tail drop's the syns) pending discussion with fenner about some socket state flags that I don't fully understand.
Submitted by: fenner
|
#
18787 |
|
07-Oct-1996 |
pst |
Increase robustness of FreeBSD against high-rate connection attempt denial of service attacks.
Reviewed by: bde,wollman,olah Inspired by: vjs@sgi.com
|
#
18365 |
|
19-Sep-1996 |
pst |
Add a new sysctl variable kern.sominqueue to override the MINIMUM queue specified in a listen(2) system call.
|
#
17675 |
|
19-Aug-1996 |
julian |
for kern_conf.c, start allocating dynamic major numbers half way through the range rather than possibly colliding with fixed elements. Increase the size of the arrays to take this into account.. remember that each element in the array is now only 1 ponter so this isn't that much..
also note a possible bug in debugging code in uipc_socket2.c (add XXX)
|
#
17096 |
|
11-Jul-1996 |
wollman |
Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
|
#
17047 |
|
09-Jul-1996 |
wollman |
This is a proposal-in-code for a substantial modification of the way the high kernel calls into a protocol stack to perform requests on the user's behalf. We replace the pr_usrreq() entry in struct protosw with a pointer to a structure containing pointers to functions which implement the various reuqests; each function is declared with the correct type and number of arguments. (This is unlike the current scheme in which a quarter of the requests take arguments of type other than (struct mbuf *) and the difference is papered over with casts.) There are a few benefits to this new scheme:
1) Arguments are passed with their correct types, and null-pointer dummies are no longer necessary.
2) There should be slightly better caching effects from eliminating the prximity to extraneous code and th switch in pr_usrreq().
3) It becomes much easier to change the types of the arguments to something other than `struct mbuf *' (e.g.,pushing the work of sosend() into the protocol as advocated by Van Jacobson).
There is one principal drawback: existing protocol stacks need to be modified. This is alleviated by compatibility code in uipc_socket2.c and uipc_domain.c which emulates the new interface in terms of the old and vice versa.
This idea is not original to me. I read about what Jacobson did in one of his papers and have tried to implement the first steps towards something like that here. Much work remains to be done.
|
#
16322 |
|
12-Jun-1996 |
gpalmer |
Clean up -Wunused warnings.
Reviewed by: bde
|
#
14547 |
|
11-Mar-1996 |
dg |
Changed socket code to use 4.4BSD queue macros. This includes removing the obsolete soqinsque and soqremque functions as well as collapsing so_q0len and so_qlen into a single queue length of unaccepted connections. Now the queue of unaccepted & complete connections is checked directly for queued sockets. The new code should be functionally equivilent to the old while being substantially faster - especially in cases where large numbers of connections are often queued for accept (e.g. http).
|
#
13267 |
|
05-Jan-1996 |
wollman |
Eliminate the dramatic TCP performance decrease observed for writes in the range [210:260] by sweeping the problem under the rug. This change has the following effects:
1) A new MIB variable in the kern branch is defined to allow modification of the socket buffer layer's ``wastage factor'' (which determines how much unused-but-allocated space in mbufs and mbuf clusters is allowed in a socket buffer).
2) The default value of the wastage factor is changed from 2 to 8. The original value was chosen when MINCLSIZE was 7*MLEN (!), and is not appropriate for an environment where MINCLSIZE is much less.
The real solution to this problem is to scrap both mbufs and sockbufs and completely redesign the buffering mechanism used at both levels.
|
#
12843 |
|
14-Dec-1995 |
bde |
Nuked ambiguous sleep message strings: old: new: netcls[] = "netcls" "soclos" netcon[] = "netcon" "accept", "connec" netio[] = "netio" "sblock", "sbwait"
|
#
12041 |
|
03-Nov-1995 |
wollman |
Make somaxconn (maximum backlog in a listen(2) request) and sb_max (maximum size of a socket buffer) tunable.
Permit callers of listen(2) to specify a negative backlog, which is translated into somaxconn. Previously, a negative backlog was silently translated into 0.
|
#
8876 |
|
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
#
3308 |
|
02-Oct-1994 |
phk |
All of this is cosmetic. prototypes, #includes, printfs and so on. Makes GCC a lot more silent.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|