Cross Reference: /freebsd-10.0-release/sys/kern/sys

History log of /freebsd-10.0-release/sys/kern/sys_pipe.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255426	09-Sep-2013	jhb	Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
# 254356	15-Aug-2013	glebius	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
# 250159	01-May-2013	jilles	Add pipe2() system call. The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and O_NONBLOCK (on both sides) as part of the function. If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p). If the pointer is not valid, behaviour differs: pipe2() writes into the array from the kernel like socketpair() does, while pipe() writes into the array from an architecture-specific assembler wrapper. Reviewed by: kan, kib
# 248951	31-Mar-2013	jilles	Rename do_pipe() to kern_pipe2() and declare it properly.
# 246907	17-Feb-2013	pjd	Remove redundant space.
# 238936	31-Jul-2012	davidxu	I am comparing current pipe code with the one in 8.3-STABLE r236165, I found 8.3 is a history BSD version using socket to implement FIFO pipe, it uses per-file seqcount to compare with writer generation stored in per-pipe object. The concept is after all writers are gone, the pipe enters next generation, all old readers have not closed the pipe should get the indication that the pipe is disconnected, result is they should get EPIPE, SIGPIPE or get POLLHUP in poll(). But newcomer should not know that previous writters were gone, it should treat it as a fresh session. I am trying to bring back FIFO pipe to history behavior. It is still unclear that if single EOF flag can represent SBS_CANTSENDMORE and SBS_CANTRCVMORE which socket-based version is using, but I have run the poll regression test in tool directory, output is same as the one on 8.3-STABLE now. I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1. expected POLLHUP; got 0" might be bogus, because newcomer should not know that old writers were gone. I got the same behavior on Linux. Our implementation always return POLLIN for disconnected pipe even it should return POLLHUP, but I think it is not wise to remove POLLIN for compatible reason, this is our history behavior. Regression test: /usr/src/tools/regression/poll
# 238928	31-Jul-2012	davidxu	When a thread is blocked in direct write state, it only sets PIPE_DIRECTW flag but not PIPE_WANTW, but FIFO pipe code does not understand this internal state, when a FIFO peer reader closes the pipe, it wants to notify the writer, it checks PIPE_WANTW, if not set, it skips calling wakeup(), so blocked writer never noticed the case, but in general, the writer should return from the syscall with EPIPE error code and may get SIGPIPE signal. Setting the PIPE_WANTW fixed problem, or you can turn off direct write, it should fix the problem too. This bug is found by PR/170203. Another bug in FIFO pipe code is when peer closes the pipe, another end which is being blocked in select() or poll() is not notified, it missed to call pipeselwakeup(). Third problem is found in poll regression test, the existing code can not pass 6b,6c,6d tests, but FreeBSD-4 works. This commit does not fix the problem, I still need to study more to find the cause. PR: 170203 Tested by: Garrett Copper < yanegomi at gmail dot com >
# 234352	16-Apr-2012	jkim	- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27 but GNU libc used it without checking its kernel version, e. g., Fedora 10. - Move pipe(2) implementation for Linuxulator from MD files to MI file, sys/compat/linux/linux_file.c. There is no MD code for this syscall at all. - Correct an argument type for pipe() from l_ulong * to l_int *. Probably this was the source of MI/MD confusion. Reviewed by: emulation
# 232821	11-Mar-2012	kib	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni
# 232641	07-Mar-2012	kib	The pipe_poll() performs lockless access to the vnode to test fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed by fifo_cleanup(). Precalculate EOF at the places were fo_wgen is changed, and cache the state in a new pipe state flag PIPE_SAMEWGEN. Reported and tested by: bf Submitted by: gianni MFC after: 1 week (a backport)
# 232495	04-Mar-2012	kib	pipe_read(): change the type of size to int, and remove signed clamp. pipe_write(): change the type of desiredsize back to int, its value fits. Requested by: bde MFC after: 3 weeks
# 232271	28-Feb-2012	dim	Change definition of pipe_chmod() from K&R to C99, to avoid the following clang warning: sys/kern/sys_pipe.c:1556:10: error: promoted type 'int' of K&R function parameter is not compatible with the parameter type 'mode_t' (aka 'unsigned short') declared in a previous prototype [-Werror] mode_t mode; ^ sys/kern/sys_pipe.c:155:19: note: previous declaration is here static fo_chmod_t pipe_chmod; ^
# 232183	26-Feb-2012	jilles	Fix fchmod() and fchown() on fifos. The new fifo implementation in r232055 broke fchmod() and fchown() on fifos. Postfix needs this. Submitted by: gianni Reported by: dougb
# 232055	23-Feb-2012	kmacy	merge pipe and fifo implementations Also reviewed by: jhb, jilles (initial revision) Tested by: pho, jilles Submitted by: gianni Reviewed by: bde
# 231949	20-Feb-2012	kib	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
# 228510	14-Dec-2011	jilles	Fix select/poll/kqueue for write on reverse direction before first write. The reverse direction of a pipe is lazily allocated on the first write in that direction (because pipes are usually used in one direction only). A special case is needed to ensure the pipe appears writable before the first write because there are 0 bytes of pending data in 0 bytes of buffer space at that point, leaving 0 bytes of data that can be written with the normal code. Note that the first write returns [ENOMEM] if kern.ipc.maxpipekva is exceeded and does not block or return [EAGAIN], so selecting true for write is correct even in that case. PR: kern/93685 Submitted by: gianni MFC after: 2 weeks
# 228306	06-Dec-2011	kib	Most users of pipe(2) do not call fstat(2) on the returned pipe descriptors. Optimize for the case, by lazily allocating the pipe inode number at the fstat(2) time. If alloc_unr(9) returns failure, do not fail fstat(2), since uses of inode numbers are even rare then fstat(2), but provide zero inode forever. Note that alloc_unr() failure is unlikely due to total number of pipes in the system limited by the number of file descriptors. Based on the submission by: gianni MFC after: 2 weeks
# 228178	01-Dec-2011	kib	If alloc_unr() call in the pipe_create() failed, then pipe->pipe_ino is -1. But, because ino_t is unsigned, this case was not covered by the test ino > 0 in pipeclose(), leading to the free_unr(-1). Fix it by explicitely comparing with 0 and -1. [1] Do no access freed memory, the inode number was cached to prevent access to cpipe after it possibly was freed, but I failed to commit the right patch. Noted by: gianni [1] Pointy hat to: kib MFC after: 3 days
# 226042	05-Oct-2011	kib	Supply unique (st_dev, st_ino) value pair for the fstat(2) done on the pipes. Reviewed by: jhb, Peter Jeremy <peterjeremy acm org> MFC after: 2 weeks
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 225177	25-Aug-2011	attilio	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks
# 224914	16-Aug-2011	kib	Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
# 220245	01-Apr-2011	kib	After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9) and remove the falloc() version that lacks flag argument. This is done to reduce the KPI bloat. Requested by: jhb X-MFC-note: do not
# 219801	20-Mar-2011	alc	Update a comment. The sending process has not mapped the buffer pages since before r127501. Strictly speaking, the buffer pages are not "wired". They remain in the paging queues. However, they are pinned in memory using vm_page_hold().
# 216699	25-Dec-2010	alc	Introduce and use a new VM interface for temporarily pinning pages. This new interface replaces the combined use of vm_fault_quick() and pmap_extract_and_hold() throughout the kernel. In collaboration with: kib@
# 216511	17-Dec-2010	alc	Implement and use a single optimized function for unholding a set of pages. Reviewed by: kib@
# 207805	08-May-2010	alc	Update a comment: It no longer makes sense to talk about the page queues lock here.
# 207410	29-Apr-2010	kmacy	On Alan's advice, rather than do a wholesale conversion on a single architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
# 205792	28-Mar-2010	ed	Rename st_timespec fields to st_tim for POSIX 2008 compliance. A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.
# 197134	12-Sep-2009	rwatson	Use C99 initialization for struct filterops. Obtained from: Mac OS X Sponsored by: Apple Inc. MFC after: 3 weeks
# 195423	07-Jul-2009	kib	Fix poll(2) and select(2) for named pipes to return "ready for read" when all writers, observed by reader, exited. Use writer generation counter for fifo, and store the snapshot of the fifo generation in the f_seqcount field of struct file, that is otherwise unused for fifos. Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is equal to fifo' fi_wgen, and revert r89376. Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them. Note that the patch does not fix not returning POLLHUP for fifos. PR: kern/94772 Submitted by: bde (original version) Reviewed by: rwatson, jilles Approved by: re (kensmith) MFC after: 6 weeks (might be)
# 193951	10-Jun-2009	kib	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland
# 193893	10-Jun-2009	cperciva	Prevent integer overflow in direct pipe write code from circumventing virtual-to-physical page lookups. [09:09] Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10] Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11] Approved by: so (cperciva) Approved by: re (not really, but SVN wants this...) Security: FreeBSD-SA-09:09.pipe Security: FreeBSD-SA-09:10.ipv6 Security: FreeBSD-SA-09:11.ntpd
# 193511	05-Jun-2009	rwatson	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 189649	10-Mar-2009	jhb	- Make maxpipekva a signed long rather than an unsigned long as overflow is more likely to be noticed with signed types. - Make amountpipekva a long as well to match maxpipekva. Discussed with: bde
# 189595	09-Mar-2009	jhb	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
# 184849	11-Nov-2008	ed	Several cleanups related to pipe(2). - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
# 179243	23-May-2008	kib	Another problem caused by the knlist_cleardel() potentially dropping PIPE_MTX(). Since the pipe_present is cleared before (potentially) sleeping, the second thread may enter the pipeclose() for the reciprocal pipe end. The test at the end of the pipeclose() for the pipe_present == 0 would succeed, allowing the second thread to free the pipe memory. First threads then accesses the freed memory after being woken up. Properly track the closing state of the pipe in the pipe_present. Introduce the intermediate state that marks the pipe as mostly dismantled but might be sleeping waiting for the knote list to be cleared. Free the pipe pair memory only when both ends pass that point. Debugging help and tested by: pho Discussed with: jmg MFC after: 2 weeks
# 179242	23-May-2008	kib	Destruction of the pipe calls knlist_cleardel() to remove the knotes monitoring the pipe. The code sets pipe_present = 0 and enters knlist_cleardel(), where the PIPE_MTX might be dropped when knl->kl_list cannot be cleared due to influx knotes. If the following often encountered code fragment if (!(kn->kn_status & KN_DETACHED)) kn->kn_fop->f_detach(kn); knote_drop(kn, td); [1] is executed while the knlist lock is dropped, then the knote memory is freed by the knote_drop() without knote being removed from the knlist, since the filt_pipedetach() contains the following: if (kn->kn_filter == EVFILT_WRITE) { if (!cpipe->pipe_peer->pipe_present) { PIPE_UNLOCK(cpipe); return; Now, the memory may be reused in the zone, causing the access to the freed memory. I got the panics caused by the marker knote appearing on the knlist, that, I believe, manifestation of the issue. In the Peter Holm test scenarious, we got unkillable processes too. The pipe_peer that has the knote for write shall be present. Ignore the pipe_present value for EVFILT_WRITE in filt_pipedetach(). Debugging help and tested by: pho Discussed with: jmg MFC after: 2 weeks
# 175140	07-Jan-2008	jhb	Make ftruncate a 'struct file' operation rather than a vnode operation. This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
# 174988	29-Dec-2007	jeff	Remove explicit locking of struct file. - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
# 174647	16-Dec-2007	jeff	Refactor select to reduce contention and hide internal implementation details from consumers. - Track individual selecters on a per-descriptor basis such that there are no longer collisions and after sleeping for events only those descriptors which triggered events must be rescaned. - Protect the selinfo (per descriptor) structure with a mtx pool mutex. mtx pool mutexes were chosen to preserve api compatibility with existing code which does nothing but bzero() to setup selinfo structures. - Use a per-thread wait channel rather than a global wait channel. - Hide select implementation details in a seltd structure which is opaque to the rest of the kernel. - Provide a 'selsocket' interface for those kernel consumers who wish to select on a socket when they have no fd so they no longer have to be aware of select implementation details. Tested by: kris Reviewed on: arch
# 173750	19-Nov-2007	dumbbell	The kernel uses two ways to write data on a pipe: o buffered write, for chunks smaller than PIPE_MINDIRECT bytes o direct write, for everything else A call to writev(2) may receive struct iov of various size and the kernel may have to switch from one solution to the other. Before doing this, it must wake reader processes and any select/poll/kqueue up. This commit fixes a bug where select/poll/kqueue are not triggered when switching from buffered write to direct write. It adds calls to pipeselwakeup(). I give more details on freebsd-arch@: http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html This should fix issues with Erlang (lang/erlang) and kqueue. Reported by: Rickard Green (Erlang)
# 172930	24-Oct-2007	rwatson	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# 170022	27-May-2007	rwatson	Remove amountpipes counter for pipes -- this replicates the function of existing UMA statistics for pipes, and allows us to get rid of both the per-pipe dtor and two atomic operations per pipe required to maintain the counter.
# 167232	05-Mar-2007	rwatson	Further system call comment cleanup: - Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde) - Remove extra blank lines in some cases. - Add extra blank lines in some cases. - Remove no-op comments consisting solely of the function name, the word "syscall", or the system call name. - Add punctuation. - Re-wrap some comments.
# 165347	19-Dec-2006	pjd	Use pipe_direct_write() optimization only if the data is in process' memory. This fixes sending data through pipe from the kernel. Fix suggested by: rwatson
# 163606	22-Oct-2006	rwatson	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 159481	10-Jun-2006	rwatson	Move some functions and definitions from uipc_socket2.c to uipc_socket.c: - Move sonewconn(), which creates new sockets for incoming connections on listen sockets, so that all socket allocate code is together in uipc_socket.c. - Move 'maxsockets' and associated sysctls to uipc_socket.c with the socket allocation code. - Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it to sysctl.h and remove lots of scattered implementations in various IPC modules. - Sort sodealloc() after soalloc() in uipc_socket.c for dependency order reasons. Statisticize soalloc() and sodealloc() as they are now required only in uipc_socket.c, and are internal to the socket implementation. After this change, socket allocation and deallocation is entirely centralized in one file, and uipc_socket2.c consists entirely of socket buffer manipulation and default protocol switch functions. MFC after: 1 month
# 155035	30-Jan-2006	glebius	- In pipe() return the error returned by pipe_create(), rather then hardcoded ENFILES, which is incorrect. pipe_create() can fail due to ENOMEM. - Update manual page, describing ENOMEM return code. Reviewed by: arch
# 153484	16-Dec-2005	delphij	In pipe_write(): when uiomove() fails, do not spin on it forever. Submitted by: Kostik Belousov <kostikbel at gmail.com> on -current@ Message-ID: <20051216151016.GE84442@deviant.zoral.local> MFC After: 3 weeks
# 147730	01-Jul-2005	ssouhlal	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
# 140369	17-Jan-2005	silby	Rearrange the kninit calls for both directions of a pipe so that they both happen before pipe backing allocation occurs. Previously, a pipe memory shortage would cause a panic due to a KNOTE call on an uninitialized si_note. Reported by: Peter Holm MFC after: 1 week
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 138032	23-Nov-2004	rwatson	Correct a bug introduced in sys_pipe.c:1.179: in pipe_ioctl(), release the pipe mutex before calling fsetown(), as fsetown() may block. The sigio code protects the pipe sigio data using its own mutex, and the pipe reference count held by the caller will prevent the pipe from being prematurely garbage-collected. Discovered by: imp
# 137764	16-Nov-2004	phk	Add missing break.
# 137752	15-Nov-2004	phk	Straighten the ioctl function out to have only one exit point.
# 137355	07-Nov-2004	phk	Introduce fdclose() which will clean an entry in a filedesc. Replace homerolled versions with call to fdclose(). Make fdunused() static to kern_descrip.c
# 133790	15-Aug-2004	silby	Major enhancements to pipe memory usage: - pipespace is now able to resize non-empty pipes; this allows for many more resizing opportunities - Backing is no longer pre-allocated for the reverse direction of pipes. This direction is rarely (if ever) used, so this cuts the amount of map space allocated to a pipe in half. - Pipe growth is now much more dynamic; a pipe will now grow when the total amount of data it contains and the size of the write are larger than the size of pipe. Previously, only individual writes greater than the size of the pipe would cause growth. - In low memory situations, pipes will now shrink during both read and write operations, where possible. Once the memory shortage ends, the growth code will cause these pipes to grow back to an appropriate size. - If the full PIPE_SIZE allocation fails when a new pipe is created, the allocation will be retried with SMALL_PIPE_SIZE. This helps to deal with the situation of a fragmented map after a low memory period has ended. - Minor documentation + code changes to support the above. In total, these changes increase the total number of pipes that can be allocated simultaneously, drastically reducing the chances that pipe allocation will fail. Performance appears unchanged due to dynamic resizing.
# 133741	15-Aug-2004	jmg	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
# 133049	03-Aug-2004	silby	Standardize pipe locking, ensuring that everything is locked via pipelock(), not via a mixture of mutexes and pipelock(). Additionally, add a few KASSERTS, and change some statements that should have been KASSERTS into KASSERTS. As a result of these cleanups, some segments of code have become significantly shorter and/or easier to read.
# 132987	01-Aug-2004	green	* Add a "how" argument to uma_zone constructors and initialization functions so that they know whether the allocation is supposed to be able to sleep or not. * Allow uma_zone constructors and initialation functions to return either success or error. Almost all of the ones in the tree currently return success unconditionally, but mbuf is a notable exception: the packet zone constructor wants to be able to fail if it cannot suballocate an mbuf cluster, and the mbuf allocators want to be able to fail in general in a MAC kernel if the MAC mbuf initializer fails. This fixes the panics people are seeing when they run out of memory for mbuf clusters. * Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing the default. Both bmilekic and jeff have reviewed the changes made to make failable zone allocations work.
# 132579	23-Jul-2004	rwatson	Don't perform pipe endpoint locking during pipe_create(), as the pipe can't yet be referenced by other threads. In microbenchmarks, this appears to reduce the cost of pipe();close();close() on UP by 10%, and SMP by 7%. The vast majority of the cost of allocating a pipe remains VM magic. Suggested by: silby
# 132436	20-Jul-2004	silby	Fix a minor error in pipe_stat - st_size was always reported as 0 when direct writes kicked in. Whether this affected any applications is unknown.
# 127501	27-Mar-2004	alc	Revise the direct or optimized case to use uiomove_fromphys() by the reader instead of ephemeral mappings using pmap_qenter() by the writer. The writer is still, however, responsible for wiring the pages, just not mapping them. Consequently, the allocation of KVA for the direct case is unnecessary. Remove it and the sysctls limiting it, i.e., kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number of temporarily wired pages is still, however, limited by kern.ipc.maxpipekva. Note: On platforms lacking a direct virtual-to-physical mapping, uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus, the number of available sf_bufs can influence the performance of pipes on platforms such i386. Surprisingly, I saw the greatest gain from this change on such a machine: lmbench's pipe bandwidth result increased from ~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.
# 126252	25-Feb-2004	rwatson	Assert pipe mutex in pipeselwakeup(), as we manipulate pipe_state in a non-atomic manner. It appears to always be called with the mutex (good).
# 126249	25-Feb-2004	rwatson	Update comment regarding MAC labels: we no longer pass endpoints into the MAC Framework, just the pipe pair. GC 'hadpeer' used in pipedestroy(), which is no longer needed as we check pipe_present flags on the pair.
# 126131	22-Feb-2004	green	Correct some major SMP-harmful problems in the pipe implementation. First of all, PIPE_EOF is not checked pervasively after everything that can drop the pipe mutex and msleep(), so fix. Additionally, though it might not harm anything, pipelock() and pipeunlock() are not used consistently. Third, the kqueue support functions do not use the pipe mutex correctly. Last, but absolutely not least, is a race: if pipe_busy is not set on the closing side of the pipe, the other side that is trying to write to that will crash BECAUSE PIPE_EOF IS NOT SET! Unconditionally set PIPE_EOF, and get rid of all the lockups/crashes I have seen trying to build ports.
# 125367	03-Feb-2004	rwatson	Don't dec/inc the amountpipes counter every time we resize a pipe -- instead, just dec/inc in the ctor/dtor. For now, increment/decrement in two's, since we're now performing the operation once per pair, not once per pipe. Not really any measurable performance change in my micro-benchmarks, but doing less work is good, especially when it comes to atomic operations. Suggested by: alc
# 125364	03-Feb-2004	rwatson	Catch instances of (pipe == NULL) that were obsoleted with recent changes to jointly allocated pipe pairs. Replace these checks with pipe_present checks. This avoids a NULL pointer dereference when a pipe is half-closed. Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>
# 125293	01-Feb-2004	rwatson	Coalesce pipe allocations and frees. Previously, the pipe code would allocate two 'struct pipe's from the pipe zone, and malloc a mutex. - Create a new "struct pipepair" object holding the two 'struct pipe' instances, struct mutex, and struct label reference. Pipe structures now have a back-pointer to the pipe pair, and a 'pipe_present' flag to indicate whether the half has been closed. - Perform mutex init/destroy in zone init/destroy, avoiding reallocating the mutex for each pipe. Perform most pipe structure setup in zone constructor. - VM memory mappings for pageable buffers are still done outside of the UMA zone. - Change MAC API to speak 'struct pipepair' instead of 'struct pipe', update many policies. MAC labels are also handled outside of the UMA zone for now. Label-only policy modules don't have to be recompiled, but if a module is recompiled, its pipe entry points will need to be updated. If a module actually reached into the pipe structures (unlikely), that would also need to be modified. These changes substantially simplify failure handling in the pipe code as there are many fewer possible failure modes. On half-close, pipes no longer free the 'struct pipe' for the closed half until a full-close takes place. However, VM mapped buffers are still released on half-close. Some code refactoring is now possible to clean up some of the back references, etc; this patch attempts not to change the structure of most of the pipe implementation, only allocation/free code paths, so as to avoid introducing bugs (hopefully). This cuts about 8%-9% off the cost of sequential pipe allocation and free in system call tests on UP and SMP in my micro-benchmarks. May or may not make a difference in macro-benchmarks, but doing less work is good. Reviewed by: juli, tjr Testing help: dwhite, fenestro, scottl, et al
# 125281	31-Jan-2004	rwatson	Fix an error in a KASSERT string: it's pipe_free_kmem(), not pipespace(), that contains this KASSERT.
# 124548	15-Jan-2004	des	New file descriptor allocation code, derived from similar code introduced in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated file descriptors which is used to locate available descriptors when a new one is needed. It also moves the task of growing the file descriptor table out of fdalloc(), reducing complexity in both fdalloc() and do_dup(). Debts of gratitude are owed to tjr@ (who provided the original patch on which this work is based), grog@ (for the gdb(4) man page) and rwatson@ (for assistance with pxeboot(8)).
# 124399	11-Jan-2004	des	Back out 1.160, which was committed by mistake.
# 124394	11-Jan-2004	des	Mechanical whitespace cleanup.
# 124391	11-Jan-2004	des	Mechanical whitespace cleanup + minor style nits.
# 123915	27-Dec-2003	silby	Fix the maxpipekva warning message so that it points to the correct sysctl, and shorten the message. Noticed by: bde
# 122352	09-Nov-2003	tanimura	- Implement selwakeuppri() which allows raising the priority of a thread being waken up. The thread waken up can run at a priority as high as after tsleep(). - Replace selwakeup()s with selwakeuppri()s and pass appropriate priorities. - Add cv_broadcastpri() which raises the priority of the broadcast threads. Used by selwakeuppri() if collision occurs. Not objected in: -arch, -current
# 122164	06-Nov-2003	alc	- Delay the allocation of memory for the pipe mutex until we need it. This avoids the need to free said memory in various error cases along the way.
# 122163	06-Nov-2003	alc	- Simplify pipespace() by eliminating the explicit creation of vm objects. Instead, let the vm objects be lazily instantiated at fault time. This results in the allocation of fewer vm objects and vm map entries due to aggregation in the vm system.
# 121970	03-Nov-2003	rwatson	Unlock pipe mutex when failing MAC pipe ioctl access control check. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 121307	21-Oct-2003	silby	Change all SYSCTLS which are readonly and have a related TUNABLE from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide more useful error messages.
# 121256	19-Oct-2003	dwmalone	falloc allocates a file structure and adds it to the file descriptor table, acquiring the necessary locks as it works. It usually returns two references to the new descriptor: one in the descriptor table and one via a pointer argument. As falloc releases the FILEDESC lock before returning, there is a potential for a process to close the reference in the file descriptor table before falloc's caller gets to use the file. I don't think this can happen in practice at the moment, because Giant indirectly protects closes. To stop the file being completly closed in this situation, this change makes falloc set the refcount to two when both references are returned. This makes life easier for several of falloc's callers, because the first thing they previously did was grab an extra reference on the file. Reviewed by: iedowse Idea run past: jhb
# 121018	12-Oct-2003	jmg	fix a problem referencing free'd memory. This is only a problem for kqueue write events on a socket and you regularly create tons of pipes which overwrites the structure causing a panic when removing the knote from the list. If the peer has gone away (and it's a write knote), then don't bother trying to remove the knote from the list. Submitted by: Brian Buchanan and myself Obtained from: nCircle
# 120000	12-Sep-2003	alc	pipe_build_write_buffer() only requires read access of the page that it obtains from pmap_extract_and_hold().
# 119872	08-Sep-2003	alc	Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently, pipe_build_write_buffer() no longer requires Giant on entry. Reviewed by: tegge
# 119811	06-Sep-2003	alc	Giant is no longer required by pipe_destroy_write_buffer(). Reduce unnecessary white space from pipe_destroy_write_buffer().
# 118929	15-Aug-2003	jmg	if we got this far, we definately don't have an EBADF. Return a more sane result of EPIPE. Reported by: nCircle dev team MFC after: 3 day
# 118880	13-Aug-2003	alc	- The vm_object pointer in pipe_buffer is unused. Remove it. - Check for successful initialization of pipe_zone in pipeinit() rather than every call to pipe(2).
# 118799	11-Aug-2003	alc	Pipespace() no longer requires Giant.
# 118764	11-Aug-2003	silby	More pipe changes: From alc: Move pageable pipe memory to a seperate kernel submap to avoid awkward vm map interlocking issues. (Bad explanation provided by me.) From me: Rework pipespace accounting code to handle this new layout, and adjust our default values to account for the fact that we now have a solid limit on allocations. Also, remove the "maxpipes" limit, as it no longer has a purpose. (The limit on kva usage solves the problem of having two many pipes.)
# 118757	10-Aug-2003	alc	Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded application could cause a wired page to be freed. In general, vm_page_hold() should be preferred for ephemeral kernel mappings of pages borrowed from a user-level address space. (vm_page_wire() should really be reserved for indefinite duration pinning by the "owner" of the page.) Discussed with: silby Submitted by: tegge
# 118677	08-Aug-2003	alc	- Remove GIANT_REQUIRED from pipespace(). - Remove a duplicate initialization from pipe_create().
# 118572	07-Aug-2003	alc	- Remove GIANT_REQUIRED from pipe_free_kmem(). - Remove the acquisition and release of Giant around pipe_kmem_free() and uma_zfree() in pipeclose().
# 118230	30-Jul-2003	pb	Remove test in pipe_write() which causes write(2) to return EAGAIN on a non-blocking pipe in cases where select(2) returns the file descriptor as ready for write. This in turns causes libc_r, for one, to busy wait in such cases. Note: it is a quick performance fix, a more complex fix might be required in case this turns out to have unexpected side effects. Reviewed by: silby MFC after: 3 days
# 118220	30-Jul-2003	alc	The introduction of vm object locking has caused witness to reveal a long-standing mistake in the way a portion of a pipe's KVA is allocated. Specifically, kmem_alloc_pageable() is inappropriate for use in the "direct" case because it allows a preceding vm map entry and vm object to be extended to support the new KVA allocation. However, the direct case KVA allocation should not have a backing vm object. This is corrected by using kmem_alloc_nofault(). Submitted by: tegge (with the above explanation by me)
# 117364	09-Jul-2003	silby	A few minor changes: - Use atomic ops to update the bigpipe count - Make the bigpipe count sysctl readable - Remove a duplicate comparison in an if statement - Comment two SYSCTLs.
# 117325	08-Jul-2003	silby	Put some concrete limits on pipe memory consumption: - Limit the total number of pipes so that we do not exhaust all vm objects in the kernel map. When this limit is reached, a ratelimited message will be printed to the console. - Put a soft limit on the amount of memory consumable by pipes. Once the limit has been reached, all new pipes will be limited to 4K in size, rather than the default of 16K. - Put a limit on the number of pages that may be used for high speed page flipping in order to reduce the amount of wired memory. Pipe writes that occur while this limit is exceeded will fall back to non-page flipping mode. The above values are auto-tuned in subr_param.c and are scaled to take into account both the size of physical memory and the size of the kernel map. These limits help to reduce the "kernel resources exhausted" panics that could be caused by opening a large number of pipes. (Pipes alone are no longer able to exhaust all resources, but other kernel memory hogs in league with pipes may still be able to do so.) PR: 53627 Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com MFC after: 1 week
# 116546	18-Jun-2003	phk	Initialize struct fileops with C99 sparse initialization.
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 116127	09-Jun-2003	mux	style(9).
# 112981	02-Apr-2003	hsu	Need to hold the same SMP lock for (knote) list traversal as for list manipulation. This lock also protects read-modify-write operations on the pipe_state field.
# 112569	24-Mar-2003	jake	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 110908	15-Feb-2003	alfred	Do not allow kqueues to be passed via unix domain sockets.
# 110816	13-Feb-2003	alc	Use atomic ops to update amountpipekva. Amountpipekva represents the total kernel virtual address space used by all pipes. It is, thus, outside the scope of any individual pipe lock.
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 109153	12-Jan-2003	dillon	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
# 109123	11-Jan-2003	dillon	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
# 108255	24-Dec-2002	phk	White-space changes.
# 108238	23-Dec-2002	phk	Detediousficate declaration of fileops array members by introducing typedefs for them.
# 105132	14-Oct-2002	alfred	Remove a KASSERT I added in 1.73 to catch uninitialized pipes. It must be removed because it is done without the pipe being locked via pipelock() and therefore is vulnerable to races with pipespace() erroneously triggering it by temporarily zero'ing out the structure backing the pipe. It looks as if this assertion is not needed because all manipulation of the data changed by pipespace() _is_ protected by pipelock(). Reported by: kris, mckusick
# 105009	12-Oct-2002	alfred	whitespace fixes.
# 104908	11-Oct-2002	mike	Change iov_base's type from `char ' to the standard `void '. All uses of iov_base which assume its type is `char ' (in order to do pointer arithmetic) have been updated to cast iov_base to `char '.
# 104393	03-Oct-2002	truckman	In an SMP environment post-Giant it is no longer safe to blindly dereference the struct sigio pointer without any locking. Change fgetown() to take a reference to the pointer instead of a copy of the pointer and call SIGIO_LOCK() before copying the pointer and dereferencing it. Reviewed by: rwatson
# 104269	01-Oct-2002	rwatson	Improve locking of pipe mutexes in the context of MAC: (1) Where previously the pipe mutex was selectively grabbed during pipe_ioctl(), now always grab it and then release if if not needed. This protects the call to mac_check_pipe_ioctl() to make sure the label remains consistent. (Note: it looks like sigio locking may be incorrect for fgetown() since we call it not-by-reference and sigio locking assumes call by reference). (2) In pipe_stat(), lock the pipe if MAC is compiled in so that the call to mac_check_pipe_stat() gets a locked pipe to protect label consistency. We still release the lock before returning actual stat() data, risking inconsistency, but apparently our pipe locking model accepts that risk. (3) In various pipe MAC authorization checks, assert that the pipe lock is held. (4) Grab the lock when performing a pipe relabel operation, and assert it a little deeper in the stack. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 104094	28-Sep-2002	phk	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
# 102241	21-Aug-2002	archie	Don't use "NULL" when "0" is really meant.
# 102115	19-Aug-2002	rwatson	Break out mac_check_pipe_op() into component check entry points: mac_check_pipe_poll(), mac_check_pipe_read(), mac_check_pipe_stat(), and mac_check_pipe_write(). This is improves consistency with other access control entry points and permits security modules to only control the object methods that they are interested in, avoiding switch statements. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 102003	17-Aug-2002	rwatson	In continuation of early fileop credential changes, modify fo_ioctl() to accept an 'active_cred' argument reflecting the credential of the thread initiating the ioctl operation. - Change fo_ioctl() to accept active_cred; change consumers of the fo_ioctl() interface to generally pass active_cred from td->td_ucred. - In fifofs, initialize filetmp.f_cred to ap->a_cred so that the invocations of soo_ioctl() are provided access to the calling f_cred. Pass ap->a_td->td_ucred as the active_cred, but note that this is required because we don't yet distinguish file_cred and active_cred in invoking VOP's. - Update kqueue_ioctl() for its new argument. - Update pipe_ioctl() for its new argument, pass active_cred rather than td_ucred to MAC for authorization. - Update soo_ioctl() for its new argument. - Update vn_ioctl() for its new argument, use active_cred rather than td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101987	16-Aug-2002	rwatson	Correct white space nits that crept in during my recent merges of trustedbsd_mac material.
# 101983	16-Aug-2002	rwatson	Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101941	15-Aug-2002	rwatson	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101768	13-Aug-2002	rwatson	Introduce support for labeling and access control of pipe objects as part of the TrustedBSD MAC framework. Instrument the creation and destruction of pipes, as well as relevant operations, with necessary calls to the MAC framework. Note that the locking here is probably not quite right yet, but fixes will be forthcoming. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 101382	05-Aug-2002	des	Check the far end before registering an EVFILT_WRITE filter on a pipe.
# 100527	22-Jul-2002	alfred	Remove unneeded caddr_t casts.
# 99899	13-Jul-2002	alc	o Lock accesses to the page queues. o Add a comment explaining why hoisting the page queue lock outside of a particular loop is not possible.
# 99009	28-Jun-2002	alfred	More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
# 98989	28-Jun-2002	alfred	document that the pipe fo_stat routine doesn't need locks because it's a read operation. Requested by: rwatson
# 96122	06-May-2002	alfred	Make funsetown() take a 'struct sigio **' so that the locking can be done internally. Ensure that no one can fsetown() to a dying process/pgrp. We need to check the process for P_WEXIT to see if it's exiting. Process groups are already safe because there is no such thing as a pgrp zombie, therefore the proctree lock completely protects the pgrp from having sigio structures associated with it after it runs funsetownlst. Add sigio lock to witness list under proctree and allproc, but over proc and pgrp. Seigo Tanimura helped with this.
# 95883	01-May-2002	alfred	Redo the sigio locking. Turn the sigio sx into a mutex. Sigio lock is really only needed to protect interrupts from dereferencing the sigio pointer in an object when the sigio itself is being destroyed. In order to do this in the most unintrusive manner change pgsigio's sigio * argument into a **, that way we can lock internally to the function.
# 94608	13-Apr-2002	tmm	Use pmap_extract() instead of pmap_kextract() to retrieve the physical address associated with a user virtual address in pipe_build_write_buffer(). Reviewed by: alc
# 94566	12-Apr-2002	tmm	Back out the last revision - it does not work correctly when one of the pages in question is not in the top-level vm object, but in one of the shadow ones. Pointed out by: alc Pointy hat to: tmm
# 94539	12-Apr-2002	tmm	Do not use pmap_kextract() to find out the physical address of a user belong to a user virtual address; while this happens to work on some architectures, it can't on sparc64, since user and kernel virtual address spaces overlap there (the distinction between them is done via separate address space identifiers). Instead, look up the page in the vm_map of the process in question. Reviewed by: jake
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 93296	27-Mar-2002	alc	Allow resursion on the pipe mutex because filt_piperead() and filt_pipewrite() can be called both with and without the pipe mutex held. (For example, if called by pipeselwakeup(), it is held. Whereas, if called by kqueue_scan(), it is not.) Reviewed by: alfred
# 92959	22-Mar-2002	alfred	When "cloning" a pipe's buffer bcopy the data after dropping the pipe's lock as the data may be paged out and cause a fault.
# 92751	20-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
# 92654	19-Mar-2002	jeff	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 92305	15-Mar-2002	alfred	Bug fixes: Missed a place where the pipe sleep lock was needed in order to safely grab Giant, fix it and add an assertion to make sure this doesn't happen again. Fix typos in the PIPE_GET_GIANT/PIPE_DROP_GIANT that could cause the wrong mutex to get passed to PIPE_LOCK/PIPE_UNLOCK. Fix a location where the wrong pipe was being passed to PIPE_GET_GIANT/PIPE_DROP_GIANT.
# 91968	09-Mar-2002	alfred	Don't deref NULL mutex pointer when pipeclose()'ing a pipe that is not fully instaniated. Revert the logic in pipeclose so that we don't have the entire function pretty much under a single if() statement, instead invert the test and just return if it fails. Submitted (in different form) by: bde Don't use pool mutexes for pipes. We can not use pool mutexes because we will need to grab the select lock while holding a pipe lock which is not allowed because you may not aquire additional mutexes when holding a pool mutex. Instead malloc(9) space for the mutex that is shared between the pipes.
# 91653	04-Mar-2002	tanimura	Track the number of wired pages to avoid unwiring unwired pages. Reviewed by: alfred
# 91413	27-Feb-2002	alfred	kill __P.
# 91412	27-Feb-2002	alfred	add assertions in the places where giant is required to catch when the pipe is locked and shouldn't be. initialize pipe->pipe_mtxp to NULL when creating pipes in order not to trip the above assertions. swap pipe lock with giant around calls to pipe_destroy_write_buffer() pipe_destroy_write_buffer issue noticed by: jhb
# 91395	27-Feb-2002	alfred	Fix a NULL deref panic in pipe_write, we can't blindly lock pipe->pipe_peer->pipe_mtxp because it may be NULL, so lock the passed in pipe's mutex instead.
# 91372	27-Feb-2002	alfred	MPsafe fixes: use SYSINIT to initialize pipe_zone. use PIPE_LOCK to protect kevent ops.
# 91362	27-Feb-2002	alfred	First rev at making pipe(2) pipe's MPsafe. Both ends of the pipe share a pool_mutex, this makes allocation and deadlock avoidance easy. Remove some un-needed FILE_LOCK ops while I'm here. There are some issues wrt to select and the f{s,g}etown code that we'll have to deal with, I think we may also need to move the calls to vfs_timestamp outside of the sections covered by PIPE_LOCK.
# 89306	13-Jan-2002	alfred	SMP Lock struct file, filedesc and the global file list. Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file fp); / increments reference count on a file / struct file fhold_locked(struct file fp); / like fhold but expects file to locked / struct file ffind_hold(struct thread , int fd); / finds the struct file in thread, adds one reference and returns it unlocked / struct file ffind_lock(struct thread , int fd); / ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
# 86598	19-Nov-2001	sobomax	Make kevents on pipes work as described in the manpage - when the last reader/writer disconnects, ensure that anybody who is waiting for the kevent on the other end of the pipe gets EV_EOF. MFC after: 2 weeks
# 83805	21-Sep-2001	jhb	Use the passed in thread to selrecord() instead of curthread.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 79225	04-Jul-2001	dillon	cleanup: GIANT macros, rename DEPRECIATE to DEPRECATE Move p_giant_optional to proc zero'd section Remove (old) XXX zfree comment in pipe code
# 79224	04-Jul-2001	dillon	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
# 78292	15-Jun-2001	jlemon	Correctly hook up the write kqfilter to pipes. Submitted by: Niels Provos <provos@citi.umich.edu>
# 77676	04-Jun-2001	dillon	The pipe_write() code was locking the pipe without busying it first in certain cases, and a close() by another process could potentially rip the pipe out from under the (blocked) locking operation. Reported-by: Alexander Viro <viro@math.psu.edu>
# 77140	24-May-2001	alfred	whitespace/style
# 77035	23-May-2001	alfred	aquire vm_mutex a little bit earlier to protect a pmap call.
# 76940	21-May-2001	jhb	- Assert that the vm mutex is held in pipe_free_kmem(). - Don't release the vm mutex early in pipespace() but instead hold it across vm_object_deallocate() if vm_map_find() returns an error and across pipe_free_kmem() if vm_map_find() succeeds. - Add a XXX above a zfree() since zalloc already has its own locking, one would hope that zfree() wouldn't need the vm lock.
# 76827	18-May-2001	alfred	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 76760	17-May-2001	alfred	Cleanup Remove comment about setting error for reads on EOF, read returns 0 on EOF so the code should be ok. Remove non-effective priority boost, PRIO+1 doesn't do anything (according to McKusick), if a real priority boost is needed it should have been +4. Style fixes: .) return foo -> return (foo) .) FLAG1\|FlAG2 -> FLAG1 \| FlAG2 .) wrap long lines .) unwrap short lines .) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++) .) remove braces for some conditionals with a single statement .) fix continuation lines. md5 couldn't verify the binary because some code had to be shuffled around to address the style issues.
# 76756	17-May-2001	alfred	initialize pipe pointers
# 76754	17-May-2001	alfred	pipe_create has to zero out the select record earlier to avoid returning a half-initialized pipe and causing pipeclose() to follow a junk pointer. Discovered by: "Nick S" <snicko@noid.org>
# 76364	08-May-2001	alfred	Remove an 'optimization' I hope to never see again. The pipe code could not handle running out of kva, it would panic if that happened. Instead return ENFILE to the application which is an acceptable error return from pipe(2). There was some slightly tricky things that needed to be worked on, namely that the pipe code can 'realloc' the size of the buffer if it detects that the pipe could use a bit more room. However if it failed the reallocation it could not cope and would panic. Fix this by attempting to grow the pipe while holding onto our old resources. If all goes well free the old resources and use the new ones, otherwise continue to use the smaller buffer already allocated. While I'm here add a few blank lines for style(9) and remove 'register'.
# 76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# 72521	15-Feb-2001	jlemon	Extend kqueue down to the device layer. Backwards compatible approach suggested by: peter
# 70915	10-Jan-2001	dwmalone	Style improvements for last fix. Should be functionally the same. Submitted by: bde
# 70834	09-Jan-2001	wollman	select() DKI is now in <sys/selinfo.h>.
# 70803	08-Jan-2001	dwmalone	If we failed to allocate the file discriptor for the write end of the pipe, then we were corrupting the pipe_zone free list by calling pipeclose on rpipe twice. NULL out rpipe to avoid this. Reviewed by: dillon Reviewed by: iedowse
# 68883	18-Nov-2000	dillon	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
# 65855	14-Sep-2000	jlemon	Pipes are not writeable while a direct write is in progress. However, the kqueue filter got the sense of the test reversed, so fix it. Spotted by: Michael Elkins <me@sigpipe.org>
# 60938	26-May-2000	jake	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 60833	23-May-2000	jake	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# 60404	11-May-2000	chris	Include UID and GID information for stat() calls using the values filled into the file descriptor data by falloc(). Reviewed by: phk
# 59288	16-Apr-2000	jlemon	Introduce kqueue() and kevent(), a kernel event notification facility.
# 58505	23-Mar-2000	dillon	Fix in-kernel infinite loop in pipe_write() when the reader goes away at just the wrong time.
# 55112	26-Dec-1999	bde	Use vfs_timestamp() instead of getnanotime() to set timestamps. This fixee incoherency of pipe timestamps relative to file timestamps in the usual case where getnanotime() is not used for the latter. (File and pipe timestamps are still incoherent relative to real time unless the vfs_timestamp_precision sysctl is set to 2 or 3).
# 54534	13-Dec-1999	tegge	Fix two problems with pipe_write(): 1. Data written beyond end of pipe buffer, causing kernel memory corruption. - Check that space is still valid after obtaining the pipe lock. - Defer the calculation of transfer size until the pipe lock has been obtained. - Update the pipe buffer pointers while holding the pipe lock. 2. Writes of size <= PIPE_BUF not always atomic. - Allow an internal write to span two contiguous segments, so writes of size <= PIPE_BUF can be kept atomic when wrapping around from the end to the start of the pipe buffer. PR: 15235 Reviewed by: Matt Dillon <dillon@FreeBSD.org>
# 52983	08-Nov-1999	peter	Update pipe code for fo_stat() entry point - pipe_stat() is now no longer used outside the pipe code.
# 52635	29-Oct-1999	phk	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# 51474	20-Sep-1999	dillon	Fix bug in pipe code relating to writes of mmap'd but illegal address spaces which cross a segment boundry in the page table. pmap_kextract() is not designed for access to the user space portion of the page table and cannot handle the null-page-directory-entry case. The fix is to have vm_fault_quick() return a success or failure which is then used to avoid calling pmap_kextract().
# 51418	19-Sep-1999	green	This is what was "fdfix2.patch," a fix for fd sharing. It's pretty far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 49413	04-Aug-1999	green	Fix fd race conditions (during shared fd table usage.) Badfileops is now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter
# 47748	05-Jun-1999	alc	Restructure pipe_read in order to eliminate several race conditions. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> and myself
# 45311	04-Apr-1999	dt	Add standard padding argument to pread and pwrite syscall. That should make them NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
# 43623	04-Feb-1999	dillon	Fix race in pipe read code whereby a blocked lock can allow another process to sneak in and write to or close the pipe. The read code enters a 'piperd' state after doing the lock operation without checking to see if the state changed, which can cause the process to wait forever. The code has also been documented more.
# 43311	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 43301	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 43278	27-Jan-1999	bde	Include <sys/select.h> -- don't depend on pollution in <sys/proc.h>.
# 41591	07-Dec-1998	archie	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
# 41086	11-Nov-1998	truckman	Installed the second patch attached to kern/7899 with some changes suggested by bde, a few other tweaks to get the patch to apply cleanly again and some improvements to the comments. This change closes some fairly minor security holes associated with F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN had on tty devices. For more details, see the description on the PR. Because this patch increases the size of the proc and pgrp structures, it is necessary to re-install the includes and recompile libkvm, the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w. PR: kern/7899 Reviewed by: bde, elvind
# 40700	28-Oct-1998	dg	Added a second argument, "activate" to the vm_page_unwire() call so that the caller can select either inactive or active queue to put the page on.
# 40286	13-Oct-1998	dg	Fixed two potentially serious classes of bugs: 1) The vnode pager wasn't properly tracking the file size due to "size" being page rounded in some cases and not in others. This sometimes resulted in corrupted files. First noticed by Terry Lambert. Fixed by changing the "size" pager_alloc parameter to be a 64bit byte value (as opposed to a 32bit page index) and changing the pagers and their callers to deal with this properly. 2) Fixed a bogus type cast in round_page() and trunc_page() that caused some 64bit offsets and sizes to be scrambled. Removing the cast required adding casts at a few dozen callers. There may be problems with other bogus casts in close-by macros. A quick check seemed to indicate that those were okay, however.
# 36735	07-Jun-1998	dfr	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
# 34924	28-Mar-1998	bde	Moved some #includes from <sys/param.h> nearer to where they are actually used.
# 34901	26-Mar-1998	phk	Add two new functions, get{micro\|nano}time. They are atomic, but return in essence what is in the "time" variable. gettime() is now a macro front for getmicrotime(). Various patches to use the two new functions instead of the various hacks used in their absence. Some puntuation and grammer patches from Bruce. A couple of XXX comments.
# 33181	09-Feb-1998	eivind	Staticize.
# 33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
# 33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
# 31016	07-Nov-1997	phk	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
# 30994	06-Nov-1997	phk	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
# 30164	06-Oct-1997	peter	Ack! Fix excessive cut/paste blunder during poll mods. Who had the pointy hat last? :-] When one is selecting (or polling) for write, it helps if we use the write side of the pipe when requesting wakeups instead of the read side. This broke ghostview (at least) - I'm suprised it wasn't noticed for so long. Reviewed by: Greg Lehey <grog@lemis.com>
# 29356	14-Sep-1997	peter	Implement the poll backend for the pipe file type.
# 29041	02-Sep-1997	bde	Removed unused #includes.
# 27923	05-Aug-1997	dyson	Another attempt at cleaning up the new memory allocator.
# 27900	04-Aug-1997	dyson	Fix up come cruft that I left on a previous commit.
# 27899	04-Aug-1997	dyson	Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of a simple, clean zone type allocator. This new allocator will also be used for machine dependent pmap PV entries.
# 24752	09-Apr-1997	bde	Removed support for OLD_PIPE. <sys/stat.h> is now missing the hack that supported nameless pipes being indistinguishable from fifos. We're not going back.
# 24206	24-Mar-1997	bde	Don't include <sys/ioctl.h> in the kernel. Stage 4: include <sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h> in miscellaneous files. Most of these files have nothing to do with ttys but need to include <sys/ttycom.h> to get the definitions of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into ioctls.
# 24131	23-Mar-1997	bde	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 24101	22-Mar-1997	bde	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
# 22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 18863	11-Oct-1996	dyson	Mostly some fixes from bde to start support for ASYNC I/O (SIGIO). Submitted by: bde
# 17163	13-Jul-1996	dyson	A few minor mods (improvements) to support more efficient pipe operations for large transfers. There are essentially no differences for small transfers, but big transfers should perform about 20% better.
# 17124	12-Jul-1996	bde	Staticized some variables. Fixed initialization of pipe_pgid - don't default to pid 0 (swapper) for SIGIO. Added comments about other implicit initializations, mostly for struct stat. Fixed initialization of st_mode. S_IFSOCK was for when pipes were sockets. It is probably safe to fix the bogus S_ISFIFO() now that pipes can be distinguished from sockets in all cases. Don't return ENOSYS for inappropriate ioctls.
# 16960	04-Jul-1996	dyson	Get rid of PIPE_NBIO, cleaning up the code a bit. Reviewed by: bde
# 16416	17-Jun-1996	dyson	Disable direct writes for non-blocking output.
# 16322	12-Jun-1996	gpalmer	Clean up -Wunused warnings. Reviewed by: bde
# 14802	24-Mar-1996	dyson	Various pipe error return fixes, and a significant typeo fix. From Bruce Evans (of course :-)). Submitted by: bde
# 14644	17-Mar-1996	dyson	Yet another fix from BDE for the new pipe code. This fixes a potential deadlock due to mismanagement of busy counters. Reviewed by: dyson Submitted by: bde
# 14177	22-Feb-1996	dyson	Fix a problem that select did not work with direct writes. Make wakeup channels more consistant also.
# 14122	17-Feb-1996	peter	Add missing prototype for pipeselwakeup (a recently added function) - gcc bitches about it..
# 14037	11-Feb-1996	dyson	Add ifdefs for non-freebsd system usage. Add missing select wakeups, and make the select wakup code a little neater.
# 13992	09-Feb-1996	dyson	Add some missing requests for the read-side to wakeup the write-side. Also add some missing wakeups by the write side to the read side.
# 13951	07-Feb-1996	dyson	Apparent fix for a pipe hang problem.
# 13913	05-Feb-1996	dyson	More fixes from bde. Only modify times on success. splhigh() around time variable usage. Make atomic writes more posix compliant. Spelling errors. Submitted by: bde
# 13912	05-Feb-1996	dyson	Kva space allocated for direct buffer wasn't quite big enough. The system can panic easily without this patch.
# 13909	04-Feb-1996	dyson	Changed vm_fault_quick in vm_machdep.c to be global. Needed for new pipe code.
# 13907	04-Feb-1996	dyson	Improve the performance for pipe(2) again. Also include some fixes for previous version of new pipes from Bruce Evans. This new version: Supports more properly the semantics of select (BDE). Supports "OLD_PIPE" correctly (kern_descrip.c, BDE). Eliminates incorrect EPIPE returns (bash 'pipe broken' messages.) Much faster yet, currently tuned relatively conservatively -- but now gives approx 50% more perf than the new pipes code did originally. (That was about 50% more perf than the original BSD pipe code.) Known bugs outstanding: No support for async io (SIGIO). Will be included soon. Next to do: Merge support for FIFOs. Submitted by: bde
# 13776	31-Jan-1996	dyson	Fix another problem with the new pipe code, pointed out by Bruce Evans. This one fixes a problem with interactions with signals.
# 13774	31-Jan-1996	dyson	Fix some problems with return codes on the new pipe stuff. Bruce Evans found the problems, and this commit will fix the "first batch" :-).
# 13688	29-Jan-1996	dyson	Fixed an uninitialized variable (argument to vm_map_find) -- problem that DG detected, and promptly found a fix. Submitted by: davidg
# 13675	28-Jan-1996	dyson	Added new files to support the new fast pipes. After the follow-on commits, pipe performance should increase significantly. The pipe(2) system call is currently supported, while fifofs will be added later.