History log of /freebsd-10.0-release/sys/kern/sys_pipe.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


# 255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


# 254356 15-Aug-2013 glebius

Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 250159 01-May-2013 jilles

Add pipe2() system call.

The pipe2() function is similar to pipe() but allows setting FD_CLOEXEC and
O_NONBLOCK (on both sides) as part of the function.

If p points to two writable ints, pipe2(p, 0) is equivalent to pipe(p).

If the pointer is not valid, behaviour differs: pipe2() writes into the
array from the kernel like socketpair() does, while pipe() writes into the
array from an architecture-specific assembler wrapper.

Reviewed by: kan, kib


# 248951 31-Mar-2013 jilles

Rename do_pipe() to kern_pipe2() and declare it properly.


# 246907 17-Feb-2013 pjd

Remove redundant space.


# 238936 31-Jul-2012 davidxu

I am comparing current pipe code with the one in 8.3-STABLE r236165,
I found 8.3 is a history BSD version using socket to implement FIFO
pipe, it uses per-file seqcount to compare with writer generation
stored in per-pipe object. The concept is after all writers are gone,
the pipe enters next generation, all old readers have not closed the
pipe should get the indication that the pipe is disconnected, result
is they should get EPIPE, SIGPIPE or get POLLHUP in poll().
But newcomer should not know that previous writters were gone, it
should treat it as a fresh session.
I am trying to bring back FIFO pipe to history behavior. It is still
unclear that if single EOF flag can represent SBS_CANTSENDMORE and
SBS_CANTRCVMORE which socket-based version is using, but I have run
the poll regression test in tool directory, output is same as the one
on 8.3-STABLE now.
I think the output "not ok 18 FIFO state 6b: poll result 0 expected 1.
expected POLLHUP; got 0" might be bogus, because newcomer should not
know that old writers were gone. I got the same behavior on Linux.
Our implementation always return POLLIN for disconnected pipe even it
should return POLLHUP, but I think it is not wise to remove POLLIN for
compatible reason, this is our history behavior.

Regression test: /usr/src/tools/regression/poll


# 238928 31-Jul-2012 davidxu

When a thread is blocked in direct write state, it only sets PIPE_DIRECTW
flag but not PIPE_WANTW, but FIFO pipe code does not understand this internal
state, when a FIFO peer reader closes the pipe, it wants to notify the writer,
it checks PIPE_WANTW, if not set, it skips calling wakeup(), so blocked writer
never noticed the case, but in general, the writer should return from the
syscall with EPIPE error code and may get SIGPIPE signal. Setting the
PIPE_WANTW fixed problem, or you can turn off direct write, it should fix the
problem too. This bug is found by PR/170203.

Another bug in FIFO pipe code is when peer closes the pipe, another end which
is being blocked in select() or poll() is not notified, it missed to call
pipeselwakeup().

Third problem is found in poll regression test, the existing code can not
pass 6b,6c,6d tests, but FreeBSD-4 works. This commit does not fix the
problem, I still need to study more to find the cause.

PR: 170203
Tested by: Garrett Copper &lt; yanegomi at gmail dot com &gt;


# 234352 16-Apr-2012 jkim

- Implement pipe2 syscall for Linuxulator. This syscall appeared in 2.6.27
but GNU libc used it without checking its kernel version, e. g., Fedora 10.
- Move pipe(2) implementation for Linuxulator from MD files to MI file,
sys/compat/linux/linux_file.c. There is no MD code for this syscall at all.
- Correct an argument type for pipe() from l_ulong * to l_int *. Probably
this was the source of MI/MD confusion.

Reviewed by: emulation


# 232821 11-Mar-2012 kib

Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by: gianni


# 232641 07-Mar-2012 kib

The pipe_poll() performs lockless access to the vnode to test
fifo_iseof() condition, allowing the v_fifoinfo to be reset and freed
by fifo_cleanup().

Precalculate EOF at the places were fo_wgen is changed, and cache the
state in a new pipe state flag PIPE_SAMEWGEN.

Reported and tested by: bf
Submitted by: gianni
MFC after: 1 week (a backport)


# 232495 04-Mar-2012 kib

pipe_read(): change the type of size to int, and remove signed clamp.
pipe_write(): change the type of desiredsize back to int, its value fits.

Requested by: bde
MFC after: 3 weeks


# 232271 28-Feb-2012 dim

Change definition of pipe_chmod() from K&R to C99, to avoid the
following clang warning:

sys/kern/sys_pipe.c:1556:10: error: promoted type 'int' of K&R function parameter is not compatible with the parameter type 'mode_t'
(aka 'unsigned short') declared in a previous prototype [-Werror]
mode_t mode;
^
sys/kern/sys_pipe.c:155:19: note: previous declaration is here
static fo_chmod_t pipe_chmod;
^


# 232183 26-Feb-2012 jilles

Fix fchmod() and fchown() on fifos.

The new fifo implementation in r232055 broke fchmod() and fchown() on fifos.
Postfix needs this.

Submitted by: gianni
Reported by: dougb


# 232055 23-Feb-2012 kmacy

merge pipe and fifo implementations

Also reviewed by: jhb, jilles (initial revision)
Tested by: pho, jilles

Submitted by: gianni
Reviewed by: bde


# 231949 20-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# 228510 14-Dec-2011 jilles

Fix select/poll/kqueue for write on reverse direction before first write.

The reverse direction of a pipe is lazily allocated on the first write in
that direction (because pipes are usually used in one direction only). A
special case is needed to ensure the pipe appears writable before the first
write because there are 0 bytes of pending data in 0 bytes of buffer space
at that point, leaving 0 bytes of data that can be written with the normal
code.

Note that the first write returns [ENOMEM] if kern.ipc.maxpipekva is
exceeded and does not block or return [EAGAIN], so selecting true for write
is correct even in that case.

PR: kern/93685
Submitted by: gianni
MFC after: 2 weeks


# 228306 06-Dec-2011 kib

Most users of pipe(2) do not call fstat(2) on the returned pipe descriptors.
Optimize for the case, by lazily allocating the pipe inode number at the
fstat(2) time. If alloc_unr(9) returns failure, do not fail fstat(2), since
uses of inode numbers are even rare then fstat(2), but provide zero inode
forever. Note that alloc_unr() failure is unlikely due to total number
of pipes in the system limited by the number of file descriptors.

Based on the submission by: gianni
MFC after: 2 weeks


# 228178 01-Dec-2011 kib

If alloc_unr() call in the pipe_create() failed, then pipe->pipe_ino is
-1. But, because ino_t is unsigned, this case was not covered by the
test ino > 0 in pipeclose(), leading to the free_unr(-1). Fix it by
explicitely comparing with 0 and -1. [1]

Do no access freed memory, the inode number was cached to prevent access
to cpipe after it possibly was freed, but I failed to commit the right
patch.

Noted by: gianni [1]
Pointy hat to: kib
MFC after: 3 days


# 226042 05-Oct-2011 kib

Supply unique (st_dev, st_ino) value pair for the fstat(2) done on the pipes.

Reviewed by: jhb, Peter Jeremy <peterjeremy acm org>
MFC after: 2 weeks


# 225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 225177 25-Aug-2011 attilio

Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks


# 224914 16-Aug-2011 kib

Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)


# 220245 01-Apr-2011 kib

After the r219999 is merged to stable/8, rename fallocf(9) to falloc(9)
and remove the falloc() version that lacks flag argument. This is done
to reduce the KPI bloat.

Requested by: jhb
X-MFC-note: do not


# 219801 20-Mar-2011 alc

Update a comment. The sending process has not mapped the buffer pages
since before r127501. Strictly speaking, the buffer pages are not
"wired". They remain in the paging queues. However, they are pinned in
memory using vm_page_hold().


# 216699 25-Dec-2010 alc

Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with: kib@


# 216511 17-Dec-2010 alc

Implement and use a single optimized function for unholding a set of pages.

Reviewed by: kib@


# 207805 08-May-2010 alc

Update a comment: It no longer makes sense to talk about the page queues
lock here.


# 207410 29-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


# 205792 28-Mar-2010 ed

Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.

A nice thing about POSIX 2008 is that it finally standardizes a way to
obtain file access/modification/change times in sub-second precision,
namely using struct timespec, which we already have for a very long
time. Unfortunately POSIX uses different names.

This commit adds compatibility macros, so existing code should still
build properly. Also change all source code in the kernel to work
without any of the compatibility macros. This makes it all a less
ambiguous.

I am also renaming st_birthtime to st_birthtim, even though it was a
local extension anyway. It seems Cygwin also has a st_birthtim.


# 197134 12-Sep-2009 rwatson

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


# 195423 07-Jul-2009 kib

Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.

Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.

PR: kern/94772
Submitted by: bde (original version)
Reviewed by: rwatson, jilles
Approved by: re (kensmith)
MFC after: 6 weeks (might be)


# 193951 10-Jun-2009 kib

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


# 193893 10-Jun-2009 cperciva

Prevent integer overflow in direct pipe write code from circumventing
virtual-to-physical page lookups. [09:09]

Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10]

Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11]

Approved by: so (cperciva)
Approved by: re (not really, but SVN wants this...)
Security: FreeBSD-SA-09:09.pipe
Security: FreeBSD-SA-09:10.ipv6
Security: FreeBSD-SA-09:11.ntpd


# 193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 189649 10-Mar-2009 jhb

- Make maxpipekva a signed long rather than an unsigned long as overflow
is more likely to be noticed with signed types.
- Make amountpipekva a long as well to match maxpipekva.

Discussed with: bde


# 189595 09-Mar-2009 jhb

Adjust some variables (mostly related to the buffer cache) that hold
address space sizes to be longs instead of ints. Specifically, the follow
values are now longs: runningbufspace, bufspace, maxbufspace,
bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace,
hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a
relatively small number (~ 44000) of buffers set in kern.nbuf would result
in integer overflows resulting either in hangs or bogus values of
hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see
such problems. There was a check for a nbuf setting that would cause
overflows in the auto-tuning of nbuf. I've changed it to always check and
cap nbuf but warn if a user-supplied tunable would cause overflow.

Note that this changes the ABI of several sysctls that are used by things
like top(1), etc., so any MFC would probably require a some gross shims
to allow for that.

MFC after: 1 month


# 184849 11-Nov-2008 ed

Several cleanups related to pipe(2).

- Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2)
fills an array with two descriptors.

- Remove EFAULT from the manual page. Because of the current calling
convention, pipe(2) raises a segmentation fault when an invalid
address is passed.

- Introduce kern_pipe() to make it easier for binary emulations to
implement pipe(2).

- Make Linux binary emulation use kern_pipe(), which means we don't have
to recover td_retval after calling the FreeBSD system call.

Approved by: rdivacky
Discussed on: arch


# 179243 23-May-2008 kib

Another problem caused by the knlist_cleardel() potentially dropping
PIPE_MTX().

Since the pipe_present is cleared before (potentially) sleeping, the
second thread may enter the pipeclose() for the reciprocal pipe end.
The test at the end of the pipeclose() for the pipe_present == 0 would
succeed, allowing the second thread to free the pipe memory. First
threads then accesses the freed memory after being woken up.

Properly track the closing state of the pipe in the pipe_present.
Introduce the intermediate state that marks the pipe as mostly
dismantled but might be sleeping waiting for the knote list to be
cleared. Free the pipe pair memory only when both ends pass that point.

Debugging help and tested by: pho
Discussed with: jmg
MFC after: 2 weeks


# 179242 23-May-2008 kib

Destruction of the pipe calls knlist_cleardel() to remove the knotes
monitoring the pipe. The code sets pipe_present = 0 and enters
knlist_cleardel(), where the PIPE_MTX might be dropped when knl->kl_list
cannot be cleared due to influx knotes.

If the following often encountered code fragment
if (!(kn->kn_status & KN_DETACHED))
kn->kn_fop->f_detach(kn);
knote_drop(kn, td); [1]
is executed while the knlist lock is dropped, then the knote memory is freed
by the knote_drop() without knote being removed from the knlist, since
the filt_pipedetach() contains the following:
if (kn->kn_filter == EVFILT_WRITE) {
if (!cpipe->pipe_peer->pipe_present) {
PIPE_UNLOCK(cpipe);
return;

Now, the memory may be reused in the zone, causing the access to the
freed memory. I got the panics caused by the marker knote appearing on
the knlist, that, I believe, manifestation of the issue. In the Peter
Holm test scenarious, we got unkillable processes too.

The pipe_peer that has the knote for write shall be present. Ignore the
pipe_present value for EVFILT_WRITE in filt_pipedetach().

Debugging help and tested by: pho
Discussed with: jmg
MFC after: 2 weeks


# 175140 07-Jan-2008 jhb

Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by: rwatson


# 174988 29-Dec-2007 jeff

Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho


# 174647 16-Dec-2007 jeff

Refactor select to reduce contention and hide internal implementation
details from consumers.

- Track individual selecters on a per-descriptor basis such that there
are no longer collisions and after sleeping for events only those
descriptors which triggered events must be rescaned.
- Protect the selinfo (per descriptor) structure with a mtx pool mutex.
mtx pool mutexes were chosen to preserve api compatibility with
existing code which does nothing but bzero() to setup selinfo
structures.
- Use a per-thread wait channel rather than a global wait channel.
- Hide select implementation details in a seltd structure which is
opaque to the rest of the kernel.
- Provide a 'selsocket' interface for those kernel consumers who wish to
select on a socket when they have no fd so they no longer have to
be aware of select implementation details.

Tested by: kris
Reviewed on: arch


# 173750 19-Nov-2007 dumbbell

The kernel uses two ways to write data on a pipe:
o buffered write, for chunks smaller than PIPE_MINDIRECT bytes
o direct write, for everything else

A call to writev(2) may receive struct iov of various size and the
kernel may have to switch from one solution to the other. Before doing
this, it must wake reader processes and any select/poll/kqueue up.

This commit fixes a bug where select/poll/kqueue are not triggered
when switching from buffered write to direct write. It adds calls to
pipeselwakeup().

I give more details on freebsd-arch@:
http://lists.freebsd.org/pipermail/freebsd-arch/2007-September/006790.html

This should fix issues with Erlang (lang/erlang) and kqueue.

Reported by: Rickard Green (Erlang)


# 172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# 170022 27-May-2007 rwatson

Remove amountpipes counter for pipes -- this replicates the function of
existing UMA statistics for pipes, and allows us to get rid of both the
per-pipe dtor and two atomic operations per pipe required to maintain
the counter.


# 167232 05-Mar-2007 rwatson

Further system call comment cleanup:

- Remove also "MP SAFE" after prior "MPSAFE" pass. (suggested by bde)
- Remove extra blank lines in some cases.
- Add extra blank lines in some cases.
- Remove no-op comments consisting solely of the function name, the word
"syscall", or the system call name.
- Add punctuation.
- Re-wrap some comments.


# 165347 19-Dec-2006 pjd

Use pipe_direct_write() optimization only if the data is in process' memory.
This fixes sending data through pipe from the kernel.

Fix suggested by: rwatson


# 163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 159481 10-Jun-2006 rwatson

Move some functions and definitions from uipc_socket2.c to uipc_socket.c:

- Move sonewconn(), which creates new sockets for incoming connections on
listen sockets, so that all socket allocate code is together in
uipc_socket.c.

- Move 'maxsockets' and associated sysctls to uipc_socket.c with the
socket allocation code.

- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it
to sysctl.h and remove lots of scattered implementations in various
IPC modules.

- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order
reasons. Statisticize soalloc() and sodealloc() as they are now
required only in uipc_socket.c, and are internal to the socket
implementation.

After this change, socket allocation and deallocation is entirely
centralized in one file, and uipc_socket2.c consists entirely of socket
buffer manipulation and default protocol switch functions.

MFC after: 1 month


# 155035 30-Jan-2006 glebius

- In pipe() return the error returned by pipe_create(), rather then
hardcoded ENFILES, which is incorrect. pipe_create() can fail due
to ENOMEM.
- Update manual page, describing ENOMEM return code.

Reviewed by: arch


# 153484 16-Dec-2005 delphij

In pipe_write(): when uiomove() fails, do not spin on it forever.

Submitted by: Kostik Belousov <kostikbel at gmail.com> on -current@
Message-ID: <20051216151016.GE84442@deviant.zoral.local>
MFC After: 3 weeks


# 147730 01-Jul-2005 ssouhlal

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 140369 17-Jan-2005 silby

Rearrange the kninit calls for both directions of a pipe so that
they both happen before pipe backing allocation occurs. Previously,
a pipe memory shortage would cause a panic due to a KNOTE call
on an uninitialized si_note.

Reported by: Peter Holm
MFC after: 1 week


# 139804 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


# 138032 23-Nov-2004 rwatson

Correct a bug introduced in sys_pipe.c:1.179: in pipe_ioctl(),
release the pipe mutex before calling fsetown(), as fsetown()
may block. The sigio code protects the pipe sigio data using
its own mutex, and the pipe reference count held by the caller
will prevent the pipe from being prematurely garbage-collected.

Discovered by: imp


# 137764 16-Nov-2004 phk

Add missing break.


# 137752 15-Nov-2004 phk

Straighten the ioctl function out to have only one exit point.


# 137355 07-Nov-2004 phk

Introduce fdclose() which will clean an entry in a filedesc.

Replace homerolled versions with call to fdclose().

Make fdunused() static to kern_descrip.c


# 133790 15-Aug-2004 silby

Major enhancements to pipe memory usage:

- pipespace is now able to resize non-empty pipes; this allows
for many more resizing opportunities

- Backing is no longer pre-allocated for the reverse direction
of pipes. This direction is rarely (if ever) used, so this cuts the
amount of map space allocated to a pipe in half.

- Pipe growth is now much more dynamic; a pipe will now grow when
the total amount of data it contains and the size of the write are
larger than the size of pipe. Previously, only individual writes greater
than the size of the pipe would cause growth.

- In low memory situations, pipes will now shrink during both read
and write operations, where possible. Once the memory shortage
ends, the growth code will cause these pipes to grow back to an appropriate
size.

- If the full PIPE_SIZE allocation fails when a new pipe is created, the
allocation will be retried with SMALL_PIPE_SIZE. This helps to deal
with the situation of a fragmented map after a low memory period has
ended.

- Minor documentation + code changes to support the above.

In total, these changes increase the total number of pipes that
can be allocated simultaneously, drastically reducing the chances that
pipe allocation will fail.

Performance appears unchanged due to dynamic resizing.


# 133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


# 133049 03-Aug-2004 silby

Standardize pipe locking, ensuring that everything is locked via
pipelock(), not via a mixture of mutexes and pipelock(). Additionally,
add a few KASSERTS, and change some statements that should have been
KASSERTS into KASSERTS.

As a result of these cleanups, some segments of code have become
significantly shorter and/or easier to read.


# 132987 01-Aug-2004 green

* Add a "how" argument to uma_zone constructors and initialization functions
so that they know whether the allocation is supposed to be able to sleep
or not.
* Allow uma_zone constructors and initialation functions to return either
success or error. Almost all of the ones in the tree currently return
success unconditionally, but mbuf is a notable exception: the packet
zone constructor wants to be able to fail if it cannot suballocate an
mbuf cluster, and the mbuf allocators want to be able to fail in general
in a MAC kernel if the MAC mbuf initializer fails. This fixes the
panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
the default.

Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.


# 132579 23-Jul-2004 rwatson

Don't perform pipe endpoint locking during pipe_create(), as the pipe
can't yet be referenced by other threads.

In microbenchmarks, this appears to reduce the cost of
pipe();close();close() on UP by 10%, and SMP by 7%. The vast majority
of the cost of allocating a pipe remains VM magic.

Suggested by: silby


# 132436 20-Jul-2004 silby

Fix a minor error in pipe_stat - st_size was always reported as 0
when direct writes kicked in. Whether this affected any applications
is unknown.


# 127501 27-Mar-2004 alc

Revise the direct or optimized case to use uiomove_fromphys() by the reader
instead of ephemeral mappings using pmap_qenter() by the writer. The
writer is still, however, responsible for wiring the pages, just not
mapping them. Consequently, the allocation of KVA for the direct case is
unnecessary. Remove it and the sysctls limiting it, i.e.,
kern.ipc.maxpipekvawired and kern.ipc.amountpipekvawired. The number
of temporarily wired pages is still, however, limited by
kern.ipc.maxpipekva.

Note: On platforms lacking a direct virtual-to-physical mapping,
uiomove_fromphys() uses sf_bufs to cache ephemeral mappings. Thus,
the number of available sf_bufs can influence the performance of pipes
on platforms such i386. Surprisingly, I saw the greatest gain from this
change on such a machine: lmbench's pipe bandwidth result increased from
~1050MB/s to ~1850MB/s on my 2.4GHz, 400MHz FSB P4 Xeon.


# 126252 25-Feb-2004 rwatson

Assert pipe mutex in pipeselwakeup(), as we manipulate pipe_state
in a non-atomic manner. It appears to always be called with the
mutex (good).


# 126249 25-Feb-2004 rwatson

Update comment regarding MAC labels: we no longer pass endpoints
into the MAC Framework, just the pipe pair.

GC 'hadpeer' used in pipedestroy(), which is no longer needed as
we check pipe_present flags on the pair.


# 126131 22-Feb-2004 green

Correct some major SMP-harmful problems in the pipe implementation. First
of all, PIPE_EOF is not checked pervasively after everything that can drop
the pipe mutex and msleep(), so fix. Additionally, though it might not
harm anything, pipelock() and pipeunlock() are not used consistently.
Third, the kqueue support functions do not use the pipe mutex correctly.
Last, but absolutely not least, is a race: if pipe_busy is not set on
the closing side of the pipe, the other side that is trying to write to
that will crash BECAUSE PIPE_EOF IS NOT SET! Unconditionally set
PIPE_EOF, and get rid of all the lockups/crashes I have seen trying
to build ports.


# 125367 03-Feb-2004 rwatson

Don't dec/inc the amountpipes counter every time we resize a pipe --
instead, just dec/inc in the ctor/dtor. For now, increment/decrement
in two's, since we're now performing the operation once per pair,
not once per pipe. Not really any measurable performance change
in my micro-benchmarks, but doing less work is good, especially when
it comes to atomic operations.

Suggested by: alc


# 125364 03-Feb-2004 rwatson

Catch instances of (pipe == NULL) that were obsoleted with recent
changes to jointly allocated pipe pairs. Replace these checks
with pipe_present checks. This avoids a NULL pointer dereference
when a pipe is half-closed.

Submitted by: Peter Edwards <peter.edwards@openet-telecom.com>


# 125293 01-Feb-2004 rwatson

Coalesce pipe allocations and frees. Previously, the pipe code
would allocate two 'struct pipe's from the pipe zone, and malloc a
mutex.

- Create a new "struct pipepair" object holding the two 'struct
pipe' instances, struct mutex, and struct label reference. Pipe
structures now have a back-pointer to the pipe pair, and a
'pipe_present' flag to indicate whether the half has been
closed.

- Perform mutex init/destroy in zone init/destroy, avoiding
reallocating the mutex for each pipe. Perform most pipe structure
setup in zone constructor.

- VM memory mappings for pageable buffers are still done outside of
the UMA zone.

- Change MAC API to speak 'struct pipepair' instead of 'struct pipe',
update many policies. MAC labels are also handled outside of the
UMA zone for now. Label-only policy modules don't have to be
recompiled, but if a module is recompiled, its pipe entry points
will need to be updated. If a module actually reached into the
pipe structures (unlikely), that would also need to be modified.

These changes substantially simplify failure handling in the pipe
code as there are many fewer possible failure modes.

On half-close, pipes no longer free the 'struct pipe' for the closed
half until a full-close takes place. However, VM mapped buffers
are still released on half-close.

Some code refactoring is now possible to clean up some of the back
references, etc; this patch attempts not to change the structure
of most of the pipe implementation, only allocation/free code
paths, so as to avoid introducing bugs (hopefully).

This cuts about 8%-9% off the cost of sequential pipe allocation
and free in system call tests on UP and SMP in my micro-benchmarks.
May or may not make a difference in macro-benchmarks, but doing
less work is good.

Reviewed by: juli, tjr
Testing help: dwhite, fenestro, scottl, et al


# 125281 31-Jan-2004 rwatson

Fix an error in a KASSERT string: it's pipe_free_kmem(), not
pipespace(), that contains this KASSERT.


# 124548 15-Jan-2004 des

New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed. It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).


# 124399 11-Jan-2004 des

Back out 1.160, which was committed by mistake.


# 124394 11-Jan-2004 des

Mechanical whitespace cleanup.


# 124391 11-Jan-2004 des

Mechanical whitespace cleanup + minor style nits.


# 123915 27-Dec-2003 silby

Fix the maxpipekva warning message so that it points to the correct
sysctl, and shorten the message.

Noticed by: bde


# 122352 09-Nov-2003 tanimura

- Implement selwakeuppri() which allows raising the priority of a
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.

Not objected in: -arch, -current


# 122164 06-Nov-2003 alc

- Delay the allocation of memory for the pipe mutex until we need it.
This avoids the need to free said memory in various error cases along
the way.


# 122163 06-Nov-2003 alc

- Simplify pipespace() by eliminating the explicit creation of vm objects.
Instead, let the vm objects be lazily instantiated at fault time. This
results in the allocation of fewer vm objects and vm map entries due to
aggregation in the vm system.


# 121970 03-Nov-2003 rwatson

Unlock pipe mutex when failing MAC pipe ioctl access control check.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 121307 21-Oct-2003 silby

Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.


# 121256 19-Oct-2003 dwmalone

falloc allocates a file structure and adds it to the file descriptor
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.

As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.

To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.

Reviewed by: iedowse
Idea run past: jhb


# 121018 12-Oct-2003 jmg

fix a problem referencing free'd memory. This is only a problem for
kqueue write events on a socket and you regularly create tons of pipes
which overwrites the structure causing a panic when removing the knote
from the list. If the peer has gone away (and it's a write knote), then
don't bother trying to remove the knote from the list.

Submitted by: Brian Buchanan and myself
Obtained from: nCircle


# 120000 12-Sep-2003 alc

pipe_build_write_buffer() only requires read access of the page that it
obtains from pmap_extract_and_hold().


# 119872 08-Sep-2003 alc

Use pmap_extract_and_hold() in pipe_build_write_buffer(). Consequently,
pipe_build_write_buffer() no longer requires Giant on entry.

Reviewed by: tegge


# 119811 06-Sep-2003 alc

Giant is no longer required by pipe_destroy_write_buffer(). Reduce
unnecessary white space from pipe_destroy_write_buffer().


# 118929 15-Aug-2003 jmg

if we got this far, we definately don't have an EBADF. Return a more
sane result of EPIPE.

Reported by: nCircle dev team
MFC after: 3 day


# 118880 13-Aug-2003 alc

- The vm_object pointer in pipe_buffer is unused. Remove it.
- Check for successful initialization of pipe_zone in pipeinit()
rather than every call to pipe(2).


# 118799 11-Aug-2003 alc

Pipespace() no longer requires Giant.


# 118764 11-Aug-2003 silby

More pipe changes:

From alc:
Move pageable pipe memory to a seperate kernel submap to avoid awkward
vm map interlocking issues. (Bad explanation provided by me.)

From me:
Rework pipespace accounting code to handle this new layout, and adjust
our default values to account for the fact that we now have a solid
limit on allocations.

Also, remove the "maxpipes" limit, as it no longer has a purpose.
(The limit on kva usage solves the problem of having two many pipes.)


# 118757 10-Aug-2003 alc

Use vm_page_hold() instead of vm_page_wire(). Otherwise, a multithreaded
application could cause a wired page to be freed. In general,
vm_page_hold() should be preferred for ephemeral kernel mappings of pages
borrowed from a user-level address space. (vm_page_wire() should really be
reserved for indefinite duration pinning by the "owner" of the page.)

Discussed with: silby
Submitted by: tegge


# 118677 08-Aug-2003 alc

- Remove GIANT_REQUIRED from pipespace().
- Remove a duplicate initialization from pipe_create().


# 118572 07-Aug-2003 alc

- Remove GIANT_REQUIRED from pipe_free_kmem().
- Remove the acquisition and release of Giant around pipe_kmem_free() and
uma_zfree() in pipeclose().


# 118230 30-Jul-2003 pb

Remove test in pipe_write() which causes write(2) to return EAGAIN
on a non-blocking pipe in cases where select(2) returns the file
descriptor as ready for write. This in turns causes libc_r, for
one, to busy wait in such cases.

Note: it is a quick performance fix, a more complex fix might be
required in case this turns out to have unexpected side effects.

Reviewed by: silby
MFC after: 3 days


# 118220 30-Jul-2003 alc

The introduction of vm object locking has caused witness to reveal
a long-standing mistake in the way a portion of a pipe's KVA is
allocated. Specifically, kmem_alloc_pageable() is inappropriate
for use in the "direct" case because it allows a preceding vm map entry
and vm object to be extended to support the new KVA allocation.
However, the direct case KVA allocation should not have a backing
vm object. This is corrected by using kmem_alloc_nofault().

Submitted by: tegge (with the above explanation by me)


# 117364 09-Jul-2003 silby

A few minor changes:

- Use atomic ops to update the bigpipe count
- Make the bigpipe count sysctl readable
- Remove a duplicate comparison in an if statement
- Comment two SYSCTLs.


# 117325 08-Jul-2003 silby

Put some concrete limits on pipe memory consumption:

- Limit the total number of pipes so that we do not
exhaust all vm objects in the kernel map. When
this limit is reached, a ratelimited message will
be printed to the console.

- Put a soft limit on the amount of memory consumable
by pipes. Once the limit has been reached, all new
pipes will be limited to 4K in size, rather than the
default of 16K.

- Put a limit on the number of pages that may be used
for high speed page flipping in order to reduce the
amount of wired memory. Pipe writes that occur
while this limit is exceeded will fall back to
non-page flipping mode.

The above values are auto-tuned in subr_param.c and
are scaled to take into account both the size of
physical memory and the size of the kernel map.

These limits help to reduce the "kernel resources exhausted"
panics that could be caused by opening a large
number of pipes. (Pipes alone are no longer able
to exhaust all resources, but other kernel memory hogs
in league with pipes may still be able to do so.)

PR: 53627
Ideas / comments from: hsu, tjr, dillon@apollo.backplane.com
MFC after: 1 week


# 116546 18-Jun-2003 phk

Initialize struct fileops with C99 sparse initialization.


# 116182 10-Jun-2003 obrien

Use __FBSDID().


# 116127 09-Jun-2003 mux

style(9).


# 112981 02-Apr-2003 hsu

Need to hold the same SMP lock for (knote) list traversal as for
list manipulation. This lock also protects read-modify-write operations
on the pipe_state field.


# 112569 24-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 110908 15-Feb-2003 alfred

Do not allow kqueues to be passed via unix domain sockets.


# 110816 13-Feb-2003 alc

Use atomic ops to update amountpipekva. Amountpipekva represents the
total kernel virtual address space used by all pipes. It is, thus, outside
the scope of any individual pipe lock.


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 109153 12-Jan-2003 dillon

Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.


# 109123 11-Jan-2003 dillon

Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.


# 108255 24-Dec-2002 phk

White-space changes.


# 108238 23-Dec-2002 phk

Detediousficate declaration of fileops array members by introducing
typedefs for them.


# 105132 14-Oct-2002 alfred

Remove a KASSERT I added in 1.73 to catch uninitialized pipes.

It must be removed because it is done without the pipe being locked
via pipelock() and therefore is vulnerable to races with pipespace()
erroneously triggering it by temporarily zero'ing out the structure
backing the pipe.

It looks as if this assertion is not needed because all manipulation
of the data changed by pipespace() _is_ protected by pipelock().

Reported by: kris, mckusick


# 105009 12-Oct-2002 alfred

whitespace fixes.


# 104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


# 104393 03-Oct-2002 truckman

In an SMP environment post-Giant it is no longer safe to blindly
dereference the struct sigio pointer without any locking. Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.

Reviewed by: rwatson


# 104269 01-Oct-2002 rwatson

Improve locking of pipe mutexes in the context of MAC:

(1) Where previously the pipe mutex was selectively grabbed during
pipe_ioctl(), now always grab it and then release if if not
needed. This protects the call to mac_check_pipe_ioctl() to
make sure the label remains consistent. (Note: it looks
like sigio locking may be incorrect for fgetown() since we
call it not-by-reference and sigio locking assumes call by
reference).

(2) In pipe_stat(), lock the pipe if MAC is compiled in so that
the call to mac_check_pipe_stat() gets a locked pipe to
protect label consistency. We still release the lock before
returning actual stat() data, risking inconsistency, but
apparently our pipe locking model accepts that risk.

(3) In various pipe MAC authorization checks, assert that the pipe
lock is held.

(4) Grab the lock when performing a pipe relabel operation, and
assert it a little deeper in the stack.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


# 102241 21-Aug-2002 archie

Don't use "NULL" when "0" is really meant.


# 102115 19-Aug-2002 rwatson

Break out mac_check_pipe_op() into component check entry points:
mac_check_pipe_poll(), mac_check_pipe_read(), mac_check_pipe_stat(),
and mac_check_pipe_write(). This is improves consistency with other
access control entry points and permits security modules to only
control the object methods that they are interested in, avoiding
switch statements.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 102003 17-Aug-2002 rwatson

In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101987 16-Aug-2002 rwatson

Correct white space nits that crept in during my recent merges of
trustedbsd_mac material.


# 101983 16-Aug-2002 rwatson

Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101768 13-Aug-2002 rwatson

Introduce support for labeling and access control of pipe objects
as part of the TrustedBSD MAC framework. Instrument the creation
and destruction of pipes, as well as relevant operations, with
necessary calls to the MAC framework. Note that the locking
here is probably not quite right yet, but fixes will be forthcoming.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 101382 05-Aug-2002 des

Check the far end before registering an EVFILT_WRITE filter on a pipe.


# 100527 22-Jul-2002 alfred

Remove unneeded caddr_t casts.


# 99899 13-Jul-2002 alc

o Lock accesses to the page queues.
o Add a comment explaining why hoisting the page queue lock outside
of a particular loop is not possible.


# 99009 28-Jun-2002 alfred

More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.


# 98989 28-Jun-2002 alfred

document that the pipe fo_stat routine doesn't need locks because it's
a read operation.

Requested by: rwatson


# 96122 06-May-2002 alfred

Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp. We need
to check the process for P_WEXIT to see if it's exiting. Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.


# 95883 01-May-2002 alfred

Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.


# 94608 13-Apr-2002 tmm

Use pmap_extract() instead of pmap_kextract() to retrieve the physical
address associated with a user virtual address in
pipe_build_write_buffer().

Reviewed by: alc


# 94566 12-Apr-2002 tmm

Back out the last revision - it does not work correctly when one of
the pages in question is not in the top-level vm object, but in
one of the shadow ones.

Pointed out by: alc
Pointy hat to: tmm


# 94539 12-Apr-2002 tmm

Do not use pmap_kextract() to find out the physical address of a user
belong to a user virtual address; while this happens to work on some
architectures, it can't on sparc64, since user and kernel virtual
address spaces overlap there (the distinction between them is done via
separate address space identifiers).

Instead, look up the page in the vm_map of the process in question.

Reviewed by: jake


# 93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


# 93296 27-Mar-2002 alc

Allow resursion on the pipe mutex because filt_piperead() and filt_pipewrite()
can be called both with and without the pipe mutex held. (For example,
if called by pipeselwakeup(), it is held. Whereas, if called by kqueue_scan(),
it is not.)

Reviewed by: alfred


# 92959 22-Mar-2002 alfred

When "cloning" a pipe's buffer bcopy the data after dropping the pipe's
lock as the data may be paged out and cause a fault.


# 92751 20-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.

Also, remove maxsockets. If you look carefully you'll notice that the old
zone allocator never honored this anyway.


# 92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


# 92305 15-Mar-2002 alfred

Bug fixes:

Missed a place where the pipe sleep lock was needed in order to safely grab
Giant, fix it and add an assertion to make sure this doesn't happen again.

Fix typos in the PIPE_GET_GIANT/PIPE_DROP_GIANT that could cause the
wrong mutex to get passed to PIPE_LOCK/PIPE_UNLOCK.

Fix a location where the wrong pipe was being passed to
PIPE_GET_GIANT/PIPE_DROP_GIANT.


# 91968 09-Mar-2002 alfred

Don't deref NULL mutex pointer when pipeclose()'ing a pipe that is not
fully instaniated.

Revert the logic in pipeclose so that we don't have the entire function
pretty much under a single if() statement, instead invert the test and
just return if it fails.

Submitted (in different form) by: bde

Don't use pool mutexes for pipes. We can not use pool mutexes
because we will need to grab the select lock while holding a pipe
lock which is not allowed because you may not aquire additional
mutexes when holding a pool mutex.

Instead malloc(9) space for the mutex that is shared between the
pipes.


# 91653 04-Mar-2002 tanimura

Track the number of wired pages to avoid unwiring unwired pages.

Reviewed by: alfred


# 91413 27-Feb-2002 alfred

kill __P.


# 91412 27-Feb-2002 alfred

add assertions in the places where giant is required to catch when
the pipe is locked and shouldn't be.

initialize pipe->pipe_mtxp to NULL when creating pipes in order not
to trip the above assertions.

swap pipe lock with giant around calls to pipe_destroy_write_buffer()

pipe_destroy_write_buffer issue noticed by: jhb


# 91395 27-Feb-2002 alfred

Fix a NULL deref panic in pipe_write, we can't blindly lock
pipe->pipe_peer->pipe_mtxp because it may be NULL, so lock the
passed in pipe's mutex instead.


# 91372 27-Feb-2002 alfred

MPsafe fixes:

use SYSINIT to initialize pipe_zone.
use PIPE_LOCK to protect kevent ops.


# 91362 27-Feb-2002 alfred

First rev at making pipe(2) pipe's MPsafe.

Both ends of the pipe share a pool_mutex, this makes allocation
and deadlock avoidance easy.

Remove some un-needed FILE_LOCK ops while I'm here.

There are some issues wrt to select and the f{s,g}etown code that
we'll have to deal with, I think we may also need to move the calls
to vfs_timestamp outside of the sections covered by PIPE_LOCK.


# 89306 13-Jan-2002 alfred

SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.


# 86598 19-Nov-2001 sobomax

Make kevents on pipes work as described in the manpage - when the last
reader/writer disconnects, ensure that anybody who is waiting for the
kevent on the other end of the pipe gets EV_EOF.

MFC after: 2 weeks


# 83805 21-Sep-2001 jhb

Use the passed in thread to selrecord() instead of curthread.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 79225 04-Jul-2001 dillon

cleanup: GIANT macros, rename DEPRECIATE to DEPRECATE
Move p_giant_optional to proc zero'd section
Remove (old) XXX zfree comment in pipe code


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 78292 15-Jun-2001 jlemon

Correctly hook up the write kqfilter to pipes.

Submitted by: Niels Provos <provos@citi.umich.edu>


# 77676 04-Jun-2001 dillon

The pipe_write() code was locking the pipe without busying it first in
certain cases, and a close() by another process could potentially rip the
pipe out from under the (blocked) locking operation.

Reported-by: Alexander Viro <viro@math.psu.edu>


# 77140 24-May-2001 alfred

whitespace/style


# 77035 23-May-2001 alfred

aquire vm_mutex a little bit earlier to protect a pmap call.


# 76940 21-May-2001 jhb

- Assert that the vm mutex is held in pipe_free_kmem().
- Don't release the vm mutex early in pipespace() but instead hold it
across vm_object_deallocate() if vm_map_find() returns an error and
across pipe_free_kmem() if vm_map_find() succeeds.
- Add a XXX above a zfree() since zalloc already has its own locking,
one would hope that zfree() wouldn't need the vm lock.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76760 17-May-2001 alfred

Cleanup

Remove comment about setting error for reads on EOF, read returns 0 on
EOF so the code should be ok.

Remove non-effective priority boost, PRIO+1 doesn't do anything
(according to McKusick), if a real priority boost is needed it should
have been +4.

Style fixes:
.) return foo -> return (foo)
.) FLAG1|FlAG2 -> FLAG1 | FlAG2
.) wrap long lines
.) unwrap short lines
.) for(i=0;i=foo;i++) -> for (i = 0; i=foo; i++)
.) remove braces for some conditionals with a single statement
.) fix continuation lines.

md5 couldn't verify the binary because some code had to
be shuffled around to address the style issues.


# 76756 17-May-2001 alfred

initialize pipe pointers


# 76754 17-May-2001 alfred

pipe_create has to zero out the select record earlier to avoid
returning a half-initialized pipe and causing pipeclose() to follow
a junk pointer.

Discovered by: "Nick S" <snicko@noid.org>


# 76364 08-May-2001 alfred

Remove an 'optimization' I hope to never see again.

The pipe code could not handle running out of kva, it would panic
if that happened. Instead return ENFILE to the application which
is an acceptable error return from pipe(2).

There was some slightly tricky things that needed to be worked on,
namely that the pipe code can 'realloc' the size of the buffer if
it detects that the pipe could use a bit more room. However if it
failed the reallocation it could not cope and would panic. Fix
this by attempting to grow the pipe while holding onto our old
resources. If all goes well free the old resources and use the
new ones, otherwise continue to use the smaller buffer already
allocated.

While I'm here add a few blank lines for style(9) and remove
'register'.


# 76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# 72521 15-Feb-2001 jlemon

Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter


# 70915 10-Jan-2001 dwmalone

Style improvements for last fix. Should be functionally the same.

Submitted by: bde


# 70834 09-Jan-2001 wollman

select() DKI is now in <sys/selinfo.h>.


# 70803 08-Jan-2001 dwmalone

If we failed to allocate the file discriptor for the write end of
the pipe, then we were corrupting the pipe_zone free list by calling
pipeclose on rpipe twice. NULL out rpipe to avoid this.

Reviewed by: dillon
Reviewed by: iedowse


# 68883 18-Nov-2000 dillon

This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
array. However, with the advent of rfork() the file descriptor table
could be shared between processes. This patch closes over a dozen
serious race conditions related to one thread manipulating the table
(e.g. closing or dup()ing a descriptor) while another is blocked in
an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>


# 65855 14-Sep-2000 jlemon

Pipes are not writeable while a direct write is in progress. However,
the kqueue filter got the sense of the test reversed, so fix it.

Spotted by: Michael Elkins <me@sigpipe.org>


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 60404 11-May-2000 chris

Include UID and GID information for stat() calls using the values filled
into the file descriptor data by falloc().

Reviewed by: phk


# 59288 16-Apr-2000 jlemon

Introduce kqueue() and kevent(), a kernel event notification facility.


# 58505 23-Mar-2000 dillon

Fix in-kernel infinite loop in pipe_write() when the reader goes away
at just the wrong time.


# 55112 26-Dec-1999 bde

Use vfs_timestamp() instead of getnanotime() to set timestamps. This
fixee incoherency of pipe timestamps relative to file timestamps in
the usual case where getnanotime() is not used for the latter. (File
and pipe timestamps are still incoherent relative to real time unless
the vfs_timestamp_precision sysctl is set to 2 or 3).


# 54534 13-Dec-1999 tegge

Fix two problems with pipe_write():

1. Data written beyond end of pipe buffer, causing kernel memory corruption.

- Check that space is still valid after obtaining the pipe lock.

- Defer the calculation of transfer size until the pipe
lock has been obtained.

- Update the pipe buffer pointers while holding the pipe lock.

2. Writes of size <= PIPE_BUF not always atomic.

- Allow an internal write to span two contiguous segments,
so writes of size <= PIPE_BUF can be kept atomic
when wrapping around from the end to the start of the
pipe buffer.

PR: 15235
Reviewed by: Matt Dillon <dillon@FreeBSD.org>


# 52983 08-Nov-1999 peter

Update pipe code for fo_stat() entry point - pipe_stat() is now no longer
used outside the pipe code.


# 52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 51474 20-Sep-1999 dillon

Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.

The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().


# 51418 19-Sep-1999 green

This is what was "fdfix2.patch," a fix for fd sharing. It's pretty
far-reaching in fd-land, so you'll want to consult the code for
changes. The biggest change is that now, you don't use
fp->f_ops->fo_foo(fp, bar)
but instead
fo_foo(fp, bar),
which increments and decrements the fp refcount upon entry and exit.
Two new calls, fhold() and fdrop(), are provided. Each does what it
seems like it should, and if fdrop() brings the refcount to zero, the
fd is freed as well.

Thanks to peter ("to hell with it, it looks ok to me.") for his review.
Thanks to msmith for keeping me from putting locks everywhere :)

Reviewed by: peter


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 49413 04-Aug-1999 green

Fix fd race conditions (during shared fd table usage.) Badfileops is
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.

This does not fix other problems mentioned in this PR than the first
one.

PR: 11629
Reviewed by: peter


# 47748 05-Jun-1999 alc

Restructure pipe_read in order to eliminate several race conditions.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com> and myself


# 45311 04-Apr-1999 dt

Add standard padding argument to pread and pwrite syscall. That should make them
NetBSD compatible.

Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that
the offset is set in the struct uio).

Factor out some common code from read/pread/write/pwrite syscalls.


# 43623 04-Feb-1999 dillon

Fix race in pipe read code whereby a blocked lock can allow another
process to sneak in and write to or close the pipe. The read code
enters a 'piperd' state after doing the lock operation without
checking to see if the state changed, which can cause the process
to wait forever.

The code has also been documented more.


# 43311 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 43301 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 43278 27-Jan-1999 bde

Include <sys/select.h> -- don't depend on pollution in <sys/proc.h>.


# 41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


# 41086 11-Nov-1998 truckman

Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR: kern/7899
Reviewed by: bde, elvind


# 40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


# 40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


# 36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


# 34901 26-Mar-1998 phk

Add two new functions, get{micro|nano}time.

They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.


# 33181 09-Feb-1998 eivind

Staticize.


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 30164 06-Oct-1997 peter

Ack! Fix excessive cut/paste blunder during poll mods. Who had the
pointy hat last? :-]

When one is selecting (or polling) for write, it helps if we use the
write side of the pipe when requesting wakeups instead of the read side.
This broke ghostview (at least) - I'm suprised it wasn't noticed for
so long.

Reviewed by: Greg Lehey <grog@lemis.com>


# 29356 14-Sep-1997 peter

Implement the poll backend for the pipe file type.


# 29041 02-Sep-1997 bde

Removed unused #includes.


# 27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


# 27900 04-Aug-1997 dyson

Fix up come cruft that I left on a previous commit.


# 27899 04-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


# 24752 09-Apr-1997 bde

Removed support for OLD_PIPE. <sys/stat.h> is now missing the hack that
supported nameless pipes being indistinguishable from fifos. We're not
going back.


# 24206 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 4: include
<sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h>
in miscellaneous files. Most of these files have nothing to do
with ttys but need to include <sys/ttycom.h> to get the definitions
of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into
ioctls.


# 24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 24101 22-Mar-1997 bde

Fixed some invalid (non-atomic) accesses to `time', mostly ones of the
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 18863 11-Oct-1996 dyson

Mostly some fixes from bde to start support for ASYNC I/O (SIGIO).
Submitted by: bde


# 17163 13-Jul-1996 dyson

A few minor mods (improvements) to support more efficient pipe
operations for large transfers. There are essentially no differences
for small transfers, but big transfers should perform about 20%
better.


# 17124 12-Jul-1996 bde

Staticized some variables.

Fixed initialization of pipe_pgid - don't default to pid 0 (swapper) for
SIGIO.

Added comments about other implicit initializations, mostly for struct
stat.

Fixed initialization of st_mode. S_IFSOCK was for when pipes were sockets.
It is probably safe to fix the bogus S_ISFIFO() now that pipes can be
distinguished from sockets in all cases.

Don't return ENOSYS for inappropriate ioctls.


# 16960 04-Jul-1996 dyson

Get rid of PIPE_NBIO, cleaning up the code a bit.
Reviewed by: bde


# 16416 17-Jun-1996 dyson

Disable direct writes for non-blocking output.


# 16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


# 14802 24-Mar-1996 dyson

Various pipe error return fixes, and a significant typeo fix. From
Bruce Evans (of course :-)).
Submitted by: bde


# 14644 17-Mar-1996 dyson

Yet another fix from BDE for the new pipe code. This fixes a potential
deadlock due to mismanagement of busy counters.

Reviewed by: dyson
Submitted by: bde


# 14177 22-Feb-1996 dyson

Fix a problem that select did not work with direct writes. Make
wakeup channels more consistant also.


# 14122 17-Feb-1996 peter

Add missing prototype for pipeselwakeup (a recently added function) - gcc
bitches about it..


# 14037 11-Feb-1996 dyson

Add ifdefs for non-freebsd system usage. Add missing select wakeups,
and make the select wakup code a little neater.


# 13992 09-Feb-1996 dyson

Add some missing requests for the read-side to wakeup the write-side. Also
add some missing wakeups by the write side to the read side.


# 13951 07-Feb-1996 dyson

Apparent fix for a pipe hang problem.


# 13913 05-Feb-1996 dyson

More fixes from bde.
Only modify times on success.
splhigh() around time variable usage.
Make atomic writes more posix compliant.
Spelling errors.
Submitted by: bde


# 13912 05-Feb-1996 dyson

Kva space allocated for direct buffer wasn't quite big enough. The
system can panic easily without this patch.


# 13909 04-Feb-1996 dyson

Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.


# 13907 04-Feb-1996 dyson

Improve the performance for pipe(2) again. Also include some
fixes for previous version of new pipes from Bruce Evans. This
new version:

Supports more properly the semantics of select (BDE).
Supports "OLD_PIPE" correctly (kern_descrip.c, BDE).
Eliminates incorrect EPIPE returns (bash 'pipe broken' messages.)
Much faster yet, currently tuned relatively conservatively -- but now
gives approx 50% more perf than the new pipes code did originally.
(That was about 50% more perf than the original BSD pipe code.)

Known bugs outstanding:
No support for async io (SIGIO). Will be included soon.

Next to do:
Merge support for FIFOs.

Submitted by: bde


# 13776 31-Jan-1996 dyson

Fix another problem with the new pipe code, pointed out by Bruce Evans.
This one fixes a problem with interactions with signals.


# 13774 31-Jan-1996 dyson

Fix some problems with return codes on the new pipe stuff. Bruce Evans
found the problems, and this commit will fix the "first batch" :-).


# 13688 29-Jan-1996 dyson

Fixed an uninitialized variable (argument to vm_map_find) -- problem
that DG detected, and promptly found a fix.
Submitted by: davidg


# 13675 28-Jan-1996 dyson

Added new files to support the new fast pipes. After the follow-on
commits, pipe performance should increase significantly. The pipe(2)
system call is currently supported, while fifofs will be added later.