History log of /freebsd-9.3-release/sys/nfsserver/nfs.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 267654 19-Jun-2014 gjb

Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 225736 22-Sep-2011 kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by: re (implicit)


# 216632 21-Dec-2010 pjd

- Move pubflag and lockflag handling from nfsrv_fhtovp() to nfs_namei() -
this is the only place that is different from all the other nfsrv_fhtovp()
consumers.
This simplifies nfsrv_fhtovp() a bit and also eliminates one
vn_lock/VOP_UNLOCK() cycle in case of NFSv3.
- Implement NFSRV_FLAG_BUSY flag for nfsrv_fhtovp() that tells it to leave
mount point busy.

Reviewed by: kib
MFC after: 5 days


# 203732 09-Feb-2010 marius

- Move nfs_realign() from the NFS client to the shared NFS code and
remove the NFS server version in order to reduce code duplication.
The shared version now uses a second parameter how, which is passed
on to m_get(9) and m_getcl(9) as the server used M_WAIT while the
client requires M_DONTWAIT, and replaces the the previously unused
parameter hsiz.
- Change nfs_realign() to use nfsm_aligned() so as with other NFS code
the alignment check isn't actually performed on platforms without
strict alignment requirements for performance reasons because as the
comment suggests unaligned data only occasionally occurs with TCP.
- Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather
than M_WAIT because it's called with the RPC sp_lock held.

Reviewed by: jhb, rmacklem
MFC after: 1 week


# 201899 09-Jan-2010 marius

Some style(9) fixes in order to fabricate a commit to denote that
the commit message for r201896 actually should have read:

As nfsm_srvmtofh_xx() assumes the 4-byte alignment required by XDR
ensure the mbuf data is aligned accordingly by calling nfs_realign()
in fha_extract_info(). This fix is orthogonal to the problem solved
by r199274/r199284.

PR: 142102 (second part)
MFC after: 1 week


# 201896 09-Jan-2010 marius

Exclude options COMPAT_FREEBSD4 now that the MD freebsd4_sigreturn()
is gone since r201396 and which is also in line with the fact that
FreeBSD 4 didn't supported sparc64.

PR: 142102 (second part)
MFC after: 1 week


# 195202 30-Jun-2009 dfr

Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.

Approved by: re


# 193272 01-Jun-2009 jhb

Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
locked. It is not permissible to call soisconnected() with this lock
held; however, so socket upcalls now return an integer value. The two
possible values are SU_OK and SU_ISCONNECTED. If an upcall returns
SU_ISCONNECTED, then the soisconnected() will be invoked on the
socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls. The
API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
the receive socket buffer automatically. Note that a SO_SND upcall
should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
instead of calling soisconnected() directly. They also no longer need
to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
state machine, but other accept filters no longer have any explicit
knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
while invoking soreceive() as a temporary band-aid. The plan for
the future is to add a new flag to allow soreceive() to be called with
the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
buffer locked. Previously sowakeup() would drop the socket buffer
lock only to call aio_swake() which immediately re-acquired the socket
buffer lock for the duration of the function call.

Discussed with: rwatson, rmacklem


# 190971 12-Apr-2009 rmacklem

Change nfsserver so that it uses the nfssvc() system call provided
in sys/nfs/nfs_nfssvc.c by registering with it using the
nfsd_call_nfsserver function pointer. Also, add the build glue for
nfs_nfssvc.c optionally based on "nfsserver" and also as a loadable
module.

Submitted by: rmacklem
Reviewed by: kib
Approved by: kib (mentor)


# 184969 14-Nov-2008 dfr

Switch the default rpc implementation for NFS back to the new code. I believe
I have fixed the reported problems - if you still have trouble with it, please
contact me with as much detail as possible so that I can track down any other
issues as quickly as possible.


# 184920 13-Nov-2008 dfr

Temporarily switch NFS back to the old RPC code while I try to diagnose and
fix the problems a few people have noticed with the new code. People who want
to continue testing the new code or who need RPCSEC_GSS support should use
the new option NFS_NEWRPC to select it.


# 184588 03-Nov-2008 dfr

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# 183103 16-Sep-2008 attilio

Decontext-alize the nfsserver module.
Now, only some few places still require thread passing (mostly the ones which
access to VOP_* functions) and will be fixed once the primitive also will be.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 173334 04-Nov-2007 rwatson

Garbage collect now-unused nfsrv_setcred() -- it's not only unused, but
also a purveyor of unfortunate (and now unsupported) direct frobbing of
struct ucred.

MFC after: 3 days


# 167665 17-Mar-2007 jeff

- Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs.
Only ops which used namei still remained.
- Implement a scheme for reducing the overhead of tracking which vops
require giant by constantly reducing the number of recursive giant
acquires to one, leaving us with only one vfslocked variable.
- Remove all NFSD lock acquisition and release from the individual nfs
ops. Careful examination has shown that they are not required. This
greatly simplifies the code.

Sponsored by: Isilon Systems, Inc.
Discussed with: rwatson
Tested by: kkenn
Approved by: re


# 164585 24-Nov-2006 rwatson

Push Giant a bit further off the NFS server in a number of straight
forward cases by converting from unconditional acquisition of Giant
around vnode operations to conditional acquisition:

- Remove nfsrv_access_withgiant(), and cause nfsrv_access() to now
assert that Giant will be held if it is required for the vnode.

- Add nfsrv_fhtovp_locked(), which will drop the NFS server lock if
required, and modify nfsrv_fhtovp() to conditionally acquire
Giant if required.

- In the VOP's not dealing with more than one vnode at a time (i.e.,
not involving a lookup), conditionally acquire Giant.

This removes Giant use for MPSAFE file systems for a number of quite
important RPCs, including getattr, read, write. It leaves
unconditional Giant acquisitions in vnode operations that interact
with the name space or more than one vnode at a time as these
require further work.

Tested by: kris
Reviewed by: kib


# 160881 01-Aug-2006 jhb

- Add a new function nfsrv_destroycache() to tear down the server request
cache when unloading the nfsserver module. This fixes a memory leak and
a stale pointer.
- Use callout_drain() rather than callout_stop() when unloading the
nfsserver module.

MFC after: 3 days


# 154960 28-Jan-2006 csjp

Manage the ucred for the NFS server using the crget/crfree API defined in
kern_prot.c. This API handles reference counting among many other things.
Notably, if MAC is compiled into the kernel, it will properly initialize the
MAC labels when the ucred is allocated.

This work is in preparation for a new MAC entry point which will be responsible
for properly initializing policy specific labels for the NFS server credential.
Utilization of the crfree/crget APIs reduce the complexity associated with
this label's management.

Submitted by: green (with changes) [1]
Obtained from: TrustedBSD Project
Discussed with: rwatson, alfred

[1] I moved the ucred allocation outside the scope of the NFS server lock to
prevent M_WAIKOK allocations from occurring with non-sleep-able locks held.
Additionally, to reduce complexity, the ucred persist as long as the NFS
server descriptor.


# 145197 17-Apr-2005 rwatson

NFS write gathering defers execution of NFS server write requests to wait
to see if additional write requests will arrive that can be coalesced and
clustered with earlier ones. When doing so, it must determine whether
the two requests are made by credentials with the same access writes, so
as not to coalesce improperly. NFSW_SAMECRED() implements a test of two
credentials using a binary compare.

Replace NFSW_SAMECRED() macro with nfsrv_samecred() function, which is
aware of the contents and layout of a struct ucred, rather than a simple
binary compare. While the binary compare works when ucred is simply a
zero'd and embedded 'struct ucred' in the NFS descriptor, it will work
less well when the ucred associated with an NFS descriptor is "real", so
has defined and populated reference count, mutex, etc.

MFC after: 1 week
Obtained from: TrustedBSD Project


# 140770 24-Jan-2005 phk

Don't try to create vnode_pager objects on other filesystems vnodes,
either they did it themselves or it won't happen.


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 129639 24-May-2004 rwatson

The socket code upcalls into the NFS server using the so_upcall
mechanism so that early processing on mbufs can be performed before
a context switch to the NFS server threads. Because of this, if
the socket code is running without Giant, the NFS server also needs
to be able to run the upcall code without relying on the presence on
Giant. This change modifies the NFS server to run using a "giant
code lock" covering operation of the whole subsystem. Work is in
progress to move to data-based locking as part of the NFSv4 server
changes.

Introduce an NFS server subsystem lock, 'nfsd_mtx', and a set of
macros to operate on the lock:

NFSD_LOCK_ASSERT() Assert nfsd_mtx owned by current thread
NFSD_UNLOCK_ASSERT() Assert nfsd_mtx not owned by current thread
NFSD_LOCK_DONTCARE() Advisory: this function doesn't care
NFSD_LOCK() Lock nfsd_mtx
NFSD_UNLOCK() Unlock nfsd_mtx

Constify a number of global variables/structures in the NFS server
code, as they are not modified and contain constants only:

nfsrvv2_procid nfsrv_nfsv3_procid nonidempotent
nfsv2_repstat nfsv2_type nfsrv_nfsv3_procid
nfsrvv2_procid nfsrv_v2errmap nfsv3err_null
nfsv3err_getattr nfsv3err_setattr nfsv3err_lookup
nfsv3err_access nfsv3err_readlink nfsv3err_read
nfsv3err_write nfsv3err_create nfsv3err_mkdir
nfsv3err_symlink nfsv3err_mknod nfsv3err_remove
nfsv3err_rmdir nfsv3err_rename nfsv3err_link
nfsv3err_readdir nfsv3err_readdirplus nfsv3err_fsstat
nfsv3err_fsinfo nfsv3err_pathconf nfsv3err_commit
nfsrv_v3errmap

There are additional structures that should be constified but due
to their being passed into general purpose functions without const
arguments, I have not yet converted.

In general, acquire nfsd_mtx when accessing any of the global NFS
structures, including struct nfssvc_sock, struct nfsd, struct
nfsrv_descript.

Release nfsd_mtx whenever calling into VFS, and acquire Giant for
calls into VFS. Giant is not required for any part of the
operation of the NFS server with the exception of calls into VFS.
Giant will never by acquired in the upcall code path. However, it
may operate entirely covered by Giant, or not. If debug.mpsafenet
is set to 0, the system calls will acquire Giant across all
operations, and the upcall will assert Giant. As such, by default,
this enables locking and allows us to test assertions, but should not
cause any substantial new amount of code to be run without Giant.
Bugs should manifest in the form of lock assertion failures for now.

This approach is similar (but not identical) to modifications to the
BSD/OS NFS server code snapshot provided by BSDi as part of their
SMPng snapshot. The strategy is almost the same (single lock over
the NFS server), but differs in the following ways:

- Our NFS client and server code bases don't overlap, which means
both fewer bugs and easier locking (thanks Peter!). Also means
NFSD_*() as opposed to NFS_*().

- We make broad use of assertions, whereas the BSD/OS code does not.

- Made slightly different choices about how to handle macros building
packets but operating with side effects.

- We acquire Giant only when entering VFS from the NFS server daemon
threads.

- Serious bugs in BSD/OS implementation corrected -- the snapshot we
received was clearly a work in progress.

Based on ideas from: BSDi SMPng Snapshot
Reviewed by: rick@snowhite.cis.uoguelph.ca
Extensive testing by: kris


# 128112 11-Apr-2004 peadar

Don't let the NFS server module be unloaded as long as there are
nfsd processes running

Reviewed By: iedowse
PR: 16299


# 127977 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 127924 05-Apr-2004 rwatson

Add imperfect comments identifying the function of various nfs socket
condition flags. Corrections, if appropriate, welcome.


# 126962 14-Mar-2004 peter

Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default
clock precision on i386. This is a NOP change on i386. But this stops
the mount_nfs units from suddenly changing to units of 1/20 of a second
(vs the normal 1/10 of a second) if HZ is increased.


# 126723 07-Mar-2004 kan

Convert from timeout to callout API.

Submitted by: rwatson


# 115301 25-May-2003 truckman

Beat vnode locking in the NFS server code into submission. This change
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.

Approved by: re (scottl)


# 100641 24-Jul-2002 peter

Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused
or giant NOP's. There was a credential in srvargs that was giving
rwatson some heartburn. :-)


# 100134 15-Jul-2002 alfred

Add IPv6 support.

Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>


# 96755 16-May-2002 trhodes

More s/file system/filesystem/g


# 89094 08-Jan-2002 msmith

Rename some variables that end up shadowing their namesakes in the NFS client
code.

Reviewed by: peter


# 84079 28-Sep-2001 peter

Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments. Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements. Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).


# 83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


# 83493 15-Sep-2001 peter

Sync some differences that were different between the copies of the files
that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 83291 10-Sep-2001 kris

Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from: NetBSD
Reviewed by: peter
MFC after: 2 weeks


# 75631 17-Apr-2001 alfred

Implement client side NFS locks.

Obtained from: BSD/os
Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org


# 74384 17-Mar-2001 peter

Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t. This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is
around 45% faster with -march=pentiumpro on a p6 cpu.


# 72650 18-Feb-2001 green

Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by: bde


# 67486 24-Oct-2000 dwmalone

Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

The problem seems to originate with NFS's postop_attr
information that is returned with a read or write RPC.
Within a vm_fault context, the code cannot deal with
vnode_pager_setsize() shrinking a vnode.

The workaround in the patch below stops the nfsm_postop_attr()
macro from ever shrinking a vnode. If the new size in the
postop_attr information is smaller, then it just sets the
nfsnode n_attrstamp to 0 to stop the wrong size getting
used in the future. This change only affects postop_attr
attributes; the nfsm_loadattr() macro works as normal.

The change is implemented by adding a new argument to
nfs_loadattrcache() called 'dontshrink'. When this is
non-zero, nfs_loadattrcache() will never reduce the
vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: dillon
Tested by: Vadim Belman <voland@lflat.org>


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 55934 13-Jan-2000 dillon

The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it. In fact, after further review
there simply are not very many instances of the two structures NFS
checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>


# 55206 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 52492 25-Oct-1999 dillon

Move NFS access cache hits/misses into nfsstats structure so
/usr/bin/nfsstat can get to it easily.


# 51791 29-Sep-1999 marcel

sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.


# 51344 17-Sep-1999 dillon

Asynchronized client-side nfs_commit. NFS commit operations were
previously issued synchronously even if async daemons (nfsiod's) were
available. The commit has been moved from the strategy code to the doio
code in order to asynchronize it.

Removed use of lastr in preparation for removal of vnode->v_lastr. It
has been replaced with seqcount, which is already supported by the system
and, in fact, gives us a better heuristic for sequential detection then
lastr ever did.

Made major performance improvements to the server side commit. The
server previously fsync'd the entire file for each commit rpc. The
server now bawrite()s only those buffers related to the offset/size
specified in the commit rpc.

Note that we do not commit the meta-data yet. This works still needs
to be done.

Note that a further optimization can be done (and has not yet been done)
on the client: we can merge multiple potential commit rpc's into a
single rpc with a greater file offset/size range and greatly reduce
rpc traffic.

Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 46580 06-May-1999 phk

remove b_proc from struct buf, it's (now) unused.

Reviewed by: dillon, bde


# 46349 02-May-1999 alc

The VFS/BIO subsystem contained a number of hacks in order to optimize
piecemeal, middle-of-file writes for NFS. These hacks have caused no
end of trouble, especially when combined with mmap(). I've removed
them. Instead, NFS will issue a read-before-write to fully
instantiate the struct buf containing the write. NFS does, however,
optimize piecemeal appends to files. For most common file operations,
you will not notice the difference. The sole remaining fragment in
the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache
coherency issues with read-merge-write style operations. NFS also
optimizes the write-covers-entire-buffer case by avoiding the
read-before-write. There is quite a bit of room for further
optimization in these areas.

The VM system marks pages fully-valid (AKA vm_page_t->valid =
VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This
is not correct operation. The vm_pager_get_pages() code is now
responsible for marking VM pages all-valid. A number of VM helper
routines have been added to aid in zeroing-out the invalid portions of
a VM page prior to the page being marked all-valid. This operation is
necessary to properly support mmap(). The zeroing occurs most often
when dealing with file-EOF situations. Several bugs have been fixed
in the NFS subsystem, including bits handling file and directory EOF
situations and buf->b_flags consistancy issues relating to clearing
B_ERROR & B_INVAL, and handling B_DONE.

getblk() and allocbuf() have been rewritten. B_CACHE operation is now
formally defined in comments and more straightforward in
implementation. B_CACHE for VMIO buffers is based on the validity of
the backing store. B_CACHE for non-VMIO buffers is based simply on
whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear,
and vise-versa). biodone() is now responsible for setting B_CACHE
when a successful read completes. B_CACHE is also set when a bdwrite()
is initiated and when a bwrite() is initiated. VFS VOP_BWRITE
routines (there are only two - nfs_bwrite() and bwrite()) are now
expected to set B_CACHE. This means that bowrite() and bawrite() also
set B_CACHE indirectly.

There are a number of places in the code which were previously using
buf->b_bufsize (which is DEV_BSIZE aligned) when they should have
been using buf->b_bcount. These have been fixed. getblk() now clears
B_DONE on return because the rest of the system is so bad about
dealing with B_DONE.

Major fixes to NFS/TCP have been made. A server-side bug could cause
requests to be lost by the server due to nfs_realign() overwriting
other rpc's in the same TCP mbuf chain. The server's kernel must be
recompiled to get the benefit of the fixes.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com>


# 44246 24-Feb-1999 peter

Untangle the nfs send and receive queue locking a little. One lock
routine was [ab]used for two different things, and you couldn't tell from
the wait channel which one had wedged.
Catch a few things missing from NFS_NOSERVER.


# 38894 07-Sep-1998 bde

Made unloading of the nfs LKM sort of work. This is mainly to test
detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.


# 38482 23-Aug-1998 wollman

Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.


# 37291 30-Jun-1998 jmg

fix buildworld hopefully be3fore anyone complains...

NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion),
but in some cases it looks like nfs keeps a copy of the value in a struct

hash sizes are already ifdef'd KERNEL, so there aren't userland inpact
from them...


# 37272 30-Jun-1998 jmg

convert some nfs tunables to options, these are:
NFS_MINATTRTIMO VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY Default write gather delay (msec)
NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ and with this
NFS_MUIDHASHSIZ Tune the size of nfsmount with this
NFS_NOSERVER (already documented in LINT)
NFS_DEBUG turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....


# 36541 31-May-1998 peter

For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD


# 36540 31-May-1998 peter

Support 'mount -u' remounts. This may require disconnecting and rebinding
the socket. Certain mode changes are not allowed.

Obtained from: NetBSD


# 36511 31-May-1998 peter

Some const's

Obtained from: NetBSD


# 36503 31-May-1998 peter

NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin. Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.


# 36329 24-May-1998 peter

Convert a couple of large allocations to use zones rather than malloc
for better packing. This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.


# 36176 19-May-1998 peter

Allow control of the attribute cache timeouts at mount time.

We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable. These are no longer visible via
statfs(), but I don't know of anything that looks at them.


# 34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# 32998 01-Feb-1998 bde

Moved declaration of `union nethostadr' outside of the KERNEL section,
to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c
previously had to #define KERNEL before including <nfs/nfs.h> to get
this pollution, but this gave other pollution.

Moved comment about NFSINT_SIGMASK to immediately before the code that
it applies to.


# 30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 29288 10-Sep-1997 phk

unifdef -U__NetBSD__ -D__FreeBSD__


# 28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


# 27446 16-Jul-1997 dfr

Merge WebNFS changes from NetBSD.

Obtained from: NetBSD


# 26420 03-Jun-1997 dfr

Various fixes from NetBSD:

Use u_int for rpc procedure numbers.
Some fixes to NQNFS.
A rare NULL pointer dereference.
Ignore NFSMNT_NOCONN for TCP mounts.

Obtained from: NetBSD


# 25930 19-May-1997 dfr

Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff
and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.

PR: kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by: dyson


# 25781 13-May-1997 dfr

Don't keep addresses in mbuf chains. This should simplify the next round
of network changes from Garret.

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


# 25663 10-May-1997 dfr

Fix a nasty hang connected with write gathering. Also add debug print
statements to bits of the server which helped me find the hang.


# 24378 29-Mar-1997 bde

Define our own version of DIRBLKSIZ instead of (ab)using ufs's value.
Use the same value of 512 (ufs actually uses DEV_BSIZE). There are
too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for
nfs, one for ibcs2, one for linux, one for applications, ... I think
nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes
of all supported file systems. There is also NFS_DIRBLKSIZ, which is
different from nfs's DIRBLKSIZ but is sometimes confused with it in
comments.

Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs.
This came in undocumented with the Lite2 merge although it isn't in
Lite2. It required more-bogus #define KERNEL's in fstat and pstat
to make the constants visible.

Restored a spelling fix from rev.1.17.

Removed duplicate #defines of all the the NFS mount option flags.


# 24330 27-Mar-1997 guido

Add code that will reject nfs requests in teh kernel from nonprivileged
ports. This option will be automatically set/cleraed when mount is run
without/with the -n option.
Reviewed by: Doug Rabson


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 19449 06-Nov-1996 dfr

Improve the queuing algorithms used by NFS' asynchronous i/o. The
existing mechanism uses a global queue for some buffers and the
vp->b_dirtyblkhd queue for others. This turns sequential writes into
randomly ordered writes to the server, affecting both read and write
performance. The existing mechanism also copes badly with hung
servers, tending to block accesses to other servers when all the iods
are waiting for a hung server.

The new mechanism uses a queue for each mount point. All asynchronous
i/o goes through this queue which preserves the ordering of requests.
A simple mechanism ensures that the iods are shared out fairly between
active mount points. This removes the sysctl variable vfs.nfs.dwrite
since the new queueing mechanism removes the old delayed write code
completely.

This should go into the 2.2 branch.


# 17761 21-Aug-1996 dyson

Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg


# 13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 12911 17-Dec-1995 phk

Staticize.


# 12588 03-Dec-1995 bde

Completed function declarations and/or added prototypes and/or moved
prototypes to the right place.


# 12453 21-Nov-1995 bde

Completed function declarations and/or added prototypes.


# 11982 31-Oct-1995 joerg

Include a prerequisite header (so this is consistent again with the
NFSv2 state).


# 11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 9336 27-Jun-1995 dfr

Changes to support version 3 of the NFS protocol.
The version 2 support has been tested (client+server) against FreeBSD-2.0,
IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support
is stable AFAIK.
The version 3 support has been tested with a loopback mount and minimally
against an IRIX 5.3 server. It needs more testing and may have problems.
I have patched amd to support the new variable length filehandles although
it will still only use version 2 of the protocol.

Before booting a kernel with these changes, nfs clients will need to at least
build and install /usr/sbin/mount_nfs. Servers will need to build and
install /usr/sbin/mountd.

NFS diskless support is untested.

Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>


# 6361 14-Feb-1995 phk

YFfix
+int nfsrv_vput __P(( struct vnode * ));
+int nfsrv_vrele __P(( struct vnode * ));
+int nfsrv_vmio __P(( struct vnode * ));


# 4067 01-Nov-1994 wollman

Forward-declare a few structures to avoid warning messages.


# 3820 23-Oct-1994 wollman

Implement fs.nfs MIB variables.


# 3664 17-Oct-1994 phk

This is a bunch of changes from NetBSD. There are a couple of bug-fixes.
But mostly it is changes to use the list-maintenance macros instead of
doing the pointer-gymnastics by hand.

Obtained from: NetBSD


# 3305 02-Oct-1994 phk

Prototyping and general gcc-shutting up. Gcc has one warning now which looks
bad, I will get to it eventually, unless somebody beats me to it.


# 2175 21-Aug-1994 paul

More idempotency....... this is fun :-)


# 1828 04-Aug-1994 dg

Made NFS attribute cache timeouts kernel config file tunable via
NFS_MINATTRTIMO and NFS_MAXATTRTIMO.


# 1817 02-Aug-1994 dg

Added $Id$


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources