Cross Reference: /freebsd-10-stable/sys/nfsclient/

History log of /freebsd-10-stable/sys/nfsclient/
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
317525	27-Apr-2017	rmacklem	MFC: r316792 Add an NFSv4.1 mount option for "use one openowner". Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number of open_owner4s. This patch adds a mount option called "oneopenown" that can be used for NFSv4.1 mounts to make the client do all Opens with the same open_owner4 string. This option can only be used with NFSv4.1 and may not work correctly when Delegations are is use. Differential Revision: https://reviews.freebsd.org/D8988 /freebsd-10-stable/sys/fs/nfs/nfs_var.h /freebsd-10-stable/sys/fs/nfs/nfsport.h /freebsd-10-stable/sys/fs/nfsclient/nfs_clport.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clrpcops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clstate.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvfsops.c nfsargs.h
276500	01-Jan-2015	kib	MFC r275897: Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP(). /freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c /freebsd-10-stable/sys/fs/ext2fs/ext2_lookup.c /freebsd-10-stable/sys/fs/fuse/fuse_vnops.c /freebsd-10-stable/sys/fs/msdosfs/msdosfs_lookup.c /freebsd-10-stable/sys/fs/nandfs/nandfs_vnops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvnops.c /freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdserv.c /freebsd-10-stable/sys/fs/tmpfs/tmpfs_vnops.c /freebsd-10-stable/sys/fs/unionfs/union_subr.c /freebsd-10-stable/sys/fs/unionfs/union_vnops.c /freebsd-10-stable/sys/kern/uipc_usrreq.c /freebsd-10-stable/sys/kern/vfs_syscalls.c /freebsd-10-stable/sys/kern/vfs_vnops.c nfs_vnops.c /freebsd-10-stable/sys/nfsserver/nfs_serv.c /freebsd-10-stable/sys/ufs/ffs/ffs_snapshot.c /freebsd-10-stable/sys/ufs/ufs/ufs_lookup.c
260107	30-Dec-2013	rmacklem	MFC: r259084 For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases. /freebsd-10-stable/sys/fs/nfsclient/nfs_clbio.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvfsops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvnops.c /freebsd-10-stable/sys/fs/nfsclient/nfsnode.h nfsargs.h
256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
252673	04-Jul-2013	rmacklem	A problem with the old NFS client where large writes to large files would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 such that the call to the vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption. Reported by: David G. Lawrence (dg@dglawrence.com) Tested by: David G. Lawrence (pending, to happen soon) Reviewed by: kib MFC after: 1 week
252479	01-Jul-2013	rmacklem	A recent version of the oldnfs NFS client in head/current will crash when doing a large write, since m_get2() would return NULL. This patch fixes the problem, since nfsm_uiotombuf() will allocate additional mbufs, as required. Reported by: sbruno Tested by: sbruno Discussed with: glebius
251171	31-May-2013	jeff	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
251089	29-May-2013	rmacklem	Add a patch analygous to r248567, r248581, r251079 to the old NFS client to avoid the panic reported in the PR by doing the vnode_pager_setsize() call after unlocking the mutex. PR: 177335 MFC after: 2 weeks
249630	18-Apr-2013	rmacklem	When an NFS unmount occurs, once vflush() writes the last dirty buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread. Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks
249623	18-Apr-2013	rmacklem	Both NFS clients can deadlock when using the "rdirplus" mount option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories. Reported and tested by: jhb Reviewed by: jhb MFC after: 2 weeks
248500	19-Mar-2013	emaste	Fix remainder calculation when biosize is not a power of 2 In common configurations biosize is a power of two, but is not required to be so. Thanks to markj@ for spotting an additional case beyond my original patch. Reviewed by: rmacklem@
248255	13-Mar-2013	jhb	Revert 195703 and 195821 as this special stop handling in NFS is now implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.
248207	12-Mar-2013	glebius	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.
248198	12-Mar-2013	glebius	- Use m_get2() instead of nfsm_reqhead(). - Use m_get(), m_getcl() instead of historic macros. Sponsored by: Nginx, Inc.
248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
247116	21-Feb-2013	jhb	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
246417	06-Feb-2013	jhb	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month
245909	25-Jan-2013	jhb	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
245611	18-Jan-2013	jhb	Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week
245508	16-Jan-2013	jhb	Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. Reviewed by: rmacklem MFC after: 1 week
245476	15-Jan-2013	jhb	- More properly handle interrupted NFS requests on an interruptible mount by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed. Reviewed by: rmacklem MFC after: 1 week
244042	08-Dec-2012	rmacklem	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
243882	05-Dec-2012	glebius	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
243311	19-Nov-2012	attilio	r16312 is not any longer real since many years (likely since when VFS received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
242833	09-Nov-2012	attilio	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
239246	14-Aug-2012	kib	Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week
239065	05-Aug-2012	kib	After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
239040	04-Aug-2012	kib	Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week
235332	12-May-2012	rmacklem	PR# 165923 reported intermittent write failures for dirty memory mapped pages being written back on an NFS mount. Since any thread can call VOP_PUTPAGES() to write back a dirty page, the credentials of that thread may not have write access to the file on an NFS server. (Often the uid is 0, which may be mapped to "nobody" in the NFS server.) Although there is no completely correct fix for this (NFS servers check access on every write RPC instead of at open/mmap time), this patch avoids the common cases by holding onto a credential that recently opened the file for writing and uses that credential for the write RPCs being done by VOP_PUTPAGES() for both NFS clients. Tested by: Joel Ray Holveck (joelh at juniper.net) PR: kern/165923 Reviewed by: kib MFC after: 2 weeks
235246	10-May-2012	mckusick	Fix mount mutex handling missed in r234386.
235052	05-May-2012	pluknet	Fix mount mutex handling missed in r234386.
234605	23-Apr-2012	trasz	Remove unused thread argument from vtruncbuf(). Reviewed by: kib
234386	17-Apr-2012	mckusick	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
232821	11-Mar-2012	kib	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni
232420	03-Mar-2012	rmacklem	Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
232327	01-Mar-2012	rmacklem	Fix the NFS clients so that they use copyin() instead of bcopy(), when doing direct I/O. This direct I/O code is not enabled by default. Submitted by: kib (earlier version) Reviewed by: kib MFC after: 1 week
232116	24-Feb-2012	jhb	Adjust the nfs_skip_wcc_data_onerr setting so that it does not block post-op attributes for ENOENT errors now that the name caching logic depends on working post-op attributes. MFC after: 2 weeks
231949	21-Feb-2012	kib	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
231852	17-Feb-2012	bz	Merge multi-FIB IPv6 support from projects/multi-fibv6/head/: Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
231088	06-Feb-2012	jhb	Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().
231075	06-Feb-2012	kib	Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
230803	31-Jan-2012	rmacklem	When a "mount -u" switches an NFS mount point from TCP to UDP, any thread doing an I/O RPC with a transfer size greater than NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC. After a discussion on freebsd-fs@, I decided to add a warning message for this case, as suggested by Jeremy Chadwick. Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick) MFC after: 2 weeks
230605	27-Jan-2012	rmacklem	A problem with respect to data read through the buffer cache for both NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes. Reported by: pjd Reviewed by: kib MFC after: 2 weeks
230559	26-Jan-2012	rmacklem	Revert r230516, since it doesn't really fix the problem.
230552	25-Jan-2012	kib	Fix remaining calls to cache_enter() in both NFS clients to provide appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for. Reviewed by: jhb (previous version) MFC after: 2 weeks
230547	25-Jan-2012	jhb	Add a timeout on positive name cache entries in the NFS client. That is, we will only trust a positive name cache entry for a specified amount of time before falling back to a LOOKUP RPC, even if the ctime for the file handle matches the cached copy in the name cache entry. The timeout is configured via a new 'nametimeo' mount option and defaults to 60 seconds. It may be set to zero to disable positive name caching entirely. Reviewed by: rmacklem MFC after: 1 week
230516	25-Jan-2012	rmacklem	If a mount -u is done to either NFS client that switches it from TCP to UDP and the rsize/wsize/readdirsize is greater than NFS_MAXDGRAMDATA, it is possible for a thread doing an I/O RPC to get stuck repeatedly doing retries. This happens because the RPC will use a resize/wsize/readdirsize that won't work for UDP and, as such, it will keep failing indefinitely. This patch returns an error for this case, to avoid the problem. A discussion on freebsd-fs@ seemed to indicate that returning an error was preferable to silently ignoring the "udp"/"mntudp" option. This problem was discovered while investigating a problem reported by pjd@ via email. MFC after: 2 weeks
230394	20-Jan-2012	jhb	Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
230249	17-Jan-2012	mckusick	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
228757	21-Dec-2011	rmacklem	jwd@ reported a problem via email where the old NFS client would get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one. Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks
228156	30-Nov-2011	kib	Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
227690	19-Nov-2011	rmacklem	The old NFS client will crash due to the reply being m_freem()'d twice if the server bogusly returns an error with the NFSERR_RETERR bit (bit 31) set. No actual NFS error has this bit set, but it seems that amd will sometimes do this. This patch makes sure the NFSERR_RETERR bit is cleared to avoid a crash. PR: kern/153847 MFC after: 2 weeks
227507	14-Nov-2011	jhb	Finish making 'wcommitsize' an NFS client mount option. Reviewed by: rmacklem MFC after: 1 week
224733	09-Aug-2011	jhb	Merge 220876, 220877, and 221537 from the new NFS client to the old: Allow the NFS client to use a max file size larger than 1TB for v3 mounts. It now allows files up to OFF_MAX subject to whatever limit the server advertises. Reviewed by: rmacklem Approved by: re (kib) MFC after: 1 week
224604	02-Aug-2011	rmacklem	Fix a LOR in the NFS client which could cause a deadlock. This was reported to the mailing list freebsd-net@freebsd.org on July 21, 2011 under the subject "LOR with nfsclient sillyrename". The LOR occurred when nfs_inactive() called vrele(sp->s_dvp) while holding the vnode lock on the file in s_dvp. This patch modifies the client so that it performs the vrele(sp->s_dvp) as a separate task to avoid the LOR. This fix was discussed with jhb@ and kib@, who both proposed variations of it. Tested by: pho, jlott at averesystems.com Submitted by: jhb (earlier version) Reviewed by: kib Approved by: re (kib) MFC after: 2 weeks
223309	19-Jun-2011	rmacklem	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks
222586	01-Jun-2011	kib	In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean(). VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost. The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes. Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week
222464	29-May-2011	rmacklem	Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync() in the old NFS client so that a forced dismount doesn't get stuck in the VFS_SYNC() call that happens before VFS_UNMOUNT() in dounmount(). Analagous to r222329 for the new NFS client. An additional change is needed before forced dismounts will work. PR: kern/157365 MFC after: 2 weeks
222187	22-May-2011	alc	Eliminate duplicate #include's.
222075	18-May-2011	rmacklem	Add a sanity check for the existence of an "addr" option to both NFS clients. This avoids the crash reported by Sergey Kandaurov (pluknet@gmail.com) to the freebsd-fs@ list with subject "[old nfsclient] different nmount() args passed from mount vs mount_nfs" dated May 17, 2011. Tested by: pluknet at gmail.com (old nfs client) MFC after: 2 weeks
221986	16-May-2011	rmacklem	Fix a comment that got missed by r221973 which changed the sysctl naming for the old NFS client to vfs.oldnfs.
221973	15-May-2011	rmacklem	Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.
221543	06-May-2011	rmacklem	Move sys/nfsclient/nfs_kdtrace.h to sys/nfs/nfs_kdtrace.h so it can be used by the new NFS client as well as the old one.
221542	06-May-2011	rmacklem	Fix the module dependency in nfs_kdtrace.c for the old NFS client. This should fix a problem reported by Marcus Reid.
221436	04-May-2011	ru	Implemented a mount option "nocto" that disables cache coherency checking at open time. It may improve performance for read-only NFS mounts. Use deliberately. MFC after: 1 week Reviewed by: rmacklem, jhb (earlier version)
221139	27-Apr-2011	rmacklem	Fix module names and dependencies so the NFS clients will load correctly as modules after r221124.
221124	27-Apr-2011	rmacklem	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.
221066	26-Apr-2011	rmacklem	Fix a kernel linking problem introduced by r221032, r221040 when building kernels that don't have "options NFS_ROOT" specified. I plan on moving the functions that use these data structures into the shared code in sys/nfs/nfs_diskless.c in a future commit. At that time, these definitions will no longer be needed in nfs_vfsops.c and nfs_clvfsops.c. MFC after: 2 weeks
221032	25-Apr-2011	rmacklem	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them. Reviewed by: jhb MFC after: 2 weeks
221014	25-Apr-2011	rmacklem	Modify the experimental NFS client so that it uses the same "struct nfs_args" as the regular NFS client. This is needed so that the old mount(2) syscall will work and it makes sharing of the diskless NFS root code easier. Eary in the porting exercise I introduced a new revision of nfs_args, but didn't actually need it, thanks to nmount(2). I re-introduced the NFSMNT_KERB flag, since it does essentially the same thing and the old one would not have been used because it never worked. I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h that are used by the experimental NFS client. MFC after: 2 weeks
220595	13-Apr-2011	ru	- Fixed nfs_printf() to use vprintf(). - Fixed vfs.nfs.acdebug sysctl's description. - Fixed panic when compiled with NFS_ACDEBUG. MFC after: 3 days
219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
218757	16-Feb-2011	bz	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
216931	03-Jan-2011	rmacklem	Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client. Reviewed by: jhb MFC after: 2 weeks
215548	19-Nov-2010	kib	Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
214418	27-Oct-2010	jh	Add missing "readahead" to the nfs_opts list. PR: 151321 Tested by: Simon Walton MFC after: 2 weeks
214053	19-Oct-2010	rmacklem	Fix the type of the 3rd argument for nm_getinfo so that it works for architectures like sparc64. Suggested by: kib MFC after: 2 weeks
214048	19-Oct-2010	rmacklem	Modify the NFS clients and the NLM so that the NLM can be used by both clients. Since the NLM uses various fields of the nfsmount structure, those fields were extracted and put in a separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h. This structure also has a function pointer for a function that extracts the required information from the mount point and nfs vnode for that particular client, for information stored differently by the clients. Reviewed by: jhb MFC after: 2 weeks
214026	18-Oct-2010	kib	Do not synchronously start the nfsiod threads at all. The r212506 fixed the issues with file descriptor locks, but the same problems are present for vnode lock/user map lock. If the nfs_asyncio() cannot find the free nfsiod, schedule task to create new nfsiod and return error. This causes fall back to the synchronous i/o for nfs_strategy(), or does not start read at all in the case of readahead. The caller that holds vnode and potentially user map lock does not wait for kproc_create() to finish, preventing the LORs. The change effectively reverts r203072, because we never hand off the request to newly created nfsiod thread anymore. Reviewed by: jhb Tested by: jhb, pluknet MFC after: 3 weeks
212506	12-Sep-2010	kib	Do not fork nfsiod directly from the vop methods. This causes LORs between vnode lock and several locks needed during fork, like fd lock. Instead, schedule the task to be executed in the taskqueue context. We still waiting for the fork to finish, but the context of the thread executing the task does not make real LORs with our vnode lock. Submitted by: pluknet at gmail com Reviewed by: jhb Tested by: pho MFC after: 3 weeks
212293	07-Sep-2010	jhb	Store the full timestamp when caching timestamps of files and directories for purposes of validating name cache entries. This closes races where two updates to a file or directory within the same second could result in stale entries in the name cache. While here, remove the 'n_expiry' field as it is no longer used. Reviewed by: rmacklem MFC after: 1 week
212123	01-Sep-2010	rmacklem	Modify nfs_diskless.c to recognize the environment variable boot.nfsroot.nfshandlelen and set the diskless root fs to use NFSv3 and this file handle length when it is set. If this environment variable is not set, the diskless root fs will use NFSv2 and the same defaults as before. This fixes the problem where the diskless nfs root fs had to be on a FreeBSD server for NFSv3 to work, because it did not know the correct file handle length and assumed the size used by FreeBSD. Until pxeboot and loader are replaced by ones built from commits coming soon, boot.nfsroot.nfshandlelen will not be set by them and the diskless root fs will use NFSv2 unless the /etc/fstab entry has the "nfsv3" option specified. Tested by: danny at cs.huji.ac.il MFC after: 2 weeks
211531	20-Aug-2010	jhb	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
210834	04-Aug-2010	rmacklem	Add some mutex locking on the nfsnode to the regular NFS client. Reviewed by: jhb
210455	24-Jul-2010	rmacklem	Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate module that can be used by both the regular and experimental nfs clients. This fixes the problem reported by jh@ where /dev/nfslock would be registered twice when both nfs clients were used. I also defined the size of the lm_fh field to be the correct value, as it should be the maximum size of an NFSv3 file handle. Reviewed by: jh MFC after: 2 weeks
210136	15-Jul-2010	jhb	Retire the NFS access cache timestamp structure. It was used in VOP_OPEN() to avoid sending multiple ACCESS/GETATTR RPCs during a single open() between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be sent. MFC after: 2 weeks
209948	12-Jul-2010	jhb	A previous change moved the GETATTR RPC for open() calls that hit in the name cache up into nfs_lookup() instead of nfs_open(). Continue this trend by flushing the attribute cache for leaf nodes in nfs_lookup() during an open() if we do a LOOKUP RPC. For NFSv3 this should generally be a NOP as the attributes are flushed before fetching the post-op attributes from the LOOKUP RPC which most (all?) NFSv3 servers provide, so the post-op attributes should populate the cache. Now all NFS open() calls will always clear the cached attributes during the nfs_lookup() prior to nfs_open() in the !NMODIFIED case to provide CTOC. As a result, we can remove the conditional flushing of the attribute cache from nfs_open(). Reviewed by: rmacklem, bde MFC after: 2 weeks
209946	12-Jul-2010	jhb	- Add missing locking around flushing of an NFS node's attribute cache in the NMODIFIED case of nfs_open(). - Cosmetic tweak to simplify an expression in nfs_lookup(). Reviewed by: rmacklem, bde MFC after: 1 week
209120	13-Jun-2010	kib	In NFS clients, instead of inconsistently using #ifdef DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server. Submitted by: Mikolaj Golub <to.my.trociny gmail com> MFC after: 2 weeks
208605	27-May-2010	delphij	Fix build: newnp represents newvp so KDTRACE_NFS_ATTRCACHE_FLUSH_DONE() on newvp instead of vp here.
208603	27-May-2010	jhb	More gracefully handle stale file handles and attributes when opening a file via NFS. Specifically, to satisfy close-to-open-consistency, the NFS client always performs at least one RPC on a file during an open(2) to see if the file has changed. Normally this RPC is an ACCESS or GETATTR RPC that is forced by flushing a file's attribute cache during nfs_open() and then requesting new attributes. However, if the file is noticed to be stale during nfs_open(), the only recourse is to fail the open(2) call with ESTALE. On the other hand, if the ACCESS or GETATTR RPC is sent during nfs_lookup(), then the NFS client can fall back to a LOOKUP RPC to obtain the new file handle in the case that a file has been replaced. This change causes the NFS client to flush the attribute cache during nfs_lookup() when validating a name cache hit if the attributes fetched during nfs_lookup() can be reused in nfs_open(). This allows the client to open a replaced file via the new file handle the first time that it notices a replaced file rather than failing with ESTALE in some cases. Reviewed by: rmacklem, bde Reviewed by: mohans (older version) MFC after: 1 week
208586	27-May-2010	cperciva	Change the current working directory to be inside the jail created by the jail(8) command. [10:04] Fix a one-NUL-byte buffer overflow in libopie. [10:05] Correctly sanity-check a buffer length in nfs mount. [10:06] Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-10:04.jail Security: FreeBSD-SA-10:05.opie Security: FreeBSD-SA-10:06.nfsclient
207746	07-May-2010	alc	Push down the page queues lock into vm_page_activate().
207728	06-May-2010	alc	Eliminate page queues locking around most calls to vm_page_free().
207669	05-May-2010	alc	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
207662	05-May-2010	trasz	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib
207584	03-May-2010	kib	Lock the page around vm_page_activate() and vm_page_deactivate() calls where it was missed. The wrapped fragments now protect wire_count with page lock. Reviewed by: alc
204063	18-Feb-2010	pjd	Simplify code a bit.
203968	16-Feb-2010	marius	Factor out the code shared between NFS client and server into its own module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place. Discussed with: rmacklem
203732	09-Feb-2010	marius	- Move nfs_realign() from the NFS client to the shared NFS code and remove the NFS server version in order to reduce code duplication. The shared version now uses a second parameter how, which is passed on to m_get(9) and m_getcl(9) as the server used M_WAIT while the client requires M_DONTWAIT, and replaces the the previously unused parameter hsiz. - Change nfs_realign() to use nfsm_aligned() so as with other NFS code the alignment check isn't actually performed on platforms without strict alignment requirements for performance reasons because as the comment suggests unaligned data only occasionally occurs with TCP. - Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather than M_WAIT because it's called with the RPC sp_lock held. Reviewed by: jhb, rmacklem MFC after: 1 week
203731	09-Feb-2010	marius	Some style(9) fixes
203072	27-Jan-2010	rmacklem	Fix a race that can occur when nfs nfsiod threads are being created. Without this patch it was possible for a different thread that calls nfs_asyncio() to snitch a newly created nfsiod thread that was intended for another caller of nfs_asyncio(), because the nfs_iod_mtx mutex was unlocked while the new nfsiod thread was created. This patch labels the newly created nfsiod, so that it is not taken by another caller of nfs_asyncio(). This is believed to fix the problem reported on the freebsd-stable email list under the subject: FreeBSD NFS client/Linux NFS server issue. Tested by: to DOT my DOT trociny AT gmail DOT com Reviewed by: jhb MFC after: 2 weeks
202774	21-Jan-2010	rmacklem	Fix a typo in a comment introduced by r202767. MFC after: 2 weeks
202767	21-Jan-2010	rmacklem	Add a timeout for the negative name cache entries in the NFS client. This avoids a bogus negative name cache entry from persisting forever when another client creates an entry with the same name within the same NFS server time of day clock tick. The mount option negnametimeo can be used to override the default timeout interval on a per-mount-point basis. Setting negnametimeo to 0 disables negative name caching for the mount point. I also fixed one obvious typo where args.timeo should be args.maxgrouplist. Submitted by: jhb (earlier version) Reviewed by: jhb MFC after: 2 weeks
201895	09-Jan-2010	zec	Reduce recursions on curvnet and thus spamming the console with warning messages for kernels built with options VIMAGE and VNET_DEBUG enabled. Reviewed by: bz MFC after: 3 days
201758	07-Jan-2010	mbr	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
201044	27-Dec-2009	bz	Add missing include to make LINT-VIMAGE build as well. Found by: test building an MFC candidate X-MFC with: r200471
200471	13-Dec-2009	bz	Add a few more V_hacks to nfsclient to allow machines with a VIMAGE kernel to boot from NFS. [1] Note: this is not a full virtualization of nfsclient. It is only does what advertised above and nothing more. Requested by: public demand [1] Tested by: kris, .. MFC after: 5 days
198174	16-Oct-2009	jhb	Close a race with caching of -ve name lookups in the NFS client. Specifically, clients only trust -ve cache entries while the directory remains unchanged and discard any -ve cache entries for a directory when they notice that the modification time of a directory entry changes. The race involves two concurrent lookups as follows: - Thread A does a lookup for file 'foo' which sends a lookup RPC to the server. The lookup fails and the server replies. - The 'foo' file is created (either by the same client or a different client) updating the modification time on the parent directory of 'foo'. - Thread B does a lookup for a different file 'bar' which updates the cached attributes of the parent directory of 'foo' to reflect the new modification time after 'foo' was created. - Thread A finally resumes execution to parse the reply from the NFS server. It adds a -ve cache entry and sets the cached value of the directory's modification time that is used for invalidating -ve cached lookups to the new modification time set by thread B. At this point, future lookups of 'foo' will honor the -ve cached entry until the cached entry is pushed out of the name cache's LRU or the modification time of the parent directory is changed again by some other change. The fix is to read the directory's modification time before sending the lookup RPC and use that cached modification time when setting the directory's cached modification time. Also, we do not add a -ve cache entry if another thread has added -ve cache entry that set the directory's cached modification time to a newer value than the value we read before sending the lookup RPC. Reviewed by: rmacklem MFC after: 1 week
197997	12-Oct-2009	rwatson	Add a MODULE_DEPEND() on the NFS client from dtnfsclient so that dtnfsclient can access NFS client symbols. MFC after: 3 days Discussed with: kib Reported by: markm
197235	15-Sep-2009	qingli	Reverting the previous change for now. Some users reports the patch fixes their issues but one reports a failure in NFS ROOT. Revert the change for now pending further investigation. Reviewed by: bz MFC after: immediately
197212	15-Sep-2009	qingli	Simply remove the code instead of using "#if 0". Pointed out by sam
197210	15-Sep-2009	qingli	The bootp code installs an interface address and the nfs client module tries to install the same address again. This extra code is removed, which was discovered by the removal of a call to in_ifscrub() in r196714. This call to in_ifscrub is put back here because the SIOCAIFADDR command can be used to change the prefix length of an existing alias. Reviewed by: kmacy
197048	09-Sep-2009	rmacklem	Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs vnodes, since these nodes are not linked into the mount queue and, as such, the vn_lock() cannot cause a deadlock so LORs are harmless. Suggested by: kib Approved by: kib (mentor) MFC after: 3 days
196503	24-Aug-2009	zec	Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet context inside the RPC code. Temporarily set td's cred to mount's cred before calling socreate() via __rpc_nconf2socket(). Submitted by: rmacklem (in part) Reviewed by: rmacklem, rwatson Discussed with: dfr, bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days
196481	23-Aug-2009	rwatson	Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues: Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts. Reviewed by: bz, julian MFC after: 3 days
196205	14-Aug-2009	kib	In nfs_upgrade_vnlock(), assert that the vnode is locked. It is for all pathes, as far as I see and testing seems to confirm it. Comparision of old_lock with LK_SHARED make sense only if vnode is locked by current thread. When downgrading, pass LK_RETRY to the vn_lock(), since otherwise vn_lock() unlocks the doomed vnode, causing extra unlock. Reported and tested by: pho Approved by: re (rwatson) MFC after: 3 weeks
196019	01-Aug-2009	rwatson	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
195744	17-Jul-2009	rmacklem	Patch the regular nfs client in a manner analagous to r195704 for the experimental client. The patch avoids calling vn_lock() for the case where nfs_nget() has acquired the same vnode as dvp, since nfs_nget() has already locked the vnode. Reviewed by: kib, jhb Approved by: re (kensmith), kib (mentor)
195703	14-Jul-2009	kib	Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns kernel resources that block other threads, like vnode locks. The SIGSTOP sent to such thread (process, rather) shall not stop it until thread releases the resources. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
195699	14-Jul-2009	rwatson	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
195294	02-Jul-2009	kib	In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused. Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks
195203	30-Jun-2009	dfr	Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC. Approved by: re
195202	30-Jun-2009	dfr	Remove the old kernel RPC implementation and the NFS_LEGACYRPC option. Approved by: re
195181	30-Jun-2009	jhb	Fix build with NFS_LEGACYRPC enabled after the socket upcall locking changes. Approved by: re (kensmith)
194951	25-Jun-2009	rwatson	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
194739	23-Jun-2009	bz	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.
194425	18-Jun-2009	alc	Fix some of the style errors in *getpages().
194358	17-Jun-2009	kib	For dotdot lookup in nfs_lookup, inline the vn_vget_ino() to prevent operating on the unmounted mount point and freed mount data in case of forced unmount performed while dvp is unlocked to nget the target vnode. Add missed calls to m_freem(mrep) there on error exits [1]. Submitted by: rmacklem [1] Tested by: pho MFC after: 2 weeks
194118	13-Jun-2009	jamie	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)
193952	10-Jun-2009	rmacklem	Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in nfs_bioread_check_cons(). This is required since it is possible for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when a forced dismount is in progress. Also, move the check for VI_DOOMED in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the out of date comment for it. Submitted by: jhb Tested by: pho Approved by: kib (mentor) MFC after: 1 month
193744	08-Jun-2009	bz	After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds. Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
193272	01-Jun-2009	jhb	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem
193232	01-Jun-2009	bz	Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back. Update netstat to get the correct pointer using kvm_read() as well. This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now. Reviewed by: julian, rwatson, zec X-MFC: not possible
193187	31-May-2009	alc	nfs_write() can use the recently introduced vfs_bio_set_valid() instead of vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
193066	29-May-2009	jamie	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
192986	28-May-2009	alc	Make *getpages()s' assertion on the state of each page's dirty bits stricter.
192686	24-May-2009	dfr	Make sure we feed 32bit align memory to nfsm_dissect otherwise we will fault on platforms with strict alignment requirements. In particular, this fixes the problems with the new RPC transport on the arm platform. Note: this adds yet another copy of nfs_realign(). I will attempt to refactor after NFS_LEGACYRPC is removed. Submitted by: sam
192645	23-May-2009	bz	While r192615 fixed the former problems, make this file VIMAGE compliant now as well initializing local context variables.
192615	23-May-2009	bz	It seems this file was ignored by MRT, rnh locking changes and new-arpv2. So let the V_irtualization people finally make the disabled debugging code compile again. MFC after: 2 weeks X-MFC: MRT and adapt rnh locking
192578	22-May-2009	rwatson	Remove the unmaintained University of Michigan NFSv4 client from 8.x prior to 8.0-RELEASE. Rick Macklem's new and more feature-rich NFSv234 client and server are replacing it. Discussed with: rmacklem
192134	15-May-2009	alc	Eliminate unnecessary clearing of the page's dirty mask from various getpages functions. Eliminate a stale comment.
192010	12-May-2009	alc	Eliminate gratuitous clearing of the page's dirty mask.
191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
191964	10-May-2009	alc	Eliminate stale comments. Eliminate a case of unnecessary page queues locking.
191816	05-May-2009	zec	Change the curvnet variable from a global const struct vnet , previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged. This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_ macros expand to whitespace. The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another. The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions. This change also introduces a DDB subcommand to show the list of all vnet instances. Approved by: julian (mentor)
191777	04-May-2009	rwatson	Remove redundant NFSMNT_NFSV3 check in DTrace hooks for NFS RPC. MFC after: 1 month
191776	04-May-2009	rwatson	Fix typo in comment. MFC after: 1 month
191013	13-Apr-2009	kib	Remove trailing spaces
190888	10-Apr-2009	rwatson	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
190887	10-Apr-2009	kib	Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then. In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim. Reported and tested by: pho
190785	06-Apr-2009	jhb	When a stale file handle is encountered, purge all cached information about an NFS node including the access and attribute caches. Previously the NFS client only purged any name cache entries associated with the file. PR: kern/123755 Submitted by: Jaakko Heinonen jh of saunalahti fi Reported by: Timo Sirainen tss of iki fi Reviewed by: rwatson, rmacklem MFC after: 1 month
190783	06-Apr-2009	jhb	Change the default timeout for caching attributes of a directory in the NFS client from 30 seconds to 3 seconds. After the recent changes to add caching of negative name cache lookups, a negative name cache hit will persist until the client notices the parent directory has changed. The higher the attribute cache timeout on directories, the longer that can take, so lower the default timeout for directories to match that of regular files. Suggested by: bde, mohans MFC after: 1 month
190419	25-Mar-2009	rwatson	Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient directory, since it's under a BSD license, and this keeps NFS internals- aware tracing parts close to NFS. MFC after: 1 month Suggested by: jhb
190396	24-Mar-2009	rwatson	Fix two bugs in DTrace tracing of accesscache and attrcache load events: - Trace non-error loads into the access cache once, not zero times or twice. - Sometimes attr cache loads fail due to a race, in which case they are aborted leading to an invalidation; in this case, trace only the flush, not a load. MFC after: 1 month Sponsored by: Google, Inc.
190380	24-Mar-2009	rwatson	Add DTrace probes to the NFS access and attribute caches. Access cache events are: nfsclient:accesscache:flush:done nfsclient:accesscache:get:hit nfsclient:accesscache:get:miss nfsclient:accesscache:load:done They pass the vnode, uid, and requested or loaded access mode (if any); the load event may also report a load error if the RPC fails. The attribute cache events are: nfsclient:attrcache:flush:done nfsclient:attrcache:get:hit nfsclient:attrcache:get:miss nfsclient:attrcache:load:done They pass the vnode, optionally the vattr if one is present (hit or load), and in the case of a load event, also a possible RPC error. MFC after: 1 month Sponsored by: Google, Inc.
190293	22-Mar-2009	rwatson	Add dtnfsclient, a first cut at an NFSv2/v3 client reuest DTrace provider. The NFS client exposes 'start' and 'done' probes for NFSv2 and NFSv3 RPCs when using the new RPC implementation, passing in the vnode, mbuf chain, credential, and NFSv2 or NFSv3 procedure number. For 'done' probes, the error number is also available. Probes are named in the following way: ... nfsclient:nfs2:write:start nfsclient:nfs2:write:done ... nfsclient:nfs3:access:start nfsclient:nfs3:access:done ... Access to the unmarshalled arguments is not easily available at this point in the stack, but the passed probe arguments are sufficient to to a lot of interesting things in practice. Technically, these probes may cover multiple RPC retransmits, and even transactions if the transaction ID change as a result of authentication failure or a jukebox error from the server, but usefully capture the intent of a single NFS request, such as access, getattr, write, etc. Typical use might involve profiling RPC latency by system call, number of RPCs, how often a getattr leads to a call to access, when failed access control checks occur, etc. More detailed RPC information might best be provided by adding a krpc provider. It would also be useful to add NFS client probes for events such as the access cache or attribute cache satisfying requests without an RPC. Sponsored by: Google, Inc. MFC after: 1 month
190220	21-Mar-2009	rwatson	In nfs_request(), always exit using the nfsmout label once we're definitely doing an NFSv2 or NFSv3 RPC, rather than sometimes doing so and sometimes not. This makes it easier to add a DTrace return probe at a single point in the function. MFC after: 1 week
190176	20-Mar-2009	jhb	Expand the per-node access cache to cache permissions for multiple users. The number of entries in the cache defaults to 8 but is easily changed in nfsclient/nfs.h. When the cache is filled, the oldest cache entry is evicted when space is needed. I mirrored the changes to the NFSv[23] client in the NFSv4 client to fix compile breakage. However, the NFSv4 client doesn't actually use the access cache currently. Submitted by: rmacklem
189639	10-Mar-2009	jhb	- Remove code to set SAVENAME for CREATE or RENAME requests that get a -ve hit in the name cache. cache_lookup() doesn't actually return ENOENT for such requests to force the filesystem to do an explicit lookup, so this was effectively dead code. - Grab the nfsnode mutex while writing to n_dmtime. We don't grab the lock when comparing the time against the cached directory mod time (just as we don't when comparing ctime's for +ve name cache hits) since the attribute caching is already racy for NFS clients as it is. Discussed with: bde
189106	27-Feb-2009	bz	For all files including net/vnet.h directly include opt_route.h and net/route.h. Remove the hidden include of opt_route.h and net/route.h from net/vnet.h. We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong. This does not change the list of files which depend on opt_route.h but we can identify them now more easily.
188994	24-Feb-2009	jhb	Bring back the code to prime the ACCESS cache when fetching attributes for an NFS file. Now the priming is conditional on a new vfs.nfs.prime_access_cache sysctl. For now I've left the default setting to disabling the priming. Requested by: scottl
188833	19-Feb-2009	jhb	Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans
188832	19-Feb-2009	jhb	When fetching attributes for a file for NFSv3 mounts, do not perform an opportunistic ACCESS RPC to populate both the access and attribute caches of the file and instead always use a GETATTR RPC. On many modern NFS servers, an ACCESS RPC is much more expensive to service than a GETATTR RPC. Submitted by: mohans
188831	19-Feb-2009	jhb	Don't clear the attribute cache of a file when it is closed. A subsequent open() of the same file will load fresh attributes, so they do not need to be explicitly flushed in close() to guarantee close to open consistency. However, other file desciptors may still reference this file and clearing the attributes in close() forces those other file descriptors to fetch fresh attributes the next time they need them. Reviewed by: mohans MFC after: 1 week
188751	18-Feb-2009	jhb	Reindent a small bit of code that was not 8-space indented like the rest of the nfs_lookup() function.
187830	28-Jan-2009	ed	Last step of splitting up minor and unit numbers: remove minor(). Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
187812	28-Jan-2009	rodrigc	Fix parsing of acregmin, acregmax, acdirmin and acdirmax NFS mount options when passed as strings via nmount(). Submitted by: Jaakko Heinonen <jh saunalahti fi>
187526	21-Jan-2009	jhb	Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP: VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature. Inspired by: ups Tested by: pho
185571	02-Dec-2008	bz	Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
184969	14-Nov-2008	dfr	Switch the default rpc implementation for NFS back to the new code. I believe I have fixed the reported problems - if you still have trouble with it, please contact me with as much detail as possible so that I can track down any other issues as quickly as possible.
184920	13-Nov-2008	dfr	Temporarily switch NFS back to the old RPC code while I try to diagnose and fix the problems a few people have noticed with the new code. People who want to continue testing the new code or who need RPCSEC_GSS support should use the new option NFS_NEWRPC to select it.
184588	03-Nov-2008	dfr	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
184561	02-Nov-2008	trhodes	Document a few sysctls in the NFS client and server code. Minor style(9) where applicable. Approved by: alfred (slightly older version)
184554	02-Nov-2008	attilio	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
184413	28-Oct-2008	trasz	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
184214	23-Oct-2008	des	Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
184205	23-Oct-2008	des	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
183754	10-Oct-2008	attilio	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
183550	02-Oct-2008	zec	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
183330	24-Sep-2008	jhb	Part 1 of making shared lookups more resilient with respect to forced unmounts. When we upgrade a vnode lock from shared to exclusive during a name cache lookup, fail the lookup with EBADF if the vnode is invalidated while we are waiting for the exclusive lock. Also, for correctness (though I'm not sure it can occur in practice), downgrade an exclusively locked vnode if it should be share locked. Tested by: pho
183215	20-Sep-2008	kib	fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(), vattr_null() or by zeroing it. Remove these to allow preinitialization of fields work in vn_stat(). This is needed to get birthtime initialized correctly. Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month
183005	13-Sep-2008	rodrigc	Add code to parse NFS mount options passed as individual items of the nmount() iovec. This will allow us to move away from gathering up all the NFS mount options as a single "struct nfs_args" to be passed down through nmount(). This will make adding new NFS mount options much easier. Many, many thanks to Doug Rabson, who took my initial patches and cleaned them up. Reviewed by: dfr MFC after: 3 months
182542	31-Aug-2008	attilio	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
182371	28-Aug-2008	attilio	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
181803	17-Aug-2008	bz	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
180780	24-Jul-2008	dfr	Try again not to use a userspace pointer in the kernel when trying to record the hostname which we need for NLM requests. The previous patch was incomplete. PR: 125849 Pointy hat: dfr
180779	24-Jul-2008	dfr	Don't use a userspace pointer in the kernel when trying to record the hostname which we need for NLM requests. PR: 125849
180724	22-Jul-2008	ed	Move the NFS/RPC code away from lbolt. The kernel has a special wchan called `lbolt', which is triggered each second. It doesn't seem to be used a lot and it seems pretty redundant, because we can specify a timeout value to the *sleep() routines. In an attempt to eventually remove lbolt, make the NFS/RPC code use a timeout of `hz' when trying to reconnect. Only the TTY code (not MPSAFE TTY) and the VFS syncer seem to use lbolt now. Reviewed by: attilio, jhb Approved by: philip (mentor), alfred, dfr
180291	05-Jul-2008	rwatson	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks
180025	26-Jun-2008	dfr	Re-implement the client side of rpc.lockd in the kernel. This implementation provides the correct semantics for flock(2) style locks which are used by the lockf(1) command line tool and the pidfile(3) library. It also implements recovery from server restarts and ensures that dirty cache blocks are written to the server before obtaining locks (allowing multiple clients to use file locking to safely share data). Sponsored by: Isilon Systems PR: 94256 MFC after: 2 weeks
179333	27-May-2008	attilio	Once the ENOLCK is detected we expect to retry the acquisition. Anyway, in the edge case the flushing happens and the while is no more executed, nfs_flush() (and nfs4_flush()) can return with a wrong err value of ENOLCK. Bring it back to 0, as we expect to have for that case. Reported by: kris Reviewed by: kib
179039	16-May-2008	benno	Allow the block size used when booting over NFS to be overridden. It defaults to 8192 bytes which is the size currently used.
178888	09-May-2008	julian	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
178429	22-Apr-2008	phk	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
178243	16-Apr-2008	kib	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
177633	26-Mar-2008	dfr	Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf. Highlights include: * Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts. * Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation. * Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux. * Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket. * Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock. * Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers. Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
177599	25-Mar-2008	ru	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
177493	22-Mar-2008	jeff	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
176824	05-Mar-2008	rodrigc	Expand the nfs_opts array to include all possible string mount options that mount_nfs could pass down, if it passed down string mount options. Right now, mount_nfs jut passes down a single mount option named "nfs_args" with a fully initialized 'struct nfs_args'. In future commits, we will add code to the kernel for parsing stringified NFS mount options, so that we can convert mount_nfs to pass string options from userspace to kernel, instead of an initialized struct nfs_args.
176823	05-Mar-2008	rodrigc	In nfs_mount(), default initialize struct nfs_args the same way that it is default initialized in revision 1.77 of mount_nfs.c. Right now, this is a no-op, because currently we initialize struct nfs_args in mount_nfs in userspace, and pass it down into the kernel via nmount(), so we overwrite whatever we initialize here with the value passed in from userspace. However, this lays the groundwork for moving away from passing struct nfs_args from userspace to kernel via nmount(), so that we can instead pass string mount options via nmount() which can be parsed in the kernel. This will make it easier to add new NFS mount options.
176559	25-Feb-2008	attilio	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
176519	24-Feb-2008	attilio	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
176374	17-Feb-2008	yar	Prevent the NFS client from losing MNT_ROOTFS on the root file system. In particular, stop overwriting mount point flags in nfs_mountdiskless() because now they are set elsewhere. (They were _initialized_ by that function in the 4.4BSD days, when mount structures were not allocated in a centralized manner -- see rev. 1.1 of this file.) Fix nfs_mount(), which happened to depend on the loss of MNT_ROOTFS when it came to update handling. Also note that mountnfs() no longer handles updates. Now they shouldn't reach this function, so printf a diagnostic message if that happens due to a coding error.
176249	13-Feb-2008	attilio	- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb
176224	13-Feb-2008	jhb	Consolidate the code to generate a new XID for a NFS request into a nfs_xid_gen() function instead of duplicating the logic in both nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request(). MFC after: 1 week Submitted by: mohans (a long while ago)
176198	11-Feb-2008	kris	Switch the default NFS mount mode from UDP to TCP. UDP mounts are a historical relic, and are no longer appropriate for either LAN or WAN mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss from socket buffer fill events is common, and sequence numbers wrap quickly enough that data corruption is possible. TCP solves both of these problems without imposing significant overhead. MFC after: 1 month
176134	09-Feb-2008	attilio	namei() can call underlying nfs_readlink() passing a struct uio pointer owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present into nfs_upgrade_vnlock() to panic as it only acquire curthread now. Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use more the struct thread pointer passed as argument (as it is really nomore required there as vn_lock() and VOP_UNLOCK doesn't get the lock more). Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should be handled as a "not locked" request by both functions. Reported by: kris Tested by: kris Reviewed by: ups
176116	08-Feb-2008	attilio	Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner. Reviewed by: jeff, kib
175635	24-Jan-2008	attilio	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
175486	19-Jan-2008	attilio	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)
175294	13-Jan-2008	attilio	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
175237	11-Jan-2008	jhb	The previous revision broke the case of reconnecting to a TCP NFS server via a new socket during an NFS operation as that reconnect takes place in the context of an arbitrary thread with an arbitrary credential. Ideally we would like to use the mount point's credential for the entire process of setting up the socket to connect to the NFS server. Since some of the APIs (sobind(), etc.) only take a thread pointer and infer the credential from that instead of a direct credential, work around the problem by temporarily changing the current thread's credential to that of the mount point while connecting the socket and then reverting back to the original credential when we are done. Reviewed by: rwatson Tested on: UDP, TCP, TCP with forced reconnect
175221	10-Jan-2008	jhb	Pass curthread to various socket routines (socreate(), sobind(), and soconnect()) instead of &thread0 when establishing a connection to the NFS server. Otherwise inconsistent credentials may be used when setting up the NFS socket. MFC after: 1 week Reviewed by: rwatson
175202	10-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
173751	19-Nov-2007	rwatson	Remove hacks from the NFSv2/3 client intended to handle a lack of a server-side RPC retranmission cache for non-idempotent operations: these hacks substituted 0 (success) for the expected EEXIST in the event that a target name already existed for LINK, SYMLINK, and MKDIR operations, under the assumption that EEXIST represented a second application of the original RPC rather than a true failure. Background: certain NFS operations (in this case, LINK, SYMLINK, and MKDIR) are not idempotent, as they leave behind persisting state on the server that prevents them from being replayed without an error;if an UDP RPC reply is lost leading to a retransmission by theclient, the second reply will return EEXIST rather than success, asthe new object has already been created. The NFS client previouslysilently mapped the EEXIST return into success to paper over thisproblem. However, in all modern NFS server implementations, a reply cache is kept in order to retransmit the original reply to a retransmitted request, rather than performing the operation a second time, allowing this hack to be avoided. This allows link()-based filelocking over NFS to operate correctly, as an application requestingthe creation of a new link for a file to tell if it succeededatomically or not. Other NFS clients, including Solaris and Linux, generally follow this behavior for the same reasons. Most clients also now default to TCP, which also helps avoid the issue of retransmitted but non-idempotent requests in most cases. Reported by: Adam McDougall <mcdouga9 at egr dot msu dot edu>, Timo Sirainen <tss at iki dot fi> Reviewed by: mohans MFC after: 1 week
173068	27-Oct-2007	rodrigc	Add the following mount options to the nfs_opts array: noatime, noexec, suiddir, nosuid, nosymfollow, union, noclusterr, noclusterw, multilabel, acls, force, update, async. These options correspond to MOPT_STDOPTS, MOPT_FORCE, MOPT_UPDATE, and MOPT_ASYNC. Currently, mount_nfs converts these "-o" options from strings to MNT_ flags via getmntopts(), and passes the flags from userspace to the kernel. This change will allow us in future to pass these mount options as strings directly to the kernel via nmount() when doing NFS mounts.
172836	20-Oct-2007	julian	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
172759	18-Oct-2007	jhb	Add a -z flag to nfsstat which zeros the NFS statistics after displaying them. MFC after: 1 week Requested by: ps Submitted by: ps (6 years ago)
172697	16-Oct-2007	alfred	Get rid of qaddr_t. Requested by: bde
172600	12-Oct-2007	mohans	NFS MP scaling changes. - Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders thru the sndlock. - Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add logic to wait for pending request senders to finish sending before reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec. - Break out the nfs xid manipulation under a new nfs xid lock, rather than over loading the nfs request lock for this purpose. - Fix some of the locking in nfs_request. Many thanks to Kris Kennaway for his help with this and for initiating the MP scaling analysis and work. Kris also tested this patch thorougly. Approved by: re@ (Ken Smith)
172324	25-Sep-2007	mohans	Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle timeout occurring at exactly the same time. If this happens, the nfsiod exits although there may be a queued async IO request for it. Found by : Kris Kennaway Approved by: re
171744	06-Aug-2007	rwatson	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
171190	03-Jul-2007	jhb	Fix for a race where out of order loading of NFS attrs into the nfsnode could lead to attrs being stale. One example (that we ran into) was a READDIR+, WRITE. The responses came back in order, but the attrs from the WRITE were loaded before the attrs from the READDIR+, leading to the wrong size from being read on the next stat() call. MFC after: 1 week Submitted by: mohans Approved by: re (kensmith)
171189	03-Jul-2007	jhb	Fix up NFS client write error handling. Errors are split into recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue. Reported by: kris, Kai, others Approved by: re (kensmith)
170292	04-Jun-2007	attilio	Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
170174	01-Jun-2007	jeff	- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
169681	18-May-2007	rwatson	In nfs_down(), if rep can be NULL, which we test for, then we should lock and unlock conditionally, not just set the flag on it conditionally. In practice, this bug couldn't manifest, as in the current revision of the code, no callers pass a NULL rep. CID: 1416 Found with: Coverity Prevent(tm)
169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
169043	25-Apr-2007	jhb	Various fixes to the NFS Directio support. - Fix for a bug where a close would not wait for all (directio) dirty buffers to drain. The nfsnode was not marked NMODIFIED when there were directio dirtied buffers pending, causing this. - No reason to vhold/vrele the vp when enqueueing DirectIO requests for the nfsiods. The vnode can't really go way since the close has to wait for these requests to drain. MFC after: 1 week Submitted by: mohans
168931	21-Apr-2007	rwatson	Attempt to rationalize NFS privileges: - Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD. - Use PRIV_NFS_DAEMON in the NFS server. - In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
167830	23-Mar-2007	delphij	Don't destroy a mutex just before we use it, instead, destroy it after we have used it.
167497	13-Mar-2007	tegge	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
167353	09-Mar-2007	mohans	Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(
167352	09-Mar-2007	mohans	Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
167086	27-Feb-2007	jhb	Use pause() rather than tsleep() on stack variables and function pointers.
166782	16-Feb-2007	mohans	Backing out an earlier change. It seems harmless for NFS to miss the "force unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.
166636	11-Feb-2007	mohans	Add missing MNT_ILOCK around some mnt_kern_flag accesses.
166378	31-Jan-2007	mohans	Fix for a vnode lock leak in nfs_create() in the event of an error. Spotted by ups@.
166338	30-Jan-2007	kris	Instead of always hard-coding the socket type for the nfs root mount as SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks
166218	25-Jan-2007	bde	Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.
166193	23-Jan-2007	kib	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
165104	11-Dec-2006	mohans	NetApp filers return corrupt post op attrs in the wcc on NFS error responses. This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway
164934	06-Dec-2006	sam	consolidate parsing of nfs root mount options in one place and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month
164735	29-Nov-2006	mohans	In nfs_nget(), we must initialize the fh in the nfsnode before inserting the vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.
164701	27-Nov-2006	mohans	bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().
164684	27-Nov-2006	mohans	Fix for a bug caused by a race when 2 threads lookup the same file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway
164430	20-Nov-2006	mohans	1) Fix up locking in nfs_up() and nfs_down. 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway
164346	16-Nov-2006	mohans	vfs_hash_insert() vputs() the losing vnode before returning, in the event of a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.
164345	16-Nov-2006	mohans	Fix to readdir+ reply handling. When inserting an entry into the namecache, initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.
164063	07-Nov-2006	sam	honor nolockd flag in root mount options MFC after: 2 weeks
163830	31-Oct-2006	mohans	Make EWOULDBLOCK a recoverable error so that the request is retransmitted. This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@
163471	17-Oct-2006	bde	Fixed some style bugs (especially ones involving long lines and use of __P(())). There are many more.
163341	14-Oct-2006	bde	Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).
162954	02-Oct-2006	phk	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.
162649	26-Sep-2006	tegge	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
162647	26-Sep-2006	tegge	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
162288	13-Sep-2006	mohans	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@
161722	29-Aug-2006	mohans	Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway
161371	16-Aug-2006	thomas	Fix typos in comment.
161125	09-Aug-2006	alc	Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().
161109	09-Aug-2006	brooks	Add a new kernel environment variable "boot.netif.mtu" which is used to set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks
160619	24-Jul-2006	rwatson	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
160182	08-Jul-2006	kib	Signals may be delivered to process as well as to the thread. Check the thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
160181	08-Jul-2006	kib	Always supply curthread as argument to nfs_asyncio and nfs_doio in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
160038	29-Jun-2006	yar	There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru
160029	29-Jun-2006	yar	Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.
159081	30-May-2006	mohans	Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was not being set, which means Giant would be acquired for these mounts.
158965	26-May-2006	mohans	Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".
158915	25-May-2006	ups	Call vm_object_page_clean() with the object lock held. Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days
158906	25-May-2006	ups	Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days
158905	24-May-2006	mohans	Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts before doing the read/write. Reported by: Chuck Lever.
158903	24-May-2006	rwatson	Adjust minimum iod threads from 4 to 0 -- since we compile the NFS client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>
158860	23-May-2006	cel	NFS over TCP retransmit behavior should default to a 60 second time out, mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated
158859	23-May-2006	cel	Refactor the NFS over UDP retransmit timeout estimation logic to allow the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated
158855	23-May-2006	mohans	Vnode locks are recursive and the NFS client support shared vnode locks. Found by: Kris Kennaway.
158739	19-May-2006	mohans	Changes to make the NFS client MP safe. Thanks to Kris Kennaway for testing and sending lots of bugs my way.
158316	05-May-2006	mohans	Fix a snafu caused while patching the previous fix from another branch.
158315	05-May-2006	mohans	Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.
157557	06-Apr-2006	mohans	Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().
157349	01-Apr-2006	jeff	- Busy the filesystem in nfs_statfs to prevent us from creating a new vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.
157058	23-Mar-2006	kris	Fix a bug in the NFS/TCP retransmission path. The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans
156879	19-Mar-2006	pjd	Actually I wanted 'nolockd' here instead of 'lockd'. MFC after: 2 days
156825	17-Mar-2006	cel	If an NFS server returns more than a few EJUKEBOX errors for a given RPC request, the FreeBSD NFS client will quickly back off to a excessively long wait (days, then weeks) before retrying the request. Change the behavior of the FreeBSD NFS client to match the behavior of the reference NFS client implementation (Solaris). This provides a fixed delay of 10 seconds between each retry by default. A sysctl, called nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris, the sysctl value on FreeBSD is in seconds, rather than in HZ. Sponsored by: Network Appliance, Incorporated Reviewed by: rick Approved by: silby MFC after: 3 days
156416	08-Mar-2006	cel	Fix a bug in NFSv3 READDIRPLUS reply processing The client's READDIRPLUS logic skips the attributes and filehandle of the ".." entry. If the server doesn't send attributes but does send a filehandle for "..", the client's logic doesn't account for the extra "value follows" field that indicates whether the filehandle is present, causing the remaining entries in the reply to be ignored. Sponsored by: Network Appliance, Inc. Reviewed by: rick, mohans Approved by: silby MFC after: 2 weeks
154580	20-Jan-2006	rees	Don't log an error on tcp connection reset, even if we don't get ECONNRESET. Submitted by: cel@citi.umich.edu
154487	17-Jan-2006	alfred	I ran into an nfs client panic a couple of times in a row over the last few days. I tracked it down to the fact that nfs_reclaim() is setting vp->v_data to NULL _before_ calling vnode_destroy_object(). After silence from the mailing list I checked further and discovered that ufs_reclaim() is unique among FreeBSD filesystems for calling vnode_destroy_object() early, long before tossing v_data or much of anything else, for that matter. The rest, including NFS, appear to be identical, as if they were just clones of one original routine. The enclosed patch fixes all file systems in essentially the same way, by moving the call to vnode_destroy_object() to early in the routine (before the call to vfs_hash_remove(), if any). I have only tested NFS, but I've now run for over eighteen hours with the patch where I wouldn't get past four or five without it. Submitted by: Frank Mayhar Requested by: Mohan Srinivasan MFC After: 1 week
154316	13-Jan-2006	rwatson	In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved from using a fifo to talk to rpc.lockd to using a special device node. Noticed by: Coverity Prevent analysis tool MFC after: 3 days
154152	09-Jan-2006	tegge	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
153786	28-Dec-2005	delphij	Correct a typo
153365	12-Dec-2005	ps	Improve upon rev 1.133 where NFS/TCP would not reconnect. Submitted by: Mohan Srinivasan
152920	29-Nov-2005	ru	Unexpand LLADDR().
152657	21-Nov-2005	ps	Fix for a bug where NFS/TCP would not reconnect (in the case where the server FIN'ed). Seen with Solaris NFS servers. Reported by: TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp> Submitted by: Mohan Strinivasan
152656	21-Nov-2005	ps	- Always return success from NFS strategy. nfs_doio(), in the event of an error, does the right thing, in terms of setting the error flags in the buf header. That fixes a crash from bstrategy(). - Treat ETIMEDOUT as a "recoverable" error, causing the buffer to be re-dirtied. ETIMEDOUT can occur on soft mounts, when the number of retries are exceeded, and we don't want data loss in that case. Submitted by: Mohan Srinivasan
152652	21-Nov-2005	rees	fix a problem with XID re-use when a server returns NFSERR_JUKEBOX. Submitted by: cel@citi.umich.edu Fixed by: rick@snowhite.cis.uoguelph.ca Approved by: alfred MFC after: 3 weeks
152289	10-Nov-2005	jon	fix a crash when an nfsv2 mount fails MFC after: 1 week
152019	03-Nov-2005	ps	Fix for a crash (from nfs_lookup() in an error case). Submitted by: Mohan Srinivasan
152000	03-Nov-2005	ps	In nfs_flush(), clear the NMODIFIED bit only if there are no dirty buffers and there are no buffers queued up for writing. The bug was that NMODIFIED was being cleared even while there were buffers scheduled to be written out, which leads to all sorts of interesting bugs - one where the file could shrink (because of a post-op getattr load, say) causing data in buffer(s) queued for write to be tossed, resulting in data corruption. Submitted by: Mohan Srinivasan
151998	03-Nov-2005	ps	Fix for a race between the thread transmitting the request and the thread processing the reply. Submitted by: Mohan Srinivasan
151897	31-Oct-2005	rwatson	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
151695	26-Oct-2005	glebius	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>
151024	06-Oct-2005	pjd	- Use strsep() instead of strtok(). - strdup() uses M_WAITOK, so we don't need to check it's return value against NULL. MFC after: 2 weeks
150995	06-Oct-2005	pjd	Add boot.nfsroot.options loader tunable. It allows to specify options for NFS root file system. Currently supported options are: soft, intr, conn, lockd. I'm adding this functionality mostly for 'lockd' option, which is only honored when performing the initial mount and will be silently ignored if used while updating the mount options. This will allow to use flock(2) without the need of using varmfs or rpc.lockd and friends. Example of use: boot.nfsroot.options="intr,lockd" MFC after: 2 weeks
150335	19-Sep-2005	rwatson	Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
148448	27-Jul-2005	ps	FIx for a bug in the change that made nfs_timer() MPSAFE. We need to grab Giant before calling pru_send() (if running with mpsafenet = 0). Found by: Jeremie Le Hen. Fixed by: Maxime Henrion
148447	27-Jul-2005	ps	In nfs_nget() if two threads race on the same filehandle, the loser should cause the nfsnode to get freed. This fixes a potential vnode (and nfsnode) leak in that path. Submitted by: Mohan Srinivasan Reviewed by: phk
148268	21-Jul-2005	ps	Remove the NFS client rslock. The rslock was used to serialize writers that want to extend the file. It was also used to serialize readers that might want to read the last block of the file (with a writer extending the file). Now that we support vnode locking for NFS, the rslock is unnecessary. Writers grab the exclusive vnode lock before writing and readers grab the shared (or in some cases the exclusive) lock. Submitted by: Mohan Srinivasan
148162	19-Jul-2005	ps	Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS client (the interface with the protocol stack and callouts) is Giant-free. Submitted by: Mohan Srinivasan.
148111	18-Jul-2005	ps	Fix for a NFS soft mounts bug where if the number of retries exceeds the max rexmits, the request was not being bounced back with a ETIMEDOUT error. Reported by: Oliver Lehmann Submitted by: Mohan Srinivasan
148008	14-Jul-2005	ps	Fixes for NFS crashes on architectures that require strict alignment. - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32). Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan
147420	16-Jun-2005	green	Ifdef out the incomplete non-blocking IO implementation for NFS pending discussion of how implementation would proceed. Applications like -lc_r expect select(3) to match the EAGAIN-status of IO functions. Approved by: re
147280	10-Jun-2005	green	Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208
146333	17-May-2005	des	Ugh. Previous commit got the logic exactly backward. Submitted by: bland Pointy hat to: des
146316	17-May-2005	des	Revision 1.173 broke updating a mount from ro to rw. Fix that by clearing the MNT_RDONLY flag if MNT_UPDATE is set and "ro" was not specified. Suggested by: cognet
146065	10-May-2005	rees	set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests get re-sent instead of timing out. don't log an error message on reconnection, which is not an error. remove unused nfs_mrep_before_tsleep. Reviewed by: Mohan Srinivasan Approved by: alfred
145879	04-May-2005	ps	Fix a bug in NFS/TCP where retransmissions would not reliably happen if the server rebooted or tore down the connection for any reason. Found by: Jonathan Noack. Submitted by: Mohan Srinivasan.
145806	02-May-2005	iedowse	Don't copy the NFSMNT_* flags into struct statfs's f_flags field, as they have no connection with the expected MNT_* flags. This bug was exposed 18 months ago when the assignments to f_flags in vfs_syscalls.c were moved to before the VFS_STATFS() call. It was fixed in the CSRG source 10 years ago, but we never picked up that change. PR: kern/80390 MFC after: 1 week
145598	27-Apr-2005	des	When NFS was converted to the new mount syscall, code was written that sets the MNT_RDONLY flag if the "ro" option was passed in from userland, and clears it otherwise. In the diskless case, the MNT_RDONLY flag is already set when this code is reached, but there are no mount options, so it was incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the "ro" option was specified, and left alone otherwise. Note that the NFS code will still happily let you mount a filesystem RW even if the server exports it RO. I'm not sure how to fix that.
145572	26-Apr-2005	des	While I'm here, list the new kenv (boot.netif.name) along with the others.
145570	26-Apr-2005	des	When netbooting, as soon as we've figured out which interface we booted from, store its name in a kenv variable.
145235	18-Apr-2005	rees	TCP reconnect is not an error. Change the message from LOG_ERR to LOG_INFO. Approved by: alfred
145060	14-Apr-2005	jeff	- cache_lookup() relocks the parent in the DOTDOT case for us. Spotted by: phk Sponsored by: Isilon Systems, Inc.
145006	13-Apr-2005	jeff	- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details. Sponsored by: Isilon Systems, Inc.
144367	31-Mar-2005	jeff	- LK_NOPAUSE is a nop now. Sponsored by: Isilon Systems, Inc.
144299	29-Mar-2005	jeff	- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c prevents any callers from doing a modifying op without LOCKPARENT or WANTPARENT.
144297	29-Mar-2005	jeff	- cache_lookup() now locks the new vnode for us to prevent some races. Remove redundant code. Sponsored by: Isilon Systems, Inc.
144206	28-Mar-2005	jeff	- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. - Network filesystems are written with a special idiom that checks the cache first, and may even unlock dvp before discovering that a network round-trip is required to resolve the name. I believe dvp is prevented from being recycled even in the forced unmount case by the shared lock on the mount point. If not, this code should grow checks for VI_DOOMED after it relocks dvp or it will access NULL v_data fields. Sponsored by: Isilon Systems, Inc.
144059	24-Mar-2005	jeff	- Update vfs_root implementations to match the new prototype. None of these filesystems will support shared locks until they are explicitly modified to do so. Careful review must be done to ensure that this is safe for each individual filesystem. Sponsored by: Isilon Systems, Inc.
144040	23-Mar-2005	ps	- The NFS client was incorrectly masking SIGSTOP (which is non-maskable). - The NFS client needs to guard against spurious wakeups while waiting for the response. ltrace causes the process under question to wakeup (possibly from ptrace()), which causes NFS to wakeup from tsleep without the response being delivered. Submitted by: Mohan Srinivasan
143822	18-Mar-2005	das	Don't brelse(bp) if bp is null. Also, eliminate some redundancy and dead code. Found by: Coverity Prevent analysis tool
143693	16-Mar-2005	phk	Use vfs_hash.
143687	16-Mar-2005	jmg	MFp4: use the function to fix the packet header length instead of rolling our own...
143511	13-Mar-2005	jeff	- VOP_INACTIVE should no longer drop the vnode lock. Sponsored by: Isilon Systems, Inc.
143510	13-Mar-2005	jeff	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. Sponsored by: Isilon Systems, Inc.
143508	13-Mar-2005	jeff	- It is no longer necessary to lock and unlock the vnode in nfs_close() as the top level does this for us now. Sponsored by: Isilon Systems, Inc.
142568	26-Feb-2005	ps	Minor cleanup in nfs_request() and removal of a comment that doesn't reflect reality. Submitted by: Mohan Srinivasan
142233	22-Feb-2005	phk	vp->v_id is a private field for the vfs namecache and it is a big mistake that NFS ever started using it. Long time ago I added the necessary vhold()/vdrop() calls to replace it, but forgot to remove the v_id code. Do it now.
142079	19-Feb-2005	phk	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.
142070	18-Feb-2005	ps	Fix for a potential NFS client race where shared data is updated from base context as well as the socket callback. Submitted by: Mohan Srinivasan
141465	07-Feb-2005	jhb	Drop Giant before calling kthread_exit().
140996	29-Jan-2005	rwatson	Style cleanup for O_DIRECT sysctl comment introduced in nfs_vnops.c:1.242.
140939	28-Jan-2005	phk	Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().
140777	24-Jan-2005	phk	Create a vnode_pager object when a file is opened.
140731	24-Jan-2005	phk	Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments passed for it.
140460	18-Jan-2005	peter	Mostly back out rev 1.33 from quite some time ago, and the followup fixes and tweaks. The code was actually quite broken because it discarded the upper bits of the 64 bit division. We only had a 50% chance of scaling up the blocksize for large NFS client mounts when it was needed. For 5.x and beyond, this was harmless because we could represent the result in either case. For 4.x this was a big problem though. (4.x also has a df(1) bug to compound the problem)
140220	14-Jan-2005	phk	Eliminate unused and unnecessary "cred" argument from vinvalbuf()
140122	12-Jan-2005	brian	Include opt_bootp.h for BOOTP_NFSROOT PR: 73183 Submitted by: Darrin Smith sdar at salseast dot org MFC after: 7 days
140056	11-Jan-2005	phk	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.
140048	11-Jan-2005	phk	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
139823	07-Jan-2005	imp	/* -> /*- for license, minor formatting changes
139744	05-Jan-2005	ps	If the NFS/TCP stream is out of sync between the client and server, and if the client (erroneously) reads the RPC length as 0 bytes, the client can loop around in the socket callback. Explicitly check for the length being 0 case and teardown/re-connect. Submitted by: Mohan Srinivasan
139247	23-Dec-2004	ps	Turn NFS directio off until the stability issues are resolved.
138919	16-Dec-2004	ps	Change the NFS sillyrename convention so that we won't run out of sillyrenames (which were limited to 58 per pid per directory, for no good reason). The new format of sillyrenames looks like .nfs.0000b31a.00d24.4 ^^^^^^^^ ^^^^^ ticks pid Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
138899	15-Dec-2004	ps	First cut of NFS direct IO support. - NFS direct IO completely bypasses the buffer and page caches. If a file is open for direct IO all caching is disabled. - Direct IO for Directories will be addressed later. - 2 new NFS directio related sysctls are added. One is a knob to disable NFS direct IO completely (direct IO is enabled by default). The other is to disallow mmaped IO on a file that has at least one O_DIRECT open (see the comment in nfs_vnops.c for more details). The default is to allow mmaps on a file that has O_DIRECT opens. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
138694	11-Dec-2004	marcel	Revert rev 1.233. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk
138645	10-Dec-2004	ps	In nfs_rename(), skip the otw rename operation if the fsync (to either src or dst) fails. This closes a potential data loss case (where the fsync failed with ENOSPC, for example). Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
138644	10-Dec-2004	ps	Store a hint in the nfsnode to detect sequential access of the file. Kick off a readahead only when sequential access is detected. This eliminates wasteful readaheads in random file access. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
138529	07-Dec-2004	ps	Fix for a Lock Order Reversal in the nfs_flush() path, between the vnode interlock and the proc lock. Reported by: marcel Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138514	07-Dec-2004	phk	Don't clobber mnt_stat.f_mntonname
138509	07-Dec-2004	phk	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
138505	07-Dec-2004	ps	Always issue wakeups() to the NFS requestors under the mutex to close all potential cases of missed wakeups. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138496	06-Dec-2004	ps	Rewrite of the NFS client's reply handling. We now have NFS socket upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138473	06-Dec-2004	ps	2 fixes that improve on the consistency of the NFS client cache. - Change the cached mtime to a 'struct timespec' from a time_t. Improving the precision of the cached mtime tightens up NFS' "close-to-open" consistency considerably. - Always force an over-the-wire consistency check from nfs_open() (unless the file is marked modified). This further improves NFS' "close-to-open" consistency. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138469	06-Dec-2004	ps	Serialize NFS vinvalbuf operations by acquiring/upgrading to the vnode EXCLUSIVE lock. This prevents threads from adding pages to the vnode while an invalidation is in progress, closing potential races. In the bioread() path, callers acquire the SHARED vnode lock - so while an invalidate was in progress, it was possible to fault in new pages onto the vnode causing the invalidation to take a while or fail. We saw these races at Yahoo! with very large files+heavy concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing the invalidation closes all these races. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138463	06-Dec-2004	ps	Add non-blocking versions of nfsm_dissect() and friends, for use from socket callbacks or similar callers, from both the NFS client and the server. Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138460	06-Dec-2004	ps	- If all data has been committed to stable storage on the server, it is safe to turn off the nfsnode's NMODIFIED flag. - Move the check for signals to the top of the loop where we loop around the dirty buffers on the vnode, scheduling writes. This ensures that we'll break ouf of the flush operation on reception of a signal. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
138458	06-Dec-2004	rwatson	Correct a typo in a comment.
138430	06-Dec-2004	phk	For reasons unknown, the nfs locking code used a fifo to send requests to userland and a dedicated system call to get replies. The vnode-bypass of fifos broke this into a panic. Ditch all the magic and create a device /dev/nfslock instead, and use that for both directions apart from the shorter path, this is also faster because the device driver runs Giant free using the vnode bypass. Noticed by: marcel
138419	05-Dec-2004	rwatson	Convert GIANT_REQUIRED; in nfs_mountroot() to NET_ASSERT_GIANT(), and annotate that nfs_mountroot assumes it is OK to step on the values in the global NFSv3 diskless structure as the mountroot function is called during a serialized part of the boot, before any other NFS client activity occurs. MFC after: 2 weeks
138418	05-Dec-2004	rwatson	Convert a GIANT_REQUIRED; into a NET_ASSERT_GIANT();, as sockets are now only conditionally protected by Giant based on debug.mpsafenet.
138412	05-Dec-2004	phk	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.
138411	05-Dec-2004	marcel	Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@
138290	01-Dec-2004	phk	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
138278	01-Dec-2004	phk	Remove redundant functions (repo-copied from nfsclient) for dealing with fifos.
138276	01-Dec-2004	phk	Scripted modification of vop_* prototypes to use typedefs.
138259	01-Dec-2004	phk	Add missing #include
138256	01-Dec-2004	ps	Fix for a race between lookup and readdirplus, that causes a deadlock (with NFS exclusive vnode locks enabled). Lookup grabs the parent's lock and wants to lock child. Readdirplus locks the child and wants to lock parent (for loading the attrs for ".."). The fix is to not load the attrs for ".." in readdirplus. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
138255	01-Dec-2004	ps	Clean all dirty pages (dirtied by mmap'ed writes) in nfs_close(). This closes a major hole in close-to-open consistency support. Added a new sysctl so that this can be disabled for single NFS client applications with very large amounts of mmap'ed IO (for performance). Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
138254	01-Dec-2004	ps	Fix for a (blocks) underrun bug where negative values were being returned back to df from a statfs call. Causing df to print negative values. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
138204	29-Nov-2004	ps	Fix for a bug in nfs_mkdir() that called vrele() instead of vput() in the error cases, causing panics. Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
137846	18-Nov-2004	jeff	- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.
137480	09-Nov-2004	phk	Detect root mount attempts on the flag, not on the NULL path.
137197	04-Nov-2004	phk	Retire b_magic now, we have the bufobj containing the same hint.
136927	24-Oct-2004	phk	Move the buffer method vector (buf->b_op) to the bufobj. Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
136767	22-Oct-2004	phk	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
136751	21-Oct-2004	phk	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
136517	14-Oct-2004	pjd	Add a missing newline character.
136006	01-Oct-2004	das	nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should be necessary here. Also, use killproc() instead of psignal().
135874	28-Sep-2004	phk	Remove support for using NFS device nodes.
135862	27-Sep-2004	phk	Remove NFS4 vop method vector for devices: we are desupporing device nodes on anything but DEVFS and in this case it was not even used (see below). Put the NFS4 vop method for fifo's behind "#if 0" because it is unused. Add a XXX comment to say that I think the unusedness is a bug.
135860	27-Sep-2004	phk	style consistency.
135280	15-Sep-2004	phk	Remove unused B_WRITEINPROG flag
134898	07-Sep-2004	phk	Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio().
134281	25-Aug-2004	rwatson	In nfs_timer(), pass curthread rather than &thread0 into the protocol send routine. In IPv6 UDP, the thread will be passed to suser(), which asserts that if a thread is used for a super user check, it be curthread. Many of these protocol entry points probably need to accept credentials instead of threads. MT5 candidate. Noticed/tested by: kuriyama
132902	30-Jul-2004	phk	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.
132808	28-Jul-2004	phk	Move a relic to its correct location(s): Put nfs diskless initialization calls with the code they call. (Yet another example of mindless copy&paste).
132805	28-Jul-2004	phk	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.
132640	25-Jul-2004	phk	Eliminate unused second argument to reassignbuf() and simplify it accordingly.
132084	13-Jul-2004	alfred	Turn off SO_REUSEADDR and SO_REUSEPORT, they were causing EADDRINUSE to be returned from the protocol stack. Pointy hat to me for not groking what those options _really_ mean.
132060	12-Jul-2004	dwmalone	Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt. Approved by: alfred
132023	12-Jul-2004	alfred	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
132018	12-Jul-2004	alfred	Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts. Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done. Seems to fix IPv6 from: kuriyama
131840	08-Jul-2004	brian	Change the following environment variables to kernel options: bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO - i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel. Pointed out by: dwmalone
131814	08-Jul-2004	brian	Change the following kernel options to environment variables: BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf: bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1" or even setting the variables manually from the OK prompt.
131717	06-Jul-2004	rwatson	Acquire socket lock in nfs_connect() connection/sleep loop to protect socket state and avoid missed wakeups.
131697	06-Jul-2004	alfred	use vfs_suser() to restrict access to the nfs mount's timeout.
131694	06-Jul-2004	alfred	NFS mobility Phase VI: Export NFS mount state via sysctl. Export timeout via sysctl.
131691	06-Jul-2004	alfred	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.
131551	04-Jul-2004	phk	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
131021	24-Jun-2004	rwatson	When updating sb_flags, acquire the socket buffer lock to prevent races.
130640	17-Jun-2004	phk	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
130619	17-Jun-2004	rwatson	Remove bad cookie vp kernel printf; while it does notify about an interesting event, there's little or nothing the user can do about it.
130554	16-Jun-2004	rwatson	Convert GIANT_REQUIRED to NET_ASSERT_GIANT where Giant is used to protect socket operations. Leave one "as-is" as it also frobs rootvp.
128992	06-May-2004	alc	Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page. Reviewed by: tegge@
128263	14-Apr-2004	peadar	Let the NFS client notice a file's size changing as a modification. This avoids presenting invalid data to the client's applications when the file is modified, and then extended within the window of the resolution of the modifcation timestamp. Reviewed By: iedowse PR: kern/64091
128126	11-Apr-2004	marcel	Unbreak build: s/TAILQ_ISEMPTY/TAILQ_EMPTY/g
128111	11-Apr-2004	peadar	Clean up properly when unloading NFS client module. This includes a modified form of some code from Thomas Moestl (tmm@) to properly clean up the UMA zone and the "nfsnodehashtbl" hash table. Reviewed By: iedowse PR: 16299
127977	07-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
127857	04-Apr-2004	rwatson	Spell 2 as SHUT_RDWR when used as an argument to soshutdown().
127797	03-Apr-2004	peadar	Flush cached access mode after modifying a files attributes for NFSv3. It's likely that modifying the attributes will affect the file's accessibility. This version of the patch is one suggested by Ian Dowse after reviewing my original attempt in the PR Reviewed By: iedowse PR: kern/44336 MFC after: 3 days
127515	28-Mar-2004	kan	Reset callout if in nfs_timeout and rpcclnt_timeout functions. Timer are supposed to continue firing as long as there is work to do, not stop after the first invocation. This is damage control after a patch that has been committed prematurely. Tested by: kris
127421	25-Mar-2004	rees	only do nfs rpc callouts if there is work to do. Submitted by: kan Approved by: alfred
127144	17-Mar-2004	pjd	Add a comment with an explanation why we don't report EPIPE errors on nfs sockets. Requested by: ru
127137	17-Mar-2004	pjd	Don't report EPIPE errors on nfs sockets. These can be due to idle tcp mounts which will be closed by netapp, solaris, etc. if left idle too long. Obtained from: NetBSD
126962	14-Mar-2004	peter	Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default clock precision on i386. This is a NOP change on i386. But this stops the mount_nfs units from suddenly changing to units of 1/20 of a second (vs the normal 1/10 of a second) if HZ is increased.
126888	12-Mar-2004	brooks	Allow kernel with the BOOTP option to boot when DHCP/BOOTP sets the root path to an absolute path without a host name. Previously, there was a nasty POLA violation where a system would PXE boot until you added the BOOTP option and then it would panic instead. Reviewed by: tegge, Dirk-Willem van Gulik <dirkx at webweaving.org> (a previous version) Submitted by: tegge (getip function)
126853	11-Mar-2004	phk	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
126851	11-Mar-2004	phk	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.
126425	01-Mar-2004	rwatson	Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam
126330	27-Feb-2004	rees	NFSv4 fixes from Connectathon 2004: remove unused pid field of file context struct map nfs4 error codes to errnos eliminate redundant code from nfs4_request use zero stateid on setattr that doesn't set file size use same clientid on all mounts until reboot invalidate dirty bufs in nfs4_close, to play it safe open file for writing if truncating and it's not already open Approved by: alfred
126105	22-Feb-2004	cperciva	If mountnfs returns an error, it will have already freed nam; no need to free it again. Reported by: "Ted Unangst" <tedu@coverity.com> Approved by: rwatson (mentor)
125454	04-Feb-2004	jhb	Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
125263	31-Jan-2004	obrien	Bump the NFCv3/TCP defaults for rsize and wsize from 8K to 32K to match Solaris and HP-UX. This increases read performance for large files across NFS. PR: 62024 & 26324 Submitted by: Bjoern Groenvall <bg@sics.se>
122953	22-Nov-2003	alfred	Use function pointers to remove the depenancy cross dependancy on nfs4 and the nfs3 client. Also fix some bugs that happen to be causing crashes in both v3 and v4 introduced by the v4 import. Submitted by: Jim Rees <rees@umich.edu> Approved by: re
122736	15-Nov-2003	alfred	Move the declaration for "struct nfs4_fctx" out from under #ifdef KERNEL for fstat(1).
122719	15-Nov-2003	alfred	unbreak LINT.
122698	14-Nov-2003	alfred	University of Michigan's Citi NFSv4 kernel client code. Submitted by: Jim Rees <rees@umich.edu>
122523	12-Nov-2003	kan	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
122450	11-Nov-2003	alfred	Stop using shared locks for nfs vop locks. The reason this was done was to avoid a race to the root when an NFS server went down. However a semi-recent change to the way that the kernel's lookup() routine traverses mount points prevents this. Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we aquire a shared lock on the mount point being accessed and then drop the directory vnode lock before requesting the target lock. With that in place we no longer need shared locks for NFS to prevent race to the root lockups.
122261	07-Nov-2003	sam	Assert GIANT_REQUIRED where sockets are manipulated. This is preparatory for MPSAFE network commits and ongoing socket locking work. Supported by: FreeBSD Foundation
122091	05-Nov-2003	kan	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
121874	02-Nov-2003	kan	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.
121816	31-Oct-2003	brooks	Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance. This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics. Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)
121205	18-Oct-2003	phk	DuH! bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)
121201	18-Oct-2003	phk	Initialize bp->b_offset before calling VOP_STRATEGY(). Remove KASSERTS and panics with B_PHYS checks which no longer apply.
121191	18-Oct-2003	phk	We do not get B_PHYS buffers here anymore. /dev/drum is long gone.
120812	05-Oct-2003	iedowse	Since the addition of the VI_DOINGINACT flag some time ago, VOP_INACTIVE routines need not worry about their vnode getting recycled if they block. Remove the code from nfs_inactive() that used vget() to get an extra vnode reference that was held during the nfs_vinvalbuf() call.
120788	05-Oct-2003	jeff	- Remove an incorrect XXX comment. This code does respect the XLOCK since it uses vget() which will fail if the identity changes.
120787	05-Oct-2003	jeff	- Check the XLOCK before we inspect the vnode.
120786	05-Oct-2003	jeff	- We don't need to cache_purge() in nfs_reclaim(), vclean() does it for us.
120755	04-Oct-2003	jeff	- Consistently set sopt_dir. Pointed out by: pete@isilon.com
120736	04-Oct-2003	jeff	- Acquire the vnode interlock prior to dropping the mntvnode_mtx. - Make a note of the lack of XLOCK protection in this code. We would access a vnode while it is changing identities without Giant.
120730	04-Oct-2003	jeff	- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a stack trace supplied by phk, I now understand what's going on here. The check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been partially torn down in vclean(). It is not clear that this would cause a problem. Document this in nfs_bio.c, which is where the other two filesystems copied this code from.
120264	19-Sep-2003	jeff	- Remove interlock protection around VI_XLOCK. The interlock is not sufficient to guarantee that this race is not hit. The XLOCK will likely have to be redesigned due to the way reference counting and mutexes work in FreeBSD. We currently can not be guaranteed that xlock was not set and cleared while we were blocked on the interlock while waiting to check for XLOCK. This would lead us to reference a vnode which was not the vnode we requested. - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if this condition is ever hit. It should not, since we should be retaining a reference to the vnode in these cases. The reference would be sufficient to block recycling.
120003	12-Sep-2003	phk	Name the vnode method vectors consistently with the rest of the filesystems. This improves the output of src/tools/tools/vop_table
119766	05-Sep-2003	phk	Remove now unused BOOTP tags related to NFS swap device.
119735	04-Sep-2003	dds	KNF: parentheses around return values. Suggested by: bde Approved by: schweikh (mentor - blanket) MFC after: 6 weeks
119687	02-Sep-2003	dds	Fix errno return values to better represent failure reasons for read and open. Approved by: schweikh (mentor) Agreed: bde MFC after: 6 weeks
118944	15-Aug-2003	phk	Remove the magic way of configuring NFS backed swap. This code dates back to the very first diskless support on FreeBSD, back when swapon(8) couldn't simply be run on a NFS backed file. Suggested replacement command sequence on the client: dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000 swapon /swapfile rm -f /swapfile For whatever value of 100000 you want.
118639	07-Aug-2003	billf	0) preallocate per-interface context structures without the ifnet lock held 1) avoid immediately calling bzero() after malloc() by passing M_ZERO 2) do not initialize individual members of the global context to zero 3) remove an unused assignment of ifctx in bootpc_init() Reviewed by: tegge
118135	29-Jul-2003	tjr	Fix a problem that occurs when truncating files on NFSv3 mounts: we need to set np->n_size back to the desired size again after calling nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting called, which would change n_size back to the value it had before the truncate request was issued. The result of this bug is that the size info cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks past the end of the file, stat() returns the wrong size, etc. PR: 41792 MFC after: 2 weeks
118094	27-Jul-2003	phk	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
117152	02-Jul-2003	phk	Change idle sleep indentifier to "-" for nfsiod
116461	17-Jun-2003	alc	Lock the vm object when freeing a page.
116412	15-Jun-2003	phk	Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.
116271	12-Jun-2003	phk	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk
116260	12-Jun-2003	iedowse	When removing a sillyrename file, make sure that the directory vnode has not been cleaned in the meantime, since this can happen during a forced unmount. Also add a comment that nfs_removeit() should really be locking the directory vnode before calling nfs_removerpc(). Reported by: mbr Tested by: mbr MFC after: 1 week
116189	11-Jun-2003	obrien	Use __FBSDID().
116185	11-Jun-2003	rwatson	Add the comment I meant to add about not passing in PCATCH to the tsleep(). Note the XXX.
116070	09-Jun-2003	hsu	On a socket creation error, don't close the socket.
115533	31-May-2003	phk	Remove unsed variables. Add explicit breaks to switch Found by: FlexeLint
115456	31-May-2003	phk	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
115415	30-May-2003	rwatson	rpc.lockd stability workaround: remove PCATCH from the tsleep() in nfs_lock.c. Right now, if we permit a signal to interrupt the sleep, we will slip the lock and no process on that client, the server, or any other client will be able to acquire the lock. This can happen, for example, if a user hits Ctrl-C or Ctrl-T while a process is waiting for the lock. By removing PCATCH, we prevent that from happening, at the cost of not permitting a user-requested lock abort: also nasty. However, a user interface bug might be preferable to a serious semantic bug, so we go with that for now. We need to teach the rpc.lockd/kernel protocol how to abort lock requests, and rpc.lockd how to handle aborted lock requests; patches for the kernel bit are floating around, but no rpc.lockd bit yet. Approved by: re (scottl)
115172	19-May-2003	peter	Deal with the possibility of negative available space from the file server to avoid Bad Things(TM) happening (eg: df crashing with a floating point exception). Submitted by: Harold Gutch <logix@foobar.franken.de> Approved by: re (scottl)
115041	15-May-2003	rwatson	This change grabs the vnode lock for NFS client vnodes when calling VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS will panic, and (b) it may be possible to corrupt entries in the cached vnode attributes in the nfsnode, since nfsnode attribute cache data is also protected by the vnode lock. Approved by: re (jhb) Pointed out by: VFS_DEBUG_LOCKS
114983	13-May-2003	jhb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
114434	01-May-2003	des	Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.
114216	29-Apr-2003	kan	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
113986	24-Apr-2003	truckman	VOP_FSYNC() expects to be called with the vnode locked, so lock fvp in nfs_rename() before calling VOP_FSYNC() and unlock fvp immediately after. Reviewed by: bde
113985	24-Apr-2003	peter	Fix a bug with df on large (>1TB) nfsv3 file servers on 32 bit client machines where the 'long' number of blocks in struct statfs wont fit. Instead of chosing an artificial 512 byte block size, simply scale it up until we avoid an overflow. NFSv3 reports the sizes in bytes, and the blocksize is a figment of nfsclient's imagination.
113885	23-Apr-2003	truckman	Release the vnode interlock in nfs_flush() before calling nfs_sigintr(), and grab it again later if necessary. This prevents a lock order reversal because nfs_sigintr() calls PROC_LOCK().
112892	31-Mar-2003	thomas	Revert change 1.201 (removing mapping of VAPPEND to VWRITE). Instead, use the generic vaccess() operation to determine whether an operation is permitted. This avoids embedding knowledge on vnode permission bits such as VAPPEND in the NFS client. PR: kern/46515 vaccess() patch submitted by: "Peter Edwards" <pmedwards@eircom.net> Approved by: tjr, roberto (mentor)
112888	31-Mar-2003	jeff	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
112685	26-Mar-2003	rwatson	Add O_NONBLOCK to the vn_open_cred() flags for NFS client locking when opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP. This improves handling of the case where the /var/run/lock fifo exists but there is no listener: we immediately return EOPNOTSUPP rather than blocking until a listener turns up. This could occur during a diskless boot before rpc.lockd is loaded, or if the lock file persists across a reboot following the disabling of rpc.lockd. This may have suddenly started to occur due to fifo blocking fixes--previously it looks like attempts to read on a fifo with no listener would time out due to insufficient resources. Reviewed by: alfred
112657	26-Mar-2003	alfred	req can not be NULL or we'd die. Sponsored by: RED
112455	21-Mar-2003	tjr	Map VAPPEND to VWRITE in nfsspec_access() - VAPPEND is never set in the mode returned by VOP_GETATTR. This fixes incorrect "Permission denied" errors when trying to append to a file on an NFSv2 mount.
112225	14-Mar-2003	jeff	- Add a forgotten BUF_LOCK() Most sincere apologies to: jake
112178	13-Mar-2003	jeff	- Lock the buf before inspecting its contents.
111856	04-Mar-2003	jeff	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick
111841	03-Mar-2003	njl	Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.
111748	02-Mar-2003	des	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
111514	26-Feb-2003	jeff	- The interlock was not being droped in nfs_flush() if the first part of an if clause was true. Break the two clauses out into seperate statements since they require different actions. Reported/Tested by: jake Spotted by: jhb
111475	25-Feb-2003	jeff	- Properly handle the vnode interlock in nfs_fsync. Reported by: phk
111463	25-Feb-2003	jeff	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick
111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
111105	18-Feb-2003	peter	Get rid of a silly message I added back in Sept 2001 (1.68).
110922	15-Feb-2003	tjr	Lock proc while accessing p_siglist, p_sigmask and p_sigignore in nfs_sigintr().
109704	22-Jan-2003	dillon	Provide a sysctl to allow defaulting of the connectionless (-c) feature to mount_nfs. The sysctl defaults to 1 (paranoid mode). Setting it to 0 will allow an NFS client to receive replies on a different IP then they were sent to by default. Submitted by: Sean Eric Fagan <sef@kithrup.com>
109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
108648	04-Jan-2003	phk	Since Jeffr made the std* functions the default in rev 1.63 of kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default. Found by: src/tools/tools/vop_table
108589	03-Jan-2003	phk	Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
108470	30-Dec-2002	schweikh	Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.
108357	28-Dec-2002	dillon	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day
108250	24-Dec-2002	hsu	SMP locking for radix nodes.
108198	23-Dec-2002	alc	Avoid holding the vnode interlock around malloc() or free() to prevent a lock order reversal. Reviewed by: jeff
108172	22-Dec-2002	hsu	SMP locking for ifnet list.
108161	21-Dec-2002	dillon	do not try to free a mountpoint that we did not allocate. X-MFC after: immediately
107104	20-Nov-2002	alfred	reapply 1.26 through 1.28. Approved by: re
107101	20-Nov-2002	alfred	forgot about 5.x freeze, backout 1.26 through 1.28 pending re@ appoval.
107100	20-Nov-2002	alfred	remove useless casts, unused macros and cleanup a line wrap.
107099	20-Nov-2002	alfred	comment and untwist error return logic
107098	20-Nov-2002	alfred	Remove an outdated comment complaining about exporting struct ucred to userspace, I fixed it a while ago.
105568	20-Oct-2002	phk	Don't examine an un-initialized variable. Spotted by: FlexeLint.
105563	20-Oct-2002	phk	Remove extern declarations of stuff which is static in nfs_node.c Move related macro to nfs_node.c Spotted by: FlexeLint
105077	14-Oct-2002	mckusick	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.
104908	11-Oct-2002	mike	Change iov_base's type from `char ' to the standard `void '. All uses of iov_base which assume its type is `char ' (in order to do pointer arithmetic) have been updated to cast iov_base to `char '.
104354	02-Oct-2002	scottl	Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates. Reviewed by: jake, peter, jhb
104306	01-Oct-2002	jmallett	Back our kernel support for reliable signal queues. Requested by: rwatson, phk, and many others
104241	30-Sep-2002	jmallett	Lock access to the signal queue, and related structures, with PROC_LOCK. Submitted by: jhb
104235	30-Sep-2002	jmallett	Convert use of p_siglist and old SIG*() macros to use <sys/ksiginfo.h> prototyped functions to get a sigset_t, and further to check for any queued signals, rather than an empty signal set, to go with the move to signal queues rather than signal sets.
104094	28-Sep-2002	phk	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
104024	27-Sep-2002	rwatson	Remove an errant debugging printf that got left in during my last commit. Pointed out by: guido
104016	26-Sep-2002	rwatson	Apparently pxeboot passes in a mygateway of non-zero sin length from DHCP in the event that no gateway is returned from DHCP, breaking the assumption that we skip the routing insertion of the gateway if the sin length is zero. Check also for s_addr of 0 to avoid the "Oh no, adding my default route failed" panic, making it possible to pxeboot machines on segments without default routes. Arguably this could be a bug in pxeboot, or in the TUNABLE code, but this makes my boxes boot.
103939	25-Sep-2002	jeff	- Lock access to the buf lists. - Use vrefcnt() where appropriate. - Add some locking asserts.
103770	22-Sep-2002	jake	Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless struct is not valid, rather than panicing later on. Call nfs_setup_diskless() from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy root mount support for sparc64, and enabled NFS_ROOT by default.
103554	18-Sep-2002	phk	Use m_length() instead of home-rolled versions.
103314	14-Sep-2002	njl	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
103099	08-Sep-2002	phk	Now that we have a cached mount credential in struct mount, use it istead of a private cached copy.
102966	05-Sep-2002	bde	Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite.
102052	18-Aug-2002	sobomax	Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's. Reviewed by: -hackers, -net
101947	15-Aug-2002	alfred	Remove a case of exposing 'struct ucred' to userspace. Use a struct xucred for LOCKD_MSG instead. Requested by: rwatson
101941	15-Aug-2002	rwatson	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
101777	13-Aug-2002	phk	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.
101744	12-Aug-2002	rwatson	Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent enforcement of MAC policy on the read or write operations: - In ext2fs, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), directory modifications in rename(), directory write operations in mkdir(), symlink write operations in symlink(). - In the NFS client locking code, perform vn_rdwr() on the NFS locking socket without enforcing MAC, since the write is done on behalf of the kernel NFS implementation rather than the user process. - In UFS, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), and symlink write operations in symlink(). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
101364	05-Aug-2002	jeff	- Add a missing VI_UNLOCK to an error case in nfs_flush.
101308	04-Aug-2002	jeff	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
100450	21-Jul-2002	alc	o Lock page queue accesses in nfs_getpages().
100194	16-Jul-2002	dillon	Fix a bug nfs_write() related to ^C'ing during a file write on an interruptable mount. We were returning from inside the loop without releasing the rslock. Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days
100175	16-Jul-2002	jhb	If we get a receive error in nfs_receive() and then get an error trying to obtain the send lock, we would bogusly try to unlock the send lock before returning resulting in a panic. Instead, only unlock the send lock if nfs_sndlock() succeeds and nfs_reconnect() fails. MFC after: 3 days Sponsored by: The Weather Channel
100134	15-Jul-2002	alfred	Add IPv6 support. Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
99797	11-Jul-2002	dillon	Convert old style (type foo *)0 casts to NULLs PR: kern/40360 Requested by: Hiten PAndya via direct email
99737	10-Jul-2002	dillon	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
98988	28-Jun-2002	jhb	In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK(). nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread argument to nfs_getcacheblk(). In nfs_getcacheblk() we dereference the thread pointer to get a process pointer to pass to nfs_sigintr(). This obviously results in a panic. :) Rather than change nfs_getcacheblk() to check if the thread pointer is NULL when calling nfs_sigintr() like other callers do, change nfs_sigintr() to take a thread as the last argument instead of a process so none of the callers have to care if the thread is NULL or not.
97658	31-May-2002	tanimura	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu
97333	27-May-2002	dd	Don't tsleep() with an sb_mtx held.
97211	24-May-2002	peter	Fix warning; deprecated use of label at end of compound statement
96972	20-May-2002	tanimura	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
96825	17-May-2002	ambrisko	Add TAG_VENDOR_INDENTIFIER (option 60) to our DHCP request done by the kernel BOOTP option. The format will be: FreeBSD:<MACHINE>:<osrelease> this way people can tune their DHCP server to server up root file systems via the OS, machine type and version. Obtained from: NetBSD MFC after: 3 weeks
96755	16-May-2002	trhodes	More s/file system/filesystem/g
95662	28-Apr-2002	phk	We don't need the arp kludge any more.
95590	27-Apr-2002	iedowse	Remove the nfs_{lock,unlock,islocked} functions and the associated definitions; they have been unused and #if 0'd out since the Lite/2 merge and we are unlikely to want them in the future.
94903	17-Apr-2002	iedowse	The recent NFS forced unmount improvements introduced a side-effect where some client operations might be unexpectedly cancelled during an unsuccessful non-forced unmount attempt. This causes problems for amd(8), because it periodically attempts a non-forced unmount to check if the filesystem is still in use. Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set only during the operation of a forced unmount. Use this instead of MNTK_UNMOUNT to trigger the cancellation of hung NFS operations. Also correct a problem where dounmount() might inadvertently clear the MNTK_UNMOUNT flag. Reported by: simokawa MFC after: 1 week
93593	01-Apr-2002	jhb	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
92783	20-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API.
92219	13-Mar-2002	luigi	Add a readonly sysctl variable of type string, kern.bootp_cookie, which is initialized with whatever string a dhcp/bootp server passes as vendor tag 134. There is no standard tag that I know with this information, and no vendor-defined tag that applies to FreeBSD that I could find doing the same thing. The intended use is to pass information to userland for run-time configuration of a diskless client without having to run a bootp/dhcp client for the third time (after the one in pxeboot/etherboot, and the one in the kernel bootp), also because these clients generally screwup the interface configuration, which is not exactly what you want when you have your disks nfs-mounted. Manpage update to follow soon. MFC-after: 3 days
91888	08-Mar-2002	phk	vhold() our vnode while checking the remote side. This is belived to be the only place where a soft reference to a vnode is held with no sort of hard reference, consequently this change should allow us to free(9) vnodes from the freelist after properly cleaning them up. Reviewed by: dillon
91460	28-Feb-2002	peter	Fix warnings.. bootpc_init() and related.
91420	27-Feb-2002	jhb	Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.
91406	27-Feb-2002	jhb	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
90373	07-Feb-2002	peter	Fix a long line touched in previous commit (but not caused by previous commit)
90361	07-Feb-2002	julian	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
89407	15-Jan-2002	peter	Revise the nfsiod auto tuning code. Now both the upper and lower limits are specifyable by sysctl and are respected. Submitted by: Maxime Henrion <mux@sneakerz.org>
89324	14-Jan-2002	peter	Implement vfs.nfs.iodmin (minimum number of nfsiod's) and vfs.nfs.iodmaxidle (idle time before nfsiod's exit). Make it adaptive so that we create nfsiod's on demand and they go away after not being used for a while. The upper limit is NFS_MAXASYNCDAEMON (currently 20). More will be done here, but this is a useful checkpoint. Submitted by: Maxime Henrion <mux@qualys.com>
89174	10-Jan-2002	iedowse	Terminate requests in nfs_sigintr() if the filesystem is in the process of being unmounted. This allows forced NFS unmounts to complete even if there are processes stuck holding the mnt_lock while the server is down. The mechanism is not ideal in that there is a small chance we might accidentally cancel requests during a failed non-forced unmount attempt on that filesystem, but this is not really a big problem. Also, move the tsleep() in nfs_nmcancelreqs() so that we do not sleep in the case where there are no requests to be cancelled.
88796	02-Jan-2002	iedowse	Permit NFS filesystems to be forcibly unmounted when the server is down, even if there are hung processes and the mount is non- interruptible. This works by having nfs_unmount call a new function nfs_nmcancelreqs() in the FORCECLOSE case. It scans the list of outstanding requests and marks as interrupted any requests belonging to the specified mount. Then it waits up to 30 seconds for all requests to terminate. A few other changes are necessary to support this: - Unconditionally set a socket timeout so that even hard mounts are guaranteed to occasionally check the R_SOFTTERM flag on requests. For hard mounts this flag can only be set by nfs_nmcancelreqs(). - Reject requests on a mount that is currently being unmounted. - Never grant the receive lock to a request that has been cancelled. This should also avoid an old problem where a forced NFS unmount could cause a crash; it occurred when a VOP on an unlocked vnode (usually VOP_GETATTR) was in progress at the time of the forced unmount.
88777	01-Jan-2002	alc	o Remove an errant ';' introduced in the last revision. o Remove an unused variable.
88770	01-Jan-2002	rwatson	o Remove premature use of nmp->nm_cred, it hasn't been initialized yet.
88746	31-Dec-2001	rwatson	o Pass td into nfs_mountroot() to eliminate an XXX'd curthread use. Since it's in the parent function anyway, might as well pass it another layer down. Obtained from: TrustedBSD Project
88745	31-Dec-2001	rwatson	o Remove premature leakage of use of td_ucred from base source tree: instead, use td->td_proc->p_ucred.
88743	31-Dec-2001	rwatson	o Add missing #include's of sys/proc.h, missed in merge, required to dereference td->td_proc->p_ucred.
88739	31-Dec-2001	rwatson	o Make the credential used by socreate() an explicit argument to socreate(), rather than getting it implicitly from the thread argument. o Make NFS cache the credential provided at mount-time, and use the cached credential (nfsmount->nm_cred) when making calls to socreate() on initially connecting, or reconnecting the socket. This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well as bugs involving NFS and mandatory access control implementations. Reviewed by: freebsd-arch
88712	30-Dec-2001	iedowse	Add a #define for the size of the nfs_backoff[] array, and use this instead of magic constants in the code.
88680	30-Dec-2001	ambrisko	Increase the buffer size to hold a bootp/DHCP reply from 256 bytes to 1222 bytes (derived as the maximum that isc-dhcpd uses). This solves the problem if a bootp/DHCP reply is over 256 bytes in which the end of the bootp/DHCP reply will not be found and then the reply will be ignored. This happens when swap and root paths are longish or many parameters are set. Reviewed by: imp Approved by: imp
88541	27-Dec-2001	dillon	nfs_nget() does no locking whatsoever when looking up a vnode. If the vget() sleeps we have to retry the operation to avoid racing against a deletion. MFC maybe: submitted to re's
88091	18-Dec-2001	iedowse	Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away. The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit. Reviewed by: peter
87834	14-Dec-2001	dillon	This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week
86363	14-Nov-2001	rwatson	o Modify nfslockdans() to accept a thread reference instead of a proc reference: with td->td_ucred, it will be desirable to authorize based on td->td_ucred, rather than p->p_ucred. o Since the same variable 'p' was later used with pfind() on the target process for the wakeup, introduce a new local variable 'targetp' to use instead. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
86284	12-Nov-2001	alfred	Allow users to use the 'nolockd' or -L options with mount_nfs in order to avoid the need for rpc.lockd to perform client locks. Using this option a user can revert back to using local locks for NFS mounts like we did before we had rpc.lockd.
86278	11-Nov-2001	alfred	turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials. Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.
86089	05-Nov-2001	dillon	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week
85398	24-Oct-2001	rwatson	o Note an additional potential problem here: LOCKD_MSG directly exports struct ucred to userland. In 5.0-CURRENT, it is desirable to instead export struct xucred, as ucred contains mutexes, pointers, and other kernel evil. I'll add it to my work queue.
85370	23-Oct-2001	rwatson	o Add two comments identifying problems with the current nfs_lock.c implementation, so that the information doesn't get lost. (1) /var/run/lock is looked up relative to the current thread's root directory, but it's not clear that's desirable. (2) A race condition associated with live credential modification on a shared credential is present when privilege is granted for the purposes of talking to /var/run/lock.
85339	23-Oct-2001	dillon	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
84827	11-Oct-2001	jhb	Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
84726	09-Oct-2001	jhb	Use crhold() instead of crdup() since we aren't modifying the cred but just need to ensure it remains immutable.
84700	09-Oct-2001	peter	Make this compile after last commit. It should be: "td ? td->td_proc : NULL", not "td ? td->td_proc, NULL"
84690	08-Oct-2001	julian	Don't dereference td if it's NULL. Submitted by: Alexander N. Kabaev <ak03@gte.com>
84079	28-Sep-2001	peter	Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).
84057	27-Sep-2001	peter	Make nfsm_dissect() have an obvious return value.
84002	27-Sep-2001	peter	Tidy up nfsm_build usage. This is only partially finished.
83914	25-Sep-2001	iedowse	Add a missing dereference level. This caused nfsm_postop_attr_xx() to try and extract node attributes from an RPC reply even if none were present. Reviewed by: peter
83697	20-Sep-2001	peter	Add the magic marker so that loader and kldload(2) can find this in module form automagically.
83694	20-Sep-2001	peter	Oops. Fix a missing indirection level. gcc didn't complain about it on x86, but did complain about it on alpha (since int and pointer are different sizes)
83654	18-Sep-2001	peter	Sigh, Last minute pre-merge typo. (missing quotes)
83651	18-Sep-2001	peter	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
83629	18-Sep-2001	imp	nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that will pass NULL as the struct proc when td is NULL. This has stopped crashing on my machine. Note: The passing of NULL may be bogus, but I'll let others fix that problem. Reviewed by: jhb
83493	15-Sep-2001	peter	Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
83291	10-Sep-2001	kris	Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions. Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks
82702	31-Aug-2001	dillon	Pushdown Giant for nfs syscalls (nfssvc())
82213	23-Aug-2001	ache	Stupid error from my side in prev. commit: \|\| -> &&
82204	23-Aug-2001	ache	Implement l_len<0 per POSIX check. Check for valid l_whence too.
82194	23-Aug-2001	ache	Even better move: suppose that server is able to handle SEEK_END, so check arguments for all but not SEEK_END case, leaving SEEK_END handling for server
82193	23-Aug-2001	ache	Apparently SEEK_END locking not supported by NFS. Previous variant returns EINVAL in that case, change it to EOPNOTSUPP.
82190	23-Aug-2001	ache	Move <machine/> after <sys/> Pointed by: bde
82174	23-Aug-2001	ache	adv. lock: detect off_t overflow _before_ it occurse and return EOVERFLOW instead of EINVAL
80891	01-Aug-2001	iedowse	Fix a client-side memory leak in nfs_flush(). The code allocates a temporary array to store struct buf pointers if the list doesn't fit in a local array. Usually it frees the array when finished, but if it jumps to the 'again' label and the new list does fit in the local array then it can forget to free a previously malloc'd M_TEMP memory. Move the free() up a line so that it frees any previously allocated memory whether or not it needs to malloc a new array. Reviewed by: dillon
80672	30-Jul-2001	peter	Check the filehandle size when mounting. Obtained from: Constantine Sapuntzakis <csapuntz@openbsd.org>
79247	04-Jul-2001	jhb	- Sort includes. - Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.
79224	04-Jul-2001	dillon	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
78911	28-Jun-2001	jhb	- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
77563	01-Jun-2001	jake	Unlock the process returned from pfind() if it does not return NULL. This fixes a witness lock violation for nfssvc returning with locks held. Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr> PR: kern/27776
77183	25-May-2001	rwatson	o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
77086	23-May-2001	jhb	Assert Giant is held by the caller rather than getting it and releasing it in getpages/putpages.
77031	23-May-2001	ru	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
76827	19-May-2001	alfred	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
76688	16-May-2001	iedowse	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
76131	29-Apr-2001	phk	Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
76118	29-Apr-2001	alfred	Remove incorrect comment. Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr> PR: kern/26893
76117	29-Apr-2001	grog	Revert consequences of changes to mount.h, part 2. Requested by: bde
75858	23-Apr-2001	grog	Correct #includes to work with fixed sys/mount.h.
75692	19-Apr-2001	alfred	vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
75631	17-Apr-2001	alfred	Implement client side NFS locks. Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
75580	17-Apr-2001	phk	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
75402	11-Apr-2001	peter	Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.
75218	05-Apr-2001	rwatson	o Rather than arbitrarily construct a credential in the nfs_statfs() VFS operation, make use of the calling process's credential. This solution may not be ideal (there are a number of other possible proposals, including making use of the proc0 credential, adding a credential argument to the VFSOP, and switching from a hard-coded ucred to a hard-coded nfscred), it is simple and appears to work. The arguments against using simply crget() are fairly strong: it is the only place in the code (other than a nearly identical invocation in ncp) where crget() is invoked, other than in the process credential creation code; as ucred becomes extensible, this use of crget() without appropriate context results in less and less meaningful credential data. The implementation here will probably be tweaked as a result of experimentation and further exploration of the requirements. In the mean-time, it allows progress to be made in ucred expansion for new security models without causing a crash every time df is used on an NFS mounted file system. This code has been interop tested against FreeBSD and Solaris NFS servers. While using the process credentials should not introduce interop problems, please let me know if any turn out to exist. Reviewed by: freebsd-arch
74501	20-Mar-2001	peter	Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.
74384	17-Mar-2001	peter	Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
74381	17-Mar-2001	peter	Dramatically improve the lame nfs_hash(). This is based on the Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/). This improves hash coverage a massive amount. We were seeing one set of machines that were using 0.84% of their 131072 entry nfsnode hash buckets with maximum chain lengths of up to ~500 entries. The machine was spending nearly 100% of its time in 'system'. A test with this has pushed the coverage from a few perCent up to 91% utilization with a max chain length of 11. Submitted by: David Filo
73929	07-Mar-2001	jhb	Grab the process lock while calling psignal and before calling psignal.
73286	01-Mar-2001	adrian	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
73211	28-Feb-2001	dillon	Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze) Testing by: mjacob, dillon
72650	18-Feb-2001	green	Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde
71915	02-Feb-2001	tegge	Enable use of DHCP extensions. Reviewed by: Per Kristian Hove <Per.Hove@math.ntnu.no>
70674	04-Jan-2001	dillon	NFS O_EXCL file create semantics temporarily uses file attributes to store the file verifier. The NFS client is supposed to do a SETATTR after a successful O_EXCL open/create to clean up the attributes. FreeBSD's client code was generating a SETATTR rpc but was not generating an access or modification time update within that rpc, leaving the file with a broken access time that solaris chokes on (and it doesn't look very nice when you ls -lua under FreeBSD either!). Fixed.
70254	21-Dec-2000	bmilekic	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
69781	08-Dec-2000	dwmalone	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
69214	26-Nov-2000	phk	Simplify the tprintf() API. Loose the special <sys/tprintf.h> #include file.
68883	18-Nov-2000	dillon	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
68711	14-Nov-2000	mckusick	In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
68186	01-Nov-2000	eivind	Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.
67882	29-Oct-2000	phk	Remove unneeded #include <sys/proc.h> lines.
67834	29-Oct-2000	tegge	Reduce kernel stack usage by not having large packets on the stack. Supply correct size parameter to dhcpd. Replace some magic numbers with macro names. Handle more than one interface.
67534	24-Oct-2000	tegge	Eliminate some bitrot (nonexisting member variable names). Don't use curproc when a proc pointer is available.
67531	24-Oct-2000	tegge	Style fixes.
67529	24-Oct-2000	tegge	Make RPC timeout message more readable. Supply proc pointer to sosend.
67486	24-Oct-2000	dwmalone	Problem to avoid processes getting stuck in "vmopar". From Ian's mail: The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode. The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal. The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp. There remain other was processes can get stuck in vmopar. Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>
67152	15-Oct-2000	bp	Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top of nfs mounts, but there can be side effects because nfs uses shared locks for vnodes.
67151	15-Oct-2000	bp	Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts). Give nfs's node lock its own name.
66615	04-Oct-2000	jasone	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
66355	25-Sep-2000	bp	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
65497	05-Sep-2000	msmith	Don't scan for the "right" network interface by shooting in the dark. Assume that the nfs_diskless structure is correctly set up; the provider ought to be getting it right.
63788	24-Jul-2000	mckusick	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
61619	13-Jun-2000	ps	Correctly set the Maximum DHCP Message Size. bootpd now works again as well as ISC dhcpd.
60938	26-May-2000	jake	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
60833	23-May-2000	jake	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
60141	07-May-2000	phk	Include a RFC 1533 "Maximum DHCP Message Size" option in our request. ISC DHCP will limit the reply length to 64 bytes for bootp replies unless we explicitly tell it we can do more. We tell it that we can do 1200 bytes.
60041	05-May-2000	phk	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
59794	30-Apr-2000	phk	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
59762	29-Apr-2000	phk	s/biowait/bufwait/g Prodded by: several.
59391	19-Apr-2000	phk	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
59249	15-Apr-2000	phk	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
58934	02-Apr-2000	phk	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
58710	27-Mar-2000	dillon	Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks. Add documentation and correct some prior documentation. Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
58349	20-Mar-2000	phk	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
58345	20-Mar-2000	phk	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
57178	13-Feb-2000	peter	Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world. Approved by: jkh
55934	13-Jan-2000	dillon	The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
55679	09-Jan-2000	shin	tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
55431	05-Jan-2000	dillon	Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
55423	05-Jan-2000	dillon	Fix at least one source of the continued 'NFS append race'. close() was calling nfs_flush() and then clearing the NMODIFIED bit. This is not legal since there might still be dirty buffers after the nfs_flush (for example, pending commits). The clearing of this bit in turn prevented a necessary vinvalbuf() from occuring leaving left over dirty buffers even after truncating the file in a new operation. The fix is to simply not clear NMODIFIED. Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1, will cause close() to do a stage 1 write AND a stage 2 commit synchronously. By default only the stage 1 write is done synchronously. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
55206	29-Dec-1999	peter	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
54970	21-Dec-1999	alfred	make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem. Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
54803	19-Dec-1999	rwatson	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
54799	19-Dec-1999	green	M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
54655	15-Dec-1999	eivind	Introduce NDFREE (and remove VOP_ABORTOP)
54605	14-Dec-1999	dillon	Fix two problems: First, fix the append seek position race that can occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write(). Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process. Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file. It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current. However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
54536	13-Dec-1999	dillon	Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request. PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie
54485	12-Dec-1999	dillon	Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return. PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
54480	12-Dec-1999	dillon	Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has NOT yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
54444	11-Dec-1999	eivind	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter
53937	30-Nov-1999	dillon	The symlink implementation could improperly return a NULL vp along with a 0 error code. The problem occured with NFSv2 mounts and also with any NFSv3 mount returning an EEXIST error (which is translated to 0 prior to return). The reply to the rpc only contains the file handle for the no-error case under NFSv3. The error case under NFSv3 and all cases under NFSv2 do not return the file handle. The fix is to do a secondary lookup to obtain the file handle and thus be able to generate a return vnode for the situations where the rpc reply does not contain the required information. The bug was originally introduced when VOP_SYMLINK semantics were changed for -CURRENT. The NFS symlink implementation was not properly modified to go along with the change despite the fact that three people reviewed the code. It took four attempts to get the current fix correct with five people. Is NFS obfuscated? Ha! Reviewed by: Alfred Perlstein <bright@wintelcom.net> Testing and Discussion: "Viren R.Shah" <viren@rstcorp.com>, Eivind Eklund <eivind@FreeBSD.ORG>, Ian Dowse <iedowse@maths.tcd.ie>
53776	27-Nov-1999	eivind	Remap the error EEXISTS => 0 before using error to determine if we should return a vp.
53552	22-Nov-1999	dillon	nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
53463	20-Nov-1999	eivind	Fix VOP_MKNOD for loss of WILLRELE. I don't know how I could have missed this in the first place :-( Noticed by: bde
53131	13-Nov-1999	eivind	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
53095	11-Nov-1999	dillon	Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately. PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current
53022	08-Nov-1999	dillon	Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit. PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
52781	01-Nov-1999	msmith	Call bootpc_init before we try to mount an NFS root, if we're configured to use BOOTP for NFS root discovery. The entire interface setup inside nfs_mountroot is evil, and should die.
52635	29-Oct-1999	phk	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
52492	25-Oct-1999	dillon	Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.
51906	03-Oct-1999	phk	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
51799	29-Sep-1999	marcel	Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL). reported, reviewed and tested by: jdp
51791	29-Sep-1999	marcel	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
51475	20-Sep-1999	dillon	Add comment to clarify a commit rpc optimization already being performed.
51344	17-Sep-1999	dillon	Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it. Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did. Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc. Note that we do not commit the meta-data yet. This works still needs to be done. Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
51138	11-Sep-1999	alfred	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
51068	07-Sep-1999	alfred	All unimplemented VFS ops now have entries in kern/vfs_default.c that return reasonable defaults. This avoids confusing and ugly casting to eopnotsupp or making dummy functions. Bogus casting of filesystem sysctls to eopnotsupp() have been removed. This should make *_vfsops.c more readable and reduce bloat. Reviewed by: msmith, eivind Approved by: phk Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
50521	28-Aug-1999	phk	remove unused variables.
50477	28-Aug-1999	peter	$Id$ -> $FreeBSD$
50405	26-Aug-1999	phk	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;
50053	19-Aug-1999	peter	Convert all the nfs macros to do { blah } while (0) to ensure it works correctly in if/else etc. egcs had probably picked up most of the problems here before with "ambiguous braces" etc, but this should increase the robustness a bit. Based on an idea from Eivind Eklund.
49945	17-Aug-1999	alc	Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page. Use it. Submitted by: dillon
49659	12-Aug-1999	dt	nfs_getcacheblk() can return 0 if the mount is interruptible. It need to be checked by the caller. Broken in: rev. 1.70 (1999/05/02)
49535	08-Aug-1999	phk	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
49405	04-Aug-1999	peter	Don't over-allocate and over-copy shorter NFSv2 filehandles and then correct the pointers afterwards. It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32 fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with garbage. The whole point of NFSv3's variable filehandle length was to allow for shorter handles, both in memory and over the wire. I plan on taking a shot at fixing this shortly.
49302	31-Jul-1999	msmith	As described by the submitter: I did some tcpdumping the other day and noticed that GETATTR calls were frequently followed by an ACCESS call to the same file. The attached patch changes nfs_getattr to fill the access cache as a side effect. This is accomplished by calling ACCESS rather than GETATTR. This implies a modest overhead of 4 bytes in the request and 8 bytes in the response compared to doing a vanilla GETATTR. ... [The patch comprises two parts] The first is the "real" patch, the second counts misses and hits rather than fills and hits. The difference is subtle but important because both nfs_getattr and nfs_access now fill the cache. It also changes the default value of nfsaccess_cache_timeout to better match the attribute cache. IMHO, file timestamps change much more frequently than protection bits. Submitted by: Bjoern Groenvall <bg@sics.se> Reviewed by: dillon (partially)
49239	30-Jul-1999	wpaul	Close PR #12651: the hash calculation routine has changed in other parts of the kernel but was not updated in nfs_readdirplusrpc().
49237	30-Jul-1999	wpaul	Fix two bugs in nfs_readdirplus(). The first is that in some cases, vnodes are locked and never unlocked, which leads to processes starting to wedge up after doing a mount -o nfsv3,tcp,rdirplus foo:/fs /fs; ls /fs. The second is that sometimes cnp is accessed without having been properly initialized: cnp->cn_nameptr points to an earlier name while "len" contains the length of a current name of different size. This leads to an attempt to dereference *(cn->cn_nameptr + len) which will sometimes cause a page fault and a panic. With these two fixes, client side readdirplus works correctly with FreeBSD, IRIX 6.5.4 and Solaris 2.5.1 and 2.6 servers. Submitted by: Matthew Dillon <dillon@backplane.com>
48859	17-Jul-1999	phk	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
48394	01-Jul-1999	peter	Fix warning. va_fsid is udev_t, which is int32_t. No need to use %lx.
48357	30-Jun-1999	julian	Submitted by: Conrad Minshall <conrad@apple.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> The following ugly hack to the exit path of nfs_readlinkrpc() circumvents an Auspex bug: for symlinks longer than 112 (0x70) they return a 1024 byte xdr string - the correct data with many nulls appended. Without this fix namei returns ENAMETOOLONG, at least it does on our source base and on FreeBSD 3.0. Note we do not (and should not) rely upon their null padding.
48317	28-Jun-1999	peter	Fix a KASSERT() that was negated and lead to: nfs_strategy: buffer 0xxxxx not locked when you attempted to write and had INVARIANTS turned on.
48274	27-Jun-1999	peter	Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
48225	26-Jun-1999	mckusick	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
48125	23-Jun-1999	julian	Matt's NFS fixes. Submitted by: Matt Dillon Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin 3.2 version to follow when tested
48024	19-Jun-1999	mjacob	Thanks to Bruce for noticing this.... compare against the new nfsnode's mount point for seeing whether or not the new nfsnode is already in the hash queue. We're pretty much guaranteed that the old nfsnode is already in the hash queue. Wank! Infinite Loop! Looks like just a minor typo.... (ah the influence of fortran ... np && np2... why not nfsnode_the_first && nfsnode_the_second???)...
47964	16-Jun-1999	mckusick	Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
47938	15-Jun-1999	mjacob	If we retry this operation from the top of this routine, we need to make sure we've freed any allocated resources (to avoid a memory leak) and and do the right thing with respect to the nfs node hash lock we'd acquired.
47751	05-Jun-1999	peter	Various changes lifted from the OpenBSD cvs tree: txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge. nfs_serv.c: don't call nfsm_adj() with negative values, windows clients could crash servers when doing a readdir of a large directory. nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin loop trying to bind(). Don't clobber a mbuf pointer or we get panics on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf. nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB. Obtained from: OpenBSD
47750	05-Jun-1999	peter	Fix a malloc race Obtained from: OpenBSD (csapuntz)
47749	05-Jun-1999	peter	Don't mistake a non-async block that needs to be committed for an interrupted write. Obtained from: fvdl@NetBSD.org via OpenBSD.
47028	11-May-1999	phk	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
46580	06-May-1999	phk	remove b_proc from struct buf, it's (now) unused. Reviewed by: dillon, bde
46568	06-May-1999	peter	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
46370	03-May-1999	alc	All directory accesses must be made with NFS_DIRBLKSIZE chunks to avoid confusing the directory read cookie cache. The nfs_access implementation for v2 mounts attempts to read from the directory if root is the user so that root can't access cached files when the server remaps root to some other user. Submitted by: Doug Rabson <dfr@nlsystems.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
46349	02-May-1999	alc	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
46112	27-Apr-1999	phk	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
45996	24-Apr-1999	dt	Fixed printf format errors on alpha.
45553	10-Apr-1999	peter	Close a potential mbuf and/or mbuf cluster leak in the client-side NFS statfs() code. Free the whole chain, not just the first one.
45361	06-Apr-1999	peter	Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP.
45347	05-Apr-1999	julian	Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
44679	12-Mar-1999	julian	Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
44246	25-Feb-1999	peter	Untangle the nfs send and receive queue locking a little. One lock routine was [ab]used for two different things, and you couldn't tell from the wait channel which one had wedged. Catch a few things missing from NFS_NOSERVER.
44112	18-Feb-1999	dfr	Move the declaration of the vfs.nfs sysctl node outside an ifdef so that it builds if NFS_NOSERVER is defined. Spotted by: Bruce Evans <bde@zeta.org.au>
44101	17-Feb-1999	bde	Fixed bitrot in NFS_ACDEBUG option.
44078	16-Feb-1999	dfr	* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
43960	13-Feb-1999	dillon	General additional cleanup of VOP API for NFS ops - mainly NFS ignoring the API for freeing up cnp's. This cleanup should not effect nominal operation one way or the other since NFS VOPs just happen to be called with flags that match what it actually does to the NAMEI components it gets. Still, if an NFS error occured, there was probably some memory leakage of NAMEI components with certain NFS VOP ops.
43956	13-Feb-1999	dillon	PR: kern/9970 Remove incorrect vput() in nfs_link()
43705	06-Feb-1999	dillon	Flush delayed-write data out prior to issuing a rename rpc. This appears to fix the problem w/ NFSV3 whereby a make installworld would get into high-network-bandwidth situations continuously trying to retry nfs writes that fail with a 'stale file handle' error.
43351	28-Jan-1999	dillon	Fix warnings related to -Wall -Wcast-qual
43311	28-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
43309	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile. This commit includes significant work to proper handle const arguments for the DDB symbol routines.
43307	27-Jan-1999	dillon	Fix nasty bug in nfs_access(). A conditional was if (a = b) instead of if (a == b).
43306	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
43305	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
42957	21-Jan-1999	dillon	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
42582	12-Jan-1999	eivind	Remove two cases of unused variable sp3.
42315	05-Jan-1999	eivind	Remove the 'waslocked' parameter to vfs_object_create().
42155	30-Dec-1998	hoek	Silence -Wtrigraph. Submitted by: Bradley Dunn <bradley@dunn.org> (pr: kern/8817)
42060	25-Dec-1998	dfr	Fix for creating files on a Solaris 7 server with NFSv3 (the request was slightly garbled but older servers seemed to understand it). Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
41796	14-Dec-1998	dt	Added 3 new errno values, requred by various standards: EOVERFLOW, ECANCELED, EILSEQ. Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping. Reviewed by: Dan Nelson <dnelson@emsphone.com>
41791	14-Dec-1998	dt	(Hopefully) fix support for "large" files. Mostly cast block numbers to off_t before they multiplied to block sizes.
41591	07-Dec-1998	archie	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
41514	04-Dec-1998	archie	Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
41488	03-Dec-1998	dillon	Make bootp error message slightly more verbose
41186	15-Nov-1998	msmith	Reimplement the NFS ACCESS RPC cache as an "accelerator" rather than a true cache. If the cached result lets us say "yes", then go with that. If we're not sure, or we think the answer might be "no", go to the wire to be certain. This avoids all of the possible false veto cases, and allows us to key the cached value with just the UID for which the cached value holds, reducing the bloat of the nfsnode structure from 104 bytes to just 12 bytes. Since the "yes" case is by far the most common, this should still provide a substantial performance improvement. Also default the cache to on, with a conservative timeout (2 seconds). This improves performance if NFS is loaded as a KLD module, as there's not (yet) code to parse an option out of the module arguments to set it, and sysctl doesn't work (yet) for OIDs in modules. The 'accelerator' mode was suggested by Bjoern Groenvall (bg@sics.se) Feedback on this would be appreciated as testing has been necessarily limited by Comdex, and it would be valuable to have this in 2.2.8.
41138	13-Nov-1998	msmith	Avoid a null pointer reference if the target of an NFS rename has been sillrenamed, or if the source vnode doesn't have an associated nfsnode. Bug report from Andrew Gallatin <gallatin@cs.duke.edu>
41132	13-Nov-1998	dfr	Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to free() sometimes. Reviewed by: Eric Haug <ejh@eas.slu.edu>
41127	13-Nov-1998	msmith	Implement NFS ACCESS RPC result caching. This yields startling performance increases for NFS clients for many access profiles, due to the fact that ACCESS results are persistently cached in the namecache in many cases. Note that the code is somewhat conservative in that it requires an exact credential match for a cache hit. This bloats the nfsnode structure by sizeof(struct ucred) (96 bytes). Any less conservative approach opens the possibility for a false veto in eg. setuid applications. Alternative suggestions would be welcomed. The cache is normally disabled, to activate set the sysctl variable vfs.nfs.access_cache_timeout to a nonzero value. This is the time in seconds that a cached entry will be considered valid; useful values appear to be 2-10 seconds. Performance of the cache can be monitored with the vfs.nfs.access_cache_hits and vfs.nfs.access_cache_hits variables.
41026	09-Nov-1998	peter	Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems. I believe it was Bruce who queried this.
40790	31-Oct-1998	peter	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
39794	29-Sep-1998	mckusick	In nfs_link(), check for a cross-device mount before looking in the v_data field. Obtained from: Charles Hannum, via Frank van der Linden <frank@wins.uva.nl>
39793	29-Sep-1998	mckusick	Missing vput when cross-device link error is detected in nfs_link.
39792	29-Sep-1998	mckusick	During truncation, have to notify the VM about the new size of the NFS file before doing the nfs_vinvalbuf operation. Otherwise some invalid data may show up in an mmap.
39790	29-Sep-1998	mckusick	Frank sez: 'It fixes a problem with servers that return 0 values for some of the fsinfo RPC fields. It is strictly speaking not wrong to do this, as the spec says that "it is expected that a server will make a best effort at supporting all the attributes", but pretty unusual. You guessed it, it's NT servers that do it.' Obtained from: Frank van der Linden <frank@wins.uva.nl>
39789	29-Sep-1998	mckusick	Do not need (or want) to take a reference on an NFS file that is being deleted due to an forcible unmount. The problem is that vgone calls vclean() which then calls calls nfs_inactive() with VXLOCK set on the vnode. Nfs_inactive() was calling vget() to get a reference on the vnode, which in turn hung on VXLOCK. Nfs_inactive() now checks v_usecount to make sure that the vnode is not coming from vclean() before it does a vget().
39788	29-Sep-1998	mckusick	The code checks each fragment mark to see if it's valid; if the fragment is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it barfs and, I think, drops the connection. However, there's no guarantee that in a multi-fragment RPC, all the fragments will be at least as large as NFS_MINPACKET. In fact, with the version of "tclnfs" we have here, which supports NFS over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's user-mode ONC RPC library), I can repeatably cause "tclnfs" to send a request with more than one fragment, one of which is only 8 bytes long. I just do a 3877-byte write to a file, at an offset of 0. The check that "slp->ns_reclen" is greater than or equal to NFS_MINPACKET serves no useful purpose - if the NFS server code can't handle packets < NFS_MINPACKET bytes, it can't handle them over any protocol, so the check has to be done above the RPC-over-TCP layer - and should be removed. Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
39782	29-Sep-1998	mckusick	Mark directory buffers that have no valid data with B_INVAL so that they are not put in the cache.
39781	29-Sep-1998	mckusick	When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag which says that the data is on server but not committed.
38909	07-Sep-1998	bde	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.
38894	07-Sep-1998	bde	Made unloading of the nfs LKM sort of work. This is mainly to test detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually work for any vfs, since it leaves garbage pointers to memory allocation control structures.
38869	05-Sep-1998	bde	Ignore the statically configured vfs type numbers and assign vfs type numbers in vfs attach order (modulo incomplete reuse of old numbers after vfs LKMs are unloaded). This requires reinitializing the sysctl tree (or at least the vfs subtree) for vfs's that support sysctls (currently only nfs). sysctl_order() already handled reinitialization reasonably except it checked for annulled self references in the wrong place. Fixed sysctls for vfs LKMs.
38866	05-Sep-1998	bde	Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded. NetBSD uses strcmp() to avoid this ugly global.
38799	04-Sep-1998	dfr	Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
38718	01-Sep-1998	luoqi	Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle NULL, buf free() can't.
38482	23-Aug-1998	wollman	Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
38412	18-Aug-1998	bde	Fixed printf format errors.
38299	13-Aug-1998	dfr	Protect all modifications to v_numoutput with splbio().
38289	12-Aug-1998	bde	Don't configure compatibility code for pre-Lite2 mount() calls by default. This code should go away soon.
37997	01-Aug-1998	peter	If we get an ENOBUFS from the network, it's normally transient network interface congestion (eg: nfs over a ppp link, etc). Don't log these for UDP mounts, and don't cause syscalls to fail with EINTR. This stops the 'nfs send error 55' warnings. If the error is because the system is really hosed, this is the least of your problems...
37649	15-Jul-1998	bde	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
37384	04-Jul-1998	julian	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
37291	30-Jun-1998	jmg	fix buildworld hopefully be3fore anyone complains... NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
37272	30-Jun-1998	jmg	convert some nfs tunables to options, these are: NFS_MINATTRTIMO VREG attrib cache timeout in sec NFS_MAXATTRTIMO NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec NFS_MAXDIRATTRTIMO NFS_GATHERDELAY Default write gather delay (msec) NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this NFS_WDELAYHASHSIZ and with this NFS_MUIDHASHSIZ Tune the size of nfsmount with this NFS_NOSERVER (already documented in LINT) NFS_DEBUG turn on NFS debugging also, because NFS_ROOT is used by very different files, it has been renamed to opt_nfsroot.h instead of the old opt_nfs.h....
37089	21-Jun-1998	bde	Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore, code controlled by it did not even compile.)
36979	14-Jun-1998	bde	Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
36735	07-Jun-1998	dfr	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
36563	01-Jun-1998	peter	Make sure we go a nfs_fsinfo() in get/putpages before calling readrpc/writerpc, since they assume it's already been done. This could break if the first read/write access to a nfs filesystem was an exec() or mmap() instead of a read(), write() syscall. (or statfs()). nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_* return code. Some layout tweaks for the get/putpages code.
36562	01-Jun-1998	peter	Fix post-test pre-commit cleanup typo.
36561	01-Jun-1998	peter	readlink() returns EINVAL rather than EPERM if called on a non-symlink.
36560	01-Jun-1998	peter	Preset the maximum file size before we get to nfs_fsinfo(), based on an (over?) conservative assumption about what the client can store in it's buffer cache using a signed 32-bit 512-byte block number index. Otherwise it's possible for some file access when maxfilesize = 0 (eg: /usr is nfs mounted and doing an execve()) Pointed out by: bde XXX It might make sense to do a preemptive nfs_fsinfo() call at mount time.
36541	31-May-1998	peter	For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t; int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short for storing modes (mode_t is a u_int16_t). Obtained from: NetBSD
36540	31-May-1998	peter	Support 'mount -u' remounts. This may require disconnecting and rebinding the socket. Certain mode changes are not allowed. Obtained from: NetBSD
36538	31-May-1998	peter	xdr encode -1 properly. Obtained from: NetBSD
36537	31-May-1998	peter	Fully fill in nfsv2 write rpc requests rather than leaving garbage. Obtained from: NetBSD
36536	31-May-1998	peter	Don't silently fail to set file flags. Obtained from: NetBSD
36535	31-May-1998	peter	Don't blindly accept the server's preferences if they are too small. Obtained from: NetBSD
36534	31-May-1998	peter	Prototype support for selectively allowing non-reserved ports on a per export basis. Needs userland support yet. Obtained from: NetBSD
36531	31-May-1998	peter	Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures, it just makes more work. We pass a copy of the uid/gid with the credentials. (although, this may need to be revisited if a non AUTHUNIX authentication method (such as NFSKERB) ever gets implemented). Obtained from: NetBSD
36530	31-May-1998	peter	Use the new SB_UPCALL flag, Obtained from: NetBSD (but I changed the flag clear order in case).
36526	31-May-1998	peter	NFS_SMALLFH is defined in nfsproto.h, not sys/mount.h Obtained from: NetBSD
36525	31-May-1998	peter	Don't let the user try "rmdir ." Obtained from: NetBSD
36524	31-May-1998	peter	Don't let the user try and unlink() a directory on a NFS server. Obtained from: NetBSD
36523	31-May-1998	peter	When a write rpc returns an error, break the loop. Obtained from: NetBSD
36522	31-May-1998	peter	Don't leak an mbuf when a write rpc returns zero bytes written. Obtained from: NetBSD
36521	31-May-1998	peter	#ifdef a diagnostic printf Obtained from: NetBSD
36520	31-May-1998	peter	Don't try and free mrep twice on some error conditions. Obtained from: NetBSD
36519	31-May-1998	peter	#ifdef a diagnostic panic, plus another missed costmetic change. Obtained from: NetBSD
36518	31-May-1998	peter	We have gained 2 more errno's, add them to the NFSv2 mapping table.
36517	31-May-1998	peter	Missed a cosmetic change that the other BSD's have.
36516	31-May-1998	peter	oops, nfs_msg() is called from client code too.
36515	31-May-1998	peter	When we can't reconnect a socket, don't forget to unlock before retrying or we can deadlock. Obtained from: NetBSD
36514	31-May-1998	peter	Don't log zero length reads, this can happen during normal operation. Obtained from: NetBSD
36513	31-May-1998	peter	Consider for readdir chunk sizes when tuning socket buffer reservations. Obtained from: NetBSD
36511	31-May-1998	peter	Some const's Obtained from: NetBSD
36503	31-May-1998	peter	NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim of this part of commits is to minimize unnecessary differences between the other NFS's of similar origin. Yes, there are gratuitous changes here that the style folks won't like, but it makes the catch-up less difficult.
36481	31-May-1998	peter	VOP_ABORTUP() appears to be called with the wrong vnode. The other callers that I checked (eg: ufs_link()) do the ABORTOP on the directory rather than the file itself. After Michael Hancock's patches, the abortop doesn't seem all that critial now since something else will free the pathname buffer.
36473	30-May-1998	peter	When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)). One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order) Obtained from: NetBSD
36329	24-May-1998	peter	Convert a couple of large allocations to use zones rather than malloc for better packing. This means that we can choose better values for the various hash entries without having to try and get it all to fit within an artificial power of two limit for malloc's sake.
36249	20-May-1998	peter	s/flags/flag/
36248	20-May-1998	peter	A cleaner fix for PR#5102, clear nonsense flags at mount time rather than in the core of nfs_bio.c at the 11th hour. PR: 5102
36247	20-May-1998	peter	Don't change argp->flags after it's been copied.
36176	19-May-1998	peter	Allow control of the attribute cache timeouts at mount time. We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
36100	16-May-1998	bde	Get timespecs directly instead of via timevals.
36099	16-May-1998	bde	Don't abuse `+' to combine flags.
36098	16-May-1998	bde	Backed out rev.1.76. It just added style bugs.
36097	16-May-1998	bde	Get timespecs directly instead of via timevals.
36013	13-May-1998	peter	Add missing arg to vget().. Serves me right for committing a 2.2 patch to -current without testing it there.. :-( Submitted by: Michael Hancock <michaelh@cet.co.jp>
35994	13-May-1998	peter	Hold a reference to the vnode during the sillyrename cleanup. If we block in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated from underneath us (eg: replaced by a ufs 'struct inode') which can cause disk corruption ('freeing free block' when di_db[5] gets trashed). This is not a cheap fix, but it'll do until the nfsnodes get reference counting and/or locking. Apparently NetBSD have a similar fix (apparently from BSDI). I wish all PR's had this much useful detail. :-) PR: 6611 Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
35991	13-May-1998	peter	Move the *vpp initialization earlier so that it's set in all error cases. This should stop the 'panic: leaf should not be empty' nfs panic. PR: 1856 Submitted by: msaitoh@spa.is.uec.ac.jp
35823	07-May-1998	msmith	In the words of the submitter: --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>
35769	06-May-1998	msmith	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>
35066	06-Apr-1998	phk	Use random() to find our initial xid.
34961	30-Mar-1998	phk	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde
34930	28-Mar-1998	steve	Don't allow the readdirplus routine to be used in NFS V2. PR: 5102 Reviewed by: msmith Submitted by: Dmitry Kohmanyuk <dk@farm.org>
34926	28-Mar-1998	bde	Don't depend on <sys/mount.h> including <sys/socket.h>.
34924	28-Mar-1998	bde	Moved some #includes from <sys/param.h> nearer to where they are actually used.
34573	14-Mar-1998	tegge	Add a BOOTP_WIRED_TO option, for use on machines with multiple network cards where the first detected card should not be used for bootp. Submitted by: Doug Ambrisko <ambrisko@whistle.com>
34572	14-Mar-1998	tegge	Update workaround for limitations in the arp code. Adjust the RPC timeout message which occured when the old workaround broke to show the correct IP address.
34266	08-Mar-1998	julian	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
34206	07-Mar-1998	dyson	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
34096	06-Mar-1998	msmith	Trivial filesystem getpages/putpages implementations, set the second. These should be considered the first steps in a work-in-progress. Submitted by: Terry Lambert <terry@freebsd.org>
33964	01-Mar-1998	msmith	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
33181	09-Feb-1998	eivind	Staticize.
33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
33121	05-Feb-1998	dyson	Fix an omission of a line from the previous commit to this file. The problem appeared to be an NFS hang.
33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
33054	03-Feb-1998	bde	Forward declare some structs so that this file is more self-sufficient.
32998	01-Feb-1998	bde	Moved declaration of `union nethostadr' outside of the KERNEL section, to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c previously had to #define KERNEL before including <nfs/nfs.h> to get this pollution, but this gave other pollution. Moved comment about NFSINT_SIGMASK to immediately before the code that it applies to.
32997	01-Feb-1998	bde	Forward declare more structs that are used in prototypes here - don't depend on <sys/types.h> forward declaring common ones. Added an underscore to `sin' in prototypes to avoid warnings for the conflict with the ANSI sin().
32912	31-Jan-1998	tegge	Release the buffer when an error occurs while reading directory entries.
32755	25-Jan-1998	dyson	Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances. (Thanks to Bill Paul for help.)
32754	25-Jan-1998	dyson	Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances. (Thanks for help from Bill Paul.)
32609	18-Jan-1998	tegge	Increase the minimum bootp reply packet size from 16 (bogus) to 300 (correct).
32358	09-Jan-1998	eivind	Make the BOOTP family new-style options (in opt_bootp.h)
32350	08-Jan-1998	eivind	Make INET a proper option. This will not make any of object files that LINT create change; there might be differences with INET disabled, but hardly anything compiled before without INET anyway. Now the 'obvious' things will give a proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The only thing that _should_ work (but can't be made to compile reasonably easily) is sppp :-( This commit move struct arpcom from <netinet/if_ether.h> to <net/if_arp.h>.
32286	06-Jan-1998	dyson	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
32071	29-Dec-1997	dyson	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
32011	27-Dec-1997	bde	Unspammed nested include of <vm/vm_zone.h>.
31886	20-Dec-1997	bde	Added a used include. Fixed a gratuitous ANSIism and nearby KNF violations.
31617	08-Dec-1997	dyson	Various of the ISP users have commented that the 1.41 version of the nfs_bio.c code worked better than the 1.44. This commit reverts the important parts of 1.44 to 1.41, and we will fix it when we can get a handle on the problem.
31391	24-Nov-1997	bde	Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often a mistake (since softnet interrupts may occur if malloc() waits), and doing it harmlessly but unnecessarily here interfered with detection of the mistaken cases.
31132	12-Nov-1997	julian	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.
31017	07-Nov-1997	phk	Rename some local variables to avoid shadowing other local variables. Found by: -Wshadow
31016	07-Nov-1997	phk	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
30994	06-Nov-1997	phk	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
30813	28-Oct-1997	bde	Removed unused #includes.
30808	28-Oct-1997	bde	Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined. Fixed everything that depended on the nested include.
30780	27-Oct-1997	bde	Removed unused #includes. The need for most of them went away with recent changes (docluster* and vfs improvements).
30743	26-Oct-1997	phk	VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
30738	26-Oct-1997	phk	Always initialize the syscall vectors for our "private" syscalls (not just in the LKM case). Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
30496	16-Oct-1997	phk	VFS clean up "hekto commit" 1. Add defaults for more VOPs VOP_LOCK vop_nolock VOP_ISLOCKED vop_noislocked VOP_UNLOCK vop_nounlock and remove direct reference in filesystems. 2. Rename the nfsv2 vnop tables to improve sorting order.
30492	16-Oct-1997	phk	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)
30474	16-Oct-1997	phk	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
30439	15-Oct-1997	phk	vnops megacommit 1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.
30434	15-Oct-1997	phk	Hmm, realign the vnops into two columns.
30431	15-Oct-1997	phk	Stylistic overhaul of vnops tables. 1. Remove comment stating the blatantly obvious. 2. Align in two columns. 3. Sort all but the default element alphabetically. 4. Remove XXX comments pointing out entries not needed.
30430	15-Oct-1997	phk	When the default vnops funtion is vn_default_error(), there is no reason to implement small functions that just return EOPNOTSUPP for things we don't do. The removed functions only apply to UFS based filesystems anyway.
30354	12-Oct-1997	phk	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
30309	11-Oct-1997	phk	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
30118	05-Oct-1997	phk	Reverse rev 1.56 and rev 1.59. These made NFS too flakey.
29653	21-Sep-1997	dyson	Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
29363	14-Sep-1997	peter	select -> poll flag missing vnode op table entries
29293	10-Sep-1997	phk	Don't repeat checks done at general level.
29291	10-Sep-1997	phk	Remove a couple of stubborn NetBSD #if's.
29288	10-Sep-1997	phk	unifdef -U__NetBSD__ -D__FreeBSD__
29202	07-Sep-1997	bde	Removed more vestiges of config-time swap configuration.
29024	02-Sep-1997	bde	Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
28787	26-Aug-1997	phk	Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
28270	16-Aug-1997	wollman	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
27845	02-Aug-1997	bde	Removed unused #includes.
27609	22-Jul-1997	dfr	Correct some dumb mistakes in the WebNFS stuff. Submitted by: bde
27446	16-Jul-1997	dfr	Merge WebNFS changes from NetBSD. Obtained from: NetBSD
26995	27-Jun-1997	wpaul	Fix a condition where nfs_statfs() can precipitate a panic. There is code that says this: nfsm_request(vp, NFSPROC_FSSTAT, p, cred); if (v3) nfsm_postop_attr(vp, retattr); if (!error) nfsm_dissect(sfp, struct nfs_statfs , NFSX_STATFS(v3)); The problem here is that if error != 0, nfsm_dissect() will not be called, which leaves sfp == NULL. But nfs_statfs() does not bail out at this point: it continues processing until it tries to dereference sfp, which causes a panic. I was able to generate this crash under the following conditions: 1) Set up a machine as an NFS server and NFS client, with amd running (using NIS maps). /usr/local is exported, though any exported fs can can be used to trigger the bug. 2) Log in as normal user, with home directory mounted from a SunOS 4.1.3 NFS server via amd (along with a few other NFS filesystems from same machine). 3) Su to root and type the following: # mount localhost:/usr/local /mnt # df To fix the panic, I changed the code to read: if (!error) { nfsm_dissect(sfp, struct nfs_statfs , NFSX_STATFS(v3)); } else goto nfsmout; This is a bit kludgy in that nfsmout is a label defined by the nfsm_subs.h macros, but these macros are themselves more than a little kludgy. This stops the machine from crashing, but does not fix the overall bug: 'error' somehow becomes 5 (EIO) when a statfs() is performed on the locally mounted NFS filesystem. This seems to only happen the first time the filesystem is accesed: on subsequent accesses, it seems to work fine again. Now, I know there's no practical use in mounting a local filesystem via NFS, but doing it shouldn't cause the system to melt down.
26952	25-Jun-1997	tegge	Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal.
26929	25-Jun-1997	dfr	Avoid small synchronous writes when an application does lots of random-access short writes within a block (e.g. ld).
26928	25-Jun-1997	dfr	Make nfs_lookup return a NULLVP on error so that DIAGNOSTIC kernels don't panic.
26669	16-Jun-1997	dyson	Upgrade NFS to support the new vfs_bio resource/buffer management.
26581	12-Jun-1997	tegge	Move commonly used code into static functions in order to reduce kernel bloat.
26580	12-Jun-1997	tegge	Remove unused routines.
26469	06-Jun-1997	dfr	Fix a problem caused by removing large numbers of files from a directory which could cause a bad size to be given to uiomove, causing a page fault.
26420	03-Jun-1997	dfr	Various fixes from NetBSD: Use u_int for rpc procedure numbers. Some fixes to NQNFS. A rare NULL pointer dereference. Ignore NFSMNT_NOCONN for TCP mounts. Obtained from: NetBSD
26418	03-Jun-1997	dfr	Implement the async mount option for NFSv3. This makes NFS pretend that all writes sent to the server were synchronous and therefore no commits are needed. This is the same as the vfs.nfs.async variable on the server but allows each client to choose whether to work this way. Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e. pretend that the write was synchronous.
26410	03-Jun-1997	dfr	Fix a problem with nfs_flush where if many B_NEEDCOMMIT buffers are attached to the vnode, some of them could be re-written synchronously (if they overflowed the fixed size array nfs_flush had for them). The fix involves mallocing an array if there are more than its limited size stack buffer. Reviewed by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
26409	03-Jun-1997	dfr	Fix some performance problems with the NFS mmap fixes.
25952	20-May-1997	dfr	Plug a memory leak in nfs_link. PR: kern/1001
25930	19-May-1997	dfr	Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign. PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
25877	17-May-1997	phk	Remove redundant check for vp == dvp (done in VFS before calling).
25804	14-May-1997	tegge	Use same syntax as netboot for root and swap mounts. Handle mount options. Ignore T16 (swap server address) and T6 (DNS server).
25785	13-May-1997	dfr	Check the B_CLUSTER flag when choosing whether to use unstable or filesync writes. PR: kern/3438 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
25781	13-May-1997	dfr	Don't keep addresses in mbuf chains. This should simplify the next round of network changes from Garret. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
25755	12-May-1997	tegge	Use the old nfs arguments in the nfs_diskless structure, to be compatible with boot proms made from the 2.2 source. Convert the nfs arguments when copying to the new diskless structure. Copy the gateway field in the diskless structure.
25723	11-May-1997	tegge	Bring in some kernel bootp support. This removes the need for netboot to fill in the nfs_diskless structure, at the cost of some kernel bloat. The advantage is that this code works on a wider range of network adapters than netboot. Several new kernel options are documented in LINT. Obtained from: parts of the code comes from NetBSD.
25664	10-May-1997	dfr	Implement a separate control for write gathering on NFSv3. This is turned off for NFSv3 by default since write gathering seems to reduce performance for NFSv3 by up to 60%. Add sysctl knobs to control both variables.
25663	10-May-1997	dfr	Fix a nasty hang connected with write gathering. Also add debug print statements to bits of the server which helped me find the hang.
25611	09-May-1997	dfr	Prevent a mapped root which appears on the server as e.g. nobody from accessing files which it shouldn't be able to. This required a better approximation of VOP_ACCESS for NFSv2 (NFSv3 already has an ACCESS rpc which is a better solution) and adding a call to VOP_ACCESS from VOP_LOOKUP. PR: kern/876, kern/2635 Submitted by: David Malone <dwmalone@maths.tcd.ie> (for kern/2635)
25610	09-May-1997	dfr	Fix memory leak caused by the fact that the directory offset cookies and the sillyrename information are stored in the same place.
25459	04-May-1997	phk	Now I can even execute "df" on my diskless :-)
25453	04-May-1997	phk	1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
25416	03-May-1997	phk	Make nfs roots (diskless) functional again. It may still not be correct, but it is functional.
25307	30-Apr-1997	dfr	Allow NULL rpcs on non-privileged ports at all times to work around broken clients. PR: kern/3298 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
25201	27-Apr-1997	wollman	The long-awaited mega-massive-network-code- cleanup. Part I. This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in. 2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials. 3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed. 4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family. 5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem. As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
25089	22-Apr-1997	dfr	Fix broken usage of nm_readdirsize and increase the socket buffers for UDP to prevent possible socket overflows. 2.2 candidate. PR: kern/3304 Reviewed by: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
25023	19-Apr-1997	dfr	Fix a bug where a program which appended many small records to a file could wind up writing zeros instead of real data when the file is on an NFSv2 mounted directory. While tracking this bug down, I noticed that nfs_asyncio was waking all the iods when a block was written instead of just one per block. Fixing this gives a 25% performance improvment for writes on v2 (less for v3). Both are 2.2 candidates. PR: kern/2774
25003	18-Apr-1997	dfr	Don't allow partial buffers to be cluster-comitted. Zero the b_dirty{off,end} after cluster-comitting a group of buffers. With these fixes, I was able to complete a 'make world' with remote src and obj directories.
24626	04-Apr-1997	dfr	Fix various bugs in the locking protocol, allowing proper shared locks to be used. This should fix the lock panics that people are seeing.
24577	03-Apr-1997	dfr	The code which recovered from a modified directory situation did not check for eof when re-caching the directory. This could cause it to loop forever if a directory was truncated.
24381	29-Mar-1997	bde	Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs features, and the one thing that it depended on (DIRBLKSIZ) now has conflicting spelling.
24378	29-Mar-1997	bde	Define our own version of DIRBLKSIZ instead of (ab)using ufs's value. Use the same value of 512 (ufs actually uses DEV_BSIZE). There are too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for nfs, one for ibcs2, one for linux, one for applications, ... I think nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes of all supported file systems. There is also NFS_DIRBLKSIZ, which is different from nfs's DIRBLKSIZ but is sometimes confused with it in comments. Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs. This came in undocumented with the Lite2 merge although it isn't in Lite2. It required more-bogus #define KERNEL's in fstat and pstat to make the constants visible. Restored a spelling fix from rev.1.17. Removed duplicate #defines of all the the NFS mount option flags.
24330	27-Mar-1997	guido	Add code that will reject nfs requests in teh kernel from nonprivileged ports. This option will be automatically set/cleraed when mount is run without/with the -n option. Reviewed by: Doug Rabson
24204	24-Mar-1997	bde	Don't include <sys/ioctl.h> in the kernel. Stage 2: include <sys/sockio.h> instead of <sys/ioctl.h> in network files.
24101	22-Mar-1997	bde	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
23570	09-Mar-1997	bde	YAMInTheWrongDirectionF22 (part of rev.1.28.2.3: set B_CLUSTEROK for commits).
23218	28-Feb-1997	bde	Fixed a panic in nfs_writevp(). Lite2 provided a fix for a silly missing-parentheses bug, but this exposed a misplaced vfs_busy_pages(). This bug cost a factor of 2.5-3 in nfsv3 write performance! It should be fixed in 2.2. Removed some debugging code that gets triggered often in normal operation. There are still many backwards diagnostics (#define DIAGNOSTIC gives no diagnostics). Submitted by: vfs_busy_pages() fix by dfr
22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
22871	18-Feb-1997	bde	Changed `#ifdef COMPAT_PRELITE2' to `#ifndef NO_COMPAT_PRELITE2' so that old nfs mount calls are supported by default.
22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
21124	31-Dec-1996	wpaul	Fix (properly, I hope) 'panic: sillyrename dir' crash that can happen if you do: % cd /nfsdir % mkdir -p foo/foo % mv foo/foo . nfs_sillyrename() self-destructs if you try to sillyrename a directory, however nfs_rename() can be coerced into doing just that by the above sequence of commands. To avoid this, nfs_rename() now checks that v_type of the 'destination' vnode != VDIR before attempting the sillyrename. The server correctly handles this particular situation by returning ENOTEMPTY on the rename() attempt. I asked if this was the correct fix for this on -hackers but nobody ever answered. This is a 2.2 candidate.
20407	13-Dec-1996	wollman	Convert the interface address and IP interface address structures to TAILQs. Fix places which referenced these for no good reason that I can see (the references remain, but were fixed to compile again; they are still questionable).
19449	06-Nov-1996	dfr	Improve the queuing algorithms used by NFS' asynchronous i/o. The existing mechanism uses a global queue for some buffers and the vp->b_dirtyblkhd queue for others. This turns sequential writes into randomly ordered writes to the server, affecting both read and write performance. The existing mechanism also copes badly with hung servers, tending to block accesses to other servers when all the iods are waiting for a hung server. The new mechanism uses a queue for each mount point. All asynchronous i/o goes through this queue which preserves the ordering of requests. A simple mechanism ensures that the iods are shared out fairly between active mount points. This removes the sysctl variable vfs.nfs.dwrite since the new queueing mechanism removes the old delayed write code completely. This should go into the 2.2 branch.
19070	21-Oct-1996	dfr	If a large (>4096 bytes) directory was modified, the old directory contents are discarded, including the cached seek cookies. Unfortunately, if the directory was larger than NFS_DIRBLKSIZ, then this confused nfs_readdirrpc(), making it appear as if the directory was truncated. Reviewed by: Karl Denninger <karl@Mcs.Net>
19058	20-Oct-1996	phk	Add four sysctl variables that joerg wanted.
18888	12-Oct-1996	bde	Staticized `nfs_dwrite'.
18866	11-Oct-1996	dfr	This fixes a problem with the nfs socket handling code which happens if a single process is performing a large number of requests (in this case writing a large file). The writing process could monopolise the recieve lock and prevent any other processes from recieving their replies. It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the behaviour which originally pointed out the problem. When a process writes to a file over NFS, it usually arranges for another process (the 'iod') to perform the request. If no iods are available, then it turns the write into a 'delayed write' which is later picked up by the next iod to do a write request for that file. This can cause that particular iod to do a disproportionate number of requests from a single process which can harm performance on some NFS servers. The alternative is to perform the write synchronously in the context of the original writing process if no iod is avaiable for asynchronous writing. The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The default is vfs.nfs.dwrite=1; if many people tell me that performance is better if vfs.nfs.dwrite=0 then I will change the default. Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
18397	19-Sep-1996	nate	In sys/time.h, struct timespec is defined as: /* * Structure defined by POSIX.4 to be like a timeval. / struct timespec { time_t ts_sec; / seconds / long ts_nsec; / and nanoseconds */ }; The correct names of the fields are tv_sec and tv_nsec. Reminded by: James Drobina <jdrobina@infinet.com>
17761	21-Aug-1996	dyson	Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg
17186	16-Jul-1996	dfr	Various fixes from frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca: 1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is correctly send to the server. If a buffer was dirtied when it was in the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left unchanged and when the buffer was later cleaned, just a commit rpc was made to the server to complete the previous write. Clearing B_NEEDCOMMIT ensures that another write is made to the server. 2. If a server returned a server (for whatever reason) returned an answer to a write RPC that implied that fewer bytes than requested were written, bad things would happen. 3. The setattr operation passed on the atime in stead of the mtime to the server. The fix is trivial. 4. XIDs always started at 0, but this caused some servers (older DEC OSF/1 3.0 so I've been told) who had very long-lasting XID caches to get confused if, after a reboot of a BSD client, RPCs came in with a XID that had in the past been used before from that client. Patch is to use the current time in seconds as a starting point for XIDs. The patch below is not perfect, because it requires the root fs to be mounted first. This is because of the check BSD systems do, comparing FS time to system time. Reviewed by: Bruce Evans, Terry Lambert. Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
17096	11-Jul-1996	wollman	Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
16634	23-Jun-1996	bde	Don't truncate minor or major numbers in the nfsv3 client.
16365	14-Jun-1996	phk	Fix for NFS_NOSERVER Poul mentioned that he thought this was some kind of timing problem, and that started me thinking. After a little poking around, I found that nfs_timer() was completely disabled when NFS_NOSERVER was #defined. But after looking at nfs_timer(), it seemed like it was something required by both the client and server code, and disabling it outright just didn't seem to make any sense. Parts of it relate only to the NFS server side code, so I disabled those, but I re-enabled the rest of the function and made sure that it would be called from nfs_init() (in nfs_subs.c). With nfs_timer() re-enabled, everything seems to work again. The only other changes I made were to #ifdef away some variable declarations in the NFS_NOSERVER case so that gcc would stop complaining about unused variables. Reviewed by: phk Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>
16312	12-Jun-1996	dg	Moved the fsnode MALLOC to before the call to getnewvnode() so that the process won't possibly block before filling in the fsnode pointer (v_data) which might be dereferenced during a sync since the vnode is put on the mnt_vnodelist by getnewvnode. Pointed out by Matt Day <mday@artisoft.com>
16192	08-Jun-1996	pst	Clear flags before using an inactive buffer. This is a kludge, but matches the code in bread(). Reviewed by: bde
15543	02-May-1996	phk	removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG Major macro cleanup.
15480	30-Apr-1996	bde	#include <sys/filedesc.h> explicitly instead of depending on it being bogusly included by <sys/socketvar.h>.
15479	30-Apr-1996	bde	Fixed nfs sysctls. They missed out on the fs -> vfs name changes from Lite2. This broke nfsstat.
14093	13-Feb-1996	wollman	Kill XNS. While we're at it, fix socreate() to take a process argument. (This was supposed to get committed days ago...)
13765	30-Jan-1996	mpp	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
13625	25-Jan-1996	bde	Fixed spelling of s_namlen so that this compiles again.
13619	24-Jan-1996	phk	Use new printf features rather than local kludges.
13612	24-Jan-1996	mpp	Add a check to prevent a computation from underflowing and causing a panic due to an attaempt to allocate a buffer for a terabyte or so of data when an attempt is made to create sparse data (e.g. a holey file) more than 1 block past the end of the file. Note: some other areas of this code need to be looked at, since they might cause problems when the file size exceeds 2GB, due to storing results in ints when the computations are being done with quad sized variables. Reviewed by: bde
13490	19-Jan-1996	dyson	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
13416	13-Jan-1996	phk	Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
13084	28-Dec-1995	phk	Don't print swap server as root server. Submitted by: Mattias.Gronlund@sa.erisoft.se (Mattias Gronlund)
12970	22-Dec-1995	phk	Move fs.nfs.nfsstats sysctl var back to it's old OID.
12911	17-Dec-1995	phk	Staticize.
12662	07-Dec-1995	dg	Untangled the vm.h include file spaghetti.
12588	03-Dec-1995	bde	Completed function declarations and/or added prototypes and/or moved prototypes to the right place.
12457	21-Nov-1995	bde	Completed function declarations, added prototypes and removed redundant declarations.
12453	21-Nov-1995	bde	Completed function declarations and/or added prototypes.
12287	14-Nov-1995	phk	Get rid of hostnamelen variable.
12274	14-Nov-1995	bde	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
12158	09-Nov-1995	bde	Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386. The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
12118	06-Nov-1995	bde	Replaced bogus macros for dummy devswitch entries by functions. These functions went away: enosys (hasn't been used for some time) enxio enodev enoioctl (was used only once, actually for a vop) if_tun.c: Continued cleaning up... conf.h: Probably fixed the type of d_reset_t. It is hard to tell the correct type because there are no non-dummy device reset functions. Removed last vestige of ambiguous sleep message strings.
11982	31-Oct-1995	joerg	Include a prerequisite header (so this is consistent again with the NFSv2 state).
11921	29-Oct-1995	phk	Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
11645	22-Oct-1995	dg	Fix order problem: unbusy pages before releasing the buffer. Submitted by: John Dyson <dyson>
11644	22-Oct-1995	dg	Moved the filesystem read-only check out of the syscalls and into the filesystem layer, as was done in lite-2. Merged in some other cosmetic changes while I was at it. Rewrote most of msdosfs_access() to be more like ufs_access() and to include the FS read-only check. Obtained from: partially from 4.4BSD-lite2
10551	04-Sep-1995	dyson	Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count for VOP_BMAP. Updated affected filesystems...
10478	30-Aug-1995	dfr	Make nfs diskless work again. Reviewed by: John Hay <jhay@mikom.csir.co.za>
10223	24-Aug-1995	dg	Killed redundant declarations of nfsm_rpchead().
10222	24-Aug-1995	dfr	Some fixes found using gcc -Wall: nfsm_rpchead() has been called with the wrong number of args and misplaced args since someone added new args in the middle for nfsv3. Here's another one that would be important on 64-bit systems. VOP_READDIR takes a `u_int **cookies' arg. Submitted by: Bruce Evans <bde@zeta.org.au>
10219	24-Aug-1995	dfr	Add support for amd direct maps. Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
10027	11-Aug-1995	dg	Converted mountlist to a CIRCLEQ. Partially obtained from: 4.4BSD-Lite2
9842	01-Aug-1995	dg	Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.
9759	29-Jul-1995	bde	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
9681	24-Jul-1995	dfr	Slightly better fix than previous revision. Submitted by: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
9679	24-Jul-1995	dfr	Fix a problem which appeared to truncate a file to the nearest block boundary when it is moved to an NFS filesystem from from another filesystem and /bin/mv failed to set the file ownership during the move. I believe that this bug is present in STABLE but I have not tested it. The fix would be the same in STABLE even though the code has changed quite considerably in CURRENT.
9627	22-Jul-1995	dg	Correct my cut-'n-paste job from ffs_vfsops.c and fix up the formatting to be similar. Submitted by: Bruce Evans
9604	21-Jul-1995	dg	Implemented an nfs_node hash list lock, similar to what was implemented in ffs_vget(), and for the same reason: to prevent a race condition that results in duplicate vnodes/NFSnodes being allocated.
9588	20-Jul-1995	dg	vnode_pager_alloc() never returns NULL, so don't check for it.
9522	13-Jul-1995	dfr	I believe that the following fix to nfs_vnops.c should do the trick w.r.t. the problem "when a file is truncated on the server after being written on a client under NFSv3, the client doesn't see the size drop to zero". (As you noted, the problem is that NMODIFIED wasn't being cleared by nfs_close when it flushed the buffers. After checking through the code, the only place where NMODIFIED was used to test for the possibility of dirty blocks was in nfs_setattr(). The two cases are safe to do when there aren't dirty blocks, so I just took out the tests. Unfortunately, testing for v_dirtyblkhd.lh_first being non-null is not sufficient, since there are times when the code moves blocks to the clean list and then back to the dirty list.) Submitted by: rick@snowhite.cis.uoguelph.ca
9507	13-Jul-1995	dg	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
9456	09-Jul-1995	dg	Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
9428	07-Jul-1995	dfr	Use a consistent blocksize for sizing bufs to avoid panicing the bio system.
9365	28-Jun-1995	dfr	Use the correct cred for nfs_commit operations.
9356	28-Jun-1995	dg	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
9354	28-Jun-1995	dg	Fixed VOP_LINK argument order botch.
9336	27-Jun-1995	dfr	Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol. Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd. NFS diskless support is untested. Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
9221	14-Jun-1995	joerg	The duplicate information returned in fa_type and fa_mode is an ambiguity in the NFS version 2 protocol. VREG should be taken literally as a regular file. If a server intents to return some type information differently in the upper bits of the mode field (e.g. for sockets, or FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we leave the examination of the mode bits even in the VREG case to avoid breakage for bogus servers, but we make sure that there are actually type bits set in the upper part of fa_mode (and failing that, trust the va_type field). NFSv3 cleared the issue, and requires fa_mode to not contain any type information (while also introduing sockets and FIFOs for fa_type). The fix has been tested against a variety of NFS servers. It fixes problems with the ``Tropic'' NFS server for Windows, while apparently not breaking anything. Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)
9202	11-Jun-1995	rgrimes	Merge RELENG_2_0_5 into HEAD
8876	30-May-1995	rgrimes	Remove trailing whitespace.
8832	29-May-1995	dg	Fixed some serious bugs that resulted in object reference counts not being handled correctly. This would manifest itself as "object deallocated too many times" panics and perhaps other strange inconsistencies on NFS servers. Reviewed by: me, of course Submitted by: John Dyson
8692	21-May-1995	dg	Changes to fix the following bugs: 1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping. Reviewed by: David Greenman Submitted by: John Dyson
8504	14-May-1995	dg	Changed swap partition handling/allocation so that it doesn't require specific partitions be mentioned in the kernel config file ("swap on foo" is now obsolete). From Poul-Henning: The visible effect is this: As default, unless options "NSWAPDEV=23" is in your config, you will have four swap-devices. You can swapon(2) any block device you feel like, it doesn't have to be in the kernel config. There is a performance/resource win available by getting the NSWAPDEV right (but only if you have just one swap-device ??), but using that as default would be too restrictive. The invisible effect is that: Swap-handling disappears from the $arch part of the kernel. It gets a lot simpler (-145 lines) and cleaner. Reviewed by: John Dyson, David Greenman Submitted by: Poul-Henning Kamp, with minor changes by me.
7969	21-Apr-1995	dyson	Slight re-ordering of the creation of a vmio object to fix a condition that can cause NFS I/O failures.
7871	16-Apr-1995	dg	Various fixes from John Dyson: 1) Rewrote screwy code that uses an incore buffer without making it busy. 2) Use B_CACHE instead of B_DONE in cases where it is appropriate. 3) Minor code optimization. This might fix kern/345 submitted by Heikki Suonsivu.
7275	23-Mar-1995	dg	Deleted bogus DIAGNOSTIC "nfs_fsync: dirty" message. This can and does happen normally when there is heavy write activity to a file since the vnode isn't locked (NFS plays fast and loose with vnode locks). This change "fixes" PR#267.
7095	16-Mar-1995	wollman	Add four more filesystem flags: VFCF_NETWORK (this FS goes over the net) VFCF_READONLY (read-write mounts do not make any sense) VFCF_SYNTHETIC (data in this FS is not real) VFCF_LOOPBACK (this FS aliases something else) cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr; procfs, kernfs, and fdesc are synthetic.
7090	16-Mar-1995	bde	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
6875	04-Mar-1995	dg	Removed obsolete vtrace() remnants.
6420	15-Feb-1995	phk	YF fix.
6417	15-Feb-1995	dg	Fixed two more bugs related to the merged cache changes. Submitted by: John Dyson
6361	14-Feb-1995	phk	YFfix +int nfsrv_vput __P(( struct vnode * )); +int nfsrv_vrele __P(( struct vnode * )); +int nfsrv_vmio __P(( struct vnode * ));
6210	06-Feb-1995	dg	Changed order of release of vnode/object to fix a problem where the vnode is freed with an old object still attached (subsequently causing a panic). Fixes NFS server panic "object/pager mismatch". Submitted by: John Dyson
6151	03-Feb-1995	dg	Fixed bmap run-length brokeness. Use bmap run-length extension when doing clustered paging. Submitted by: John Dyson
6148	03-Feb-1995	dg	Removed a pile of vfs_unbusy_pages()...both unnecessary and wrong - resulted in serious system instability. Changed a B_INVAL to a B_NOCACHE so that buffer data is properly disposed of. Submitted by: John Dyson, Rick Macklin, and ohki@gssm.otsuka.tsukuba.ac.jp
5471	10-Jan-1995	dg	Added two missing brelse() calls. Submitted by: rick@snowhite.cis.uoguelph.ca
5455	09-Jan-1995	dg	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
5353	03-Jan-1995	jkh	From Gene Stark: If nd->swap_nblks is zero in nfs_mountroot(), then the system comes up without initializing swapdev_vp to an actual vnode pointer. The swap pager assumes a non-NULL value for swapdev_vp. The fix is to try initializing local swap if no NFS swap space is specified.
5013	08-Dec-1994	phk	Would you please correct nfs/nfs_vfsops.c so that the ip address of the root filesystem is printed out correctly? It's line 299 in nfs/nfs_vfsops.c. Reviewed by: phk Submitted by: Luigi Rizzo luigi@iet.unipi.it
4067	02-Nov-1994	wollman	Forward-declare a few structures to avoid warning messages.
3820	23-Oct-1994	wollman	Implement fs.nfs MIB variables.
3797	22-Oct-1994	phk	This is where the action is. I'm still not sure that swap is 100% OK, but it seems to work.
3664	17-Oct-1994	phk	This is a bunch of changes from NetBSD. There are a couple of bug-fixes. But mostly it is changes to use the list-maintenance macros instead of doing the pointer-gymnastics by hand. Obtained from: NetBSD
3451	09-Oct-1994	dg	Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
3396	06-Oct-1994	dg	Use tsleep() rather than sleep so that 'ps' is more informative about the wait.
3305	02-Oct-1994	phk	Prototyping and general gcc-shutting up. Gcc has one warning now which looks bad, I will get to it eventually, unless somebody beats me to it.
2997	22-Sep-1994	wollman	Make NFS loadable.
2979	22-Sep-1994	wollman	More loadable VFS changes: - Make a number of filesystems work again when they are statically compiled (blush) - FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
2946	21-Sep-1994	wollman	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
2384	29-Aug-1994	dg	"bogus" fixes from 1.1.5 to work around some cache coherency problems.
2175	21-Aug-1994	paul	More idempotency....... this is fun :-)
2152	20-Aug-1994	dg	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
2112	18-Aug-1994	wollman	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
2010	10-Aug-1994	dg	Initialize lockf pointer. I missed this when I made NFS use the generic advlock mechanism, and not doing so results in random system crashes.
1979	09-Aug-1994	dg	Removed some padding bytes from the nfsnode struct to make the structure size a power of 2 again. The system complains otherwise - probably because it wastes space with our malloc scheme otherwise.
1960	08-Aug-1994	dg	Made lockf advisory locking code generic (rather than ufs specific), and use it in NFS. This is required both for diskless support and for POSIX compliance. Note: the support in NFS is only for the local node. Submitted by: based on work originally done by Yuval Yurom
1937	08-Aug-1994	dg	Changed B_AGE policy to work correctly in a world with relatively large buffer caches. The old policy generally ended up caching nothing.
1858	05-Aug-1994	dg	Converted 'vmunix' to 'kernel'.
1828	04-Aug-1994	dg	Made NFS attribute cache timeouts kernel config file tunable via NFS_MINATTRTIMO and NFS_MAXATTRTIMO.
1817	02-Aug-1994	dg	Added $Id$
1549	25-May-1994	rgrimes	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources