Cross Reference: /freebsd-9.3-release/sys/fs/nfsclient/

History log of /freebsd-9.3-release/sys/fs/nfsclient/
Revision	Date	Author	Comments
267654	20-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
265721	09-May-2014	rmacklem	MFC: r264842 Modify the NFSv4 client's Pathconf RPC (actually a Getattr Op.) so that it only does the RPC for names that are answered by the RPC. Doing the RPC for other names is harmless, but unnecessary.
265390	05-May-2014	rmacklem	MFC: r264738 For an NFSv4 mount with the "nocto" option, don't get the up to date file attributes upon close. This reduces the Getattr RPC count by about 65% for software builds.
265389	05-May-2014	rmacklem	MFC: r264705, r264749 Modify the NFSv4 client create/mkdir RPC so that it acquires post-create/mkdir directory attributes. This allows the RPC to name cache the newly created directory and reduces the lookup RPC count for applications creating a lot of directories.
265340	05-May-2014	rmacklem	MFC: r264681 Modify the NFSv4 client open/create RPC so that it acquires post-open/create directory attributes. This allows the RPC to name cache the newly created file and reduces the lookup RPC count by about 10% for software builds.
265339	05-May-2014	rmacklem	MFC: r264672 Modify the Lookup RPC for NFSv4 so that it acquires directory attributes. This allows the client to cache directory names when they are looked up, reducing the Lookup RPC count by about 40% for software builds.
260172	01-Jan-2014	rmacklem	MFC: r259801 The NFSv4 client was passing both the p and cred arguments to nfsv4_fillattr() as NULLs for the Getattr callback. This caused nfsv4_fillattr() to not fill in the Change attribute for the reply. I believe this was a violation of the RFC, but had little effect on server behaviour. This patch passes a non-NULL p argument to fix this.
260170	01-Jan-2014	rmacklem	MFC: r259084 For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases.
255769	21-Sep-2013	rmacklem	MFC: r255216 Crashes have been observed for NFSv4.1 mounts when the system is being shut down which were caused by the nfscbd_pool being destroyed before the backchannel is disabled. This patch is believed to fix the problem, by simply avoiding ever destroying the nfscbd_pool. Since the NFS client module cannot be unloaded, this should not cause a memory leak.
253124	10-Jul-2013	rmacklem	MFC: r252528 A problem with the old NFS client where large writes to large files would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 (actually pre-r248567, r248581, r248567 for the new client) such that the call to vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption. A better solution might be to replace the mutex with a sleep lock, but that is a non-trivial conversion, so this fix is hoped to be sufficient in the meantime. Tested by: remy.nonnenmacher@activnetworks.com
252739	05-Jul-2013	rmacklem	MFC: r252067 Since some NFSv4 servers enforce the requirement for a reserved port#, enable use of the (no)resvport mount option for NFSv4. I had thought that the RFC required that non-reserved port #s be allowed, but I couldn't find it in the RFC.
251641	11-Jun-2013	ken	MFC NFS FHA changes 249592 and 249596: ------------------------------------------------------------------------ r249592 \| ken \| 2013-04-17 15:00:22 -0600 (Wed, 17 Apr 2013) \| 180 lines Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic ------------------------------------------------------------------------ r249596 \| ken \| 2013-04-17 16:42:43 -0600 (Wed, 17 Apr 2013) \| 7 lines Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic ------------------------------------------------------------------------ Sponsored by: Spectra Logic
251395	04-Jun-2013	rmacklem	MFC: r251079 Post-r248567, there were times when the client would return a truncated directory for some NFS servers. This turned out to be because the size of a directory reported by an NFS server can be smaller that the ufs-like directory created from the RPC XDR in the client. This patch fixes the problem by changing r248567 so that vnode_pager_setsize() is only done for regular files.
251147	30-May-2013	jhb	MFC 246417,247116,248584: Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Instead of setting PBDRY on individual sleeps, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. - Ignore thread suspend requests due to SIGSTOP if stop signals are currently deferred. This can occur if a process is stopped via SIGSTOP while a thread is running or runnable but before it has set TDF_SBDRY.
250996	26-May-2013	rmacklem	MFC: r250580 Add support for the eofflag to nfs_readdir() in the new NFS client so that it works under a unionfs mount.
250258	04-May-2013	rmacklem	MFC: r249630 When an NFS unmount occurs, once vflush() writes the last dirty buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread.
250257	04-May-2013	rmacklem	MFC: r249623 Both NFS clients can deadlock when using the "rdirplus" mount option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories.
249078	04-Apr-2013	kib	MFC r248567: Do not call vnode_pager_setsize() while a NFS node mutex is locked. vnode_pager_setsize() might sleep waiting for the page after EOF be unbusied. Call vnode_pager_setsize() both for the regular and directory vnodes. MFC r248581: Initialize the variable to avoid (false) compiler warning about use of an uninitialized local.
249077	04-Apr-2013	kib	MFC r248967: Strip the unnneeded spaces, mostly at the end of lines.
247502	28-Feb-2013	jhb	MFC 245508,245566,245568,245611,245909: Various fixes to timestamps in NFS: - Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. - Remove unused nfs_curusec(). - Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime().
246285	03-Feb-2013	kib	MFC r245977: Be conservative and do not try to consume more bytes than was requested from the server for the read operation.
244289	16-Dec-2012	rmacklem	MFC: r243782 Add an nfssvc() option to the kernel for the new NFS client which dumps out the actual options being used by an NFS mount. This will be used to add a "-m" option to nfsstat(1).
243962	07-Dec-2012	kib	MFC r243142: In pget(9), if PGET_NOTWEXIT flag is not specified, also search the zombie list for the pid. This allows several kern.proc sysctls to report useful information for zombies. Hold the allproc_lock around all searches instead of relocking it. Remove private pfind_locked() from the new nfs client code. MFC r243528 (by pjd): Look for zombie process only if we were given process id.
241194	04-Oct-2012	rmacklem	MFC: r240720 Modify the NFSv4 client so that it can handle owner and owner_group strings that consist entirely of digits, interpreting them as the uid/gid number. This change was needed since new (>= 3.3) Linux servers reply with these strings by default. This change is mandated by the rfc3530bis draft. Reported on freebsd-stable@ under the Subject heading "Problem with Linux >= 3.3 as NFSv4 server" by Norbert Aschendorff on Aug. 20, 2012.
240977	26-Sep-2012	rmacklem	MFC: r240289 Add a simple printf() based debug facility to the new nfs client. Use it for a printf() that can be harmlessly generated for mmap()'d files. It will be used extensively for the NFSv4.1 client. Debugging printf()s are enabled by setting vfs.nfs.debuglevel to a non-zero value. The higher the value, the more debugging printf()s.
240238	08-Sep-2012	kib	MFC r239065: Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
239852	29-Aug-2012	kib	MFC r237367: Enable deadlock avoidance code for NFS client.
239848	29-Aug-2012	kib	MFC r237987: Do not override an error from uiomove() with (non-)error result from bwrite().
239845	29-Aug-2012	kib	MFC r236687: Improve handling of uiomove(9) errors for the NFS client.
239554	22-Aug-2012	kib	MFC r239040: Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). MFC r239246: Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success.
237543	25-Jun-2012	rmacklem	MFC: r237244 Fix the NFSv4 client for the case where mmap'd files are written, but not msync'd by a process. A VOP_PUTPAGES() called when VOP_RECLAIM() happens will usually fail, since the NFSv4 Open has already been closed by VOP_INACTIVE(). Add a vm_object_page_clean() call to the NFSv4 client's VOP_INACTIVE(), so that the write happens before the NFSv4 Open is closed. kib@ suggested using vgone() instead and I will explore this, but this patch fixes things in the meantime. For some reason, the VOP_PUTPAGES() is still attaempted in VOP_RECLAIM(), but having this fail doesn't cause any problems except a "stateid0 in write" being logged.
237534	24-Jun-2012	rmacklem	MFC: r237200 Move the nfsrpc_close() call in ncl_reclaim() for the NFSv4 client to below the vnode_destroy_vobject() call, since that is where writes are flushed.
236446	02-Jun-2012	kib	MFC r236313: Capitalize start of sentence.
236096	26-May-2012	rmacklem	MFC: r235332 PR# 165923 reported intermittent write failures for dirty memory mapped pages being written back on an NFS mount. Since any thread can call VOP_PUTPAGES() to write back a dirty page, the credentials of that thread may not have write access to the file on an NFS server. (Often the uid is 0, which may be mapped to "nobody" in the NFS server.) Although there is no completely correct fix for this (NFS servers check access on every write RPC instead of at open/mmap time), this patch avoids the common cases by holding onto a credential that recently opened the file for writing and uses that credential for the write RPCs being done by VOP_PUTPAGES() for both NFS clients.
235626	18-May-2012	mckusick	MFC of 234386, 234400, 234441, 234443, 234482, 234483, 235052, 235241, 235246, and 235619 MFC: 234386 Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 234400 Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks MFC: 234441 Fix a memory leak of M_VNODE_MARKER introduced in 234386. Found by: Peter Holm MFC: 234443 Delete a no longer useful VNASSERT missed during changes in 234400. Suggested by: kib MFC: 234482 This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 234483 This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 235052 (by pluknet) Fix mount mutex handling missed in r234386. MFC: 235241 (by pluknet) Fix mount interlock oversights from the previous change in r234386. Reported by: dougb Submitted by: Mateusz Guzik <mjguzik at gmail com> Reviewed by: Kirk McKusick Tested by: pho MFC: 235246 Fix mount mutex handling missed in r234386. MFC: 235619 Update comment to document that the vnode free-list mutex needs to be held when updating mnt_activevnodelist and mnt_activevnodelistsize.
235397	13-May-2012	rmacklem	MFC: r234742 It was reported via email that some non-FreeBSD NFS servers do not include file attributes in the reply to an NFS create RPC under certain circumstances. This resulted in a vnode of type VNON that was not usable. This patch adds an NFS getattr RPC to nfs_create() for this case, to fix the problem. It was tested by the person that reported the problem and confirmed to fix this case for their server.
233730	31-Mar-2012	kib	MFC r233101: Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client behaviour on error from write RPC back to behaviour of old nfs client. When set to not zero, the pages for which write failed are kept dirty. PR: kern/165927
233353	23-Mar-2012	kib	MFC r231949: Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. MFC r232493: Remove unneeded cast to u_int. The values as small enough to fit into int, beside the use of MIN macro which performs type promotions. MFC r232494: Instead of incomplete handling of read(2)/write(2) return values that does not fit into registers, declare that we do not support this case using CTASSERT(), and remove endianess-unsafe code to split return value into td_retval. While there, change the style of the sysctl debug.iosize_max_clamp definition. MFC r232495: pipe_read(): change the type of size to int, and remove signed clamp. pipe_write(): change the type of desiredsize back to int, its value fits.
233326	22-Mar-2012	jhb	MFC 230547: Add a timeout on positive name cache entries in the NFS client. That is, we will only trust a positive name cache entry for a specified amount of time before falling back to a LOOKUP RPC, even if the ctime for the file handle matches the cached copy in the name cache entry. The timeout is configured via a new 'nametimeo' mount option and defaults to 60 seconds. It may be set to zero to disable positive name caching entirely.
233285	21-Mar-2012	jhb	MFC 230394,230441,230489,230552,232420: Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently.
232682	08-Mar-2012	rmacklem	MFC: r232327 Fix the NFS clients so that they use copyin() instead of bcopy(), when doing direct I/O. This direct I/O code is not enabled by default.
232292	29-Feb-2012	bz	MFC r231852,232127: Merge multi-FIB IPv6 support. Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc.
231936	20-Feb-2012	kib	MFC r231075: Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. MFC r231204: Unbreak detection of the async mode for clustered writes after r231075.
231636	14-Feb-2012	rmacklem	MFC: r230803 When a "mount -u" switches an NFS mount point from TCP to UDP, any thread doing an I/O RPC with a transfer size greater than NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC. After a discussion on freebsd-fs@, I decided to add a warning message for this case, as suggested by Jeremy Chadwick.
231545	12-Feb-2012	rmacklem	MFC: r231133 r228827 fixed a problem where copying of NFSv4 open credentials into a credential structure would corrupt it. This happened when the p argument was != NULL. However, I now realize that the copying of open credentials should only happen for p == NULL, since that indicates that it is a read-ahead or write-behind. This patch fixes this. After this commit, r228827 could be reverted, but I think the code is clearer and safer with the patch, so I am going to leave it in. Without this patch, it was possible that a NFSv4 VOP_SETATTR() could have changed the credentials of the caller. This would have happened if the process doing the VOP_SETATTR() did not have the file open, but some other process running as a different uid had the file open for writing at the same time.
231330	10-Feb-2012	rmacklem	MFC: r230605 A problem with respect to data read through the buffer cache for both NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes.
230725	29-Jan-2012	mckusick	MFC r230249: Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC r230250: There are several bugs/hangs when trying to take a snapshot on a UFS/FFS filesystem running with journaled soft updates. Until these problems have been tracked down, return ENOTSUPP when an attempt is made to take a snapshot on a filesystem running with journaled soft updates.
230446	22-Jan-2012	rmacklem	MFC: r229802 opt_inet6.h was missing from some files in the new NFS subsystem. The effect of this was, for clients mounted via inet6 addresses, that the DRC cache would never have a hit in the server. It also broke NFSv4 callbacks when an inet6 address was the only one available in the client. This patch fixes the above, plus deletes opt_inet6.h from a couple of files it is not needed for.
229953	11-Jan-2012	rmacklem	MFC: r228827 During investigation of an NFSv4 client crash reported by glebius@, jhb@ spotted that nfscl_getstateid() might modify credentials when called from nfsrpc_read() for the case where p != NULL, whereas nfsrpc_read() only did a crdup() to get new credentials for p == NULL. This bug was introduced by r195510, since pre-r195510 nfscl_getstateid() only modified credentials for the p == NULL case. This patch modifies nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case. It is conceivable that this bug caused the crash reported by glebius@, but that will not be determined for some time, since the crash occurred after about 1month of operation.
229752	07-Jan-2012	rmacklem	MFC: r228217 Post r223774, the NFSv4 client no longer has multiple instances of the same lock_owner4 string. As such, the handling of cleanup of lock_owners could be simplified. This simplification permitted the client to do a ReleaseLockOwner operation when the process that the lock_owner4 string represents, has exited. This permits the server to release any storage related to the lock_owner4 string before the associated open is closed. Without this change, it is possible to exhaust a server's storage when a long running process opens a file and then many child processes do locking on the file, because the open doesn't get closed. A similar patch was applied to the Linux NFSv4 client recently so that it wouldn't exhaust a server's storage.
229678	06-Jan-2012	rmacklem	MFC: r227796 Clean up some cruft in the NFSv4 client left over from the OpenBSD port, so that it is more readable. No logic change is made by this commit.
229674	06-Jan-2012	rmacklem	MFC: r227760 Add two arguments to the nfsrpc_rellockown() function in the NFSv4 client. This does not change the client's behaviour, but prepares the code so that nfsrpc_rellockown() can be called elsewhere in a future commit.
229604	05-Jan-2012	jhb	MFC 227507: Finish making 'wcommitsize' an NFS client mount option.
229599	05-Jan-2012	rmacklem	MFC: r227744 Since the nfscl_cleanup() function isn't used by the FreeBSD NFSv4 client, delete the code and fix up the related comments. This should not have any functional effect on the client.
229548	05-Jan-2012	rmacklem	MFC: r227743 Post r223774 the NFSv4 client never uses the linked list with the head nfsc_defunctlockowner. This patch simply removes the code that loops through this always empty list, since the code no longer does anything useful. It should not have any effect on the client's behaviour.
229287	02-Jan-2012	rmacklem	MFC: r227543 Modify the new NFS client so that nfs_fsync() only calls ncl_flush() for regular files. Since other file types don't write into the buffer cache, calling ncl_flush() is almost a no-op. However, it does clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for directories.
229263	02-Jan-2012	rmacklem	MFC: r227517 Move the setting of the default value for nm_wcommitsize to before the nfs_decode_args() call in the new NFS client, so that a specfied command line value won't be overwritten. Also, modify the calculation for small values of desiredvnodes to avoid an unusually large value or a divide by zero crash. It seems that the default value for nm_wcommitsize is very conservative and may need to change at some time.
229258	02-Jan-2012	rmacklem	MFC: r227494 Since NFSv4 byte range locking only works for regular files, add a sanity check for the vnode type to the NFSv4 client.
229172	01-Jan-2012	rmacklem	MFC: r227493 Move the assignment of default values for some mount options to before the nfs_decode_args() call in the new NFS client, so they don't overwrite the value specified on the command line.
225736	23-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
224778	11-Aug-2011	rwatson	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
224606	02-Aug-2011	rmacklem	Fix a LOR in the NFS client which could cause a deadlock. This was reported to the mailing list freebsd-net@freebsd.org on July 21, 2011 under the subject "LOR with nfsclient sillyrename". The LOR occurred when nfs_inactive() called vrele(sp->s_dvp) while holding the vnode lock on the file in s_dvp. This patch modifies the client so that it performs the vrele(sp->s_dvp) as a separate task to avoid the LOR. This fix was discussed with jhb@ and kib@, who both proposed variations of it. Tested by: pho, jlott at averesystems.com Submitted by: jhb (earlier version) Reviewed by: kib Approved by: re (kib) MFC after: 2 weeks
224532	30-Jul-2011	rmacklem	The new NFS client failed to vput() the new vnode if a setattr failed after the file was created in nfs_create(). This would probably only happen during a forced dismount. The old NFS client does have a vput() for this case. Detected by pho during recent testing, where an open syscall returned with a vnode still locked. Tested by: pho Approved by: re (kib) MFC after: 2 weeks
224083	16-Jul-2011	zack	Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224082	16-Jul-2011	zack	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224081	16-Jul-2011	zack	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
223971	13-Jul-2011	rmacklem	r222389 introduced a case where the NFSv4 client could loop in nfscl_getcl() when a forced dismount is in progress, because nfsv4_lock() will return 0 without sleeping when MNTK_UNMOUNTF is set. This patch fixes it so it won't loop calling nfsv4_lock() for this case. MFC after: 2 weeks
223774	04-Jul-2011	rmacklem	The algorithm used by nfscl_getopen() could have resulted in multiple instances of the same lock_owner when a process both inherited an open file descriptor plus opened the same file itself. Since some NFSv4 servers cannot handle multiple instances of the same lock_owner string, this patch changes the algorithm used by nfscl_getopen() in the new NFSv4 client to keep that from happening. The new algorithm is simpler, since there is no longer any need to ascend the process's parentage tree because all NFSv4 Closes for a file are done at VOP_INACTIVE()/VOP_RECLAIM(), making the Opens indistinct w.r.t. use with Lock Ops. This problem was discovered at the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
223747	03-Jul-2011	rmacklem	Modify the new NFSv4 client so that it appends a file handle to the lock_owner4 string that goes on the wire. Also, add code to do a ReleaseLockOwner Op on the lock_owner4 string before a Close. Apparently not all NFSv4 servers handle multiple instances of the same lock_owner4 string, at least not in a compatible way. This patch avoids having multiple instances, except for one unusual case, which will be fixed by a future commit. Found at the recent NFSv4 interoperability Bakeathon. Tested by: tdh at excfb.com MFC after: 2 weeks
223657	28-Jun-2011	rmacklem	Fix the new NFSv4 client so that it doesn't fill the cached mode attribute in as 0 when doing writes. The change adds the Mode attribute plus the others except Owner and Owner_group to the list requested by the NFSv4 Write Operation. This fixed a problem where an executable file built by "cc" would get mode 0111 instead of 0755 for some NFSv4 servers. Found at the recent NFSv4 interoperability Bakeathon. Tested by: tdh at excfb.com MFC after: 2 weeks
223309	19-Jun-2011	rmacklem	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks
223280	18-Jun-2011	rmacklem	Add DTrace support to the new NFS client. This is essentially cloned from the old NFS client, plus additions for NFSv4. A review of this code is in progress, however it was felt by the reviewer that it could go in now, before code slush. Any changes required by the review can be committed as bug fixes later.
222722	05-Jun-2011	rmacklem	Add support for flock(2) locks to the new NFSv4 client. I think this should be ok, since the client now delays NFSv4 Close operations until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no risk that the NFSv4 Open is closed while an associated byte range lock still exists. Tested by: avg MFC after: 2 weeks
222719	05-Jun-2011	rmacklem	The new NFSv4 client was erroneously using "p" instead of "p_leader" for the "id" for POSIX byte range locking. I think this would only have affected processes created by rfork(2) with the RFTHREAD flag specified. This patch fixes that by passing the "id" down through the various functions from nfs_advlock(). MFC after: 2 weeks
222718	05-Jun-2011	rmacklem	Fix the new NFSv4 client so that it doesn't crash when a mount is done for a VIMAGE kernel. Tested by: glz at hidden-powers dot com Reviewed by: bz MFC after: 2 weeks
222586	01-Jun-2011	kib	In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean(). VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost. The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes. Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week
222540	31-May-2011	rmacklem	Fix the new NFS client so that it doesn't do an NFSv3 Pathconf RPC for cases where the reply doesn't include the answer. This fixes a problem reported by avg@ where the NFSv3 Pathconf RPC would fail when "ls -l" did an lpathconf(2) for _PC_ACL_NFS4. Tested by: avg MFC after: 2 weeks
222389	27-May-2011	rmacklem	Fix the new NFS client so that it handles NFSv4 state correctly during a forced dismount. This required that the exclusive and shared (refcnt) sleep lock functions check for MNTK_UMOUNTF before sleeping, so that they won't block while nfscl_umount() is getting rid of the state. As such, a "struct mount *" argument was added to the locking functions. I believe the only remaining case where a forced dismount can get hung in the kernel is when a thread is already attempting to do a TCP connect to a dead server when the krpc client structure called nr_client is NULL. This will only happen just after a "mount -u" with options that force a new TCP connection is done, so it shouldn't be a problem in practice. MFC after: 2 weeks
222329	26-May-2011	rmacklem	Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync() in the new NFS client so that a forced dismount doesn't get stuck in the VFS_SYNC() call that happens before VFS_UNMOUNT() in dounmount(). Additional changes are needed before forced dismounts will work. MFC after: 2 weeks
222291	25-May-2011	rmacklem	Add some missing mutex locking to the new NFS client. MFC after: 2 weeks
222289	25-May-2011	rmacklem	Fix the new NFS client so that it correctly sets the "must_commit" argument for a write RPC when it succeeds for the first one and fails for a subsequent RPC within the same call to the function. This makes it compatible with the old NFS client for this case. MFC after: 2 weeks
222233	23-May-2011	rmacklem	Set the MNT_NFS4ACLS flag for an NFSv4 client mount if the NFSv4 server supports it. Requested by trasz. MFC after: 2 weeks
222187	22-May-2011	alc	Eliminate duplicate #include's.
222075	18-May-2011	rmacklem	Add a sanity check for the existence of an "addr" option to both NFS clients. This avoids the crash reported by Sergey Kandaurov (pluknet@gmail.com) to the freebsd-fs@ list with subject "[old nfsclient] different nmount() args passed from mount vs mount_nfs" dated May 17, 2011. Tested by: pluknet at gmail.com (old nfs client) MFC after: 2 weeks
221973	15-May-2011	rmacklem	Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.
221537	06-May-2011	rmacklem	Set the initial value of maxfilesize to OFF_MAX in the new NFS client. It will then be reduced to whatever the server says it can support. There might be an argument that this could be one block larger, but since NFS is a byte granular system, I chose not to do that. Suggested by: Matt Dillon Tested by: Daniel Braniss (earlier version) MFC after: 2 weeks
221467	05-May-2011	rmacklem	Fix the new NFS client so that it handles the 64bit fields that are now in "struct statfs" for NFSv3 and NFSv4. Since the ffiles value is uint64_t on the wire, I clip the value to INT64_MAX to avoid setting f_ffree negative. Tested by: kib MFC after: 2 weeks
221436	04-May-2011	ru	Implemented a mount option "nocto" that disables cache coherency checking at open time. It may improve performance for read-only NFS mounts. Use deliberately. MFC after: 1 week Reviewed by: rmacklem, jhb (earlier version)
221429	04-May-2011	ru	In ncl_printf(), call vprintf() instead of printf(). MFC after: 3 days
221205	29-Apr-2011	rmacklem	The build was broken by r221190 for 64bit arches like amd64. This patch fixes it. MFC after: 2 weeks
221190	28-Apr-2011	rmacklem	Fix the new NFS client so that it handles the "nfs_args" value in mnt_optnew. This is needed so that the old mount(2) syscall works and that is needed so that amd(8) works. The code was basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor changes. This patch is mainly to fix the new NFS client so that amd(8) works with it. Thanks go to Craig Rodrigues for helping with this. Tested by: Craig Rodrigues (for amd) MFC after: 2 weeks
221139	27-Apr-2011	rmacklem	Fix module names and dependencies so the NFS clients will load correctly as modules after r221124.
221124	27-Apr-2011	rmacklem	This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.
221066	26-Apr-2011	rmacklem	Fix a kernel linking problem introduced by r221032, r221040 when building kernels that don't have "options NFS_ROOT" specified. I plan on moving the functions that use these data structures into the shared code in sys/nfs/nfs_diskless.c in a future commit. At that time, these definitions will no longer be needed in nfs_vfsops.c and nfs_clvfsops.c. MFC after: 2 weeks
221040	25-Apr-2011	rmacklem	Modify the experimental (newnfs) NFS client so that it uses the same diskless NFS root code as the regular client, which was moved to sys/nfs by r221032. This fixes the newnfs client so that it can do an NFSv3 diskless root file system. MFC after: 2 weeks
221018	25-Apr-2011	rmacklem	Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. MFC after: 2 weeks
221014	25-Apr-2011	rmacklem	Modify the experimental NFS client so that it uses the same "struct nfs_args" as the regular NFS client. This is needed so that the old mount(2) syscall will work and it makes sharing of the diskless NFS root code easier. Eary in the porting exercise I introduced a new revision of nfs_args, but didn't actually need it, thanks to nmount(2). I re-introduced the NFSMNT_KERB flag, since it does essentially the same thing and the old one would not have been used because it never worked. I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h that are used by the experimental NFS client. MFC after: 2 weeks
220928	21-Apr-2011	rmacklem	Remove the nm_mtx mutex locking from the test for nm_maxfilesize. This value rarely, if ever, changes and the nm_mtx mutex is locked/unlocked earlier in the function, which should be sufficient to avoid getting a stale cached value for it. There is a discussion w.r.t. what these tests should be, but I've left them basically the same as the regular NFS client for now. Suggested by: pjd MFC after: 2 weeks
220921	21-Apr-2011	rmacklem	Revert r220906, since the vp isn't always locked when nfscl_request() is called. It will need a more involved patch.
220906	20-Apr-2011	rmacklem	Add a check for VI_DOOMED at the beginning of nfscl_request() so that it won't try and use vp->v_mount to do an RPC during a forced dismount. There needs to be at least one more kernel commit, plus a change to the umount(8) command before forced dismounts will work for the experimental NFS client. MFC after: 2 weeks
220877	20-Apr-2011	rmacklem	Modify the offset + size checks for read and write in the experimental NFS client to take care of overflows for the calls above the buffer cache layer in a manner similar to r220876. Thanks go to dillon at apollo.backplane.com for providing the snippet of code that does this. MFC after: 2 weeks
220876	20-Apr-2011	rmacklem	Modify the offset + size checks for read and write in the experimental NFS client to take care of overflows. Thanks go to dillon at apollo.backplane.com for providing the snippet of code that does this. MFC after: 2 weeks
220810	19-Apr-2011	rmacklem	Fix up handling of the nfsmount structure in read and write within the experimental NFS client. Mostly add mutex locking and use the same rsize, wsize during the operation by keeping a local copy of it. This is another change that brings it closer to the regular NFS client. MFC after: 2 weeks
220807	18-Apr-2011	rmacklem	Revert r220761 since, as kib@ pointed out, the case of adding the check to nfsrpc_close() isn't useful. Also, the check in nfscl_getcl() must be more involved, since it needs to check before and after the acquisition of the refcnt on nfsc_lock, while the mutex that protects the client state data is held.
220764	18-Apr-2011	rmacklem	Add a vput() to nfs_lookitup() in the experimental NFS client for a case that will probably never happen. It can only happen if a server were to successfully lookup a file, but not return attributes for that file. Although technically allowed by the NFSv3 RFC, I doubt any server would ever do this. However, if it did, the client would have not vput()'d the new vnode when it needed to do so. MFC after: 2 weeks
220763	18-Apr-2011	rmacklem	Add vput() calls in two places in the experimental NFS client that would be needed if, in the future, nfscl_loadattrcache() were to return an error. Currently nfscl_loadattrcache() never returns an error, so these cases never currently happen. MFC after: 2 weeks
220762	18-Apr-2011	rmacklem	Change the mutex locking for several locations in the experimental NFS client's vnode op functions to make them compatible with the regular NFS client. I'll admit I'm not sure that the mutex locks around the assignments are needed, but the regular client has them, so I added them. Also, add handling of the case of partial attributes in setattr to be compatible with the regular client. MFC after: 2 weeks
220761	17-Apr-2011	rmacklem	Add checks for MNTK_UNMOUNTF at the beginning of three functions, so that threads don't get stuck in them during a forced dismount. nfs_sync/VFS_SYNC() needs this, since it is called by dounmount() before VFS_UNMOUNT(). The nfscl_nget() case makes sure that a thread doing an VOP_OPEN() or VOP_ADVLOCK() call doesn't get blocked before attempting the RPC. Attempting RPCs don't block, since they all fail once a forced dismount is in progress. The third one at the beginning of nfsrpc_close() is done so threads don't get blocked while doing VOP_INACTIVE() as the vnodes are cleared out. With these three changes plus a change to the umount(1) command so that it doesn't do "sync()" for the forced case seem to make forced dismounts work for the experimental NFS client. MFC after: 2 weeks
220751	17-Apr-2011	rmacklem	Fix up some of the sysctls for the experimental NFS client so that they use the same names as the regular client. Also add string descriptions for them. MFC after: 2 weeks
220739	17-Apr-2011	rmacklem	Change some defaults in the experimental NFS client to be the same as the regular NFS client for NFSv3. The main one is making use of a reserved port# the default. Also, set the retry limit for TCP the same and fix the code so that it doesn't disable readdirplus for NFSv4. MFC after: 2 weeks
220735	17-Apr-2011	rmacklem	Fix readdirplus in the experimental NFS client so that it skips over ".." to avoid a LOR race with nfs_lookup(). This fix is analagous to r138256 in the regular NFS client. MFC after: 2 weeks
220732	16-Apr-2011	rmacklem	Add a lktype flags argument to nfscl_nget() and ncl_nget() in the experimental NFS client so that its nfs_lookup() function can use cn_lkflags in a manner analagous to the regular NFS client. MFC after: 2 weeks
220731	16-Apr-2011	rmacklem	Add mutex locking on the nfs node in ncl_inactive() for the experimental NFS client. MFC after: 2 weeks
220683	15-Apr-2011	rmacklem	Change the experimental NFS client so that it creates nfsiod threads in the same manner as the regular NFS client after r214026 was committed. This resolves the lors fixed by r214026 and its predecessors for the regular client. Reviewed by: jhb MFC after: 2 weeks
220648	14-Apr-2011	rmacklem	Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks
220645	14-Apr-2011	rmacklem	Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks
220611	13-Apr-2011	rmacklem	Add VOP_PATHCONF() support to the experimental NFS client so that it can, along with other things, report whether or not NFS4 ACLs are supported. MFC after: 2 weeks
220610	13-Apr-2011	rmacklem	Fix the experimental NFSv4 client so that it recognizes server mount point crossings correctly. It was testing the wrong flag. Also, try harder to make sure that the fsid is different than the one assigned to the client mount point, by hashing the server's fsid (just to create a different value deterministically) when it is the same. MFC after: 2 weeks
220152	30-Mar-2011	zack	This patch fixes the Experimental NFS client to properly deal with 32 bit or 64 bit fileid's in NFSv2 and NFSv3. Without this fix, invalid casting (and sign extension) was creating problems for any fileid greater than 2^31. We discovered this because we have test clusters with more than 2 billion allocated files and 64-bit ino_t's (and friend structures). Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
219968	24-Mar-2011	jhb	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week
219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
218757	16-Feb-2011	bz	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
216931	03-Jan-2011	rmacklem	Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client. Reviewed by: jhb MFC after: 2 weeks
215548	19-Nov-2010	kib	Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
214513	29-Oct-2010	rmacklem	Modify nfs_open() in the experimental NFS client to be compatible with the regular NFS client. Also, fix a couple of mutex lock issues. MFC after: 1 week
214511	29-Oct-2010	rmacklem	Add a call for nfsrpc_close() to ncl_reclaim() in the experimental NFSv4 client, since the call in ncl_inactive() might be missed because VOP_INACTIVE() is not guaranteed to be called before VOP_RECLAIM(). MFC after: 1 week
214406	26-Oct-2010	rmacklem	Add a flag to the experimental NFSv4 client to indicate when delegations are being returned for reasons other than a Recall. Also, re-organize nfscl_recalldeleg() slightly, so that it leaves clearing NMODIFIED to the ncl_flush() call and invalidates the attribute cache after flushing. It is hoped that these changes might fix the problem others have seen when using the NFSv4 client with delegations enabled, since I can't reliably reproduce the problem. These changes only affect the client when doing NFSv4 mounts with delegations enabled. MFC after: 10 days
214053	19-Oct-2010	rmacklem	Fix the type of the 3rd argument for nm_getinfo so that it works for architectures like sparc64. Suggested by: kib MFC after: 2 weeks
214048	19-Oct-2010	rmacklem	Modify the NFS clients and the NLM so that the NLM can be used by both clients. Since the NLM uses various fields of the nfsmount structure, those fields were extracted and put in a separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h. This structure also has a function pointer for a function that extracts the required information from the mount point and nfs vnode for that particular client, for information stored differently by the clients. Reviewed by: jhb MFC after: 2 weeks
212362	09-Sep-2010	rmacklem	Fix the experimental NFS client so that it doesn't panic when NFSv2,3 byte range locking is attempted. A fix that allows the nlm_advlock() to work with both clients is in progress, but may take a while. As such, I am doing this commit so that the kernel doesn't panic in the meantime. Submitted by: jh MFC after: 2 weeks
212293	07-Sep-2010	jhb	Store the full timestamp when caching timestamps of files and directories for purposes of validating name cache entries. This closes races where two updates to a file or directory within the same second could result in stale entries in the name cache. While here, remove the 'n_expiry' field as it is no longer used. Reviewed by: rmacklem MFC after: 1 week
212217	05-Sep-2010	rmacklem	Change the code in ncl_bioread() in the experimental NFS client to return an error when rabp is not set, so it behaves the same way as the regular NFS client for this case. It does not affect NFSv4, since nfs_getcacheblk() only fails for "intr" mounts and NFSv4 can't use the "intr" mount option. MFC after: 2 weeks
212216	05-Sep-2010	rmacklem	Disable use of the NLM in the experimental NFS client, since it will crash the kernel because it uses the nfsmount and nfsnode structures of the regular NFS client. MFC after: 2 weeks
211531	20-Aug-2010	jhb	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
210786	03-Aug-2010	rmacklem	Modify the return value for nfscl_mustflush() from boolean_t, which I mistakenly thought was correct w.r.t. style(9), back to int and add the checks for != 0. This is just a stylistic modification. MFC after: 1 week
210455	24-Jul-2010	rmacklem	Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate module that can be used by both the regular and experimental nfs clients. This fixes the problem reported by jh@ where /dev/nfslock would be registered twice when both nfs clients were used. I also defined the size of the lm_fh field to be the correct value, as it should be the maximum size of an NFSv3 file handle. Reviewed by: jh MFC after: 2 weeks
210227	18-Jul-2010	rmacklem	Add a call to nfscl_mustflush() in nfs_close() of the experimental NFSv4 client, so that attributes are not acquired from the server when a delegation for the file is held. This can reduce the number of Getattr Ops significantly. MFC after: 2 weeks
210201	18-Jul-2010	rmacklem	Change the nfscl_mustflush() function in the experimental NFSv4 client to return a boolean_t in order to make it more compatible with style(9). MFC after: 2 weeks
210136	15-Jul-2010	jhb	Retire the NFS access cache timestamp structure. It was used in VOP_OPEN() to avoid sending multiple ACCESS/GETATTR RPCs during a single open() between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be sent. MFC after: 2 weeks
210135	15-Jul-2010	jhb	Merge 208603, 209946, and 209948 to the new NFS client: Move attribute cache flushes from VOP_OPEN() to VOP_LOOKUP() to provide more graceful recovery for stale filehandles and eliminate the need for conditionally clearing the attribute cache in the !NMODIFIED case in VOP_OPEN(). Reviewed by: rmacklem MFC after: 2 weeks
210034	13-Jul-2010	rmacklem	For the experimental NFSv4 client, make sure that attributes that predate the issue of a delegation are not cached once the delegation is held. This is necessary, since cached attributes remain valid while the delegation is held. MFC after: 2 weeks
210032	13-Jul-2010	rmacklem	For the experimental NFSv4 client, do not use cached attributes that were invalidated, even when a delegation for the file is held. MFC after: 2 weeks
209191	15-Jun-2010	rmacklem	Add MODULE_DEPEND() macros to the experimental NFS client and server so that the modules will load when kernels are built with none of the NFS* configuration options specified. I believe this resolves the problems reported by PR kern/144458 and the email on freebsd-stable@ posted by Dmitry Pryanishnikov on June 13. Tested by: kib PR: kern/144458 Reviewed by: kib MFC after: 1 week
209120	13-Jun-2010	kib	In NFS clients, instead of inconsistently using #ifdef DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server. Submitted by: Mikolaj Golub <to.my.trociny gmail com> MFC after: 2 weeks
208254	18-May-2010	rmacklem	Allow the experimental NFSv4 client to use cached attributes when a write delegation is held. Also, add a missing mtx_unlock() call for the ACL debugging code. MFC after: 5 days
208234	18-May-2010	rmacklem	Add a sanity check for a negative args.fhsize to the experimental NFS client. MFC after: 5 days
207746	07-May-2010	alc	Push down the page queues lock into vm_page_activate().
207728	06-May-2010	alc	Eliminate page queues locking around most calls to vm_page_free().
207669	05-May-2010	alc	Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.) This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename(). Discussed with: kib
207662	05-May-2010	trasz	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib
207584	03-May-2010	kib	Lock the page around vm_page_activate() and vm_page_deactivate() calls where it was missed. The wrapped fragments now protect wire_count with page lock. Reviewed by: alc
207350	28-Apr-2010	rmacklem	For the experimental NFS client, it should always flush dirty buffers before closing the NFSv4 opens, as the comment states. This patch deletes the call to nfscl_mustflush() which would return 0 for the case where a delegation still exists, which was incorrect and could cause crashes during recovery from an expired lease. MFC after: 1 week
207349	28-Apr-2010	rmacklem	Delete a diagnostic statement that is no longer useful from the experimental NFS client. MFC after: 1 week
207170	24-Apr-2010	rmacklem	An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case. MFC after: 1 week
207082	22-Apr-2010	rmacklem	When the experimental NFS client is handling an NFSv4 server reboot with delegations enabled, the recovery could fail if the renew thread is trying to return a delegation, since it will not do the recovery. This patch fixes the above by having nfscl_recalldeleg() fail with the I/O operations returning EIO, so that they will be attempted later. Most of the patch consists of adding an argument to various functions to indicate the delegation recall case where this needs to be done. MFC after: 1 week
206880	20-Apr-2010	rmacklem	For the experimental NFS client doing an NFSv4 mount, set the NFSCLFLAGS_RECVRINPROG while doing recovery from an expired lease in a manner similar to r206818 for server reboot recovery. This will prevent the function that acquires stateids for I/O operations from acquiring out of date stateids during recovery. Also, fix up mutex locking on the nfsc_flags field. MFC after: 1 week
206818	18-Apr-2010	rmacklem	Avoid extraneous recovery cycles in the experimental NFS client when an NFSv4 server reboots, by doing two things. 1 - Make the function that acquires a stateid for I/O operations block until recovery is complete, so that it doesn't acquire out of date stateids. 2 - Only allow a recovery once every 1/2 of a lease duration, since the NFSv4 server must provide a recovery grace period of at least a lease duration. This should avoid recoveries caused by an out of date stateid that was acquired for an I/O op. just before a recovery cycle started. MFC after: 1 week
206690	15-Apr-2010	rmacklem	Add mutex lock calls to 2 cases in the experimental NFS client's renew thread where they were missing. MFC after: 1 week
206688	15-Apr-2010	rmacklem	The experimental NFS client was not filling in recovery credentials for opens done locally in the client when a delegation for the file was held. This could cause the client to crash in crsetgroups() when recovering from a server crash/reboot. This patch fills in the recovery credentials for this case, in order to avoid the client crash. Also, add KASSERT()s to the credential copy functions, to catch any other cases where the credentials aren't filled in correctly. MFC after: 1 week
203303	31-Jan-2010	rmacklem	Patch the experimental NFS client so that there is a timeout for negative name cache entries in a manner analogous to r202767 for the regular NFS client. Also, make the code in nfs_lookup() compatible with that of the regular client and replace the sysctl variable that enabled negative name caching with the mount point option. MFC after: 2 weeks
203119	28-Jan-2010	rmacklem	Patch the experimental NFS client in a manner analogous to r203072 for the regular NFS client. Also, delete two fields of struct nfsmount that are not used by the FreeBSD port of the client. MFC after: 2 weeks
201439	03-Jan-2010	rmacklem	Fix three related problems in the experimental nfs client when checking for conflicts w.r.t. byte range locks for NFSv4. 1 - Return 0 instead of EACCES when a conflict is found, for F_GETLK. 2 - Check for "same file" when checking for a conflict. 3 - Don't check for a conflict for the F_UNLCK case.
201345	31-Dec-2009	rmacklem	Fix the experimental NFS client so that it can create Unix domain sockets on an NFSv4 mount point. It was generating incorrect XDR in the request for this case. Tested by: infofarmer MFC after: 2 weeks
201029	26-Dec-2009	rmacklem	When porting the experimental nfs subsystem to the FreeBSD8 krpc, I added 3 functions that were already in the experimental client under different names. This patch deletes the functions in the experimental client and renames the calls to use the other set. (This is just removal of duplicated code and does not fix any bug.) MFC after: 2 weeks
200069	03-Dec-2009	trasz	Remove unneeded ifdefs. Reviewed by: rmacklem
199189	11-Nov-2009	jh	Create verifier used by FreeBSD NFS client is suboptimal because the first part of a verifier is set to the first IP address from V_in_ifaddrhead list. This address is typically the loopback address making the first part of the verifier practically non-unique. The second part of the verifier is initialized to zero making its initial value non-unique too. This commit changes the strategy for create verifier initialization: just initialize it to a random value. Also move verifier handling into its own function and use a mutex to protect the variable. This change is a candidate for porting to sys/nfsclient. Reviewed by: jhb, rmacklem Approved by: trasz (mentor)
198291	20-Oct-2009	jh	Unloading of the nfscl module is unsupported because newnfslock doesn't support unloading. It's not trivial to implement newnfslock unloading so for now just admit that unloading is unsupported and refuse to attempt unload in all nfscl module event handlers. Reviewed by: rmacklem Approved by: trasz (mentor)
198290	20-Oct-2009	jh	Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must be called after ncl_uninit() when unloading the nfscl module because ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent(). Reviewed by: rmacklem Approved by: trasz (mentor)
198289	20-Oct-2009	jh	Fix comment typos. Reviewed by: rmacklem Approved by: trasz (mentor)
197048	09-Sep-2009	rmacklem	Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs vnodes, since these nodes are not linked into the mount queue and, as such, the vn_lock() cannot cause a deadlock so LORs are harmless. Suggested by: kib Approved by: kib (mentor) MFC after: 3 days
196503	24-Aug-2009	zec	Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet context inside the RPC code. Temporarily set td's cred to mount's cred before calling socreate() via __rpc_nconf2socket(). Submitted by: rmacklem (in part) Reviewed by: rmacklem, rwatson Discussed with: dfr, bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days
196332	17-Aug-2009	rmacklem	Apply the same patch as r196205 for nfs_upgrade_lock() and nfs_downgrade_lock() to the experimental nfs client. Approved by: re (kensmith), kib (mentor)
195943	29-Jul-2009	rmacklem	Fix the experimental nfs client so that it only calls ncl_vinvalbuf() for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since nfscl_mustflush() only returns 0 when there is a valid write delegation issued to the client, it only affects the case of an NFSv4 mount with callbacks/delegations enabled. Approved by: re (kensmith), kib (mentor)
195825	22-Jul-2009	rmacklem	When vfs.newnfs.callback_addr is set to an IPv4 address, the experimental NFSv4 client might try and use it as an IPv6 address, breaking callbacks. The fix simply initializes the isinet6 variable for this case. Approved by: re (kensmith), kib (mentor)
195821	22-Jul-2009	rmacklem	Add changes to the experimental nfs client to use the PBDRY flag for msleep(9) when a vnode lock or similar may be held. The changes are just a clone of the changes applied to the regular nfs client by r195703. Approved by: re (kensmith), kib (mentor)
195819	22-Jul-2009	rmacklem	When using an NFSv4 mount in the experimental nfs client with delegations being issued from the server, there was a case where an Open issued locally based on the delegation would be released before the associated vnode became inactive. If the delegation was recalled after the open was released, an Open against the server would not have been acquired and subsequent I/O operations would need to use the special stateid of all zeros. This patch fixes that case. Approved by: re (kensmith), kib (mentor)
195762	19-Jul-2009	rmacklem	Fix two bugs in the experimental nfs client: - When the root vnode was acquired during mounting, mnt_stat.f_iosize was still set to 0, so getnewvnode() would set bo_bsize == 0. This would confuse getblk(), so that it always returned the first block causing the problem when the root directory of the mount point was greater than one block in size. It was fixed by setting mnt_stat.f_iosize to NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode. - NFSMNT_INT was being set temporarily while the initial connect to a server was being done. This erroneously configured the krpc for interruptible RPCs, which caused problems because signals weren't being masked off as they would have been for interruptible mounts. This code was deleted to fix the problem. Since mount_nfs does an NFS null RPC before the mount system call, connections to the server should work ok. Tested by: swell dot k at gmail dot com Approved by: re (kensmith), kib (mentor)
195704	14-Jul-2009	rmacklem	Fix the experimental nfs client so that it does not cause a "share->excl" panic when doing a lookup of dotdot at the root of a server's file system. The patch avoids calling vn_lock() for that case, since nfscl_nget() has already acquired a lock for the vnode. Approved by: re (kensmith), kib (mentor)
195699	14-Jul-2009	rwatson	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
195641	12-Jul-2009	rmacklem	Fix the handling of dotdot in lookup for the experimental nfs client in a manner analagous to the change in r195294 for the regular nfs client. Approved by: re (kensmith), kib (mentor)
195510	09-Jul-2009	rmacklem	Since the nfscl_getclose() function both decremented open counts and, optionally, created a separate list of NFSv4 opens to be closed, it was possible for the associated OpenOwner to be free'd before the Open was closed. The problem was that the Open was taken off the OpenOwner list before the Close RPC was done and OpenOwners can be free'd once the list is empty. This patch separates out the case of doing the Close RPC into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose() so that it closes a single open instead of a list of them. This avoids removing the Open from the OpenOwner list before doing the Close RPC. Approved by: re (kensmith), kib (mentor)
194951	25-Jun-2009	rwatson	Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
194541	20-Jun-2009	rmacklem	Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build. Approved by: kib (mentor)
194523	20-Jun-2009	rmacklem	Change the size of the nfsc_groups[] array in the experimental nfs client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on the wire for AUTH_SYS authentication. Reviewed by: brooks Approved by: kib (mentor)
194498	19-Jun-2009	brooks	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
194425	18-Jun-2009	alc	Fix some of the style errors in *getpages().
194408	17-Jun-2009	rmacklem	Add the SVC_RELEASE(xprt), as required by r194407. Approved by: kib (mentor)
194368	17-Jun-2009	bz	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.
194363	17-Jun-2009	rmacklem	Fix handling of ".." in nfs_lookup() for the forced dismount case by cribbing the change made to the regular nfs client in r194358. Approved by: kib (mentor)
194118	13-Jun-2009	jamie	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)
194117	13-Jun-2009	jamie	Use getcredhostuuid instead of accessing the prison directly. Approved by: bz (mentor)
193955	10-Jun-2009	rmacklem	This commit is analagous to r193952, but for the experimental nfs subsystem. Add a test for VI_DOOMED just after ncl_upgrade_vnlock() in ncl_bioread_check_cons(). This is required since it is possible for the vnode to be vgonel()'d while in ncl_upgrade_vnlock() when a forced dismount is in progress. Also, move the check for VI_DOOMED in ncl_vinvalbuf() down to after ncl_upgrade_vnlock() and replace the out of date comment for it. Approved by: kib (mentor)
193837	09-Jun-2009	rmacklem	Since vn_lock() with the LK_RETRY flag never returns an error for FreeBSD-CURRENT, the code that checked for and returned the error was broken. Change it to check for VI_DOOMED set after vn_lock() and return an error for that case. I believe this should only happen for forced dismounts. Approved by: kib (mentor)
193735	08-Jun-2009	rmacklem	Fix nfscl_getcl() so that it doesn't crash when it is called to do an NFSv4 Close operation with the cred argument NULL. Also, clarify what NULL arguments mean in the function's comment. Approved by: kib (mentor)
193187	31-May-2009	alc	nfs_write() can use the recently introduced vfs_bio_set_valid() instead of vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
193162	31-May-2009	zec	Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)
193125	30-May-2009	rmacklem	Add a check to v_type == VREG for the recently modified code that does NFSv4 Closes in the experimental client's VOP_INACTIVE(). I also replaced a bunch of ap->a_vp with a local copy of vp, because I thought that made it more readable. Approved by: kib (mentor)
193066	29-May-2009	jamie	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
192986	28-May-2009	alc	Make *getpages()s' assertion on the state of each page's dirty bits stricter.
192928	27-May-2009	rmacklem	Fix handling of NFSv4 Close operations in ncl_inactive(). Only do them for NFSv4 and flush writes to the server before doing the Close(s), as required. Also, use the a_td argument instead of curthread. Approved by: kib (mentor)
192675	24-May-2009	rmacklem	Fix the experimental nfsv4 client so that it works for the case of a kerberized mount without a host based principal name. This will only work for mounts being done by a user other than root. Support for a host based principal name will not work until proposed changes to the rpcsec_gss part of the krpc are committed. It now builds for "options KGSSAPI". Approved by: kib (mentor)
192613	22-May-2009	rmacklem	Change the sysctl_base argument to svcpool_create() to NULL for client side callbacks so that leaf names are not re-used, since they are already being used by the server. Approved by: kib (mentor)
192601	22-May-2009	rmacklem	Fix the name of the module common to the client and server in the experimental nfs subsystem to the correct one for the MODULE_DEPEND() macro. Approved by: kib (mentor)
192585	22-May-2009	rmacklem	Modify the mount handling code in the experimental nfs client to use the newer nmount() style arguments, as is used by mount_nfs.c. This prepares the kernel code for the use of a mount_nfs.c with changes for the experimental client integrated into it. Approved by: kib (mentor)
192582	22-May-2009	rmacklem	Change the code in the experimental nfs client to avoid flushing writes upon close when a write delegation is held by the client. This should be safe to do, now that nfsv4 Close operations are delayed until ncl_inactive() is called for the vnode. Approved by: kib (mentor)
192337	18-May-2009	rmacklem	Change the experimental NFSv4 client so that it does not do the NFSv4 Close operations until ncl_inactive(). This is necessary so that the Open StateIDs are available for doing I/O on mmap'd files after VOP_CLOSE(). I also changed some indentation for the nfscl_getclose() function. Approved by: kib (mentor)
192245	17-May-2009	alc	Merge r191964: Eliminate a case of unnecessary page queues locking.
192231	16-May-2009	rmacklem	Changed sys/fs/nfs_clbio.c in the same way Alan Cox changed sys/nfsclient/nfs_bio.c for r192134, so that the sources stay in sync. Approved by: kib (mentor)
192145	15-May-2009	rmacklem	Modify the diskless booting code in sys/fs/nfsclient to be compatible with what is in sys/nfsclient, so that it will at least build now. Approved by: kib (mentor)
192121	14-May-2009	rmacklem	Apply changes to the experimental nfs server so that it uses the security flavors as exported in FreeBSD-CURRENT. This allows it to use a slightly modified mountd.c instead of a different utility. Approved by: kib (mentor)
192065	13-May-2009	rmacklem	Apply a one line change to nfs_clbio.c (which is largely a copy of sys/nfsclient/nfs_bio.c) to track the change recently committed by acl for nfs_bio.c. Approved by: kib (mentor)
191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
191783	04-May-2009	rmacklem	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)