Cross Reference: /freebsd-9.3-release/sys/fs/nfsserver/nfs

History log of /freebsd-9.3-release/sys/fs/nfsserver/nfs_nfsdport.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 267654	19-Jun-2014	gjb	Copy stable/9 to releng/9.3 as part of the 9.3-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-9.3-release
# 266447	19-May-2014	rmacklem	MFC: r227809 This patch enables the new/default NFS server's use of shared vnode locking for read, readdir, readlink, getattr and access. It is hoped that this will improve server performance for these operations, since they will no longer be serialized for a given file/vnode. PR: 167048
# 265724	08-May-2014	rmacklem	MFC: r265252 The new draft specification for NFSv4.0 specifies that a server should either accept owner and owner_group strings that are just the digits of the uid/gid or return NFS4ERR_BADOWNER. This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can be set to enable the server w.r.t. accepting numeric string. It also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid strings are not enabled. This fixes the server for recent Linux nfs4 clients that use numeric uid/gid strings by default.
# 265723	08-May-2014	rmacklem	MFC: r264888 The PR reported that the old NFS server did not set uio_td == NULL for the VOP_READ() call. This patch fixes both the old and new server for this case.
# 261067	22-Jan-2014	mav	MFC r260229, r260258, r260367, r260390, r260459, r260648: Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
# 260174	01-Jan-2014	rmacklem	MFC: r259854 The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch.
# 260173	01-Jan-2014	rmacklem	MFC: r259845 An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set.
# 255532	13-Sep-2013	rmacklem	MFC: r254337 Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code.
# 251641	11-Jun-2013	ken	MFC NFS FHA changes 249592 and 249596: ------------------------------------------------------------------------ r249592 \| ken \| 2013-04-17 15:00:22 -0600 (Wed, 17 Apr 2013) \| 180 lines Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic ------------------------------------------------------------------------ r249596 \| ken \| 2013-04-17 16:42:43 -0600 (Wed, 17 Apr 2013) \| 7 lines Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic ------------------------------------------------------------------------ Sponsored by: Spectra Logic
# 250060	29-Apr-2013	des	Fix a bug that allows NFS clients to issue READDIR on files. PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver Approved by: so
# 247502	28-Feb-2013	jhb	MFC 245508,245566,245568,245611,245909: Various fixes to timestamps in NFS: - Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. - Remove unused nfs_curusec(). - Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime().
# 244658	24-Dec-2012	kib	MFC r241025: Fix the mis-handling of the VV_TEXT on the nullfs vnodes. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode.
# 243738	30-Nov-2012	rmacklem	MFC: r241561 Add two new options to the nfssvc(2) syscall that allow processes running as root to suspend/resume execution of the kernel nfsd threads. An earlier version of this patch was tested by Vincent Hoffman (vince at unsane.co.uk) and John Hickey (jh at deterlab.net).
# 241194	04-Oct-2012	rmacklem	MFC: r240720 Modify the NFSv4 client so that it can handle owner and owner_group strings that consist entirely of digits, interpreting them as the uid/gid number. This change was needed since new (>= 3.3) Linux servers reply with these strings by default. This change is mandated by the rfc3530bis draft. Reported on freebsd-stable@ under the Subject heading "Problem with Linux >= 3.3 as NFSv4 server" by Norbert Aschendorff on Aug. 20, 2012.
# 236134	26-May-2012	rmacklem	MFC: r234740 Fix a leak of namei lookup path buffers that occurs when a ZFS volume is exported via the new NFS server. The leak occurred because the new NFS server code didn't handle the case where a file system sets the SAVENAME flag in its VOP_LOOKUP() and ZFS does this for the DELETE case.
# 235626	18-May-2012	mckusick	MFC of 234386, 234400, 234441, 234443, 234482, 234483, 235052, 235241, 235246, and 235619 MFC: 234386 Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 234400 Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks MFC: 234441 Fix a memory leak of M_VNODE_MARKER introduced in 234386. Found by: Peter Holm MFC: 234443 Delete a no longer useful VNASSERT missed during changes in 234400. Suggested by: kib MFC: 234482 This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 234483 This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks MFC: 235052 (by pluknet) Fix mount mutex handling missed in r234386. MFC: 235241 (by pluknet) Fix mount interlock oversights from the previous change in r234386. Reported by: dougb Submitted by: Mateusz Guzik <mjguzik at gmail com> Reviewed by: Kirk McKusick Tested by: pho MFC: 235246 Fix mount mutex handling missed in r234386. MFC: 235619 Update comment to document that the vnode free-list mutex needs to be held when updating mnt_activevnodelist and mnt_activevnodelistsize.
# 232683	08-Mar-2012	rmacklem	MFC: r2323467 The name caching changes of r230394 exposed an intermittent bug in the new NFS server for NFSv4, where it would report ENOENT when the file actually existed on the server. This turned out to be caused by not initializing ni_topdir before calling lookup() and there was a rare case where the value on the stack location assigned to ni_topdir happened to be a pointer to a ".." entry, such that "dp == ndp->ni_topdir" succeeded in lookup(). This patch initializes ni_topdir to fix the problem.
# 232018	23-Feb-2012	rmacklem	MFC: r231805 Delete a couple of out of date comments that are no longer true in the new NFS client.
# 229827	08-Jan-2012	rmacklem	MFC: r228560 Patch the new NFS server in a manner analagous to r228520 for the old NFS server, so that it correctly handles a count == 0 argument for Commit.
# 229617	05-Jan-2012	jhb	MFC 228185: Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered.
# 225736	22-Sep-2011	kensmith	Copy head to stable/9 as part of 9.0-RELEASE release cycle. Approved by: re (implicit)
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 225356	02-Sep-2011	rmacklem	Fix the NFS servers so that they can do a Lookup of "..", which requires that ni_strictrelative be set to 0, post-r224810. Tested by: swills (earlier version), geo dot liaskos at gmail.com Approved by: re (kib)
# 224911	16-Aug-2011	jonathan	Fix a merge conflict. r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order to reliably call NFSEXITCODE() before returning. Our Capsicum changes, based on the old "return (error)" model, did not merge nicely. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
# 224778	11-Aug-2011	rwatson	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
# 224086	16-Jul-2011	zack	Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224083	16-Jul-2011	zack	Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224082	16-Jul-2011	zack	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224081	16-Jul-2011	zack	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224080	16-Jul-2011	zack	Remove unnecessary thread pointer from VOPLOCK macros and current users. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224078	16-Jul-2011	zack	Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 224077	16-Jul-2011	zack	Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 222663	03-Jun-2011	rmacklem	Modify the new NFS server so that the NFSv3 Pathconf RPC doesn't return an error when the underlying file system lacks support for any of the four _PC_xxx values used, by falling back to default values. Tested by: avg MFC after: 2 weeks
# 222167	21-May-2011	rmacklem	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
# 221615	07-May-2011	rmacklem	Change the new NFS server so that it uses vfs.nfsd naming for its sysctls instead of vfs.newnfs. This separates the names from the ones used by the client.
# 221517	05-May-2011	rmacklem	Change the new NFS server so that it returns 0 when the f_bavail or f_ffree fields of "struct statfs" are negative, since the values that go on the wire are unsigned and will appear to be very large positive values otherwise. This makes the handling of a negative f_bavail compatible with the old/regular NFS server. MFC after: 2 weeks
# 220648	14-Apr-2011	rmacklem	Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks
# 220645	14-Apr-2011	rmacklem	Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks
# 220546	11-Apr-2011	rmacklem	Vrele ni_startdir in the experimental NFS server for the case of NFSv2 getting an error return from VOP_MKNOD(). Without this patch, the server file system remains busy after an NFSv2 VOP_MKNOD() fails. MFC after: 2 weeks
# 220530	10-Apr-2011	rmacklem	Add some cleanup code to the module unload operation for the experimental NFS server, so that it doesn't leak memory when unloaded. However, unloading the NFSv4 server is not recommended, since all NFSv4 state will be lost by the unload and clients will have to recover the state after a server reload/restart as if the server crashed/rebooted. MFC after: 2 weeks
# 220507	09-Apr-2011	rmacklem	Add a VOP_UNLOCK() for the directory, when that is not what VOP_LOOKUP() returned. This fixes a bug in the experimental NFS server for the case where VFS_VGET() fails returning EOPNOTSUPP in the ReaddirPlus RPC, forcing the use of VOP_LOOKUP() instead. MFC after: 2 weeks
# 219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
# 218345	05-Feb-2011	alc	Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range. Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers. Reviewed by: kib MFC after: 3 weeks
# 217432	14-Jan-2011	rmacklem	Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled. Reviewed by: jhb MFC after: 1 week
# 217335	12-Jan-2011	zack	Clean up the experimental NFS server replay cache when the module is unloaded. Reviewed by: rmacklem Approved by: zml (mentor)
# 217176	09-Jan-2011	rmacklem	Modify readdirplus in the experimental NFS server in a manner analogous to r216633 for the regular server. This change busies the file system so that VFS_VGET() is guaranteed to be using the correct mount point even during a forced dismount attempt. Since nfsd_fhtovp() is not called immediately before readdirplus, the patch is actually a clone of pjd@'s nfs_serv.c.4.patch instead of the one committed in r216633. Reviewed by: kib MFC after: 10 days
# 217063	06-Jan-2011	rmacklem	Since the VFS_LOCK_GIANT() code in the experimental NFS server is broken and the major file systems are now all mpsafe, modify the server so that it will only export mpsafe file systems. This was discussed on freebsd-fs@ and removes a fair bit of crufty code. MFC after: 12 days
# 217017	05-Jan-2011	rmacklem	Fix the experimental NFS server to use vfs_busyfs() instead of vfs_getvfs() so that the mount point is busied for the VFS_FHTOVP() call. This is analagous to r185432 for the regular NFS server. Reviewed by: kib MFC after: 12 days
# 216931	03-Jan-2011	rmacklem	Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client. Reviewed by: jhb MFC after: 2 weeks
# 216898	02-Jan-2011	rmacklem	Fix the experimental NFS server so that it doesn't leak a reference count on the directory when creating device special files. MFC after: 2 weeks
# 216894	02-Jan-2011	rmacklem	Delete some cruft from the experimental NFS server that was only used by the OpenBSD port for its pseudo-fs. MFC after: 2 weeks
# 216893	02-Jan-2011	rmacklem	Add checks for VI_DOOMED and vn_lock() failures to the experimental NFS server, to handle the case where an exported file system is forced dismounted while an RPC is in progress. Further commits will fix the cases where a mount point is used when the associated vnode isn't locked. Reviewed by: kib MFC after: 2 weeks
# 216784	28-Dec-2010	rmacklem	Delete the nfsvno_localconflict() function in the experimental NFS server since it is no longer used and is broken. MFC after: 2 weeks
# 216700	25-Dec-2010	rmacklem	Modify the experimental NFS server so that it uses LK_SHARED for RPC operations when it can. Since VFS_FHTOVP() currently always gets an exclusively locked vnode and is usually called at the beginning of each RPC, the RPCs for a given vnode will still be serialized. As such, passing a lock type argument to VFS_FHTOVP() would be preferable to doing the vn_lock() with LK_DOWNGRADE after the VFS_FHTOVP() call. Reviewed by: kib MFC after: 2 weeks
# 216693	24-Dec-2010	rmacklem	Add an argument to nfsvno_getattr() in the experimental NFS server, so that it can avoid calling VOP_ISLOCKED() when the vnode is known to be locked. This will allow LK_SHARED to be used for these cases, which happen to be all the cases that can use LK_SHARED. This does not fix any bug, but it reduces the number of calls to VOP_ISLOCKED() and prepares the code so that it can be switched to using LK_SHARED in a future patch. Reviewed by: kib MFC after: 2 weeks
# 216692	24-Dec-2010	rmacklem	Simplify vnode locking in the expeimental NFS server's readdir functions. In particular, get rid of two bogus VOP_ISLOCKED() calls. Removing the VOP_ISLOCKED() calls is the only actual bug fixed by this patch. Reviewed by: kib MFC after: 2 weeks
# 216691	24-Dec-2010	rmacklem	Since VOP_READDIR() for ZFS does not return monotonically increasing directory offset cookies, disable the UFS related loop that skips over directory entries at the beginning of the block for the experimental NFS server. This loop is required for UFS since it always returns directory entries starting at the beginning of the block that the requested directory offset is in. In discussion with pjd@ and mckusick@ it seems that this behaviour of UFS should maybe change, with this fix being an interim patch until then. This patch only fixes the experimental server, since pjd@ is working on a patch for the regular server. Discussed with: pjd, mckusick MFC after: 5 days
# 214255	23-Oct-2010	rmacklem	Modify the experimental NFSv4 server's file handle hash function to use the generic hash32_buf() function. Although adding the bytes seemed sufficient for UFS and ZFS, since most of the bytes are the same for file handles on the same volume, this might not be sufficient for other file systems. Use of a generic function also seems preferable to one specific to NFSv4. Suggested by: gleb.kurtsou at gmail.com MFC after: 10 days
# 214224	22-Oct-2010	rmacklem	Modify the file handle hash function in the experimental NFS server so that it will work better for non-UFS file systems. The new function simply sums the bytes of the fh_fid field of fhandle_t. MFC after: 10 days
# 214149	21-Oct-2010	rmacklem	Modify the experimental NFS server in a manner analagous to r214049 for the regular NFS server, so that it will not do a VOP_LOOKUP() of ".." when at the root of a file system when performing a ReaddirPlus RPC. MFC after: 10 days
# 212833	18-Sep-2010	rmacklem	Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK() unlock operations correctly. It was passing in F_SETLK instead of F_UNLCK as the operation for the unlock case. This only affected operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled. MFC after: 1 week
# 209191	14-Jun-2010	rmacklem	Add MODULE_DEPEND() macros to the experimental NFS client and server so that the modules will load when kernels are built with none of the NFS* configuration options specified. I believe this resolves the problems reported by PR kern/144458 and the email on freebsd-stable@ posted by Dmitry Pryanishnikov on June 13. Tested by: kib PR: kern/144458 Reviewed by: kib MFC after: 1 week
# 206170	04-Apr-2010	rmacklem	Harden the experimental NFS server a little, by adding extra checks in the readdir functions for non-positive byte count arguments. For the negative case, set it to the maximum allowable, since it was actually a large positive value (unsigned) on the wire. Also, fix up the readdir function comment a bit. Suggested by: dillon AT apollo.backplane.com MFC after: 2 weeks
# 206063	02-Apr-2010	rmacklem	For the experimental NFS server, add a call to free the lookup path buffer for one case where it was missing when doing mkdir. This could have conceivably resulted in a leak of a buffer, but a leak was never observed during testing, so I suspect it would have occurred rarely, if ever, in practice. MFC after: 2 weeks
# 205663	25-Mar-2010	rmacklem	Patch the experimental NFS server in a manner analagous to r205661 for the regular NFS server, to ensure that ESTALE is returned to the client for all errors returned by VFS_FHTOVP(). MFC after: 2 weeks
# 205010	11-Mar-2010	rwatson	Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x: - so_pcb is now guaranteed to be non-NULL and valid if a valid socket reference is held. - Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a tcpcb, as it might be a tcptw or NULL otherwise. - tp can never be NULL by the end of the function, so only check TCPS_ESTABLISHED before extracting tcpcb fields. The NFS server arguably incorporates too many assumptions about TCP internals, but fixing that is left for nother day. MFC after: 1 week Reviewed by: bz Reviewed and tested by: rmacklem Sponsored by: Juniper Networks
# 200999	25-Dec-2009	rmacklem	Modify the experimental server so that it uses VOP_ACCESSX(). This is necessary in order to enable NFSv4 ACL support. The argument to nfsvno_accchk() was changed to an accmode_t and the function nfsrv_aclaccess() was no longer needed and, therefore, deleted. Reviewed by: trasz MFC after: 2 weeks
# 199715	23-Nov-2009	rmacklem	Modify the experimental nfs server so that it falls back to using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the ReaddirPlus RPC. This patch is based upon one by pjd@ for the regular nfs server which has not yet been committed. It is needed when a ZFS volume is exported and ReaddirPlus (which almost always happens for NFSv4) is performed by a client. The patch also simplifies vnode lock handling somewhat. MFC after: 2 weeks
# 199616	20-Nov-2009	rmacklem	Patch the experimental NFS server is a manner analagous to r197525, so that the creation verifier is handled correctly in va_atime for 64bit architectures. There were two problems. One was that the code incorrectly assumed that sizeof (struct timespec) == 8 and the other was that the tv_sec field needs to be assigned from a signed 32bit integer, so that sign extension occurs on 64bit architectures. This is required for correct operation when exporting ZFS volumes. Reviewed by: pjd MFC after: 2 weeks
# 195699	14-Jul-2009	rwatson	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
# 194498	19-Jun-2009	brooks	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
# 193162	31-May-2009	zec	Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)
# 192861	26-May-2009	rmacklem	Fix the experimental nfs subsystem so that it builds with the current NFSv4 ACLs, as defined in sys/acl.h. It still needs a way to test a mount point for NFSv4 ACL support before it will work. Until then, the NFSHASNFS4ACL() macro just always returns 0. Approved by: kib (mentor)
# 192574	21-May-2009	rmacklem	Fix the experimental nfs server so that it depends on the nlm, since it now calls nlm_acquire_next_sysid(). Approved by: kib (mentor)
# 192503	21-May-2009	rmacklem	Modify sys/fs/nfsserver/nfs_nfsdport.c to use nlm_acquire_next_sysid() to set the l_sysid for locks correctly. Approved by: kib (mentor)
# 192256	17-May-2009	rmacklem	Fix the acquisition of local locks via VOP_ADVLOCK() by the experimental nfsv4 server. It was setting the a_id argument to a fixed value, but that wasn't sufficient for FreeBSD8. Instead, set l_pid and l_sysid to 0 plus set the F_REMOTE flag to indicate that these fields are used to check for same lock owner. Since, for NFSv4, a lockowner is a ClientID plus an up to 1024byte name, it can't be put in l_sysid easily. I also renamed the p variable to td, since it's a thread ptr. Approved by: kib (mentor)
# 192255	17-May-2009	rmacklem	Added a SYSCTL to sys/fs/nfsserver/nfs_nfsdport.c so that the value of nfsrv_dolocallocks can be changed via sysctl. I also added some non-empty descriptor strings and reformatted some overly long lines. Approved by: kib (mentor)
# 192121	14-May-2009	rmacklem	Apply changes to the experimental nfs server so that it uses the security flavors as exported in FreeBSD-CURRENT. This allows it to use a slightly modified mountd.c instead of a different utility. Approved by: kib (mentor)
# 192017	12-May-2009	rmacklem	Modify the experimental nfs server to use the new nfsd_nfsd_args structure for nfsd. Includes a change that clarifies the use of an empty principal name string to indicate AUTH_SYS only. Approved by: kib (mentor)
# 192000	11-May-2009	rmacklem	Change the name of the nfs server addsock structure from nfsd_args to nfsd_addsock_args, so that it is consistent with the one in sys/nfsserver/nfs.h. Approved by: kib (mentor)
# 191998	11-May-2009	rmacklem	Modify nfsvno_fhtovp() to ensure that it always sets the credp argument. Returning without credp set could result in a caller doing crfree() on garbage. Reviewed by: kan Approved by: kib (mentor)
# 191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
# 191940	09-May-2009	kan	Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
# 191783	04-May-2009	rmacklem	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)