Cross Reference: /freebsd-current/sys/fs/nfsclient/nfs

History log of /freebsd-current/sys/fs/nfsclient/nfs_clport.c
Revision	Date	Author	Comments
# 656d2e83	30-Dec-2023	Konstantin Belousov <kib@FreeBSD.org>	nfsclient: eliminate ncl_writebp() Use plain bufwrite() instead. ncl_writebp() evolved to mostly repeat bufwrite() code with some ommisions, most notably runningbufspace accounting. Reviewed by: imp, markj, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43249
# 196787f7	20-Oct-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Use Claim_Null_FH and Claim_Deleg_Cur_FH For NFSv4.1/4.2, there are two new options for the Open operation. These two options use the file handle for the file instead of the file handle for the directory plus a file name. By doing so, the client code is simplified (it no longer needs the "nfsv4node" structure attached to the NFS vnode). It also avoids problems caused by another NFS client (or process running locally in the NFS server) doing a rename or remove of the file name between the Lookup and Open. Unfortunately, there was a bug (fixed recently by commit X) in the NFS server which mis-parsed the Claim_Deleg_Cur_FH arguments. To allow this patch to work with the broken FreeBSD NFSv4.1/4.2 server, NFSMNTP_BUGGYFBSDSRV is defined and is set when a correctly formatted Claim_Deleg_Cur_FH fails with NFSERR_EXPIRED. (This is what the old, broken NFS server does, since it erroneously uses the Getattr arguments as a stateID.) Once this flag is set, the client fills in a stateID, to make the broken NFS server happy. Tested at a recent IETF NFSv4 Bakeathon. MFC after: 1 month
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 896516e5	16-Mar-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts Without this patch, a Kerberized NFSv4.1/4.2 mount must provide a Kerberos credential for the client at mount time. This credential is typically referred to as a "machine credential". It can be created one of two ways: - The user (usually root) has a valid TGT at the time the mount is done and this becomes the machine credential. There are two problems with this. 1 - The user doing the mount must have a valid TGT for a user principal at mount time. As such, the mount cannot be put in fstab(5) or similar. 2 - When the TGT expires, the mount breaks. - The client machine has a service principal in its default keytab file and this service principal (typically called a host-based initiator credential) is used as the machine credential. There are problems with this approach as well: 1 - There is a certain amount of administrative overhead creating the service principal for the NFS client, creating a keytab entry for this principal and then copying the keytab entry into the client's default keytab file via some secure means. 2 - The NFS client must have a fixed, well known, DNS name, since that FQDN is in the service principal name as the instance. This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which allows the state maintenance operations to be performed by any authentication mechanism, to do these operations via AUTH_SYS instead of RPCSEC_GSS (Kerberos). As such, neither of the above mechanisms is needed. It is hoped that this option will encourage adoption of Kerberized NFS mounts using TLS, to provide a more secure NFS mount. This new NFSv4.1/4.2 mount option, called "syskrb5" must be used with "sec=krb5[ip]" to avoid the need for either of the above Kerberos setups to be done by the client. Note that all file access/modification operations still require users on the NFS client to have a valid TGT recognized by the NFSv4.1/4.2 server. As such, this option allows, at most, a malicious client to do some sort of DOS attack. Although not required, use of "tls" with this new option is encouraged, since it provides on-the-wire encryption plus, optionally, client identity verification via a X.509 certificate provided to the server during TLS handshake. Alternately, "sec=krb5p" does provide on-the-wire encryption of file data. A mount_nfs(8) man page update will be done in a separate commit. Discussed on: freebsd-current@ MFC after: 3 months
# 357492c9	20-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add NFSD_CURVNET macros to nfsclient syscall Although the nfsclient syscall is used for client side, it does set up server side krpc for callbacks. As such, it needs to have the vnet set. This patch does this. Without this patch, the system would crash when the nfscbd(8) daemon was killed. Reported by: freebsd@walstatt-de.de MFC after: 3 months
# 39633fc1	11-Jan-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Improve NFSv4 error message for NFSERR_WRONGSEC The usual reason for an NFSv4 server replying NFSERR_WRONGSEC to an operation is that a Kerberos credential is required. This patch replaces a cryptic "err=10016" with a message suggesting that a Kerberos TGT is probably needed. MFC after: 2 weeks
# 829f0bcb	19-Dec-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
# 6032cf3d	22-Dec-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Improve the console message for NFSERR_NOFILEHANDLE Since a NFSERR_NOFILEHANDLE reply from an NFSv4 server usually means that the file system is not exported on the server, change the console log message to indicate that. MFC after: 1 week
# 7d9dc91a	15-Oct-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix the NFSv4.0 mount so that it does not crash Commit efe58855f3ea modifies IN_LOOPBACK() so that it uses a VNET variable. Without this patch, nfscl_getmyip() uses IN_LOOPBACK() when the VNET is not set and crashes the system. nfscl_getmyip() is only called when a NFSv4.0 (not NFSv4.1/4.2) mount is done. This patch re-organizes nfscl_getmyip() so that IN_LOOPBACK() is before the CURVENT_RESTORE() macro, to avoid the crashes. Reviewed by: karels, zlei.huang_gmail.com Differential Revision: https://reviews.freebsd.org/D37008
# 068fc057	14-Apr-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Clean up the code by removing unused arguments The "void *stuff" (also called fstuff and dstuff) argument was used by the Mac OSX port. For FreeBSD, this argument is always NULL, so remove it to clean up the code. This commit gets rid of "stuff" for nfscl_nget(). Future commits will do the same for other functions.
# 4ad3423b	13-Apr-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Clean up the code by removing unused arguments The "void *stuff" (also called fstuff and dstuff) argument was used by the Mac OSX port. For FreeBSD, this argument is always NULL, so remove it to clean up the code. This commit gets rid of "stuff" for nfscl_loadattrcache(). Future commits will do the same for other functions.
# 5580e5bd	10-Apr-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Clean up the code by removing unused arguments The "void *stuff" (also called fstuff and dstuff) argument was used by the Mac OSX port. For FreeBSD, this argument is always NULL, so remove it to clean up the code. This commit gets rid of "stuff" for nfscl_request(). Future commits will do the same for other functions.
# 38c3cf6a	09-Apr-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Clean up the code by removing unused arguments The "void *stuff" (also called fstuff and dstuff) argument was used by the Mac OSX port. For FreeBSD, this argument is always NULL, so remove it to clean up the code. This commit gets rid of "stuff" for nfscl_postop_attr(). Future commits will do the same for other functions.
# 21de450a	08-Apr-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add support for a NFSv4 AppendWrite RPC For IO_APPEND VOP_WRITE()s, the code first does a Getattr RPC to acquire the file's size, before it can do the Write RPC. Although NFS does not have an append write operation, an NFSv4 compound can use a Verify operation to check that the client's notion of the file's size is correct, followed by the Write operation. This patch modifies nfscl_wcc_data() to optionally acquire the file's size, for use with an AppendWrite. Although the "stuff" arguments are always NULL (these were used for the Mac OSX port and should be cleared out someday), make the argument to nfscl_wcc_data() explicitly NULL for clarity. This patch does not cause any semantics change until the AppendWrite is added in a future commit.
# 7a9bc8a8	15-Jul-2021	Mark Johnston <markj@FreeBSD.org>	nfssvc: Zero the buffer copied out when NFSSVC_DUMPMNTOPTS is set Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 03c81af2	07-Jun-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix generation of va_fsid for a tree of NFSv4 server file systems Pre-r318997 the code looked like: if (vp->v_mount->mnt_stat.f_fsid.val[0] != (uint32_t)np->n_vattr.na_filesid[0]) vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0]; Doing this assignment got lost by r318997 and, as such, NFSv4 mounts of servers with trees of file systems on the server is broken, due to duplicate fileno values for the same st_dev/va_fsid. Although I could have re-introduced the assignment, since the value of na_filesid[0] is not guaranteed to be unique across the server file systems, I felt it was better to always do the hash for na_filesid[0,1]. Since dev_t (st_dev/va_fsid) is now 64bits, I switched to a 64bit hash. There is a slight chance of a hash conflict where 2 different na_filesid values map to same va_fsid, which will be documented in the BUGS section of the man page for mount_nfs(8). Using a table to keep track of mappings to catch conflicts would not easily scale to 10,000+ server file systems and, when the conflict occurs, it only results in fts(3) reporting a "directory cycle" under certain circumstances. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30660
# cb07628d	10-May-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Delete unneeded redundant MODULE_DEPEND() calls There are two module declarations in the nfscl.ko module for "nfscl" and "nfs". Both of these declarations had MODULE_DEPEND() calls. This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid confusion with respect to what modules this module is dependent upon. The patch also adds comments explaining why there are two module declarations within the module. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D30102
# dd02d9d6	07-May-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add support for va_birthtime to NFSv4 There is a NFSv4 file attribute called TimeCreate that can be used for va_birthtime. r362175 added some support for use of TimeCreate. This patch completes support of va_birthtime by adding support for setting this attribute to the server. It also eanbles the client to acquire and set the attribute for a NFSv4 server that supports the attribute. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30156
# 8bde6d15	04-May-2021	Mark Johnston <markj@FreeBSD.org>	nfsclient: Copy only initialized fields in nfs_getattr() When loading attributes from the cache, the NFS client is careful to copy only the fields that it initialized. After fetching attributes from the server, however, it would copy the entire vattr structure initialized from the RPC response, so uninitialized stack bytes would end up being copied to userspace. In particular, va_birthtime (v2 and v3) and va_gen (v3) had this problem. Use a common subroutine to copy fields provided by the NFS client, and ensure that we provide a dummy va_gen for the v3 case. Reviewed by: rmacklem Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30090
# 605284b8	13-Feb-2021	Alexander V. Chernikov <melifaro@FreeBSD.org>	Enforce net epoch in in6_selectsrc(). in6_selectsrc() may call fib6_lookup() in some cases, which requires epoch. Wrap in6_selectsrc* calls into epoch inside its users. Mark it as requiring epoch by adding NET_EPOCH_ASSERT(). MFC after: 1 weeek Differential Revision: https://reviews.freebsd.org/D28647
# 6b3a9a0f	11-Jan-2021	Mateusz Guzik <mjg@FreeBSD.org>	Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)
# 586ee69f	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	fs: clean up empty lines in .c and .h files
# 9d5df78e	28-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix NOINET6 build broken by r361575. Reported by: ci, hps
# c74ce5cc	28-May-2020	Alexander V. Chernikov <melifaro@FreeBSD.org>	Make NFS address selection use fib4_lookup(). fib4_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying. Switch call to use new fib4_lookup(), allowing to eventually deprecate old api. Differential Revision: https://reviews.freebsd.org/D24977
# b9cc3262	12-May-2020	Ryan Moeller <freqlabs@FreeBSD.org>	nfs: Remove APPLESTATIC macro It is no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24811
# 8de97f39	09-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933
# 0ff51c98	22-Feb-2020	Konstantin Belousov <kib@FreeBSD.org>	Fix NFS client deadlock when read reports truncated node. If node attribute returned in the reply for read rpc indicate truncation, and it happens that the vnode is exclusively locked, update of the node attributes would try to shrink vnode size. Since during the read some vnode pages were busied by the reading thread, vnode_pager_setsize() deadlocks waiting for the busy state owned by the caller. Use a thread-local flag to indicate that NFS read owns some (s)busy pages states and postpone the call to vnode_pager_setsize() until the thread relinguishes the ownership. Diagnosed by: rlibby Tested by: pho, rlibby Sponsored by: The FreeBSD Foundation MFC after: 1 week
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# abd80ddb	08-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
# c6ba06d8	22-Oct-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix interface between nfsclient and vnode pager. Make the nfsclient always call vnode_pager_setsize() with the vnode exclusively locked. This ensures that page fault always can find the backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK flag on NFS nodes. The main offender breaking the interface in nfsclient is nfs_loadattrcache(), which is used whenever server responded with updated attributes, which can happen on non-changing operations as well. Also, iod threads only have buffers locked (and even that is LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC response. Instead of immediately calling vnode_pager_setsize() if server response indicated changed file size, but the vnode is not exclusively locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively locked, or when we can temporary upgrade the lock to exclusive, call vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation. Tested by: pho Discussed with: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883
# 5d85e12f	23-Sep-2019	Rick Macklem <rmacklem@FreeBSD.org>	Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros. For a long time, some places in the NFS code have locked/unlocked the NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas others have simply used mtx_lock()/mtx_unlock(). Since the NFS node mutex needs to change to an sx lock so it can be held when vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock with the macros to simply making the change to an sx lock in future commit. There is no semantic change as a result of this commit. I am not sure if the change to an sx lock will be MFC'd soon, so I put an MFC of 1 week on this commit so that it could be MFC'd with that commit. Suggested by: kib MFC after: 1 week
# 6fd58358	17-Sep-2019	Konstantin Belousov <kib@FreeBSD.org>	Further refine r352393, only call vnode_pager_setsize() outside the node lock when shrinking. This is similar to r252528, applied to the above commit. Apparently there is a race which makes necessary at least to keep the n_size and pager size consistent when extending. Current suspect is that iod threads perform vnode_pager_setsize() without taking the vnode lock, which corrupts the file content. Reported and tested by: Masachika ISHIZUKA <ish@amail.plala.or.jp> Discussed with: rmacklem (related issues) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 1246ee66	16-Sep-2019	Konstantin Belousov <kib@FreeBSD.org>	nfscl_loadattrcache: fix rest of the cases to not call vnode_pager_setsize() under the node mutex. r248567 moved some calls of vnode_pager_setsize() after the node lock is unlocked, do the rest now. Reported and tested by: peterj Sponsored by: The FreeBSD Foundation MFC after: 1 week
# eeb1f3ed	14-Apr-2019	Rick Macklem <rmacklem@FreeBSD.org>	Fix the NFSv4 client to safely find processes. r340744 broke the NFSv4 client, because it replaced pfind_locked() with a call to pfind(), since pfind() acquires the sx lock for the pid hash and the NFSv4 already holds a mutex when it does the call. The patch fixes the problem by recreating a pfind_any_locked() and adding the functions pidhash_slockall() and pidhash_sunlockall to acquire/release all of the pid hash locks. These functions are then used by the NFSv4 client instead of acquiring the allproc_lock and calling pfind(). Reviewed by: kib, mjg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19887
# 6c1c6ae5	04-Apr-2019	Rodney W. Grimes <rgrimes@FreeBSD.org>	Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code There are a few places that use hand crafted versions of the macros from sys/netinet/in.h making it difficult to actually alter the values in use by these macros. Correct that by replacing handcrafted code with proper macro usage. Reviewed by: karels, kristof Approved by: bde (mentor) MFC after: 3 weeks Sponsored by: John Gilmore Differential Revision: https://reviews.freebsd.org/D19317
# 756a5412	14-Jan-2019	Gleb Smirnoff <glebius@FreeBSD.org>	Allocate pager bufs from UMA instead of 80-ish mutex protected linked list. o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho
# 53011553	21-Nov-2018	Mateusz Guzik <mjg@FreeBSD.org>	proc: convert pfind & friends to use pidhash locks and other cleanup pfind_locked is retired as it relied on allproc which unnecessarily restricts locking of the hash. Sponsored by: The FreeBSD Foundation
# c338c94d	14-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Move four functions in nfscl.ko to nfscommon.ko. Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom() and nfsmnt_mdssession() are now called from within the nfsd. As such, they needed to be moved from nfscl.ko to nfscommon.ko so that nfsd.ko would load when nfscl.ko wasn't loaded. Reported by: herbert@gojira.at
# 90d2dfab	12-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes
# cb0d9834	20-Apr-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix use of pointer after being set NULL. Using a pointer after setting it NULL is probably not a good plan. Spotted by inspection during changes for Flexible File Layout Ioerr handling. This code path obviously isn't normally executed. MFC after: 1 week
# 222daa42	25-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# e5cffdd3	20-Aug-2017	Konstantin Belousov <kib@FreeBSD.org>	Do not drop NFS vnode lock when performing consistency checks. Currently several paths in the NFS client upgrade the shared vnode lock to exclusive, which might cause temporal dropping of the lock. This action appears to be fatal for nullfs mounts over NFS. If the operation is performed over nullfs vnode, then bypassed down to NFS VOP, and the lock is dropped, other thread might reclaim the upper nullfs vnode. Since on reclaim the nullfs vnode lock and NFS vnode lock are split, the original lock state of the nullfs vnode is not restored. As result, VFS operations receive not locked vnode after a VOP call. Stop upgrading the vnode lock when we check the consistency or flush buffers as result of detected inconsistency. Instead, allocate a new lockmgr lock for each NFS node, which is locked exclusive instead of the vnode lock upgrade. In other words, the other parallel modification of the vnode are excluded by either vnode lock conflict or exclusivity of the new lock when the vnode lock is shared. Also revert r316529 because now the vnode cannot be reclaimed during ncl_vinvalbuf(). In collaboration with: pho Reviewed by: rmacklem Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12083
# 47cbff34	29-Jul-2017	Rick Macklem <rmacklem@FreeBSD.org>	Add kernel support for the NFS client forced dismount "umount -N" option. When an NFS mount is hung against an unresponsive NFS server, the "umount -f" option can be used to dismount the mount. Unfortunately, "umount -f" gets hung as well if a "umount" without "-f" has already been done. Usually, this is because of a vnode lock being held by the "umount" for the mounted-on vnode. This patch adds kernel code so that a new "-N" option can be added to "umount", allowing it to avoid getting hung for this case. It adds two flags. One indicates that a forced dismount is about to happen and the other is used, along with setting mnt_data == NULL, to handshake with the nfs_unmount() VFS call. It includes a slight change to the interface used between the client and common NFS modules, so I bumped __FreeBSD_version to ensure both modules are rebuilt. Tested by: pho Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11735
# 16f300fa	27-Jul-2017	Rick Macklem <rmacklem@FreeBSD.org>	Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing. This patch defines a macro that checks for MNTK_UNMOUNTF and replaces explicit checks with this macro. It has no effect on semantics, but prepares the code for a future patch where there will also be a NFS specific flag for "forced dismount about to occur". Suggested by: kib MFC after: 2 weeks
# ad6eb976	28-Jun-2017	Rick Macklem <rmacklem@FreeBSD.org>	Fix an NFSv3 client case that probably never happens. If an NFSv3 server were to reply with weak cache consistency attributes, but not post operation attributes, the client would use garbage attributes from memory. This was spotted during work on the code for the NFSv4.1 client. I have never seen evidence that this happens and it wouldn't make sense for an NFSv3 server to do this, so this patch is basically "theoretical", but does fix the problem if a server were to do the above. PR: 219552 MFC after: 2 weeks
# a8e7f543	30-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Fix bug in r318997: remove the line which overrides vn_fsid() calculation. Noted by: jhb Reviewed by: rmacklem Sponsored by: The FreeBSD Foundation
# 03311f11	27-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Use whole mnt_stat.f_fsid bits for st_dev. Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use f_fsid. In particular, NFSv3 and sometimes NFSv4, and ZFS use this method or reporting st_dev by stat(2). Provide a new helper vn_fsid() to avoid duplicating code to copy f_fsid to va_fsid. Note that the change is mostly cosmetic. Its motivation is to avoid sign-extension of f_fsid[0] into 64bit dev_t value which happens after dev_t becomes 64bit.. Reviewed by: avg(zfs), rmacklem (nfs) (both for previous version) Sponsored by: The FreeBSD Foundation
# 037a2012	13-Apr-2017	Rick Macklem <rmacklem@FreeBSD.org>	Add an NFSv4.1 mount option for "use one openowner". Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number of open_owner4s. This patch adds a mount option called "oneopenown" that can be used for NFSv4.1 mounts to make the client do all Opens with the same open_owner4 string. This option can only be used with NFSv4.1 and may not work correctly when Delegations are is use. Reported by: cperciva Tested by: cperciva MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8988
# 4e7dcfab	09-Apr-2017	Rick Macklem <rmacklem@FreeBSD.org>	Fix the NFSv4 client hndling of a stale write verifier in the Commit operation. When the NFSv4 client Commit operation encountered a stale write verifier, it erroneously mapped that to EIO. This could have caused recently written data to be lost when a server crashes/reboots between an UNSTABLE write and the subsequent commit. This patch fixes this. The bug was only for the NFSv4 client and did not affect NFSv3. Tested by: cperciva PR: 215887 MFC after: 2 weeks
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# b2fc0141	23-Dec-2016	Rick Macklem <rmacklem@FreeBSD.org>	Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure that indicates that the server has lost session/open/lock state. However, recent testing by cperciva@ against the AmazonEFS server found several problems with client recovery from this due to it generating this failure frequently. Briefly, the problems fixed are: - If all session slots were in use at the time of the failure, some processes would continue to loop waiting for a slot on the old session forever. - If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION, it would fail the RPC/syscall instead of initiating recovery and then looping to retry the RPC. - If a successful reply to an RPC for an old session wasn't processed until after a new session was created for a NFS4ERR_BAD_SESSION error, it would erroneously update the new session and corrupt it. - The use of the first element of the session list in the nfs mount structure (which is always the current metadata session) was slightly racey. With changes for the above problems it became more racey, so all uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT(). - Although the kernel malloc() usually allocates more bytes than requested and, as such, this wouldn't have caused problems, the allocation of a session structure was 1 byte smaller than it should have been. (Null termination byte for the string not included in byte count.) There are probably still problems with a pNFS data server that fails with NFS4ERR_BAD_SESSION, but I have no server that does this to test against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet. Although this patch is fairly large, it should only affect the handling of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server. Thanks go to cperciva@ for the extension testing he did to help isolate/fix these problems. Reported by: cperciva Tested by: cperciva MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D8745
# b6a60ae7	11-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Use vfs_hash_ref(9) to eliminate LK_EXCLOTHER kludge. As a consequence, the nfs client override of VOP_LOCK1() is no longer needed. Reviewed and tested by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 74b8d63d	10-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.
# 86767049	15-Jan-2016	Bjoern A. Zeeb <bz@FreeBSD.org>	Unbreak NOIP builds after r294084.
# d3bf8f64	15-Jan-2016	Alexander V. Chernikov <melifaro@FreeBSD.org>	Make nfscl_getmyip() use new routing KPI. * Use standard IPv6 SAS instead of rt->rt_ifa address. * Make address lookup work for IPv6 LLA. * Save address into buffer provided by caller instead of using static vars. Discussed with: rmacklem
# 43a993bb	29-Nov-2015	Kirk McKusick <mckusick@FreeBSD.org>	For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm
# b5af3f30	05-Aug-2015	Conrad Meyer <cem@FreeBSD.org>	nfsclient: Protest loudly when GETATTR responses are invalid BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests with attributes for totally different files). Warn very verbosely when this is detected. Linux' NFS client has a similar warning. Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity of warnings; default to 10. (Zero disables; -1 is unlimited.) Adds a failpoint to aid in validating the warning / behavior with a non-broken server. Use something like: sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)' Reviewed by: rmacklem Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3304
# 07d491de	28-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	r245508 modified the NFS client's Setattr RPC to use VA_UTIMES_NULL to indicate whether it should set the time to the current tod on the server. This had the side effect of making the NFS client use the client's timestamp for exclusive create, starting with FreeBSD9.2. Unfortunately a bug in some Solaris NFS servers causes these servers to return NFS_OK to the Setattr RPC done during exclusive create, but not actually set the file's mode, leaving the file's mode == 0. This patch restores the NFS client's behaviour to use the server's tod for the exclusive open's Setattr RPC, to avoid the Solaris server bug and to restore the pre-FreeBSD9.2 NFS behaviour. Discussed on: freebsd-fs PR: 186293 MFC after: 3 months
# c15882f0	22-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	Remove the old NFS client and server from head, which means that the NFSCLIENT and NFSSERVER kernel options will no longer work. This commit only removes the kernel components. Removal of unused code in the user utilities will be done later. This commit does not include an addition to UPDATING, but that will be committed in a few minutes. Discussed on: freebsd-fs
# 1f44e7c0	06-Nov-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Convert nfsclient SAS to use new routing API. Btw original code should be rewritten not to use static variables.
# 4a144410	16-Mar-2014	Robert Watson <rwatson@FreeBSD.org>	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# 7008be5b	04-Sep-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
# a820822e	02-Jul-2013	Rick Macklem <rmacklem@FreeBSD.org>	A problem with the old NFS client where large writes to large files would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 (actually pre-r248567, r248581, r248567 for the new client) such that the call to vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption. A better solution might be to replace the mutex with a sleep lock, but that is a non-trivial conversion, so this fix is hoped to be sufficient in the meantime. Reported by: David G. Lawrence (dg@dglawrence.com) Tested by: David G. Lawrence (to be done soon) Reviewed by: kib MFC after: 1 week
# 734b03c3	28-May-2013	Rick Macklem <rmacklem@FreeBSD.org>	Post-r248567, there were times when the client would return a truncated directory for some NFS servers. This turned out to be because the size of a directory reported by an NFS server can be smaller that the ufs-like directory created from the RPC XDR in the client. This patch fixes the problem by changing r248567 so that vnode_pager_setsize() is only done for regular files. Reported and tested by: hartmut.brandt@dlr.de Reviewed by: kib MFC after: 1 week
# 4d569af9	20-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	Initialize the variable to avoid (false) compiler warning about use of an uninitialized local. Reported by: Ivan Klymenko <fidaj@ukr.net> MFC after: 2 weeks
# 7157d8f7	21-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	Do not call vnode_pager_setsize() while a NFS node mutex is locked. vnode_pager_setsize() might sleep waiting for the page after EOF be unbusied. Call vnode_pager_setsize() both for the regular and directory vnodes. Reported by: mich Reviewed by: rmacklem Discussed with: avg, jhb MFC after: 2 weeks
# 2609222a	01-Mar-2013	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
# 39804bc8	17-Jan-2013	John Baldwin <jhb@FreeBSD.org>	Remove a no-longer-used variable after the previous change to use VA_UTIMES_NULL. Submitted by: bde, rmacklem MFC after: 1 week
# 5055536e	16-Jan-2013	John Baldwin <jhb@FreeBSD.org>	Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. Reviewed by: rmacklem MFC after: 1 week
# 1f60bfd8	08-Dec-2012	Rick Macklem <rmacklem@FreeBSD.org>	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
# 99d2727d	01-Dec-2012	Rick Macklem <rmacklem@FreeBSD.org>	Add an nfssvc() option to the kernel for the new NFS client which dumps out the actual options being used by an NFS mount. This will be used to implement a "-m" option for nfsstat(1). Reviewed by: alfred MFC after: 2 weeks
# c6e0355c	19-Nov-2012	Attilio Rao <attilio@FreeBSD.org>	r16312 is not any longer real since many years (likely since when VFS received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
# 134eb42e	16-Nov-2012	Konstantin Belousov <kib@FreeBSD.org>	In pget(9), if PGET_NOTWEXIT flag is not specified, also search the zombie list for the pid. This allows several kern.proc sysctls to report useful information for zombies. Hold the allproc_lock around all searches instead of relocking it. Remove private pfind_locked() from the new nfs client code. Requested and reviewed by: pjd Tested by: pho MFC after: 3 weeks
# 81d5d46b	03-Feb-2012	Bjoern A. Zeeb <bz@FreeBSD.org>	Add multi-FIB IPv6 support to the core network stack supplementing the original IPv4 implementation from r178888: - Use RT_DEFAULT_FIB in the IPv4 implementation where noticed. - Use rtfib() KPI with explicit RT_DEFAULT_FIB where applicable in the NFS code. - Use the new in6_rt KPI in TCP, gif(4), and the IPv6 network stack where applicable. - Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally prevent multiple initializations of callouts in in6_inithead(). - Use wrapper functions where needed to preserve the current KPI to ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup. - Fix (related) comments (both technical or style). - Convert to rtinit() where applicable and only use custom loops where currently not possible otherwise. - Multicast group, most neighbor discovery address actions and faith(4) are locked to the default FIB. Individual IPv6 addresses will only appear in the default FIB, however redirect information and prefixes of connected subnets are automatically propagated to all FIBs by default (mimicking IPv4 behavior as closely as possible). Sponsored by: Cisco Systems, Inc.
# 7f763fc3	26-Jan-2012	Rick Macklem <rmacklem@FreeBSD.org>	A problem with respect to data read through the buffer cache for both NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes. Reported by: pjd Reviewed by: kib MFC after: 2 weeks
# f7258644	07-Jan-2012	Rick Macklem <rmacklem@FreeBSD.org>	opt_inet6.h was missing from some files in the new NFS subsystem. The effect of this was, for clients mounted via inet6 addresses, that the DRC cache would never have a hit in the server. It also broke NFSv4 callbacks when an inet6 address was the only one available in the client. This patch fixes the above, plus deletes opt_inet6.h from a couple of files it is not needed for. MFC after: 2 weeks
# a9d2f8d8	10-Aug-2011	Robert Watson <rwatson@FreeBSD.org>	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
# a9989634	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 98f234f3	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 4875024b	28-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the new NFSv4 client so that it doesn't fill the cached mode attribute in as 0 when doing writes. The change adds the Mode attribute plus the others except Owner and Owner_group to the list requested by the NFSv4 Write Operation. This fixed a problem where an executable file built by "cc" would get mode 0111 instead of 0755 for some NFSv4 servers. Found at the recent NFSv4 interoperability Bakeathon. Tested by: tdh at excfb.com MFC after: 2 weeks
# 8f0e65c9	18-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add DTrace support to the new NFS client. This is essentially cloned from the old NFS client, plus additions for NFSv4. A review of this code is in progress, however it was felt by the reviewer that it could go in now, before code slush. Any changes required by the review can be committed as bug fixes later.
# fb35711d	05-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add support for flock(2) locks to the new NFSv4 client. I think this should be ok, since the client now delays NFSv4 Close operations until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no risk that the NFSv4 Open is closed while an associated byte range lock still exists. Tested by: avg MFC after: 2 weeks
# f8f4e256	05-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	The new NFSv4 client was erroneously using "p" instead of "p_leader" for the "id" for POSIX byte range locking. I think this would only have affected processes created by rfork(2) with the RFTHREAD flag specified. This patch fixes that by passing the "id" down through the various functions from nfs_advlock(). MFC after: 2 weeks
# 2301f58f	05-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the new NFSv4 client so that it doesn't crash when a mount is done for a VIMAGE kernel. Tested by: glz at hidden-powers dot com Reviewed by: bz MFC after: 2 weeks
# f96712c2	04-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the new NFS client so that it handles the 64bit fields that are now in "struct statfs" for NFSv3 and NFSv4. Since the ffiles value is uint64_t on the wire, I clip the value to INT64_MAX to avoid setting f_ffree negative. Tested by: kib MFC after: 2 weeks
# 920ae5d9	20-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Revert r220906, since the vp isn't always locked when nfscl_request() is called. It will need a more involved patch.
# 69bcf845	20-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add a check for VI_DOOMED at the beginning of nfscl_request() so that it won't try and use vp->v_mount to do an RPC during a forced dismount. There needs to be at least one more kernel commit, plus a change to the umount(8) command before forced dismounts will work for the experimental NFS client. MFC after: 2 weeks
# 4b3a38ec	16-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add a lktype flags argument to nfscl_nget() and ncl_nget() in the experimental NFS client so that its nfs_lookup() function can use cn_lkflags in a manner analagous to the regular NFS client. MFC after: 2 weeks
# a09001a8	14-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks
# 07c0c166	14-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks
# 3707cf89	13-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the experimental NFSv4 client so that it recognizes server mount point crossings correctly. It was testing the wrong flag. Also, try harder to make sure that the fsid is different than the one assigned to the client mount point, by hashing the server's fsid (just to create a different value deterministically) when it is the same. MFC after: 2 weeks
# 8e6fa660	24-Mar-2011	John Baldwin <jhb@FreeBSD.org>	Fix some locking nits with the p_state field of struct proc: - Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL in fork to honor the locking requirements. While here, expand the scope of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously the code was locking the new child process (p2) after it had locked the parent process (p1). However, when locking two processes, the safe order is to lock the child first, then the parent. - Fix various places that were checking p_state against PRS_NEW without having the process locked to use PROC_LOCK(). Every place was already locking the process, just after the PRS_NEW check. - Remove or reduce the use of PROC_SLOCK() for places that were checking p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading the current state. - Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once. MFC after: 1 week
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 3634d5b2	20-Aug-2010	John Baldwin <jhb@FreeBSD.org>	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
# f92bbff2	24-Jul-2010	Rick Macklem <rmacklem@FreeBSD.org>	Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate module that can be used by both the regular and experimental nfs clients. This fixes the problem reported by jh@ where /dev/nfslock would be registered twice when both nfs clients were used. I also defined the size of the lm_fh field to be the correct value, as it should be the maximum size of an NFSv3 file handle. Reviewed by: jh MFC after: 2 weeks
# 3c497fac	15-Jul-2010	John Baldwin <jhb@FreeBSD.org>	Retire the NFS access cache timestamp structure. It was used in VOP_OPEN() to avoid sending multiple ACCESS/GETATTR RPCs during a single open() between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be sent. MFC after: 2 weeks
# a8437c97	14-Jun-2010	Rick Macklem <rmacklem@FreeBSD.org>	Add MODULE_DEPEND() macros to the experimental NFS client and server so that the modules will load when kernels are built with none of the NFS* configuration options specified. I believe this resolves the problems reported by PR kern/144458 and the email on freebsd-stable@ posted by Dmitry Pryanishnikov on June 13. Tested by: kib PR: kern/144458 Reviewed by: kib MFC after: 1 week
# 4a9c979c	22-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r206688 The experimental NFS client was not filling in recovery credentials for opens done locally in the client when a delegation for the file was held. This could cause the client to crash in crsetgroups() when recovering from a server crash/reboot. This patch fills in the recovery credentials for this case, in order to avoid the client crash. Also, add KASSERT()s to the credential copy functions, to catch any other cases where the credentials aren't filled in correctly.
# 55909abf	15-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	The experimental NFS client was not filling in recovery credentials for opens done locally in the client when a delegation for the file was held. This could cause the client to crash in crsetgroups() when recovering from a server crash/reboot. This patch fills in the recovery credentials for this case, in order to avoid the client crash. Also, add KASSERT()s to the credential copy functions, to catch any other cases where the credentials aren't filled in correctly. MFC after: 1 week
# e15f9ffc	31-Dec-2009	Jaakko Heinonen <jh@FreeBSD.org>	MFC r198291: Unloading of the nfscl module is unsupported because newnfslock doesn't support unloading. It's not trivial to implement newnfslock unloading so for now just admit that unloading is unsupported and refuse to attempt unload in all nfscl module event handlers. Approved by: trasz (mentor)
# 44aa9b63	31-Dec-2009	Jaakko Heinonen <jh@FreeBSD.org>	MFC r198290: Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must be called after ncl_uninit() when unloading the nfscl module because ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent(). Approved by: trasz (mentor)
# dd697cbf	20-Oct-2009	Jaakko Heinonen <jh@FreeBSD.org>	Unloading of the nfscl module is unsupported because newnfslock doesn't support unloading. It's not trivial to implement newnfslock unloading so for now just admit that unloading is unsupported and refuse to attempt unload in all nfscl module event handlers. Reviewed by: rmacklem Approved by: trasz (mentor)
# 349af4de	20-Oct-2009	Jaakko Heinonen <jh@FreeBSD.org>	Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must be called after ncl_uninit() when unloading the nfscl module because ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent(). Reviewed by: rmacklem Approved by: trasz (mentor)
# 8760cb9a	14-Sep-2009	Rick Macklem <rmacklem@FreeBSD.org>	MFC r197048: Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs vnodes, since these nodes are not linked into the mount queue and, as such, the vn_lock() cannot cause a deadlock so LORs are harmless. Suggested by: kib Approved by: re (kensmith), kib (mentor)
# 8f63187e	09-Sep-2009	Rick Macklem <rmacklem@FreeBSD.org>	Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs vnodes, since these nodes are not linked into the mount queue and, as such, the vn_lock() cannot cause a deadlock so LORs are harmless. Suggested by: kib Approved by: kib (mentor) MFC after: 3 days
# 65cc6600	20-Jun-2009	Rick Macklem <rmacklem@FreeBSD.org>	Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build. Approved by: kib (mentor)
# 2c1e6cce	19-Jun-2009	Rick Macklem <rmacklem@FreeBSD.org>	Change the size of the nfsc_groups[] array in the experimental nfs client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on the wire for AUTH_SYS authentication. Reviewed by: brooks Approved by: kib (mentor)
# 838d9858	19-Jun-2009	Brooks Davis <brooks@FreeBSD.org>	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
# 9dd7554f	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fix the name of the module common to the client and server in the experimental nfs subsystem to the correct one for the MODULE_DEPEND() macro. Approved by: kib (mentor)
# 9ec7b004	04-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)