#
3f65000b |
|
04-May-2024 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix Link conformance with RFC8881 for delegations RFC8881 specifies that, when a Link operation occurs on an NFSv4, that file delegations issued to other clients must be recalled. Discovered during a recent discussion on nfsv4@ietf.org. Although I have not observed a problem caused by not doing the required delegation recall, it is definitely required by the RFC, so this patch makes the server do the recall. Tested during a recent NFSv4 IETF Bakeathon event. MFC after: 1 week
|
#
b068bb09 |
|
07-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vnode_pager_clean_{a,}sync(9) Bump __FreeBSD_version for ZFS use. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43356
|
#
fdafd315 |
|
24-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Automated cleanup of cdefs and other formatting Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/ Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/ Remove /\n+#if.*\n#endif.*\n+/ Remove /^#if.*\n#endif.*\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix
|
#
cd5edc7d |
|
17-Oct-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Avoid acquiring a vnode for some NFSv4 Readdir operations Without this patch, a NFSv4 Readdir operation acquires the vnode for each entry in the directory. If only the Type, Fileid, Mounted_on_fileid and ReaddirError attributes are requested by a client, acquiring the vnode is not necessary for non-directories. Directory vnodes must be acquired to check for server file system mount points. This patch avoids acquiring the vnode, as above, resulting in a 3-8% improvement in Readdir RPC RTT for some simple tests I did. Note that only non-rdirplus NFSv4 mounts will benefit from this change. Tested during a recent IETF NFSv4 Bakeathon testing event. MFC after: 1 month
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
ba8cc6d7 |
|
12-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use __enum_uint8 for vtype and vstate This whacks hackery around only reading v_type once. Bump __FreeBSD_version to 1400093
|
#
648a208e |
|
05-May-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix NFSv3 Readdir/ReaddirPlus reply for large i-node numbers If the i-node number (d_fileno) for a file on the server did not fit in 32bits, it would be truncated to the low order 32bits for the NFSv3 Readdir and ReaddirPlus RPC replies. This is no longer correct, given that ino_t is now 64bits. This patch fixes this by sending the full 64bits of d_fileno on the wire in the NFSv3 Readdir/ReaddirPlus RPC reply. PR: 271174 Reported by: bmueller@panasas.com Tested by: bmueller@panasas.com MFC after: 2 weeks
|
#
896516e5 |
|
16-Mar-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts Without this patch, a Kerberized NFSv4.1/4.2 mount must provide a Kerberos credential for the client at mount time. This credential is typically referred to as a "machine credential". It can be created one of two ways: - The user (usually root) has a valid TGT at the time the mount is done and this becomes the machine credential. There are two problems with this. 1 - The user doing the mount must have a valid TGT for a user principal at mount time. As such, the mount cannot be put in fstab(5) or similar. 2 - When the TGT expires, the mount breaks. - The client machine has a service principal in its default keytab file and this service principal (typically called a host-based initiator credential) is used as the machine credential. There are problems with this approach as well: 1 - There is a certain amount of administrative overhead creating the service principal for the NFS client, creating a keytab entry for this principal and then copying the keytab entry into the client's default keytab file via some secure means. 2 - The NFS client must have a fixed, well known, DNS name, since that FQDN is in the service principal name as the instance. This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which allows the state maintenance operations to be performed by any authentication mechanism, to do these operations via AUTH_SYS instead of RPCSEC_GSS (Kerberos). As such, neither of the above mechanisms is needed. It is hoped that this option will encourage adoption of Kerberized NFS mounts using TLS, to provide a more secure NFS mount. This new NFSv4.1/4.2 mount option, called "syskrb5" must be used with "sec=krb5[ip]" to avoid the need for either of the above Kerberos setups to be done by the client. Note that all file access/modification operations still require users on the NFS client to have a valid TGT recognized by the NFSv4.1/4.2 server. As such, this option allows, at most, a malicious client to do some sort of DOS attack. Although not required, use of "tls" with this new option is encouraged, since it provides on-the-wire encryption plus, optionally, client identity verification via a X.509 certificate provided to the server during TLS handshake. Alternately, "sec=krb5p" does provide on-the-wire encryption of file data. A mount_nfs(8) man page update will be done in a separate commit. Discussed on: freebsd-current@ MFC after: 3 months
|
#
10dff9da |
|
22-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Return ENXIO instead of EPERM when nfsd(8) already running The nfsd(8) daemon generates an error message that does not indicate that the nfsd daemon is already running when the nfssvc(2) syscall fails for the NFSSVC_STABLERESTART. Also, the check for running nfsd(8) in a vnet prison will return EPERM when it fails. This patch replaces EPERM with ENXIO so that the nfsd(8) daemon can generate more reasonable failure messages. The nfsd(8) daemon will be patched in a future commit. MFC after: 3 months
|
#
88175af8 |
|
21-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs_export: Add mnt_exjail to control exports done in prisons If there are multiple instances of mountd(8) (in different prisons), there will be confusion if they manipulate the exports of the same file system. This patch adds mnt_exjail to "struct mount" so that the credentials (and, therefore, the prison) that did the exports for that file system can be recorded. If another prison has already exported the file system, vfs_export() will fail with an error. If mnt_exjail == NULL, the file system has not been exported. mnt_exjail is checked by the NFS server, so that exports done from within a different prison will not be used. The patch also implements vfs_exjail_destroy(), which is called from prison_cleanup() to release all the mnt_exjail credential references, so that the prison can be removed. Mainly to avoid doing a scan of the mountlist for the case where there were no exports done from within the prison, a count of how many file systems have been exported from within the prison is kept in pr_exportcnt. Reviewed by: markj Discussed with: jamie Differential Revision: https://reviews.freebsd.org/D38371 MFC after: 3 months
|
#
ef6fcc5e |
|
20-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Add VNET_SYSUNINIT() macros for vnet cleanup Commit ed03776ca7f4 enabled the vnet front end macros. As such, for kernels built with the VIMAGE option will malloc data and initialize locks on a per-vnet basis, typically via a VNET_SYSINIT(). This patch adds VNET_SYSUNINIT() macros to do the frees of the per-vnet malloc'd data and destroys of per-vnet locks. It also removes the mtx_lock/mtx_unlock calls from nfsrvd_cleancache(), since they are not needed. Discussed with: bz, jamie MFC after: 3 months
|
#
ed03776c |
|
18-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Enable the NFSD_VNET vnet front end macros Several commits have added front end macros for the vnet macros to the NFS server, krpc and kgssapi. These macros are now null, but this patch changes them to front end the vnet macros. With this commit, many global variables in the code become vnet'd, so that nfsd(8), nfsuserd(8), rpc.tlsservd(8) and gssd(8) can run in a vnet prison, once enabled. To run the NFS server in a vnet prison still requires a couple of patches (in D37741 and D38371) that allow mountd(8) to export file systems from within a vnet prison. Once these are committed to main, a small patch to kern_jail.c allowing "allow.nfsd" without VNET_NFSD defined will allow the NFS server to run in a vnet prison. One area that still needs to be settled is cleanup when a prison is removed. Without this, everything should work except there will be a leak of malloc'd data and mutex locks when a vnet prison is removed. MFC after: 3 months
|
#
b039ca07 |
|
15-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Wrap nfsstatsv1_p in the NFSD_VNET() macro Commit 7344856e3a6d added a lot of macros that will front end vnet macros so that nfsd(8) can run in vnet prison. The nfsstatsv1_p variable got missed. This patch wraps all uses of nfsstatsv1_p with the NFSD_VNET() macro. The NFSD_VNET() macro is still a null macro. MFC after: 3 months
|
#
9d329bbc |
|
13-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Continue adding macros so nfsd can run in a vnet prison Commit 7344856e3a6d added a lot of macros that will front end vnet macros so that nfsd(8) can run in vnet prison. This patch adds some more of them and also a lot of uses of nfsstatsv1_p instead of nfsstatsv1. nfsstatsv1_p points to nfsstatsv1 for prison0, but will point to a malloc'd structure for other prisons. It also puts nfsstatsv1_p in nfscommon.ko instead of nfsd.ko. MFC after: 3 months
|
#
fcfdb76e |
|
12-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix initialization broken by 7344856e3a6d Oops, although the vneting macros do not do anything yet, commit 7344856e3a6d did change where things are initialized and one of the initialization functions was not being called early enough. This patch moves nfsrvd_init(0) to the function called via (VNET_)SYSINIT() to fix this. Reported by: olivier MFC after: 3 months
|
#
4d68605f |
|
11-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Delete nfsrv_prison_cleanup() until vneting enabled Oops, although the vneting macros do not do anything yet, commit 7344856e3a6d enabled the prison cleanup function, that would get called and crash the system when a jail was terminated. This patch gets rid of nfsrv_prison_cleanup() for now. It can go in when the vnet macros are enabled as front ends to the vnet macros. MFC after: 3 months
|
#
7e44856e |
|
11-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Prepare the NFS server code to run in a vnet prison This patch defines null macros that can be used to apply the vnet macros for global variables and SYSCTL flags. It also applies these macros to many of the global variables and some of the SYSCTLs. Since the macros do nothing, these changes should not result in semantics changes, although the changes are large in number. The patch does change several global variables that were arrays or structures to pointers to same. For these variables, modified initialization and cleanup code malloc's and free's the arrays/structures. This was done so that the vnet footprint would be about 300bytes when the macros are defined as vnet macros, allowing nfsd.ko to load dynamically. I believe the comments in D37519 have been addressed, although it has never been reviewed, due in part to the large size of the patch. This is the first of a series of patches that will put D37519 in main. Once everything is in main, the macros will be defined as front end macros to the vnet ones. MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D37519
|
#
5fd0916c |
|
11-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Add a KASSERT in nfsvno_open Commit ded5f2954e1a defined done_namei to indicate that nd_repstat was set after a successful nfsvno_namei(), so that a cleanup needs to be done in nfsvno_open(). This only happens when nfsvno_namei() is done with CREATE. This patch adds a KASSERT() to check for that. PR: 268971
|
#
3e230e0c |
|
10-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix handling of the error case for nfsvno_open some more Commit ded5f2954e1a defined done_namei to indicate that nd_repstat was set after a successful nfsvno_namei(), so that a cleanup needs to be done in nfsvno_open(). However, it missed the case where a call to nfsrv_opencheck() in nfsvno_open() sets nd_repstat non-zero. This would cause panics due to a dangling locked vnode when nfsrv_opencheck() set nd_repstat, such as during grace just after a server boot. This patch fixes the problem. PR: 268971
|
#
ded5f295 |
|
08-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix handling of the error case for nfsvno_open Using done_namei instead of ni_startdir did not fix the crashes reported in the PR. Upon looking more closely at the code, the only case where the code near the end of nfsvno_open() needs to be executed is when nfsvno_namei() has succeeded, but a subsequent error was detected. This patch uses done_namei to indicate this case. Also, nfsvno_relpathbuf() should only be called for this case and not whenever nfsvno_open() is called with nd_repstat != 0. A bug was introduced here when the HASBUF flag was deleted. Reviewed by: mjg PR: 268971 Tested by: ish@amail.plala.or.jp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D38430
|
#
dcfa3ee4 |
|
12-Jan-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsserver: Fix vrele() panic in nfsvno_open() Commit 65127e982b94 removed a check for ni_startdir != NULL. This allowed the vrele(ndp->ni_dvp) to be called with a NULL argument. This patch adds a new boolean argument to nfsvno_open() that can be checked instead of ni_startdir, since mjg@ requested that ni_startdir not be used. (Discussed in PR#268828.) PR: 268828 Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D38032
|
#
6fd6a0e3 |
|
23-Dec-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Handle file systems without a VOP_VPTOFH() Unlike NFSv3, the NFSv4 server follows mount points within the file system tree below the NFSv4 root directory. If there is a file system mounted within this subtree that returns EOPNOTSUPP for VOP_VPTOFH(), the NFSv4 server would return an error for the mount point entry. This resulted in an "I/O error" report from the Linux NFSv4 client. It also put an error code in the Readdir reply that is not defined in the NFSv4 RFCs. For the FreeBSD NFSv4 client, the entry with the error would be ignored, which I think is reasonable behaviour for a mounted file system that can never be exported. This patch changes the NFSv4 server behaviour to ignore the mount point entry and not send it in the Readdir reply. It also changes the behaviour of Lookup for the entry so that it replies ENOENT for the mount point directory, so that it is consistent with no entry in the Readdir reply. With these two changes, the Linux client behaviour is the same as the FreeBSD client behaviour. It also avoids putting an unknown error on the wire to the client. MFC after: 1 week
|
#
65127e98 |
|
09-Nov-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
nfs: stop using SAVESTART Only the name is wanted which is already always provided. Reviewed by: rmacklem Tested by: pho, rmacklem Differential Revision: https://reviews.freebsd.org/D34470
|
#
ae781657 |
|
18-Oct-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Make the pNFS server update Change for Setxattr/Rmxattr When the NFS server does the Setxattr or Rmxattr operation, the Change attribute (va_filerev) needs to be updated. Without this patch, that was not happening for the pNFS server configuration. This patch does a Setattr against the DS file to make the Change attribute change. This bug was discovered during a recent IETF NFSv4 testing event, where the Change attribute wasn't changed in the operation reply. MFC after: 1 month
|
#
8cee2eba |
|
16-Oct-2022 |
Cy Schubert <cy@FreeBSD.org> |
Revert "unbound: Vendor import 1.17.0" This reverts commit 64d318ea98b7c59f5567d47a9a8474887d8b5cb8, reversing changes made to 8063dc03202fad7d6bdf34976bc8556fa3f23fa1. Revert a mismerge which reversed 8063dc03202fad7d6bdf34976bc8556fa3f23fa1.
|
#
8063dc03 |
|
16-Oct-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Make Setxattr/Removexattr NFSv4.2 ops IO_SYNC When the NFS server does Setxattr or Removexattr, the operations must be done IO_SYNC. If a server crashes/reboots immediately after replying it must have the extended attribute changes. Since UFS does extended attributes asynchronously by default and there is no "ioflag" argument in the VOP calls, follow the VOP calls with VOP_FSYNC(), to ensure the operation has been done synchronously. This was found by inspection while investigating a bug discovered during a recent IETF NFSv4 testing event, where the Change attribute wasn't changed in the operation reply. This bug will take further work for ZFS and the pNFS server configuration, but is now fixed for a non-pNFS UFS exported file system. MFC after: 1 month
|
#
5b5b7e2c |
|
17-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: always retain path buffer after lookup This removes some of the complexity needed to maintain HASBUF and allows for removing injecting SAVENAME by filesystems. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D36542
|
#
2b766d5e |
|
08-Jul-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfscl: Change the cred argument to non-NULL for pNFS proxies Commit 326bcf9394c7 added a "cred" argument to nfscl_reqstart(). For the pNFS proxy calls on the server, the argument should be "cred" instead of NULL. This patch fixes this. Since the argument is not yet used, this patch should not result in a semantics change. PR: 260011 MFC after: 2 weeks
|
#
326bcf93 |
|
08-Jul-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfscl: Add a cred argument to nfscl_reqstart() To deal with broken session slots caused by the use of the "soft" and/or "intr" mount options, nfsv4_sequencelookup() will be modified to track the potentially broken session slots. Then, when all session slots are potentially broken, do a DeleteSession operation, so that the NFSv4 server will reply NFSERR_BADSESSION to uses of the session. These changes will be done in future commits. However, to do the DeleteSession RPC, a "cred" argument is needed for nfscl_reqstart(). This patch adds this argument, which is unused at this time. If the argument is NULL, it indicates that DeleteSession should not be done (usually because the RPC does not use sessions). This patch should not cause any semantics change. PR: 260011 MFC after: 2 weeks
|
#
f32bf50d |
|
04-May-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix handling of Open/Create for the pNFS server When the MDS of a pNFS service receives an Open/Create and the file already exists, it must do a Setattr of size == 0. Without this patch, this was eroneously done via a VOP_SETAATR() call, which would set the length of the MDS file to 0 (which is already is, since all data lives on the DSs). This patch fixes the problem by doing a nfsvno_setattr() instead of VOP_SETATTR(), which knows to do a proxied Setattr on the DSs. For a non-pNFS server, the change has no effect, since nfsvno_setattr() only does a VOP_SETATTR() for that case. This was found during a recent IETF NFSv4 testing event. MFC after: 2 weeks
|
#
0134bbe5 |
|
13-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: prefix lookup and relookup with vfs_ Reviewed by: imp, mckusick Differential Revision: https://reviews.freebsd.org/D34530
|
#
3fc3fe90 |
|
09-Mar-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Do not exempt NFSv3 Fsinfo from the TLS check The Fsinfo RPC is exempt from the check for Kerberized NFS being required, as recommended by RFC2623. However, there is no reason to exempt Fsinfo from the requirement to use TLS. This patch fixes the code so that the exemption only applies to Kerberized NFS and not NFS-over-TLS. This only affects NFS-over-TLS for an NFSv3 mount when it is required, but the client does not do so. MFC after: 1 month
|
#
a91a5784 |
|
11-Jan-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Do not accept audit/alarm ACEs for the NFSv4 server The UFS and ZFS file systems only support Allow/Deny ACEs in the NFSv4 ACLs. This patch does not allow the server to parse Audit/Alarm ACEs. The NFSv4 client is still allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4 servers may use them. This patch should not have a significant effect, since the UFS and ZFS file systems will not handle these ACEs anyhow. It simply serves as an additional "safety belt" for the NFSv4 server. MFC after: 2 weeks
|
#
5da9b3b0 |
|
11-Jan-2022 |
Rick Macklem <rmacklem@FreeBSD.org> |
Revert "nfscommon: Add arguments for support of the dacl attribute" This reverts commit 0fa074b53e7c22157dcb41aaa25a33abc8118f26. I now see that the implementation of the "dacl" operation requires that the NFSv4 server to "automatic inheritance" and I do not plan on doing this. As such, this patch is harmless, but unneeded.
|
#
3455c738 |
|
09-Jan-2022 |
Alexander Motin <mav@FreeBSD.org> |
nfsd: Reduce callouts rate. Before this callouts were scheduled twice a seconds even if nfsd was never used. This reduces the rate to ~1Hz and only after nfsd first started. MFC after: 2 weeks
|
#
0fa074b5 |
|
26-Dec-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfscommon: Add arguments for support of the dacl attribute NFSv4.1/4.2 has an alternative to the acl attribute, called dacl, that includes support for the ACL_ENTRY_INHERITED flag, called NFSV4ACE_INHERITED in NFSv4. This patch adds a dacl argument to nfsrv_buildacl(), nfsrv_dissectacl() and nfsrv_dissectace(), so that they will handle NFSV4ACE_INHERITED when dacl == true. Since these functions are always called with dacl == false for this patch, semantics should not have changed. A future patch will add support for dacl. MFC after: 2 weeks
|
#
744c2dc7 |
|
23-Dec-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
rpc: Delete AUTH_NEEDS_TLS(_MUTUAL_HOST) auth_stat values I thought that these new auth_stat values had been agreed upon by the IETF NFSv4 working group, but that no longer is the case. As such, delete them and use AUTH_TOOWEAK instead. Leave the code that uses these new auth_stat values in the sources #ifdef notnow, in case they are defined in the future. MFC after: 1 week
|
#
b214fcce |
|
13-Dec-2021 |
Alan Somers <asomers@FreeBSD.org> |
Change VOP_READDIR's cookies argument to a **uint64_t The cookies argument is only used by the NFS server. NFSv2 defines the cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our VOP_READDIR, however, has always defined it as u_long, which is 32 bits on some architectures. Change it to 64 bits on all architectures. This doesn't matter for any in-tree file systems, but it matters for some FUSE file systems that use 64-bit directory cookies. PR: 260375 Reviewed by: rmacklem Differential Revision: https://reviews.freebsd.org/D33404
|
#
32fbc5d8 |
|
12-Dec-2021 |
Alan Somers <asomers@FreeBSD.org> |
nfs: don't truncate directory cookies to 32-bits in the NFS server In NFSv2, the directory cookie was 32-bits. NFSv3 widened it to 64-bits and SVN r22521 widened the corresponding argument in VOP_READDIR, but FreeBSD's NFS server continued to treat the cookies as 32-bits, and 0-extended to fill the field on the wire. Nobody ever noticed, because every in-tree file system generates cookies that fit comfortably within 32-bits. Also, have better type safety for txdr_hyper. Turn it into an inline function that type-checks its arguments. Prevents warnings about shift-count-overflow. PR: 260375 MFC after: 2 weeks Reviewed by: rmacklem Differential Revision: https://reviews.freebsd.org/D33404
|
#
638b90a1 |
|
28-Nov-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfs: Quiet a few "unused" warnings For most of these warnings, the variable is loaded with data parsed out of an RPC messages. In case the data is useful in the future, I just marked these with __unused.
|
#
7e1d3eef |
|
25-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
|
#
f8dc0630 |
|
08-Nov-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix the NFSv4.2 pNFS MDS server for NFSERR_NOSPC via LayoutError If a pNFS server's DS runs out of disk space, it replies NFSERR_NOSPC to the client doing writing. For the Linux client, it then sends a LayoutError RPC to the MDS server to tell it about the error and keeps retrying, doing repeated LayoutGets to the MDS and Write RPCs to the DS. The Linux client is "stuck" until disk space on the DS is free'd up unless a subsequent LayoutGet request is sent a NFSERR_NOSPC reply. The looping problem still occurs for NFSv4.1 mounts, but no fix for this is known at this time. This patch changes the pNFS MDS server to reply to LayoutGet operations with NFSERR_NOSPC once a LayoutError reports the problem, until the DS has available space. This keeps the Linux NFSv4.2 from looping. Found during recent testing because of issues w.r.t. a DS being out of space found during a recent IEFT NFSv4 working group testing event. MFC after: 2 weeks
|
#
f0c9847a |
|
06-Nov-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE When the NFSv4.2 server does a VOP_ALLOCATE(), it needs the operation to be done for the RPC's credential and not td_ucred. It also needs the writing to be done synchronously. This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE() and modifies vop_stdallocate() to use these arguments. The VOP_ALLOCATE.9 man page will be patched separately. Reviewed by: khng, kib Differential Revision: https://reviews.freebsd.org/D32865
|
#
b4a58fbf |
|
01-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove cn_thread It is always curthread. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32453
|
#
93a32050 |
|
02-Oct-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix pNFS handling of Deallocate For a pNFS server configuration, an NFSv4.2 Deallocate operation is proxied to the DS(s). The code that parsed the reply for the proxy RPC is broken and did not process the pre-operation attributes. This patch fixes this problem. This bug would only affect pNFS servers built from recent main/FreeBSD14 sources.
|
#
ef7d2c1f |
|
01-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
nfs: eliminate thread argument from nfsvno_namei This is a step towards retiring struct componentname cn_thread Reviewed by: rmacklem Differential Revision: https://reviews.freebsd.org/D32267
|
#
13914e51 |
|
29-Aug-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Make loop calling VOP_ALLOCATE() iterate until done The NFSv4.2 Deallocate operation loops on VOP_DEALLOCATE() while progress is being made (remaining length decreasing). This patch changes the loop on VOP_ALLOCATE() for the NFSv4.2 Allocate operation do the same, instead of stopping after an arbitrary 20 iterations. MFC after: 2 weeks
|
#
bb958dcf |
|
26-Aug-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Add support for the NFSv4.2 Deallocate operation The recently added VOP_DEALLOCATE(9) VOP call allows implementation of the Deallocate NFSv4.2 operation. Since the Deallocate operation is a single succeed/fail operation, the call to VOP_DEALLOCATE(9) loops so long as progress is being made. It calls maybe_yield() between loop iterations to allow other processes to preempt it. Where RFC 7862 underspecifies behaviour, the code is written to be Linux NFSv4.2 server compatible. Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D31624
|
#
ee29e6f3 |
|
16-Jul-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Add sysctl to set maximum I/O size up to 1Mbyte Since MAXPHYS now allows the FreeBSD NFS client to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio so that the maximum NFS server I/O size can be set up to 1Mbyte. The Linux NFS client can also do 1Mbyte I/O operations. The default of 128Kbytes for the maximum I/O size has not been changed for two reasons: - kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O - The limited benchmarking I can do actually shows a drop in I/O rate when the I/O size is above 256Kbytes. However, daveb@spectralogic.com reports seeing an increase in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client. Reviewed by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30826
|
#
a5df139e |
|
05-Jun-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: Fix when NFSERR_WRONGSEC may be replied to NFSv4 clients Commit d224f05fcfc1 pre-parsed the next operation number for the put file handle operations. This patch uses this next operation number, plus the type of the file handle being set by the put file handle operation, to implement the rules in RFC5661 Sec. 2.6 with respect to replying NFSERR_WRONGSEC. This patch also adds a check to see if NFSERR_WRONGSEC should be replied when about to perform Lookup, Lookupp or Open with a file name component, so that the NFSERR_WRONGSEC reply is done for these operations, as required by RFC5661 Sec. 2.6. This patch does not have any practical effect for the FreeBSD NFSv4 client and I believe that the same is true for the Linux client, since NFSERR_WRONGSEC is considered a fatal error at this time. MFC after: 2 weeks
|
#
4a21bcb2 |
|
24-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
nfsserver: use VOP_VPUT_PAIR(). Apply VOP_VPUT_PAIR() to the end of vnode operations after the VOP_MKNOD(), VOP_MKDIR(), VOP_LINK(), VOP_SYMLINK(), VOP_CREATE(). Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
|
#
6b3a9a0f |
|
11-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
Convert remaining cap_rights_init users to cap_rights_init_one semantic patch: @@ expression rights, r; @@ - cap_rights_init(&rights, r) + cap_rights_init_one(&rights, r)
|
#
148a227b |
|
10-Jan-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: add KASSERTs to nfsm_trimtrailing() for M_EXTPG mbufs Add KASSERTS to nfsm_trimtrailing() to confirm the sanity of the arguments for the M_EXTPG case. Suggested by: kib Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28053
|
#
51a9b978 |
|
01-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
nfs server: improve use of the VFS KPI In particular, do not assume that vn_start_write() returns the same mp as it was passed in, or never returns error. Also be more accurate to return NULL vp and mp when error occured, to catch wrong control flow easier. Stop checking for NULL mp before calling vn_finished_write(), NULL mp is handled transparently by the function. Reviewed by: rmacklem Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27881
|
#
774a3685 |
|
01-Jan-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
nfsd: fix NFS server for ERELOOKUP r367672 modified UFS such that certain VOPs, such as VOP_CREATE() will intermittently return ERELOOKUP. When this happens, the entire system call, or NFS operation in the case of the NFS server, must be redone. This patch adds that support to the NFS server by rolling back the state of the NFS request arguments and NFS reply arguments mbuf lists to the condition they were in before the operation and then redoing the operation. Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D27875
|
#
586ee69f |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
fs: clean up empty lines in .c and .h files
|
#
6e4b6ff8 |
|
27-Aug-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add flags to enable NFS over TLS to the NFS client and server. An Internet Draft titled "Towards Remote Procedure Call Encryption By Default" (soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS as a specific application case. Various commits prepared the NFS code to use KERN_TLS, mainly enabling use of ext_pgs mbufs for large RPC messages. r364475 added TLS support to the kernel RPC. This commit (which is the final one for kernel changes required to do NFS over TLS) adds support for three export flags: MNT_EXTLS - Requires a TLS connection. MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid X.509 certificate during TLS handshake. MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a valid X.509 certificate with "user@domain" in the otherName field of the SubjectAltName during TLS handshake. Without these export options, clients are permitted, but not required, to use TLS. For the client, a new nmount(2) option called "tls" makes the client do a STARTTLS Null RPC and TLS handshake for all TCP connections used for the mount. The CLSET_TLS client control option is used to indicate to the kernel RPC that this should be done. Unless the above export flags or "tls" option is used, semantics should not change for the NFS client nor server. For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client } or rpctlssd(8) daemon { for server } must be running.
|
#
808306dd |
|
17-Aug-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete the unused "use_ext" argument to nfscl_reqstart(). This is a partial revert of r363210, since the "use_ext" argument added by that commit is not actually useful. This patch should not result in any semantics change.
|
#
cb889ce6 |
|
31-Jul-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add optional support for ext_pgs mbufs to the NFS server's read, readlink and getxattr operations. This patch optionally enables generation of read, readlink and getxattr replies in ext_pgs mbufs. Since neither of ND_EXTPG or ND_TLS are currently ever set, there is no change in semantics at this time. It also corrects the message in a couple of panic()s that should never occur. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.
|
#
ea83d07e |
|
29-Jul-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add support for ext_pgs mbufs to nfsrvd_readdir() and nfsrvd_readdirplus(). This patch code that optionally (based on ND_TLS, never set yet) generates readdir replies in ext_pgs mbufs. To trim the list back, a new function that is ext_pgs aware called nfsm_trimtrailing() replaces newnfs_trimtrailing(). newnfs_trimtrailing() is no longer used, but will be removed in a future commit, since its removal does modify the internal kpi between the NFS modules. This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.
|
#
2de592f6 |
|
26-Jul-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the NFS server so that it sets va_birthtime. r362490 marked that the NFSv4 attribute TimeCreate (va_birthtime) is supported, but it did not change the NFS server code to actually do it. As such, errors could occur when unrolling a tarball onto an NFSv4 mounted volume, since setting TimeCreate would fail with a NFSERR_ATTRNOTSUPP reply. This patch fixes the server so that it does TimeCreate and also makes sure that TimeCreate will not be set for a DS file for a pNFS server. A separate commit will add a check to the NFSv4 client for support of the TimeCreate attribute before attempting to set it, to avoid a problem when mounting a server that does not support the attribute. The failures will still occur for r362490 or later kernels that do not have this patch, since they indicate support for the attribute, but do not actually support the attribute.
|
#
18a48314 |
|
25-Jul-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add support for ext_pgs mbufs to nfsrv_adj(). This patch uses a slightly different algorithm for nfsrv_adj() since ext_pgs mbuf lists are not permitted to have m_len == 0 mbufs. As such, the code now frees mbufs after the adjustment in the list instead of setting their m_len field to 0. Since mbuf(s) may be trimmed off the tail of the list, the function now returns a pointer to the last mbuf in the list. This saves the caller from needing to use m_last() to find the last mbuf. It also implies that it might return a nul list, which required a check for that in nfsrvd_readlink(). This is another in the series of commits that add support to the NFS client and server for building RPC messages in ext_pgs mbufs with anonymous pages. This is useful so that the entire mbuf list does not need to be copied before calling sosend() when NFS over TLS is enabled. Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated to handle TLS.
|
#
4476c1de |
|
25-Jun-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs should be used. For KERN_TLS (and possibly some other future network interface) the mbuf list passed into sosend() must be ext_pgs mbufs. The krpc could simply copy all the mbuf data into ext_pgs mbufs before calling sosend(), but that would be inefficient for large RPC messages. This patch adds an argument to nfscl_reqstart() to indicate that it should fill the RPC message into ext_pgs mbufs. It also adds fields to "struct nfsrv_descript" needed for building NFS RPC messages in ext_pgs mbufs, along with new flags for this. Since the argument is always "false", this commit should not result in any semantic change. However, this commit prepares the code for future commits that will add support for building of NFS RPC messages in ext_pgs mbufs.
|
#
1f7104d7 |
|
13-Jun-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088
|
#
245bfd34 |
|
20-May-2020 |
Ryan Moeller <freqlabs@FreeBSD.org> |
Deduplicate fsid comparisons Comparing fsid_t objects requires internal knowledge of the fsid structure and yet this is duplicated across a number of places in the code. Simplify by creating a fsidcmp function (macro). Reviewed by: mjg, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24749
|
#
3d7650f0 |
|
17-May-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add a function nfsm_set() to initialize "struct nfsrv_descript" for building mbuf lists. This function is currently trivial, but will that will change when support for building NFS messages in ext_pgs mbufs is added. Adding support for ext_pgs mbufs is needed for KERN_TLS, which will be used to implement nfs-over-tls.
|
#
e4a458bb |
|
24-Apr-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Remove Mac OS/X macros that did nothing for FreeBSD. The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X port. The first of these macros was a no-op for FreeBSD and the second is no longer used. This patch gets rid of them. It also deletes the "mbuf_t" typedef which is no longer used in the FreeBSD code from nfskpiport.h This patch should not change semantics.
|
#
82164bdd |
|
16-Apr-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add a sanity check for nes_numsecflavor to the NFS server. Ryan Moeller reported crashes in the NFS server that appear to be caused by stack corruption in nfsrv_compound(). It appears that the stack got corrupted just after a NFSv4.1 Lookup that crosses a server mount point. Although it is just a "theory" at this point, the most obvious way the stack could get corrupted would be if nfsvno_checkexp() somehow acquires an export with a bogus nes_numsecflavor value. This would cause the copying of the secflavors to run off the end of the array, which is allocated on the stack below where the corruption occurs. This sanity check is simple to do and would stop the stack corruption if the theory is correct. Otherwise, doing the sanity check seems to be a reasonable safety belt to add to the code. Reported by: freqlabs MFC after: 2 weeks
|
#
fb8ed4c5 |
|
14-Apr-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the NFSv2 extended attribute support to handle 0 length attributes. I did not realize that zero length attributes are allowed, but they are. This patch fixes the NFSv4.2 client and server to handle zero length extended attributes correctly. Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version) Reported by: Frank van der Linden <fllinder@amazon.com>
|
#
9f6624d3 |
|
11-Apr-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change.
|
#
8de97f39 |
|
09-Apr-2020 |
Rick Macklem <rmacklem@FreeBSD.org> |
Remove the old NFS lock device driver that uses Giant. This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and has not normally been used since then. To use it, the kernel had to be built without "options NFSLOCKD" and the nfslockd.ko had to be deleted as well. Since it uses Giant and is no longer used, this patch removes it. With this device driver removed, there is now a lot of unused code in the userland rpc.lockd. That will be removed on a future commit. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22933
|
#
7029da5c |
|
26-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718
|
#
7493134e |
|
14-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
nfs: add missing CLTFLAG_MPSAFE annotations
|
#
cc3593fb |
|
12-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997
|
#
57083d25 |
|
12-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995
|
#
b249ce48 |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
|
#
f808cf72 |
|
13-Dec-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Silence some "might not be initialized" warnings for riscv64. None of these case were actually using the variable(s) uninitialized, but I figured that silencing the warnings via initializing them made sense. Some of these predated r355677.
|
#
bf6ac05a |
|
12-Dec-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add some more initializations to quiet riscv build. The one case in nfs_copy_file_range() was a legitimate case, although it would probably never occur in practice.
|
#
95bf2e52 |
|
12-Dec-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the build for MAC not defined and a couple of might not be initialized. r355677 broke the build for the not MAC defined case and a couple of might not be initialized warnings were generated for riscv. Others seem to be erroneous. Hopefully there won't be too many more build errors. Pointy hat goes on me.
|
#
c057a378 |
|
12-Dec-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add support for NFSv4.2 to the NFS client and server. This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes (RFC-8276) to the NFS client and server. NFSv4.2 is comprised of several optional features that can be supported in addition to NFSv4.1. This patch adds the following optional features: - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED) - posix_fallocate() - intra server file range copying via the copy_file_range(2) syscall --> Avoiding data tranfer over the wire to/from the NFS client. - lseek(SEEK_DATA/SEEK_HOLE) - Extended attribute syscalls for "user" namespace attributes as defined by RFC-8276. Although this patch is fairly large, it should not affect support for the other versions of NFS. However it does add two new sysctls that allow a sysadmin to limit which minor versions of NFSv4 a server supports, allowing a sysadmin to disable NFSv4.2. Unfortunately, when the NFS stats structure was last revised, it was assumed that there would be no additional operations added beyond what was specified in RFC-7862. However RFC-8276 did add additional operations, forcing the NFS stats structure to revised again. It now has extra unused entries in all arrays, so that future extensions to NFSv4.2 can be accomodated without revising this structure again. A future commit will update nfsstat(1) to report counts for the new NFSv4.2 specific operations/procedures. This patch affects the internal interface between the nfscommon, nfscl and nfsd modules and, as such, they all must be upgraded simultaneously. I will do a version bump (although arguably not needed), due to this. This code has survived a "make universe" but has not been built with a recent GCC. If you encounter build problems, please email me. Relnotes: yes
|
#
abd80ddb |
|
08-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
|
#
14eff785 |
|
21-Nov-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the pNFS server's reporting of SpaceUsed (va_bytes). The pNFS server currently reports SpaceUsed (va_bytes) for the metadata file. This in not correct, since the metadata file is always empty and, as such, va_bytes is just the allocation for the empty file. This patch adds va_bytes to the list of attributes acquired from the DS for a file, so that it includes the allocated data size and is updated when the file is written. For files created on a pNFS server before this patch is applied, the va_bytes value is estimated by rounding va_size up to a multiple of BLKDEV_IOSIZE. Once the file is written after this patch has been applied to the metadata server, the va_bytes returned for the file will be correct. This patch only affects a pNFS metadata server. Found during testing of the NFSv4.2 pNFS server for the Allocate operation. (Not yet in head/current.) MFC after: 2 weeks
|
#
67d0e293 |
|
29-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116
|
#
4ce21f37 |
|
05-Sep-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete the unused "nd" argument for nfsrv_proxyds(). The "nd" argument for nfsrv_proxyds() is no longer used by the function. This patch deletes it. This allows a subsequent patch to delete the "nd" argument from nfsvno_getattr(), since it's only use of "nd" was to pass it to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect.
|
#
2e670777 |
|
04-Sep-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete the unused "nd" argument for nfsrv_checkdsattr(). The "nd" argument for nfsrv_checkdsattr() is no longer used by the function. This patch deletes it. This allows subsequent patches to delete the "nd" argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(), which passes "nd" to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect. Found by inspection while working on the NFSv4.2 server.
|
#
b4372164 |
|
19-Apr-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add support for the ModeSetMasked attribute to the NFSv4.1 server. I do not know of an extant NFSv4.1 client that currently does a Setattr operation for the ModeSetMasked, but it has been discussed on the linux-nfs mailing list. This patch adds support for doing a Setattr of ModeSetMasked, so that it will work for any future NFSv4.1 client that chooses to do so. Tested via a hacked FreeBSD NFSv4.1 client. MFC after: 2 weeks
|
#
ea5776ec |
|
18-Apr-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes. During inspection of a packet trace, I noticed that an NFSv4.0 mount reported that it supported attributes that are only defined for NFSv4.1. In practice, this bug appears to be benign, since NFSv4.0 clients will not use attributes that were added for NFSv4.1. However, this was not correct and this patch fixes the NFSv4.0 server so that it only supports attributes defined for NFSv4.0. It also adds a definition for NFSv4.1 attributes that can only be set, although it is only defined as 0 for now. This is anticipation of the addition of support for the NFSv4.1 mode+mask attribute soon. MFC after: 2 weeks
|
#
2df8bd90 |
|
12-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Drop unused 'p' argument to nfsv4_strtogid(). MFC after: 2 weeks Sponsored by: DARPA, AFRL
|
#
0658ac39 |
|
12-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Drop unused 'p' argument to nfsv4_strtouid(). MFC after: 2 weeks Sponsored by: DARPA, AFRL
|
#
01c27978 |
|
04-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Don't pass td to nfsvno_open(). MFC after: 2 weeks Sponsored by: DARPA, AFRL
|
#
127152fe |
|
04-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Don't pass td to nfsvno_createsub(). MFC after: 2 weeks Sponsored by: DARPA, AFRL
|
#
5edc9102 |
|
04-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Don't pass td to nfsd_fhtovp(), it's unused. Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19421
|
#
af444b18 |
|
04-Mar-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Push down the thread argument in NFS server code, using curthread instead of passing it explicitly. No functional changes Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19419
|
#
d9463dd4 |
|
21-Jan-2019 |
Mark Johnston <markj@FreeBSD.org> |
nfs: Zero the buffers exported by NFSSVC_DUMPCLIENTS and DUMPLOCKS. Note that these interfaces are available only to root. admbugs: 765 Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Reviewed by: rmacklem MFC after: 1 day Security: Kernel memory disclosure Sponsored by: The FreeBSD Foundation
|
#
cc426dd3 |
|
11-Dec-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
Remove unused argument to priv_check_cred. Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation
|
#
75772b69 |
|
19-Nov-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Improve sanity checking for the dircount hint argument to NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code checked for a zero argument, but did not check for a very large value. This patch clips dircount at the server's maximum data size. MFC after: 1 week
|
#
ca8f3d1c |
|
22-Oct-2018 |
Andriy Gapon <avg@FreeBSD.org> |
nfsrvd_readdirplus: for some errors, do not fail the entire request Instead, a failing entry is skipped. This change consist of two logical changes. A failure to vget or lookup an entry is considered to be a result of a concurrent removal, which is the only reasonable explanation given that the filesystem is busied. So, the entry would be silently skipped. In the case of a failure to get attributes of an entry for an NFSv3 request, the entry would be silently skipped. There can be legitimate reasons for the failure, but NFSv3 does not provide any means to report the error, so we have two options: either fail the whole request or ignore the failed entry. Traditionally, the old NFS server used the latter option, so the code is reverted to it. Making the whole directory unreadable because of a single entry seems to be unpractical. Additionally, some bits of code are slightly re-arranged to account for the new control flow and to honor style(9). Reviewed by: rmacklem Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D15424
|
#
910ccc77 |
|
08-Oct-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the pNFS server's reporting of disk space usage for the "#<path>" case. The pNFS server would report the total disk space used and free for all of the DSs, even when certain DSs are assigned to the file system via the "#<path>" suffix used in the "nfsd -p" option argument. This patch fixes this case. It only reports usage for the file system that the argument vnode resides on. This is consistent with the non-pNFS NFSv4 server. In NFSv4 it is possible to have subtrees on other file systems, but these are not included in the usage information for NFSv4. Approved by: re (gjb)
|
#
3e5ba2e1 |
|
17-Aug-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix LORs between vn_start_write() and vn_lock() in the pNFS server. When coding the pNFS server, I added several vn_start_write() calls done while the vnode was locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes this by removing the added vn_start_write() calls and modifying the code so that the extant vn_start_write() call before the NFS RPC/operation is done when needed by the pNFS server. Flags are changed so that LayoutCommit and LayoutReturn now get a vn_start_write() done for them. When the pNFS server is enabled, the code now also changes the flags for Getattr, so that the vn_start_write() is done for Getattr, since it may need to do a vn_set_extattr(). The nfs_writerpc flag array was made global to the NFS server and renamed nfsrv_writerpc, which is consistent naming for globals in the NFS server. Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is locked results in a LOR. This patch only affects the behaviour of the pNFS server.
|
#
9fbb0faf |
|
16-Aug-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Don't set a file's size for the MDS file of a pNFS service. When a pNFS service is running, the size of the files created on the MDS are normally 0, since the data is written to the data files on the DS(s). However, without this patch, if a Setattr with a non-zero size was done by a client, the MDS file was set to that size. This was thought to be benign, but it turns out that files with a non-zero size plus extended attributes can cause a "ffs_truncate3" panic in UFS. Although the exact cause of this panic() has not been isolated, this patch avoids the panic() and leaves the MDS files in a consistent state of always having a size == 0. Note that these MDS files never store data. The patch also includes an unnecessary initialization of savsize in case some compiler or static analyser complains it might not be initialized. This patch only affects the NFS server when pNFS is enabled via the "-p" command line option on nfsd.
|
#
25705dd5 |
|
05-Aug-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.
|
#
8014c971 |
|
29-Jul-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings in the NFS server. Add __unused macros to silence these warnings. Requested by: mmacy
|
#
8361de25 |
|
11-Jul-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Ignore the cookie verifier for NFSv4.1 when the cookie is 0. RFC5661 states that the cookie verifier should be 0 when the cookie is 0. However, the wording is somewhat unclear and a recent discussion on the nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore the cookie verifier's value when the dirctory offset cookie is 0. This patch deletes the check for this that would return NFSERR_BAD_COOKIE when the verifier was not 0. This was found during testing of the ESXi client against the NFSv4.1 server. Reported by: daniel@ftml.net (via packet trace) MFC after: 2 weeks
|
#
de9a1a70 |
|
09-Jul-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add support for a "forced" pnfsdskill to the pNFS server kernel code. The pnfsdskill(8) command will normally fail if there is no valid mirror for the DS to be disabled. However, a system administrator may need to disable a DS which does not have a valid mirror so that the nfsd threads can be terminated. This patch adds the kernel code needed by pnfsdskill(8) to implement this "forced" case of disabling a DS. This patch only affects the pNFS server.
|
#
ed66a76b |
|
07-Jul-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix handling of the hybrid DS case for a pNFS server. After the addition of the "#mds_path" suffix for a DS specification on the "-p" nfsd option, it is possible to have a mix of DSs assigned to an MDS file system and DSs that store files for all DSs. This is what I referred to as "hybrid" above. At first, I didn't think this hybrid case would be useful, but I now believe that some system administrators may fine it useful. This patch modifies the file storage assignment algorithm so that it makes the "#mds_path" DSs take priority and the all file systems DSs are now only used for MDS file systems with no "#mds_path" DS servers. This only affects the pNFS server for this "hybrid" case.
|
#
2f32675c |
|
02-Jul-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add an optional feature to the pNFS server. Without this patch, the pNFS server distributes the data storage files across all of the specified DSs. A tester noted that it would be nice if a system administrator could control which DSs are used to store the file data for a given exported MDS file system. This patch adds the kernel support to do this. It also makes a slight semantic change to nfsv4_findmirror(), since some uses of it no longer require that the DS being searched for have a current mirror. A patch that will be committed in a few minutes will modify the nfsd daemon to support this feature. The patch should only affect sites using the pNFS server (specified via the "-p" command line option for nfsd. Suggested by: james.rose@framestore.com
|
#
1aabf3fd |
|
28-Jun-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the pNFS server for a case where mirror level equals number of DSs. If a pNFS service was set up where the number of DSs equals the mirror level and then a DS was disabled, the service would create files with duplicate entries for the same DS. This bug occurred because I didn't realize that TAILQ_FOREACH_FROM() would start at the beginning of the list when the inital value of the variable was NULL. This patch also changes the pNFS server DS file creation code so that it creates entrie(s) with 0.0.0.0 IP address when it cannot create mirror level files due to lack of DSs. The patch only affects the pNFS service and only when it was created with a number of DSs equal to the mirror level and mirroring is enabled.
|
#
90d2dfab |
|
12-Jun-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes
|
#
8472f760 |
|
04-Jun-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Revert r334586 since I now think __unused is the better way to handle this.
|
#
12c7a494 |
|
03-Jun-2018 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix a gcc8 warning about a write only variable. gcc8 warns that "verf" was set but not used. This was because the code that uses it is disabled via a "#if 0". This patch adds a "#if 0" to the variable's declaration and assignment to get rid of the warning. This way the code could be re-enabled without difficulty. Requested by: mmacy MFC after: 2 weeks
|
#
222daa42 |
|
25-Jan-2018 |
Conrad Meyer <cem@FreeBSD.org> |
style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035
|
#
151ba793 |
|
24-Dec-2017 |
Alexander Kabaev <kan@FreeBSD.org> |
Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
1d2fef9b |
|
19-Jul-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring. It applies to both NFS client and NFS server, and is useful for both. This is different from vfs.nfsd.enable_stringtouid, which is specific to server side. Reviewed by: rmacklem@ MFC after: 2 weeks Sponsored by: DARPA, AFRL
|
#
6a3450e1 |
|
26-Jun-2017 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add vfs.nfsd.nfsd_enable_uidtostring, which works just like vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1, it forces the NFSv4 server to return numeric UIDs and GIDs instead of "user@domain" strings. This helps with clients that can't translate returned identifiers, eg when rerooting. The same can be achieved by just never running nfsuserd(8), but the sysctl is useful to toggle the behaviour back and forth without rebooting. Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11326
|
#
69921123 |
|
23-May-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
|
#
dedec68c |
|
20-Apr-2017 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the setting of atime for Linux client NFSv4 mounts. The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in the reply to an Open with exclusive_create, as required by the RFCs. (This is required since the FreeBSD NFS server stores the create_verifier in the va_atime attribute.) As such, the Linux NFSv4 client did not set the TimeAccess (atime) in the Setattr done in an RPC after the one with the Open/exclusive_create. This patch fixes the server to set the TimeAccess bit in the reply. I believe that storing the create_verifier in an extended attribute for file systems that support extended attributes might be a good idea, but I will wait for a discussion of this on the freebsd-fs@ email list before considering committing a patch to do this. Reported by: jim@ks.uiuc.edu Suggested by: dfr MFC after: 2 weeks
|
#
fbbd9655 |
|
28-Feb-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
|
#
b5a8f340 |
|
02-Jan-2017 |
Josh Paetzel <jpaetzel@FreeBSD.org> |
Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem. Reviewed by kib MFC after: 2 weeks Sponsored by: iXsystems Differential Revision: D9009
|
#
7359fdcf |
|
01-Nov-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110
|
#
1b819cf2 |
|
12-Aug-2016 |
Rick Macklem <rmacklem@FreeBSD.org> |
Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. Submitted by: will (earlier version) Reviewed by: ken MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D1626
|
#
a96c9b30 |
|
29-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
NFS: spelling fixes on comments. No funcional change.
|
#
13c581fc |
|
12-Apr-2016 |
Rick Macklem <rmacklem@FreeBSD.org> |
If the VOP_SETATTR() call that saves the exclusive create verifier failed, the NFS server would leave the newly created vnode locked. This could result in a file system that would not unmount and processes wedged, waiting for the file to be unlocked. Since this VOP_SETATTR() never fails for most file systems, this bug doesn't normally manifest itself. I found it during testing of an exported GlusterFS file system, which can fail. This patch adds the vput() and changes the error to the correct NFS one. MFC after: 2 weeks
|
#
74b8d63d |
|
10-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.
|
#
84be7e09 |
|
30-Nov-2015 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex. Suggested by: jpaetzel Tested by: jpaetzel MFC after: 2 weeks
|
#
1f54e596 |
|
27-May-2015 |
Rick Macklem <rmacklem@FreeBSD.org> |
Make the size of the hash tables used by the NFSv4 server tunable. No appreciable change in performance was observed after increasing the sizes of these tables and then testing with a single client. However, there was an email that indicated high CPU overheads for a heavily loaded NFSv4 and it is hoped that increasing the sizes of the hash tables via these tunables might help. The tables remain the same size by default. Differential Revision: https://reviews.freebsd.org/D2596 MFC after: 2 weeks
|
#
50a220c6 |
|
19-Apr-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Replace "new NFS" with just "NFS" in some sysctl description strings. Sponsored by: The FreeBSD Foundation
|
#
dda11d4a |
|
15-Apr-2015 |
Rick Macklem <rmacklem@FreeBSD.org> |
File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks
|
#
eae6da3d |
|
07-Jan-2015 |
Robert Watson <rwatson@FreeBSD.org> |
Use M_SIZE() instead of hand-crafted (and mostly correct) NFSMSIZ() macro in the NFS server; garbage collect now-unused NFSMSIZ() and M_HASCL() macros. Also garbage collect now-unused versions in headers for the removed previous NFS client and server. Reviewed by: rmacklem Sponsored by: EMC / Isilon Storage Division
|
#
52f1bb38 |
|
24-Dec-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1 was reported via email. This was caused by a LOR between the sleep lock used to serialize the local locking (nfsrv_locklf()) and locking the vnode. I believe this patch fixes the problem by delaying relocking of the vnode until the sleep lock is unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side effect of unlocking the vnode, unlocking the vnode was moved to before the functions that call nfsvno_advlock(). It shouldn't affect the execution of the default case where vfs.nfsd.enable_locallocks=0. Reported by: loic.blot@unix-experience.fr Discussed with: kib MFC after: 1 week
|
#
d8a5961f |
|
02-Oct-2014 |
Marcelo Araujo <araujo@FreeBSD.org> |
Fix failures and warnings reported by newpynfs20090424 test tool. This fix addresses only issues with the pynfs reports, none of these issues are know to create problems for extant real clients. Submitted by: Bart Hsiao <bart.hsiao@gmail.com> Reworked by: myself Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc.
|
#
e7375b6f |
|
31-Jul-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not generate 1000 unique lock names for nfsrc hash chain locks. It overflows witness. Shorten the names of some nfs mutexes. Reported and tested by: pho No objections from: rmacklem, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
6c7d2293 |
|
04-Jul-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
The new NFSv3 server did not generate directory postop attributes for the reply to ReaddirPlus when the server failed within the loop that calls VFS_VGET(). This failure is most likely an error return from VFS_VGET() caused by a bogus d_fileno that was truncated to 32bits. This patch fixes the server so that it will return directory postop attributes for the failure. It does not fix the underlying issue caused by d_fileno being uint32_t when a file system like ZFS generates a fileno that is greater than 32bits. Reported by: jpaetzel Reviewed by: jpaetzel MFC after: 1 month
|
#
c59e4cc3 |
|
01-Jul-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month
|
#
4fc0f18c |
|
01-Jul-2014 |
Bryan Drewery <bdrewery@FreeBSD.org> |
Change NFS readdir() to only ignore cookies preceding the given offset for UFS rather than for all but ZFS. This code was assuming that offsets were monotonically increasing for all file systems except ZFS and that the cookies from a previous call may have been rewound to a block boundary. According to mckusick@ only UFS is known to do this, so only requests against UFS file systems should remove cookies smaller than the given offset. This fixes serving TMPFS over NFS as it too does not have monotonically increasing offsets. The comment around the code also indicated it was specific to UFS. Some of the code using 'not_zfs' is specific to ZFS snapshot handling, so add a 'is_zfs' variable for those cases. It's possible that 'is_zfs' check for VFS_VGET() support may not be specific to ZFS. This needs more research and testing. After this fix TMPFS and other file systems can be served over NFS. To test I compared the results of syncing a /usr/src tree into a tmpfs and serving that over NFS. Before the fix 3589 files were missing on the remote view. After the fix all files were successfully found. Reviewed by: rmacklem Discussed with: mckusick, rmacklem via fs@ Discussed at: http://lists.freebsd.org/pipermail/freebsd-fs/2014-April/019264.html MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division
|
#
ca20bd92 |
|
02-May-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
The new draft specification for NFSv4.0 specifies that a server should either accept owner and owner_group strings that are just the digits of the uid/gid or return NFS4ERR_BADOWNER. This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can be set to enable the server w.r.t. accepting numeric string. It also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid strings are not enabled. This fixes the server for recent Linux nfs4 clients that use numeric uid/gid strings by default. Reported and tested by: craigyk@gmail.com MFC after: 2 weeks
|
#
3c53f923 |
|
24-Apr-2014 |
Rick Macklem <rmacklem@FreeBSD.org> |
The PR reported that the old NFS server did not set uio_td == NULL for the VOP_READ() call. This patch fixes both the old and new server for this case. PR: 185232 Submitted by: PR had patch for old server Reviewed by: kib MFC after: 2 weeks
|
#
4a144410 |
|
16-Mar-2014 |
Robert Watson <rwatson@FreeBSD.org> |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
|
#
d473bac7 |
|
03-Jan-2014 |
Alexander Motin <mav@FreeBSD.org> |
Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
|
#
43a213bb |
|
24-Dec-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week
|
#
0c695afb |
|
24-Dec-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set. Reported by: jas@cse.yorku.ca Tested by: jas@cse.yorku.ca Reviewed by: asomers MFC after: 1 week
|
#
7008be5b |
|
04-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...); bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
|
#
93c5875b |
|
14-Aug-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code. Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month
|
#
22a72260 |
|
30-May-2013 |
Jeff Roberson <jeff@FreeBSD.org> |
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
|
#
72ccd4cc |
|
15-May-2013 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Fix typo in comment. Submitted by: Alex Weber <alexwebr@gmail.com> MFC after: 1 week
|
#
c93c82f4 |
|
29-Apr-2013 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Fix a bug that allows NFS clients to issue READDIR on files. PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver
|
#
d96b98a3 |
|
17-Apr-2013 |
Kenneth D. Merry <ken@FreeBSD.org> |
Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
|
#
89f6b863 |
|
08-Mar-2013 |
Attilio Rao <attilio@FreeBSD.org> |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
2609222a |
|
01-Mar-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
#
d177f14d |
|
18-Jan-2013 |
John Baldwin <jhb@FreeBSD.org> |
Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week
|
#
eb1b1807 |
|
05-Dec-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
|
#
5050aa86 |
|
22-Oct-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
|
#
6001db29 |
|
14-Oct-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add two new options to the nfssvc(2) syscall that allow processes running as root to suspend/resume execution of the kernel nfsd threads. An earlier version of this patch was tested by Vincent Hoffman (vince at unsane.co.uk) and John Hickey (jh at deterlab.net). Reviewed by: kib MFC after: 2 weeks
|
#
877d24ac |
|
28-Sep-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
|
#
c52005a3 |
|
19-Sep-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the NFSv4 client so that it can handle owner and owner_group strings that consist entirely of digits, interpreting them as the uid/gid number. This change was needed since new (>= 3.3) Linux servers reply with these strings by default. This change is mandated by the rfc3530bis draft. Reported on freebsd-stable@ under the Subject heading "Problem with Linux >= 3.3 as NFSv4 server" by Norbert Aschendorff on Aug. 20, 2012. Tested by: norbert.aschendorff at yahoo.de Reviewed by: jhb MFC after: 2 weeks
|
#
3676a0d8 |
|
07-May-2012 |
John W. De Boskey <jwd@FreeBSD.org> |
Use the common api helper routine instead of freeing the namei buffer directly. Approved by: rmacklem (mentor) MFC after: 1 month
|
#
a607cc6d |
|
27-Apr-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix a leak of namei lookup path buffers that occurs when a ZFS volume is exported via the new NFS server. The leak occurred because the new NFS server code didn't handle the case where a file system sets the SAVENAME flag in its VOP_LOOKUP() and ZFS does this for the DELETE case. Tested by: Oliver Brandmueller (ob at gruft.de), hrs PR: kern/167266 MFC after: 1 month
|
#
f257ebbb |
|
20-Apr-2012 |
Kirk McKusick <mckusick@FreeBSD.org> |
This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
|
#
b76ec2db |
|
03-Mar-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
The name caching changes of r230394 exposed an intermittent bug in the new NFS server for NFSv4, where it would report ENOENT when the file actually existed on the server. This turned out to be caused by not initializing ni_topdir before calling lookup() and there was a rare case where the value on the stack location assigned to ni_topdir happened to be a pointer to a ".." entry, such that "dp == ndp->ni_topdir" succeeded in lookup(). This patch initializes ni_topdir to fix the problem. MFC after: 5 days
|
#
13b2772f |
|
15-Feb-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete a couple of out of date comments that are no longer true in the new NFS client. Requested by: bde MFC after: 1 week
|
#
22ea9f58 |
|
15-Dec-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Patch the new NFS server in a manner analagous to r228520 for the old NFS server, so that it correctly handles a count == 0 argument for Commit. PR: kern/118126 MFC after: 2 weeks
|
#
574862c8 |
|
01-Dec-2011 |
John Baldwin <jhb@FreeBSD.org> |
Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered. - Use a prime number for the size of the heuristic table (1017 is not prime). - Move the logic to locate a heuristic entry from the table and compute the sequential count out of VOP_READ() and into a separate routine. - Use the logic from sequential_heuristic() in vfs_vnops.c to update the seqcount when a sequential access is performed rather than just increasing seqcount by 1. This lets the clustering count ramp up faster. - Allow for some reordering of RPCs and if it is detected leave the current seqcount as-is rather than dropping back to a seqcount of 1. Also, when out of order access is encountered, cut seqcount in half rather than dropping it all the way back to 1 to further aid with reordering. - Fix the new NFS server to properly update the next offset after a successful VOP_READ() so that the readahead actually works. Some of these changes came from an earlier patch by Bjorn Gronwall that was forwarded to me by bde@. Discussed with: bde, rmacklem, fs@ Submitted by: Bjorn Gronwall (1, 4) MFC after: 2 weeks
|
#
6854d648 |
|
21-Nov-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
This patch enables the new/default NFS server's use of shared vnode locking for read, readdir, readlink, getattr and access. It is hoped that this will improve server performance for these operations, since they will no longer be serialized for a given file/vnode.
|
#
8451d0dd |
|
16-Sep-2011 |
Kip Macy <kmacy@FreeBSD.org> |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
|
#
322c8d9e |
|
02-Sep-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the NFS servers so that they can do a Lookup of "..", which requires that ni_strictrelative be set to 0, post-r224810. Tested by: swills (earlier version), geo dot liaskos at gmail.com Approved by: re (kib)
|
#
985a88e2 |
|
16-Aug-2011 |
Jonathan Anderson <jonathan@FreeBSD.org> |
Fix a merge conflict. r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order to reliably call NFSEXITCODE() before returning. Our Capsicum changes, based on the old "return (error)" model, did not merge nicely. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
|
#
a9d2f8d8 |
|
10-Aug-2011 |
Robert Watson <rwatson@FreeBSD.org> |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
#
a9285ae5 |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
68347a92 |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
a9989634 |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
98f234f3 |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
c383087c |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Remove unnecessary thread pointer from VOPLOCK macros and current users. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
40435b74 |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
b008a72c |
|
16-Jul-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
|
#
c5c142f6 |
|
03-Jun-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the new NFS server so that the NFSv3 Pathconf RPC doesn't return an error when the underlying file system lacks support for any of the four _PC_xxx values used, by falling back to default values. Tested by: avg MFC after: 2 weeks
|
#
694a586a |
|
21-May-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
|
#
a0c2c369 |
|
07-May-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Change the new NFS server so that it uses vfs.nfsd naming for its sysctls instead of vfs.newnfs. This separates the names from the ones used by the client.
|
#
78e4b1f8 |
|
05-May-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Change the new NFS server so that it returns 0 when the f_bavail or f_ffree fields of "struct statfs" are negative, since the values that go on the wire are unsigned and will appear to be very large positive values otherwise. This makes the handling of a negative f_bavail compatible with the old/regular NFS server. MFC after: 2 weeks
|
#
a09001a8 |
|
14-Apr-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks
|
#
07c0c166 |
|
14-Apr-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks
|
#
f659876f |
|
11-Apr-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Vrele ni_startdir in the experimental NFS server for the case of NFSv2 getting an error return from VOP_MKNOD(). Without this patch, the server file system remains busy after an NFSv2 VOP_MKNOD() fails. MFC after: 2 weeks
|
#
806e2e4b |
|
10-Apr-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add some cleanup code to the module unload operation for the experimental NFS server, so that it doesn't leak memory when unloaded. However, unloading the NFSv4 server is not recommended, since all NFSv4 state will be lost by the unload and clients will have to recover the state after a server reload/restart as if the server crashed/rebooted. MFC after: 2 weeks
|
#
8d2f180e |
|
09-Apr-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add a VOP_UNLOCK() for the directory, when that is not what VOP_LOOKUP() returned. This fixes a bug in the experimental NFS server for the case where VFS_VGET() fails returning EOPNOTSUPP in the ReaddirPlus RPC, forcing the use of VOP_LOOKUP() instead. MFC after: 2 weeks
|
#
de5b1952 |
|
25-Feb-2011 |
Alexander Leidinger <netchild@FreeBSD.org> |
Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
|
#
17f3095d |
|
05-Feb-2011 |
Alan Cox <alc@FreeBSD.org> |
Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range. Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers. Reviewed by: kib MFC after: 3 weeks
|
#
5f73287a |
|
14-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled. Reviewed by: jhb MFC after: 1 week
|
#
52776c50 |
|
12-Jan-2011 |
Zack Kirsch <zack@FreeBSD.org> |
Clean up the experimental NFS server replay cache when the module is unloaded. Reviewed by: rmacklem Approved by: zml (mentor)
|
#
f9266eb1 |
|
08-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify readdirplus in the experimental NFS server in a manner analogous to r216633 for the regular server. This change busies the file system so that VFS_VGET() is guaranteed to be using the correct mount point even during a forced dismount attempt. Since nfsd_fhtovp() is not called immediately before readdirplus, the patch is actually a clone of pjd@'s nfs_serv.c.4.patch instead of the one committed in r216633. Reviewed by: kib MFC after: 10 days
|
#
8974bc2f |
|
06-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Since the VFS_LOCK_GIANT() code in the experimental NFS server is broken and the major file systems are now all mpsafe, modify the server so that it will only export mpsafe file systems. This was discussed on freebsd-fs@ and removes a fair bit of crufty code. MFC after: 12 days
|
#
47524363 |
|
05-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental NFS server to use vfs_busyfs() instead of vfs_getvfs() so that the mount point is busied for the VFS_FHTOVP() call. This is analagous to r185432 for the regular NFS server. Reviewed by: kib MFC after: 12 days
|
#
90305aa3 |
|
03-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client. Reviewed by: jhb MFC after: 2 weeks
|
#
fa5ecdd3 |
|
02-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental NFS server so that it doesn't leak a reference count on the directory when creating device special files. MFC after: 2 weeks
|
#
c9aad40f |
|
02-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete some cruft from the experimental NFS server that was only used by the OpenBSD port for its pseudo-fs. MFC after: 2 weeks
|
#
629fa50e |
|
02-Jan-2011 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add checks for VI_DOOMED and vn_lock() failures to the experimental NFS server, to handle the case where an exported file system is forced dismounted while an RPC is in progress. Further commits will fix the cases where a mount point is used when the associated vnode isn't locked. Reviewed by: kib MFC after: 2 weeks
|
#
bd2fa726 |
|
28-Dec-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Delete the nfsvno_localconflict() function in the experimental NFS server since it is no longer used and is broken. MFC after: 2 weeks
|
#
17891d00 |
|
25-Dec-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental NFS server so that it uses LK_SHARED for RPC operations when it can. Since VFS_FHTOVP() currently always gets an exclusively locked vnode and is usually called at the beginning of each RPC, the RPCs for a given vnode will still be serialized. As such, passing a lock type argument to VFS_FHTOVP() would be preferable to doing the vn_lock() with LK_DOWNGRADE after the VFS_FHTOVP() call. Reviewed by: kib MFC after: 2 weeks
|
#
0cf42b62 |
|
24-Dec-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add an argument to nfsvno_getattr() in the experimental NFS server, so that it can avoid calling VOP_ISLOCKED() when the vnode is known to be locked. This will allow LK_SHARED to be used for these cases, which happen to be all the cases that can use LK_SHARED. This does not fix any bug, but it reduces the number of calls to VOP_ISLOCKED() and prepares the code so that it can be switched to using LK_SHARED in a future patch. Reviewed by: kib MFC after: 2 weeks
|
#
a852f40b |
|
24-Dec-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Simplify vnode locking in the expeimental NFS server's readdir functions. In particular, get rid of two bogus VOP_ISLOCKED() calls. Removing the VOP_ISLOCKED() calls is the only actual bug fixed by this patch. Reviewed by: kib MFC after: 2 weeks
|
#
63e1cb43 |
|
24-Dec-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Since VOP_READDIR() for ZFS does not return monotonically increasing directory offset cookies, disable the UFS related loop that skips over directory entries at the beginning of the block for the experimental NFS server. This loop is required for UFS since it always returns directory entries starting at the beginning of the block that the requested directory offset is in. In discussion with pjd@ and mckusick@ it seems that this behaviour of UFS should maybe change, with this fix being an interim patch until then. This patch only fixes the experimental server, since pjd@ is working on a patch for the regular server. Discussed with: pjd, mckusick MFC after: 5 days
|
#
377c50f6 |
|
23-Oct-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental NFSv4 server's file handle hash function to use the generic hash32_buf() function. Although adding the bytes seemed sufficient for UFS and ZFS, since most of the bytes are the same for file handles on the same volume, this might not be sufficient for other file systems. Use of a generic function also seems preferable to one specific to NFSv4. Suggested by: gleb.kurtsou at gmail.com MFC after: 10 days
|
#
91027b4e |
|
22-Oct-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the file handle hash function in the experimental NFS server so that it will work better for non-UFS file systems. The new function simply sums the bytes of the fh_fid field of fhandle_t. MFC after: 10 days
|
#
8a1b5ade |
|
21-Oct-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental NFS server in a manner analagous to r214049 for the regular NFS server, so that it will not do a VOP_LOOKUP() of ".." when at the root of a file system when performing a ReaddirPlus RPC. MFC after: 10 days
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
c7aafc24 |
|
18-Sep-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK() unlock operations correctly. It was passing in F_SETLK instead of F_UNLCK as the operation for the unlock case. This only affected operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled. MFC after: 1 week
|
#
a8437c97 |
|
14-Jun-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add MODULE_DEPEND() macros to the experimental NFS client and server so that the modules will load when kernels are built with none of the NFS* configuration options specified. I believe this resolves the problems reported by PR kern/144458 and the email on freebsd-stable@ posted by Dmitry Pryanishnikov on June 13. Tested by: kib PR: kern/144458 Reviewed by: kib MFC after: 1 week
|
#
ef9bfed8 |
|
03-Jun-2010 |
Robert Watson <rwatson@FreeBSD.org> |
Merge r205010 from head to stable/8: Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x: - so_pcb is now guaranteed to be non-NULL and valid if a valid socket reference is held. - Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a tcpcb, as it might be a tcptw or NULL otherwise. - tp can never be NULL by the end of the function, so only check TCPS_ESTABLISHED before extracting tcpcb fields. The NFS server arguably incorporates too many assumptions about TCP internals, but fixing that is left for another day. Reviewed by: bz Reviewed and tested by: rmacklem Sponsored by: Juniper Networks Approved by: re (kib)
|
#
cf66cfa0 |
|
18-Apr-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r206170 Harden the experimental NFS server a little, by adding extra checks in the readdir functions for non-positive byte count arguments. For the negative case, set it to the maximum allowable, since it was actually a large positive value (unsigned) on the wire. Also, fix up the readdir function comment a bit.
|
#
4e76f296 |
|
15-Apr-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r206063 For the experimental NFS server, add a call to free the lookup path buffer for one case where it was missing when doing mkdir. This could have conceivably resulted in a leak of a buffer, but a leak was never observed during testing, so I suspect it would have occurred rarely, if ever, in practice.
|
#
5b418052 |
|
08-Apr-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r205663 Patch the experimental NFS server in a manner analagous to r205661 for the regular NFS server, to ensure that ESTALE is returned to the client for all errors returned by VFS_FHTOVP().
|
#
54bde1fa |
|
04-Apr-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Harden the experimental NFS server a little, by adding extra checks in the readdir functions for non-positive byte count arguments. For the negative case, set it to the maximum allowable, since it was actually a large positive value (unsigned) on the wire. Also, fix up the readdir function comment a bit. Suggested by: dillon AT apollo.backplane.com MFC after: 2 weeks
|
#
15b28cb8 |
|
01-Apr-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
For the experimental NFS server, add a call to free the lookup path buffer for one case where it was missing when doing mkdir. This could have conceivably resulted in a leak of a buffer, but a leak was never observed during testing, so I suspect it would have occurred rarely, if ever, in practice. MFC after: 2 weeks
|
#
7482701c |
|
25-Mar-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
Patch the experimental NFS server in a manner analagous to r205661 for the regular NFS server, to ensure that ESTALE is returned to the client for all errors returned by VFS_FHTOVP(). MFC after: 2 weeks
|
#
2684bef6 |
|
11-Mar-2010 |
Robert Watson <rwatson@FreeBSD.org> |
Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x: - so_pcb is now guaranteed to be non-NULL and valid if a valid socket reference is held. - Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a tcpcb, as it might be a tcptw or NULL otherwise. - tp can never be NULL by the end of the function, so only check TCPS_ESTABLISHED before extracting tcpcb fields. The NFS server arguably incorporates too many assumptions about TCP internals, but fixing that is left for nother day. MFC after: 1 week Reviewed by: bz Reviewed and tested by: rmacklem Sponsored by: Juniper Networks
|
#
d3db09cb |
|
08-Jan-2010 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r200999 Modify the experimental server so that it uses VOP_ACCESSX(). This is necessary in order to enable NFSv4 ACL support. The argument to nfsvno_accchk() was changed to an accmode_t and the function nfsrv_aclaccess() was no longer needed and, therefore, deleted. Reviewed by: trasz
|
#
8da45f2c |
|
25-Dec-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental server so that it uses VOP_ACCESSX(). This is necessary in order to enable NFSv4 ACL support. The argument to nfsvno_accchk() was changed to an accmode_t and the function nfsrv_aclaccess() was no longer needed and, therefore, deleted. Reviewed by: trasz MFC after: 2 weeks
|
#
6d54ecd3 |
|
08-Dec-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r199715 Modify the experimental nfs server so that it falls back to using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the ReaddirPlus RPC. This patch is based upon one by pjd@ for the regular nfs server which has not yet been committed. It is needed when a ZFS volume is exported and ReaddirPlus (which almost always happens for NFSv4) is performed by a client. The patch also simplifies vnode lock handling somewhat. Tested by: gerrit at pmp.uni-hannover.de
|
#
52b239b0 |
|
08-Dec-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
MFC: r199616 Patch the experimental NFS server is a manner analagous to r197525, so that the creation verifier is handled correctly in va_atime for 64bit architectures. There were two problems. One was that the code incorrectly assumed that sizeof (struct timespec) == 8 and the other was that the tv_sec field needs to be assigned from a signed 32bit integer, so that sign extension occurs on 64bit architectures. This is required for correct operation when exporting ZFS volumes. Tested by: gerrit at pmp.uni-hannover.de Reviewed by: pjd
|
#
38e3ea69 |
|
23-Nov-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental nfs server so that it falls back to using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the ReaddirPlus RPC. This patch is based upon one by pjd@ for the regular nfs server which has not yet been committed. It is needed when a ZFS volume is exported and ReaddirPlus (which almost always happens for NFSv4) is performed by a client. The patch also simplifies vnode lock handling somewhat. MFC after: 2 weeks
|
#
086f6e0c |
|
20-Nov-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Patch the experimental NFS server is a manner analagous to r197525, so that the creation verifier is handled correctly in va_atime for 64bit architectures. There were two problems. One was that the code incorrectly assumed that sizeof (struct timespec) == 8 and the other was that the tv_sec field needs to be assigned from a signed 32bit integer, so that sign extension occurs on 64bit architectures. This is required for correct operation when exporting ZFS volumes. Reviewed by: pjd MFC after: 2 weeks
|
#
eddfbb76 |
|
14-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
#
838d9858 |
|
19-Jun-2009 |
Brooks Davis <brooks@FreeBSD.org> |
Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
|
#
705fe7ce |
|
31-May-2009 |
Marko Zec <zec@FreeBSD.org> |
Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)
|
#
c3e22f83 |
|
26-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental nfs subsystem so that it builds with the current NFSv4 ACLs, as defined in sys/acl.h. It still needs a way to test a mount point for NFSv4 ACL support before it will work. Until then, the NFSHASNFS4ACL() macro just always returns 0. Approved by: kib (mentor)
|
#
92f7f12b |
|
21-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the experimental nfs server so that it depends on the nlm, since it now calls nlm_acquire_next_sysid(). Approved by: kib (mentor)
|
#
b839e625 |
|
20-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify sys/fs/nfsserver/nfs_nfsdport.c to use nlm_acquire_next_sysid() to set the l_sysid for locks correctly. Approved by: kib (mentor)
|
#
2c1b26b9 |
|
17-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix the acquisition of local locks via VOP_ADVLOCK() by the experimental nfsv4 server. It was setting the a_id argument to a fixed value, but that wasn't sufficient for FreeBSD8. Instead, set l_pid and l_sysid to 0 plus set the F_REMOTE flag to indicate that these fields are used to check for same lock owner. Since, for NFSv4, a lockowner is a ClientID plus an up to 1024byte name, it can't be put in l_sysid easily. I also renamed the p variable to td, since it's a thread ptr. Approved by: kib (mentor)
|
#
57d1e464 |
|
17-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Added a SYSCTL to sys/fs/nfsserver/nfs_nfsdport.c so that the value of nfsrv_dolocallocks can be changed via sysctl. I also added some non-empty descriptor strings and reformatted some overly long lines. Approved by: kib (mentor)
|
#
98ad4453 |
|
14-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Apply changes to the experimental nfs server so that it uses the security flavors as exported in FreeBSD-CURRENT. This allows it to use a slightly modified mountd.c instead of a different utility. Approved by: kib (mentor)
|
#
7e745519 |
|
12-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify the experimental nfs server to use the new nfsd_nfsd_args structure for nfsd. Includes a change that clarifies the use of an empty principal name string to indicate AUTH_SYS only. Approved by: kib (mentor)
|
#
1c6c0ed9 |
|
11-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Change the name of the nfs server addsock structure from nfsd_args to nfsd_addsock_args, so that it is consistent with the one in sys/nfsserver/nfs.h. Approved by: kib (mentor)
|
#
70839889 |
|
11-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Modify nfsvno_fhtovp() to ensure that it always sets the credp argument. Returning without credp set could result in a caller doing crfree() on garbage. Reviewed by: kan Approved by: kib (mentor)
|
#
dfd233ed |
|
11-May-2009 |
Attilio Rao <attilio@FreeBSD.org> |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
#
5679fe19 |
|
09-May-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
|
#
9ec7b004 |
|
04-May-2009 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)
|