History log of /freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdport.c
Revision Date Author Comments
# 347040 03-May-2019 rmacklem

MFC: r346365
Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes.

During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.


# 340184 06-Nov-2018 avg

MFC r339595: nfsrvd_readdirplus: for some errors, do not fail the entire request

Sponsored by: Panzura


# 337006 31-Jul-2018 rmacklem

MFC: r336215
Ignore the cookie verifier for NFSv4.1 when the cookie is 0.

RFC5661 states that the cookie verifier should be 0 when the cookie is 0.
However, the wording is somewhat unclear and a recent discussion on the
nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore
the cookie verifier's value when the dirctory offset cookie is 0.
This patch deletes the check for this that would return NFSERR_BAD_COOKIE
when the verifier was not 0.
This was found during testing of the ESXi client against the NFSv4.1 server.


# 317914 07-May-2017 rmacklem

MFC: r317236
Fix the setting of atime for Linux client NFSv4 mounts.

The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in
the reply to an Open with exclusive_create, as required by the RFCs.
(This is required since the FreeBSD NFS server stores the create_verifier
in the va_atime attribute.)
As such, the Linux NFSv4 client did not set the TimeAccess (atime) in
the Setattr done in an RPC after the one with the Open/exclusive_create.
This patch fixes the server to set the TimeAccess bit in the reply.

I believe that storing the create_verifier in an extended attribute for
file systems that support extended attributes might be a good idea,
but I will wait for a discussion of this on the freebsd-fs@ email list
before considering committing a patch to do this.


# 312421 19-Jan-2017 jpaetzel

MFC 311122

Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem.

Reviewed by: kib


# 306663 03-Oct-2016 rmacklem

Revert r306659 since the userland changes won't merge and this would
break the build.


# 306659 03-Oct-2016 rmacklem

MFC: r304026
Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.


# 299208 06-May-2016 rmacklem

MFC: r297869
If the VOP_SETATTR() call that saves the exclusive create verifier failed,
the NFS server would leave the newly created vnode locked. This could
result in a file system that would not unmount and processes wedged,
waiting for the file to be unlocked.
Since this VOP_SETATTR() never fails for most file systems, this bug
doesn't normally manifest itself. I found it during testing of an
exported GlusterFS file system, which can fail.
This patch adds the vput() and changes the error to the correct NFS one.


# 292223 14-Dec-2015 rmacklem

MFC: r291527
Add kernel support to the NFS server for the "-manage-gids"
option that will be added to the nfsuserd daemon in a future
commit. It modifies the cache used by NFSv4 for name<-->id
translation (both username/uid and group/gid) to support this.
When "-manage-gids" is set, the server looks up each uid
for the RPC and uses the list of groups cached in the server
instead of the list of groups provided in the RPC request.
The cached group list is acquired for the cache by the nfsuserd
daemon via getgrouplist(3).
This avoids the 16 groups limit for the list in the RPC request.
Since the cache is now used for every RPC when "-manage-gids"
is enabled, the code also modifies the cache to use a separate
mutex for each hash list instead of a single global mutex.


# 284216 10-Jun-2015 rmacklem

MFC: r283635
Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.


# 282270 30-Apr-2015 rmacklem

MFC: r281562
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.


# 280258 19-Mar-2015 rwatson

Merge r263233 from HEAD to stable/10:

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

Sponsored by: Google, Inc.


# 276437 30-Dec-2014 rmacklem

MFC: r276193
A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1
was reported via email. This was caused by a LOR between the
sleep lock used to serialize the local locking (nfsrv_locklf())
and locking the vnode. I believe this patch fixes the problem
by delaying relocking of the vnode until the sleep lock is
unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side
effect of unlocking the vnode, unlocking the vnode was moved to before
the functions that call nfsvno_advlock().
It shouldn't affect the execution of the default case where
vfs.nfsd.enable_locallocks=0.


# 269655 07-Aug-2014 kib

MFC r269347:
Do not generate 1000 unique lock names for nfsrc hash chain locks.
Shorten the names of some nfs mutexes.


# 269452 02-Aug-2014 rmacklem

MFC: r268273
The new NFSv3 server did not generate directory postop attributes for
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.


# 269398 01-Aug-2014 rmacklem

MFC: r268115
Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.


# 268961 21-Jul-2014 bdrewery

MFC r268114:

Change NFS readdir() to only ignore cookies preceding the given offset for
UFS rather than for all but ZFS.


# 265714 08-May-2014 rmacklem

MFC: r265252
The new draft specification for NFSv4.0 specifies that a server
should either accept owner and owner_group strings that are just
the digits of the uid/gid or return NFS4ERR_BADOWNER.
This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can
be set to enable the server w.r.t. accepting numeric string. It
also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid
strings are not enabled. This fixes the server for recent Linux
nfs4 clients that use numeric uid/gid strings by default.


# 265667 08-May-2014 rmacklem

MFC: r264888
The PR reported that the old NFS server did not set uio_td == NULL
for the VOP_READ() call. This patch fixes both the old and new
server for this case.


# 261055 22-Jan-2014 mav

MFC r260229, r260258, r260367, r260390, r260459, r260648:
Rework NFS Duplicate Request Cache cleanup logic.

- Introduce additional hash to group requests by hash of sockref. This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
- Indroduce additional callbacks to notify application layer about sockets
disconnection. Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
- Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache. This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
- Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.

Together this limits NFS DRC growth only to situations of real connectivity
problems. If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load. Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.

As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.

Sponsored by: iXsystems, Inc.


# 260159 01-Jan-2014 rmacklem

MFC: r259854
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.


# 260144 31-Dec-2013 rmacklem

MFC: r259845
An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.


# 284216 10-Jun-2015 rmacklem

MFC: r283635
Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.


# 282270 30-Apr-2015 rmacklem

MFC: r281562
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.


# 280258 19-Mar-2015 rwatson

Merge r263233 from HEAD to stable/10:

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

Sponsored by: Google, Inc.


# 276437 30-Dec-2014 rmacklem

MFC: r276193
A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1
was reported via email. This was caused by a LOR between the
sleep lock used to serialize the local locking (nfsrv_locklf())
and locking the vnode. I believe this patch fixes the problem
by delaying relocking of the vnode until the sleep lock is
unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side
effect of unlocking the vnode, unlocking the vnode was moved to before
the functions that call nfsvno_advlock().
It shouldn't affect the execution of the default case where
vfs.nfsd.enable_locallocks=0.


# 269655 07-Aug-2014 kib

MFC r269347:
Do not generate 1000 unique lock names for nfsrc hash chain locks.
Shorten the names of some nfs mutexes.


# 269452 02-Aug-2014 rmacklem

MFC: r268273
The new NFSv3 server did not generate directory postop attributes for
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.


# 269398 01-Aug-2014 rmacklem

MFC: r268115
Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.


# 268961 21-Jul-2014 bdrewery

MFC r268114:

Change NFS readdir() to only ignore cookies preceding the given offset for
UFS rather than for all but ZFS.


# 265714 08-May-2014 rmacklem

MFC: r265252
The new draft specification for NFSv4.0 specifies that a server
should either accept owner and owner_group strings that are just
the digits of the uid/gid or return NFS4ERR_BADOWNER.
This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can
be set to enable the server w.r.t. accepting numeric string. It
also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid
strings are not enabled. This fixes the server for recent Linux
nfs4 clients that use numeric uid/gid strings by default.


# 265667 08-May-2014 rmacklem

MFC: r264888
The PR reported that the old NFS server did not set uio_td == NULL
for the VOP_READ() call. This patch fixes both the old and new
server for this case.


# 261055 22-Jan-2014 mav

MFC r260229, r260258, r260367, r260390, r260459, r260648:
Rework NFS Duplicate Request Cache cleanup logic.

- Introduce additional hash to group requests by hash of sockref. This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
- Indroduce additional callbacks to notify application layer about sockets
disconnection. Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
- Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache. This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
- Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.

Together this limits NFS DRC growth only to situations of real connectivity
problems. If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load. Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.

As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.

Sponsored by: iXsystems, Inc.


# 260159 01-Jan-2014 rmacklem

MFC: r259854
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.


# 260144 31-Dec-2013 rmacklem

MFC: r259845
An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.