History log of /freebsd-current/sys/fs/nfsclient/nfs_clvnops.c
Revision Date Author Comments
# 03a39a17 27-Apr-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clear out a lot of cruft related to B_DIRECT

There is only one place in the unpatched sources where B_DIRECT is
set in the NFS client and this code is never executed. As such, this patch
removes this code that is never executed, since B_DIRECT should never
be set.

During a IETF testing event this week, I saw a crash in ncl_doio_directwrite(),
but this function is only called if B_DIRECT is set.
I cannot explain how ncl_doio_directwrite() got called, but once this patch
was applied to the sources, the crash did not recur. This is not surprising,
since this patch deleted the function.

Reviewed by: kib, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D44980


# d00c64bb 11-Apr-2024 Zaphrod Beeblebrox <zbeeble@gmail.com>

nfscl: Purge name cache when readdir_plus is done

The author reported that this patch was needed to avoid
crashes on a fairly busy RISC-V system. The author did not
provide details w.r.t. the crashes. Although I
have not seen any such crash, the patch looks reasonable
and I have not found any regressions when testing it.

Since "rdirplus" is not a default option, the patch is
only needed if you are doing NFS mounts with the "rdirplus"
mount option and seeing crashes related to the name cache.

MFC after: 1 week


# cc760de2 11-Jan-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Only update atime for Copy when noatime is not specified

Commit 57ce37f9dcd0 modified the NFSv4.2 Copy operation so that
it will update atime on the infd file whenever possible.
This is done by adding a Setattr of TimeAccess for the
input file.

This patch disables this change for the case of an NFSv4.2
mount with the "noatime" mount option, which avoids the
additional Setattr of TimeAccess operation.

MFC after: 1 week


# b068bb09 07-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

Add vnode_pager_clean_{a,}sync(9)

Bump __FreeBSD_version for ZFS use.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43356


# 656d2e83 30-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

nfsclient: eliminate ncl_writebp()

Use plain bufwrite() instead. ncl_writebp() evolved to mostly repeat
bufwrite() code with some ommisions, most notably runningbufspace
accounting.

Reviewed by: imp, markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43249


# 47ec00d9 29-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

nfsclient: flush dirty pages of the vnode

before ncl_flush() when done to ensure that the server sees our cached
data, because it potentially changes the server response. This is
relevant for copy_file_range(), seek(), and allocate().

Convert LK_SHARED invp lock into LK_EXCLUSIVE if needed to properly call
vm_object_page_clean().

Reported by: asomers
PR: 276002
Noted and reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43250


# 7dae1467 29-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

nfsclient copy_file_range(): flush dst vnode data

Otherwise server-side copy makes the client cache inconsistent with the
server data.

Reported by: asomers
PR: 276002
Reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43250


# 6fa843f6 11-Dec-2023 Mark Johnston <markj@FreeBSD.org>

nfsclient: Propagate copyin() errors from nfsm_uiombuf()

Approved by: so
Security: SA-23:18.nfsclient
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation


# c5405d1c 18-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

vn_copy_file_range(): provide ENOSYS fallback to vn_generic_copy_file_range()

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603


# 196787f7 20-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Use Claim_Null_FH and Claim_Deleg_Cur_FH

For NFSv4.1/4.2, there are two new options for the Open operation.
These two options use the file handle for the file instead of the
file handle for the directory plus a file name. By doing so, the
client code is simplified (it no longer needs the "nfsv4node" structure
attached to the NFS vnode). It also avoids problems caused by another
NFS client (or process running locally in the NFS server) doing a
rename or remove of the file name between the Lookup and Open.

Unfortunately, there was a bug (fixed recently by commit X)
in the NFS server which mis-parsed the Claim_Deleg_Cur_FH
arguments. To allow this patch to work with the broken FreeBSD
NFSv4.1/4.2 server, NFSMNTP_BUGGYFBSDSRV is defined and is set
when a correctly formatted Claim_Deleg_Cur_FH fails with NFSERR_EXPIRED.
(This is what the old, broken NFS server does, since it erroneously
uses the Getattr arguments as a stateID.) Once this flag is set,
the client fills in a stateID, to make the broken NFS server happy.

Tested at a recent IETF NFSv4 Bakeathon.

MFC after: 1 month


# 57ce37f9 18-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make NFSv4.2 Copy set atime on infd

RFC7862 does not specify infile atime behaviour when a NFSv4.2 Copy
operation is performed. Since the collective opinion of a mailing
list discussion (on freebsd-hackers@) seemed to indicate that
copy_file_range(2) should update atime on the infd,
even if there is no data copied, this
patch attempts to ensure that behaviour.

For Copy, it preceeds the Copy operation with a Setattr of
TimeAccess_Set(NFSv4. speak for atime) for the invp. For the case
where no data will be copied, it does a Setattr RPC to set
TimeAccess_Set for the invp.

A __FreeBSD_version bump will be done as a separate commit, since
this patch changes the internal interface between the nfscommon and
nfscl modules.

MFC after: 1 month


# 969071be 06-Sep-2023 Martin Matuska <mm@FreeBSD.org>

vfs: copy_file_range() between multiple mountpoints of the same fs type

VOP_COPY_FILE_RANGE(9) is now caled when source and target vnodes
reside on the same filesystem type (not just on the same mountpoint).
The check if vnodes are on the same mountpoint must be done in the
filesystem code. There are currently only three users - fusefs(5) already
has this check, ZFS can handle multiple mountpoints and a check has been
added to NFS client.

ZFS block cloning is now possible between all snapshots and datasets
of the same ZFS pool.

MFC after: 1 week
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D41721


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 1512579a 27-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make coverity happy

Coverity does not like code that checks a function's
return value sometimes. Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

Reported by: emaste
MFC after: 3 months


# 896516e5 16-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts

Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on: freebsd-current@
MFC after: 3 months


# 847967bc 08-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix interaction between mmap'd and VOP_WRITE file updates

asomers@ found a problem with the NFS client, where a write to
an NFS mounted file done via mmap(2) was lost when fspacectl(2)
was done before it. This turned out to be caused by clearing the
dirty bit on pages when the client was doing commit RPCs,
due to the second argument to vfs_busy_pages() being set to 1.
Commit RPCs tell the server to commit previously written data to
stable storage. However, Commit RPCs do not write data from the
client to the server. As such, if the dirty bit on the page has
been set by a mmap'd write to an address in the page, it should
not be cleared. Clearing it causes the mmap'd write to by lost.

This patch fixes the problem by changing the 2nd argument to
vfs_busy_pages() to 0 for this case.

I doubt this bug has affected many, since it was inherited from
the old NFS client and was in 4.3 FreeBSD twenty years ago.
Although fspacectl(2) is FreeBSD 14 specific, a write(2) would
cause the same failure.

Reviewed by: kib
Tested by: asomers
PR: 269328
MFC after: 1 week


# a82308ab 01-Oct-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfs_clvnops.c: Fix access to v_mount when vnode unlocked

Commit ab17854f974b fixed access to v_mount when the
vnode is unlocked for nfs_copy_file_range().

This patch does the same for nfs_advlockasync().

MFC after: 1 week


# bffb3d94 01-Oct-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfs_clvnops.c: Fix access to v_mount when vnode unlocked

Commit ab17854f974b fixed access to v_mount when the
vnode is unlocked for nfs_copy_file_range().

This patch does the same for nfs_ioctl().

Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36846


# ab17854f 26-Sep-2022 Konstantin Belousov <kib@FreeBSD.org>

nfsclient: access v_mount only after the vnode is locked

and we checked that it is not reclaimed.

Reviewed by: markj, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36722


# 52360ca3 25-Sep-2022 Alan Somers <asomers@FreeBSD.org>

copy_file_range: truncate write if it would exceed RLIMIT_FSIZE

PR: 266611
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36706


# 5b5b7e2c 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: always retain path buffer after lookup

This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542


# 33721eb9 04-Sep-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Allow "nolockd" to work for NFSv4 mounts

Commit 40ada74ee1da modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. This handles the
case where session slots get broken when "intr" or "soft"
NFSv4 fairly well.1/4.2 mounts are done.

There are two other cases where having an NFSv4.1/4.2
RPC attempt terminate without completion can leave
state in a non-determinate condition.

One is file locking RPCs. If the "nolockd" option is
used, this avoids file locking RPCs by doing locking
locally within the client.

The other is Open locks, but since all FreeBSD Open
locks are done with OPEN_SHARE_DENY_NONE, the locking
state for these should not be critical.

This patch enables use of "nolockd" for NFSv4 mounts,
so that it can be combined with "intr" and/or "soft",
making the latter more usable.

Use of "intr" or "soft" NFSv4 mounts are still not
recommended, but when combined with "nolockd" should
now work fairly well.

A man page update will be done as a separate commit.

MFC after: 2 weeks


# 3c4266ed 17-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clstate.c.

This commit should not result in a semantics change.


# 1e70163c 17-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clvfsops.c. Future commits will do the same for other functions.

This commit should not result in a semantics change.


# c692ea40 16-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# af6665e0 16-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 8cb42d69 15-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# da47c186 15-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 1c665e95 14-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 41c029d5 13-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 9792c7d3 31-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Enable support for the Lookup+Open RPC

Commits 3ad1e1c1ce20 and 57014f21e754 added a Lookup+Open
RPC for NFSv4.1/4.2, which can reduce the RPC count by
10-20% for some loads. This has now received a fair amount
of testing, so I think it is ok to enable it.

Note that the Lookup+Open RPC is only used when the
"oneopenown" mount option is specified. As such, this
change won't affect most NFSv4.1/4.2 mounts.


# 5218d82c 30-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for a NFSv4 AppendWrite RPC

For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.

Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.

This patch modifies the NFSv4 client to use an Appendwrite
RPC, which does a Verify to check the file's size before
doing the Write. This avoids the need for a Getattr RPC
to preceed this RPC and reduces the RPC count by half for
IO_APPEND writes, so long as the client knows the file's
size.

The nfsd structure was moved from the stack to be malloc()'d,
since the kernel stack limit was being exceeded.

While here, fix the types of a few variables, although
there should not be any semantics change caused by these
type changes.


# 068fc057 14-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_nget().
Future commits will do the same for other functions.


# 4ad3423b 13-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_loadattrcache().
Future commits will do the same for other functions.


# f37dc50d 17-Mar-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Do not do a Lookup+Open for pNFS mounts

A NFSv4.1/4.2 pNFS mount needs to do a
separate Open+LayoutGet RPC, so do not do
a Lookup+Open RPC for these mounts.

The Lookup+Open RPCs are still disabled,
until further testing is done, so this patch
has no effect at this time.


# 150da1e3 16-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Partially revert commit 867c27c23a5c

Commit 867c27c23a5c enabled the n_directio_opens code
in open/close, which sets/clears NNONCACHE, for
IO_APPEND. This code should not be enabled unless
newnfs_directio_enable is non-zero.

This patch reverts that part of commit 867c27c23a5c.

A future patch that fixes the case where the
file that is being written IO_APPEND is mmap()'d.

MFC after: 3 months


# 867c27c2 15-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Change IO_APPEND writes to direct I/O

IO_APPEND writes have always been very slow over NFS, due to
the need to acquire an up to date file size after flushing
all writes to the NFS server.

This patch switches the IO_APPEND writes to use direct I/O,
bypassing the buffer cache. As such, flushing of writes
normally only occurs when the open(..O_APPEND..) is done.
It does imply that all writes must be done synchronously
and must be committed to stable storage on the file server
(NFSWRITE_FILESYNC).

For a simple test program that does 10,000 IO_APPEND writes
in a loop, performance improved significantly with this patch.

For a UFS exported file system, the test ran 12x faster.
This drops to 3x faster when the open(2)/close(2) are done
for each loop iteration.
For a ZFS exported file system, the test ran 40% faster.

The much smaller improvement may have been because the ZFS
file system I tested against does not have a ZIL log and
does have "sync" enabled.

Note that IO_APPEND write performance is still much slower
than when done on local file systems.

Although this is a simple patch, it does result in a
significant semantics change, so I have given it a
large MFC time.

Tested by: otis
MFC after: 3 months


# fe04c911 13-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: add a filesize limit check to nfs_allocate()

As reported in PR#260343, nfs_allocate() did not check
the filesize rlimit. This patch adds that check.

PR: 260343
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33422


# c3134a6a 27-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Disable use of the LookupOpen RPC

The LookupOpen RPC reduces the number of Open RPCs
needed. Unfortunately, it breaks certain software
builds over NFS, so disable it until this is fixed.

The LookupOpen RPC is only used for NFSv4.1/4.2
mounts when the "oneopenown" mount option is
specified, so this should not affect many users.


# 8ef0c11e 15-Nov-2021 Konstantin Belousov <kib@FreeBSD.org>

nfsclient: upgrade vnode lock in VOP_OPEN()/VOP_CLOSE() if we need to flush buffers

VOP_FSYNC() asserts that the vnode is exclusively locked for NFS.
If we try to execute file with recently modified content, the assert is
triggered.

Reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32999


# f0c9847a 06-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE

When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.

This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.

The VOP_ALLOCATE.9 man page will be patched separately.

Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865


# 6b677534 03-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix forced dismount from looping on commit

When a forced dismount is in progress, it is possible to
end up looping, retrying commits that fail.
This patch fixes the problem by pretending
that commits succeeded when a forced dismount is in prgress.

MFC after: 2 weeks


# 50dcff08 30-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add setting n_localmodtime to the Write RPC code

Similar to commit 2be417843a, I believe there could be a race between
the NFS client VOP_LOOKUP() and file Writing that could result in stale
file attributes being loaded into the NFS vnode by VOP_LOOKUP().

I have not been able to reproduce a failure due to this race, but
I believe that there are two possibilities:

The Lookup RPC happens while VOP_WRITE() is being executed and loads
stale file attributes after VOP_WRITE() returns when it has already
completed the Write/Commit RPC(s).
--> For this case, setting the local modify timestamp at the end of
VOP_WRITE() should ensure that stale file attributes are not loaded.

The Lookup RPC occurs after VOP_WRITE() has returned, while
asynchronous Write/Commit RPCs are in progress and then is
blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which
will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the
NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP()
then acquires the NFS vnode lock and fills in stale file attributes.
--> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf()
when they clear NMODIFIED should ensure that stale file attributes
are not loaded.

This patch does the above.

PR: 259071
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32677


# ab87c39c 30-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Set n_localmodtime in Deallocate

Commit 2be417843a04 added n_localmodtime, which is used by Lookup
and ReaddirPlus to check to see if the file attributes in an RPC
reply might be stale. This patch sets n_localmodtime in Deallocate.
Done as a separate commit, since Deallocate is not in stable/13.

PR: 259071
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D32635


# 2be41784 30-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.

The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).

Here's what seems to happen:
Child Parent

does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time

blocks waiting for locked file does ftruncate(), which
vnode does VOP_SETATTR() of Size,
changing the file's size
while holding an exclusive
lock on the file's vnode
releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
does a read() which returns
wrong data size

This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.

With this patch the test program works as expected.

Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code. I suspect a
fairly recent change to the name caching code exposed this
bug.

PR: 259071
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32635


# b4a58fbf 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove cn_thread

It is always curthread.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32453


# 235891a1 10-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix NFS VOP_ALLOCATE for mounts without Allocate support

Without this patch, nfs_allocate() fell back on using vop_stdallocate()
for NFS mounts without Allocate operation support. This was incorrect,
since some file systems, such as ZFS, cannot do allocate via
vop_stdallocate(), which uses writes to try and allocate blocks.

Also, fix nfs_allocate() to return EINVAL when mounts cannot do Allocate,
since that is the correct error for posix_fallocate(2).
Note that Allocate is only supported by some NFSv4.2 servers.

MFC after: 2 weeks


# ad6dc365 18-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Use vfs.nfs.maxalloclen to limit Deallocate RPC RTT

Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.

This patch uses the sysctl vfs.nfs.maxalloclen to set
the limit on the size of the Deallocate operation.
There is no way to know how long a server will take to do an
deallocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on.

For an 8Gbyte deallocation, the elapsed time for doing it in 64Mbyte
chunks was the same (within margin of variability) as the
elapsed time taken for a single large deallocation
operation for a FreeBSD server with a UFS file system.


# 9ebe4b8c 15-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add vfs.nfs.maxalloclen to limit Allocate/Deallocate RPC RTT

Unlike Copy, the NFSv4.2 Allocate and Deallocate operations do not
allow a reply with partial completion. As such, the only way to
limit the time the operation takes to provide a reasonable RPC RTT
is to limit the size of the allocation/deallocation in the NFSv4.2
client.

This patch adds a sysctl called vfs.nfs.maxalloclen to set
the limit on the size of the Allocate operation.
There is no way to know how long a server will take to do an
allocate operation, but 64Mbytes results in a reasonable
RPC RTT for the slow hardware I test on, so that is what
the default value for vfs.nfs.maxalloclen is set to.

For an 8Gbyte allocation, the elapsed time for doing it in 64Mbyte
chunks was the same as the elapsed time taken for a single large
allocation operation for a FreeBSD server with a UFS file system.

MFC after: 2 weeks


# 08b9cc31 27-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a VOP_DEALLOCATE() for the NFSv4.2 client

This patch adds a VOP_DEALLOCATE() to the NFS client.
For NFSv4.2 servers that support the Deallocate operation,
it is used. Otherwise, it falls back on calling
vop_stddeallocate().

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31640


# 3ad1e1c1 11-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Lookup+Open RPC for NFSv4.1/4.2

This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2
NFS client, which can be used by nfs_lookup() so that a
subsequent Open RPC is not required.
It uses the cn_flags OPENREAD, OPENWRITE added by commit c18c74a87c15.
This reduced the number of RPCs by about 15% for a kernel
build over NFS.

For now, use of Lookup+Open is only done when the "oneopenown"
mount option is used. It may be possible for Lookup+Open to
be used for non-oneopenown NFSv4.1/4.2 mounts, but that will
require extensive further testing to determine if it works.

While here, I've added the changes to the nfscommon module
that are needed to implement the Deallocate NFSv4.2 operation.
This avoids needing another cycle of changes to the internal
KAPI between the NFS modules.

This commit has changed the internal KAPI between the NFS
modules and, as such, all need to be rebuilt from sources.
I have not bumped __FreeBSD_version, since it was bumped a
few days ago.


# efea1bc1 28-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Cache an open stateid for the "oneopenown" mount option

For NFSv4.1/4.2, if the "oneopenown" mount option is used,
there is, at most, only one open stateid for each NFS vnode.
When an open stateid for a file is acquired, set a pointer to
the open structure in the NFS vnode. This pointer can be used to
acquire the open stateid without searching the open linked list
when the following is true:
- No delegations have been issued for the file. Since delegations
can outlive an NFS vnode for a file, use the global
NFSMNTP_DELEGISSUED flag on the mount to determine this.
- No lock stateid has been issued for the file. To determine
this, a new NFS vnode flag called NMIGHTBELOCKED is set when a lock
stateid is issued, which can then be tested.

When this open structure pointer can be used, it avoids the need to
acquire the NFSCLSTATELOCK() and searching the open structure list for
an open. The NFSCLSTATELOCK() can be highly contended when there are
a lot of opens issued for the NFSv4.1/4.2 mount.

This patch only affects NFSv4.1/4.2 mounts when the "oneopenown"
mount option is used.

MFC after: 2 weeks


# dd02d9d6 07-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for va_birthtime to NFSv4

There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156


# 8bde6d15 04-May-2021 Mark Johnston <markj@FreeBSD.org>

nfsclient: Copy only initialized fields in nfs_getattr()

When loading attributes from the cache, the NFS client is careful to
copy only the fields that it initialized. After fetching attributes
from the server, however, it would copy the entire vattr structure
initialized from the RPC response, so uninitialized stack bytes would
end up being copied to userspace. In particular, va_birthtime (v2 and
v3) and va_gen (v3) had this problem.

Use a common subroutine to copy fields provided by the NFS client, and
ensure that we provide a dummy va_gen for the v3 case.

Reviewed by: rmacklem
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30090


# 15bed8c4 28-Feb-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsclient: add nfs node locking around uses of n_direofoffset

During code inspection I noticed that the n_direofoffset field
of the NFS node was being manipulated without any lock being
held to make it SMP safe.
This patch adds locking of the NFS node's mutex around
handling of n_direofoffset to make it SMP safe.

I have not seen any failure that could be attributed to n_direofoffset
being manipulated concurrently by multiple processors, but I think this
is possible, since directories are read with shared vnode
locking, plus locks only on individual buffer cache blocks.
However, there have been as yet unexplained issues w.r.t reading
large directories over NFS that could have conceivably been caused
by concurrent manipulation of n_direofoffset.

MFC after: 2 weeks


# 3e04ab36 28-Feb-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsclient: add checks for a server returning the current directory

Commit 3fe2c68ba20f dealt with a panic in cache_enter_time() where
the vnode referred to the directory argument.
It would also be possible to get these panics if a broken
NFS server were to return the directory as an new object being
created within the directory or in a Lookup reply.

This patch adds checks to avoid the panics and logs
messages to indicate that the server is broken for the
file object creation cases.

Reviewd by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28987


# 9f669985 30-Sep-2020 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one
successful RPC.

Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would
loop until the copy "len" was completed. The problem with doing this is
that it might take a considerable time to complete for a large "len".
By returning after a single successful Copy RPC that copied some of the
data, the application that did the copy_file_range(2) syscall will be
more responsive to signal delivery for large "len" copies.


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# d292b194 05-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the obsolete privused argument from vaccess

This brings argument count down to 6, which is passable without the
stack on amd64.


# eea79fde 17-Jun-2020 Alan Somers <asomers@FreeBSD.org>

Remove vfs_statfs and vnode_mount macros from NFS

These macro definitions are no longer needed as the NFS OSX port is long
dead. The vfs_statfs macro conflicts with the vfsops field of the same
name.

Submitted by: shivank@
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2020)
Differential Revision: https://reviews.freebsd.org/D25263


# fb8ed4c5 14-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv2 extended attribute support to handle 0 length attributes.

I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.

Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by: Frank van der Linden <fllinder@amazon.com>


# 6a5abb1e 02-Feb-2020 Kyle Evans <kevans@FreeBSD.org>

Provide O_SEARCH

O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.

This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.

This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23247


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# 6fa079fc 15-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: flatten vop vectors

This eliminates the following loop from all VOP calls:

while(vop != NULL && \
vop->vop_spare2 == NULL && vop->vop_bypass == NULL)
vop = vop->vop_default;

Reviewed by: jeff
Tesetd by: pho
Differential Revision: https://reviews.freebsd.org/D22738


# bf6ac05a 12-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add some more initializations to quiet riscv build.

The one case in nfs_copy_file_range() was a legitimate case, although
it would probably never occur in practice.


# c057a378 12-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for NFSv4.2 to the NFS client and server.

This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
- posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
- posix_fallocate()
- intra server file range copying via the copy_file_range(2) syscall
--> Avoiding data tranfer over the wire to/from the NFS client.
- lseek(SEEK_DATA/SEEK_HOLE)
- Extended attribute syscalls for "user" namespace attributes as defined
by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes: yes


# abd80ddb 08-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce v_irflag and make v_type smaller

The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715


# 9698d992 29-Nov-2019 Konstantin Belousov <kib@FreeBSD.org>

In nfs_lock(), recheck vp->v_data after lock before accessing it.

We might race with reclaim, and then this is no longer a nfs vnode, in
which case we do not need to handle deferred vnode_pager_setsize()
either.

Reported by: rk@ronald.org
PR: 242184
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 67d0e293 29-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTY
flag and use the same system.

This enables further fault locking improvements by allowing more faults to
proceed with a shared lock.

Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22116


# c6ba06d8 22-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix interface between nfsclient and vnode pager.

Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked. This ensures that page fault always can find the
backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK
flag on NFS nodes.

The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well. Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.

Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.

Tested by: pho
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883


# 5d85e12f 23-Sep-2019 Rick Macklem <rmacklem@FreeBSD.org>

Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.

For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by: kib
MFC after: 1 week


# a6935d08 05-Sep-2019 Conrad Meyer <cem@FreeBSD.org>

Remove long-dead BUF_ASSERT_{,UN}HELD assertions

These were fully neutered in r177676 (2008), but not removed at the time for
unclear reasons. They're totally dead code, so go ahead and yank them now.

No functional change.


# 6aab442a 30-May-2019 Rick Macklem <rmacklem@FreeBSD.org>

Get rid of extraneous initialization.

Get rid of an extraneous initialization, mainly to keep a static analyser
happy. No semantic change.

PR: 238167
Submitted by: Alexey Dokuchaev


# 26fd36b2 30-May-2019 Rick Macklem <rmacklem@FreeBSD.org>

Clean up silly code case.

This silly code segment has existed in the sources since it was brought
into FreeBSD 10 years ago. I honestly have no idea why this was done.
It was possible that I thought that it might have been better to not
set B_ASYNC for the "else" case, but I can't remember.
Anyhow, this patch gets rid of the if/else that does the same thing
either way, since it looks silly and upsets a static analyser.
This will have no semantic effect on the NFS client.

PR: 238167


# 65417f5e 24-May-2019 Alan Somers <asomers@FreeBSD.org>

Remove "struct ucred*" argument from vtruncbuf

vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it. Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20377


# 391918a3 06-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Do not flush NFS node from NFS VOP_SET_TEXT().

The more appropriate place to do the flushing is VOP_OPEN(). This was
uncovered because VOP_SET_TEXT() is now called with the vnode'
vm_object rlocked, which is incompatible with the flush operations.

After the move, there is no need for NFS-specific VOP_SET_TEXT
overload.

Sponsored by: The FreeBSD Foundation
MFC after: 30 days


# 78022527 05-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Switch to use shared vnode locks for text files during image activation.

kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923


# f5fdf82d 11-Mar-2019 Simon J. Gerraty <sjg@FreeBSD.org>

Add _PC_ACL_* to vop_stdpathconf

This avoid EINVAL from tmpfs etc.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D19512


# 6ad8a6ea 06-Nov-2018 Rick Macklem <rmacklem@FreeBSD.org>

Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end.

Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error);
in many places. This patch replaces these code sequenences with a "goto out;"
and does the NFSVOPUNLOCK(); return (error); at the end of the function
in order to make the vnode locking simpler.
This patch does not change the semantics of nfs_advlock().

Suggested by: kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D17853


# 881a9516 01-Nov-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix NFS client vnode locking to avoid a crash during forced dismount.

A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.

Tested by: rlibby
PR: 232673
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D17757


# 8ff7fad1 23-Oct-2018 Konstantin Belousov <kib@FreeBSD.org>

Only call sigdeferstop() for NFS.

Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17658


# 222daa42 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

style: Remove remaining deprecated MALLOC/FREE macros

Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions). Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by: jhb
Reviewed by: cy, imp, markj, rmacklem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14035


# d821d364 21-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

Unsign some values related to allocation.

When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after: 3 weeks


# ac2fffa4 21-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by: wosch
PR: 225197


# d48d1a64 15-Jan-2018 Pedro F. Giffuni <pfg@FreeBSD.org>

nfsclient: make some use of mallocarray(9).

Focus on code where we are doing multiplications within malloc(9). None of
these ire likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.

X-Differential revision: https://reviews.freebsd.org/D13837


# 95c8838c 21-Dec-2017 Konstantin Belousov <kib@FreeBSD.org>

Fix build for LP64 arches with gcc.

gcc complaints that the comparision is always false due to the value
range, and the cast does not prevent the analysis. Split the LP64
vs. ILP32 clamping as a workaround.

Sponsored by: The FreeBSD Foundation


# b501cc5d 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Rework pathconf handling for FIFOs.

On the one hand, FIFOs should respect other variables not supported by
the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.).
These values are fs-specific and must come from a fs-specific method.
On the other hand, filesystems that support FIFOs are required to
support _PC_PIPE_BUF on directory vnodes that can contain FIFOs.
Given this latter requirement, once the fs-specific VOP_PATHCONF
method supports _PC_PIPE_BUF for directories, it is also suitable for
FIFOs permitting a single VOP_PATHCONF method to be used for both
FIFOs and non-FIFOs.

To that end, retire all of the FIFO-specific pathconf methods from
filesystems and change FIFO-specific vnode operation switches to use
the existing fs-specific VOP_PATHCONF method. For fifofs, set it's
VOP_PATHCONF to VOP_PANIC since it should no longer be used.

While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that
only filesystems supporting FIFOs will report a value. In addition,
only report a valid _PC_PIPE_BUF for directories and FIFOs.

Discussed with: bde
Reviewed by: kib (part of a larger patch)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D12572


# a0a073b1 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Update NFS to handle larger link counts post ino64.

- Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol.
- Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported
for a PATHCONF RPC by the NFS server.
- Use NFS_LINK_MAX instead of LINK_MAX as the default value reported
by the NFS client pathconf() if not overridden by the NFS server.
- When reading the link count out of an RPC reply, read the full 32
bits instead of the lower 16 bits.

Reviewed by: rmacklem (earlier version)
Sponsored by: Chelsio Communications


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# e5cffdd3 20-Aug-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not drop NFS vnode lock when performing consistency checks.

Currently several paths in the NFS client upgrade the shared vnode
lock to exclusive, which might cause temporal dropping of the lock.
This action appears to be fatal for nullfs mounts over NFS. If the
operation is performed over nullfs vnode, then bypassed down to NFS
VOP, and the lock is dropped, other thread might reclaim the upper
nullfs vnode. Since on reclaim the nullfs vnode lock and NFS vnode
lock are split, the original lock state of the nullfs vnode is not
restored. As result, VFS operations receive not locked vnode after a
VOP call.

Stop upgrading the vnode lock when we check the consistency or flush
buffers as result of detected inconsistency. Instead, allocate a new
lockmgr lock for each NFS node, which is locked exclusive instead of
the vnode lock upgrade. In other words, the other parallel
modification of the vnode are excluded by either vnode lock conflict
or exclusivity of the new lock when the vnode lock is shared.

Also revert r316529 because now the vnode cannot be reclaimed during
ncl_vinvalbuf().

In collaboration with: pho
Reviewed by: rmacklem
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D12083


# 16f300fa 27-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.

This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by: kib
MFC after: 2 weeks


# 15a88f81 11-Jul-2017 John Baldwin <jhb@FreeBSD.org>

Consistently use vop_stdpathconf() for default pathconf values.

Update filesystems not currently using vop_stdpathconf() in pathconf
VOPs to use vop_stdpathconf() for any configuration variables that do
not have filesystem-specific values. vop_stdpathconf() is used for
variables that have system-wide settings as well as providing default
values for some values based on system limits. Filesystems can still
explicitly override individual settings.

PR: 219851
Reported by: cem
Reviewed by: cem, kib, ngie
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D11541


# 81b07aac 25-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add support to the NFSv4.1/pNFS client for commits through the DS.

A NFSv4.1/pNFS server using File Layout can specify that Commit operations
are to be done against the DS instead of MDS. Since no extant pNFS
server did this, the code was untested and "#ifdef notyet".
The FreeBSD pNFS server I am developing does specify that Commits be done
through the DS, so the code has been enabled/tested.
This patch should only affect the case of a pNFS server that specfies
Commits through the DS.

PR: 219551
MFC after: 2 weeks


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439


# 6d4377c1 14-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Remove unused "cred" argument to ncl_flush().

The "cred" argument of ncl_flush() is unused and it was confusing to have
the code passing in NULL for this argument in some cases. This patch deletes
this argument.
There is no semantic change because of this patch.

MFC after: 2 weeks


# 0649fcae 12-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFS client for "text file modified, process killed" mmap'd case.

When an mmap'd text file is written and then executed immediately
afterwards, it was possible that the modify time would change after the
text file was executing, resulting in the process executing the file
being killed. This was usually only observed when the file system's
times were set to higher resolution, but could have occurred for any
time resolution.
This was reported on a recent email list discussion.
This patch adds a VOP_SET_TEXT() to the NFS client which flushed all
dirty pages to the NFS server and then makes sure that n_mtime is up
to date to avoid this from occurring.
Thanks go to kib@ and pho@ for their help with developing this patch.

Tested by: pho
Reviewed by: kib
MFC after: 2 weeks


# 6627d919 11-Apr-2017 Konstantin Belousov <kib@FreeBSD.org>

Remove debugging printf.

Instead, issue a diagnostic and return appropriate error if
ncl_flush() was unable to clean buffer queue after the specified
number or retries.

Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# a046da7e 05-Apr-2017 Konstantin Belousov <kib@FreeBSD.org>

Remove spl*() calls from the nfsclient code. Style adjustments in the
related lines in ncl_writebp().

Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 48fe9263 05-Apr-2017 Konstantin Belousov <kib@FreeBSD.org>

Handle possible vnode reclamation after ncl_vinvalbuf() call.

ncl_vinvalbuf() might need to upgrade vnode lock, allowing the vnode
to be reclaimed by other thread. Handle the situation, indicated by
the returned error zero and VI_DOOMED iflag set, converting it into
EBADF. Handle all calls, even where the vnode is exclusively locked
right now.

Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D10241


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 1c324569 06-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Use type-independent formats for printing nlink_t and ino_t.

Extracted from: ino64 work by gleb, mckusick
Discussed with: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 753a007f 22-Nov-2016 Konstantin Belousov <kib@FreeBSD.org>

Use buffer pager for NFS.

The pager, due to its construction, implements clustering for the
page-ins. In particular, buildworld load demonstrates reduction of
the READ RPCs from 39k down to 24k. No change in real or CPU time was
observed.

Discussed with, and measured by: bde
No objections from: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# fc2c3afe 22-Nov-2016 Konstantin Belousov <kib@FreeBSD.org>

Minor cleanup, remove unneeded XXX comments and unused re-define.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 1b819cf2 12-Aug-2016 Rick Macklem <rmacklem@FreeBSD.org>

Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.

Submitted by: will (earlier version)
Reviewed by: ken
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D1626


# ad600ac8 03-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Remove ncl_printf(), use printf(9) directly. After r303710 the
function duplicates printf().

Correct function names in the messages [*].

Noted by: bde [*]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# e37dfd3d 19-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Do not access NFS data for reclaimed vnode.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)


# b6a60ae7 11-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Use vfs_hash_ref(9) to eliminate LK_EXCLOTHER kludge. As a
consequence, the nfs client override of VOP_LOCK1() is no longer
needed.

Reviewed and tested by: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# a96c9b30 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

NFS: spelling fixes on comments.

No funcional change.


# 74b8d63d 10-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


# 86b9457f 21-May-2015 Rick Macklem <rmacklem@FreeBSD.org>

The NFS client wasn't handling getdirentries(2) requests for sizes
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.

Tested by: trasz
Reported by: trasz
Reviewed by: trasz
MFC after: 2 weeks


# 2f88b3d2 25-Dec-2014 Rick Macklem <rmacklem@FreeBSD.org>

Delete some duplicate code that was harmless because
exactly the same code is at the end of the nfscl_checksattr()
function that is called just before it. As such, this code
had already been executed and didn't do anything.

MFC after: 1 week


# 6c21f6ed 18-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 65589a29 16-Jul-2014 Konstantin Belousov <kib@FreeBSD.org>

Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code. In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# c0990eda 23-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4 client's Pathconf RPC (actually a Getattr Op.)
so that it only does the RPC for names that are answered by the RPC.
Doing the RPC for other names is harmless, but unnecessary.

MFC after: 2 weeks


# c7b560b9 21-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

For an NFSv4 mount with the "nocto" option, don't get the
up to date file attributes upon close. This reduces the
Getattr RPC count by about 65% for software builds.

MFC after: 2 weeks


# cf766161 07-Dec-2013 Rick Macklem <rmacklem@FreeBSD.org>

For software builds, the NFS client does many small
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.

Submitted by: jhb (earlier version)
Reviewed by: kib
MFC after: 3 weeks


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# 56239558 21-Jun-2013 Rick Macklem <rmacklem@FreeBSD.org>

When the NFSv4.1 client is writing to a pNFS Data Server (DS), the
file's size attribute does not get updated. As such, it is necessary
to invalidate the attribute cache before clearing NMODIFIED for pNFS.

MFC after: 2 weeks


# 22a72260 30-May-2013 Jeff Roberson <jeff@FreeBSD.org>

- Convert the bufobj lock to rwlock.
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.

Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf


# 77a03c14 12-May-2013 Rick Macklem <rmacklem@FreeBSD.org>

Add support for the eofflag to nfs_readdir() in the new NFS
client so that it works under a unionfs mount.

Submitted by: Jared Yanovich (slovichon@gmail.com)
Reviewed by: kib
MFC after: 2 weeks


# 3b14c753 13-Mar-2013 John Baldwin <jhb@FreeBSD.org>

Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.


# 89f6b863 08-Mar-2013 Attilio Rao <attilio@FreeBSD.org>

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


# d177f14d 18-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Use vfs_timestamp() to set file timestamps rather than invoking
getmicrotime() or getnanotime() directly in NFS.

Reviewed by: rmacklem, bde
MFC after: 1 week


# 1f60bfd8 08-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


# 7af1242a 11-May-2012 Rick Macklem <rmacklem@FreeBSD.org>

PR# 165923 reported intermittent write failures for dirty
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.

Tested by: Joel Ray Holveck (joelh at juniper.net)
PR: kern/165923
Reviewed by: kib
MFC after: 2 weeks


# 4964d807 27-Apr-2012 Rick Macklem <rmacklem@FreeBSD.org>

It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.

Tested by: Steven Haber (steven.haber at isilon.com)
MFC after: 2 weeks


# a53373fa 17-Mar-2012 Konstantin Belousov <kib@FreeBSD.org>

Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client
behaviour on error from write RPC back to behaviour of old nfs client.
When set to not zero, the pages for which write failed are kept dirty.

PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks


# 5e99212d 02-Mar-2012 Rick Macklem <rmacklem@FreeBSD.org>

Post r230394, the Lookup RPC counts for both NFS clients increased
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.

Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks


# 526d0bd5 20-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# bf40d24a 06-Feb-2012 John Baldwin <jhb@FreeBSD.org>

Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().


# c480f781 06-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks


# d5210589 25-Jan-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps. Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by: jhb (previous version)
MFC after: 2 weeks


# 0b17c7be 25-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Add a timeout on positive name cache entries in the NFS client. That is,
we will only trust a positive name cache entry for a specified amount of
time before falling back to a LOOKUP RPC, even if the ctime for the file
handle matches the cached copy in the name cache entry. The timeout is
configured via a new 'nametimeo' mount option and defaults to 60 seconds.
It may be set to zero to disable positive name caching entirely.

Reviewed by: rmacklem
MFC after: 1 week


# 5aefb4cb 20-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client. The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries. However, if there are multiple entries for a single vnode,
they all share a single timestamp. To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry. The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode. Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache. The latter is subject to races with other
lookups updating the attribute cache concurrently. Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
cache does not store name cache entries for ".", so we cannot get a
useful timestamp. It didn't really make much sense to recheck the
attributes on the the directory to validate the namecache hit for "."
anyway.
- ABI compat shims for the name cache routines are present in this commit
so that it is safe to MFC.

MFC after: 2 weeks


# a5e583ee 15-Nov-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the new NFS client so that nfs_fsync() only calls ncl_flush()
for regular files. Since other file types don't write into the
buffer cache, calling ncl_flush() is almost a no-op. However, it does
clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for
directories.

MFC after: 2 weeks


# 670bf6f1 13-Nov-2011 Rick Macklem <rmacklem@FreeBSD.org>

Since NFSv4 byte range locking only works for regular files,
add a sanity check for the vnode type to the NFSv4 client.

MFC after: 2 weeks


# d1907de2 30-Jul-2011 Rick Macklem <rmacklem@FreeBSD.org>

The new NFS client failed to vput() the new vnode if a setattr
failed after the file was created in nfs_create(). This would
probably only happen during a forced dismount. The old NFS client
does have a vput() for this case. Detected by pho during recent
testing, where an open syscall returned with a vnode still locked.

Tested by: pho
Approved by: re (kib)
MFC after: 2 weeks


# 68347a92 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# a9989634 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 98f234f3 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 8f0e65c9 18-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add DTrace support to the new NFS client. This is essentially
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.


# fb35711d 05-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add support for flock(2) locks to the new NFSv4 client. I think this
should be ok, since the client now delays NFSv4 Close operations
until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no
risk that the NFSv4 Open is closed while an associated byte range lock
still exists.

Tested by: avg
MFC after: 2 weeks


# f8f4e256 05-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

The new NFSv4 client was erroneously using "p" instead of
"p_leader" for the "id" for POSIX byte range locking. I think
this would only have affected processes created by rfork(2)
with the RFTHREAD flag specified. This patch fixes that by
passing the "id" down through the various functions from
nfs_advlock().

MFC after: 2 weeks


# b398d106 31-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFS client so that it doesn't do an NFSv3
Pathconf RPC for cases where the reply doesn't include
the answer. This fixes a problem reported by avg@ where
the NFSv3 Pathconf RPC would fail when "ls -l" did an
lpathconf(2) for _PC_ACL_NFS4.

Tested by: avg
MFC after: 2 weeks


# 81ddb192 25-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add some missing mutex locking to the new NFS client.

MFC after: 2 weeks


# 147206ae 25-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFS client so that it correctly sets the "must_commit"
argument for a write RPC when it succeeds for the first one and
fails for a subsequent RPC within the same call to the function.
This makes it compatible with the old NFS client for this case.

MFC after: 2 weeks


# 76036f2b 22-May-2011 Alan Cox <alc@FreeBSD.org>

Eliminate duplicate #include's.


# 1f376590 15-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.


# e2f2b370 04-May-2011 Ruslan Ermilov <ru@FreeBSD.org>

Implemented a mount option "nocto" that disables cache coherency
checking at open time. It may improve performance for read-only
NFS mounts. Use deliberately.

MFC after: 1 week
Reviewed by: rmacklem, jhb (earlier version)


# bc62b5cf 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add a vput() to nfs_lookitup() in the experimental NFS client
for a case that will probably never happen. It can only
happen if a server were to successfully lookup a file, but not
return attributes for that file. Although technically allowed
by the NFSv3 RFC, I doubt any server would ever do this.
However, if it did, the client would have not vput()'d the
new vnode when it needed to do so.

MFC after: 2 weeks


# ab42af27 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add vput() calls in two places in the experimental NFS client
that would be needed if, in the future, nfscl_loadattrcache()
were to return an error. Currently nfscl_loadattrcache()
never returns an error, so these cases never currently happen.

MFC after: 2 weeks


# 78d8a600 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Change the mutex locking for several locations in the
experimental NFS client's vnode op functions to make
them compatible with the regular NFS client. I'll admit
I'm not sure that the mutex locks around the assignments
are needed, but the regular client has them, so I added them.
Also, add handling of the case of partial attributes in
setattr to be compatible with the regular client.

MFC after: 2 weeks


# 0a9f005d 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix up some of the sysctls for the experimental NFS client so
that they use the same names as the regular client. Also add
string descriptions for them.

MFC after: 2 weeks


# 4b3a38ec 16-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add a lktype flags argument to nfscl_nget() and ncl_nget() in the
experimental NFS client so that its nfs_lookup() function can use
cn_lkflags in a manner analagous to the regular NFS client.

MFC after: 2 weeks


# 149ce102 13-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add VOP_PATHCONF() support to the experimental NFS client
so that it can, along with other things, report whether or
not NFS4 ACLs are supported.

MFC after: 2 weeks


# f93d95cb 29-Oct-2010 Rick Macklem <rmacklem@FreeBSD.org>

Modify nfs_open() in the experimental NFS client to be compatible
with the regular NFS client. Also, fix a couple of mutex lock issues.

MFC after: 1 week


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# ca27c028 18-Oct-2010 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFS clients and the NLM so that the NLM can be used
by both clients. Since the NLM uses various fields of the
nfsmount structure, those fields were extracted and put in a
separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
This structure also has a function pointer for a function that
extracts the required information from the mount point and nfs vnode
for that particular client, for information stored differently by the
clients.

Reviewed by: jhb
MFC after: 2 weeks


# a8c0af59 09-Sep-2010 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFS client so that it doesn't panic when
NFSv2,3 byte range locking is attempted. A fix that allows the
nlm_advlock() to work with both clients is in progress, but
may take a while. As such, I am doing this commit so that
the kernel doesn't panic in the meantime.

Submitted by: jh
MFC after: 2 weeks


# 8e27c182 07-Sep-2010 John Baldwin <jhb@FreeBSD.org>

Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache. While here,
remove the 'n_expiry' field as it is no longer used.

Reviewed by: rmacklem
MFC after: 1 week


# 0372f5f4 04-Sep-2010 Rick Macklem <rmacklem@FreeBSD.org>

Disable use of the NLM in the experimental NFS client, since
it will crash the kernel because it uses the nfsmount and
nfsnode structures of the regular NFS client.

MFC after: 2 weeks


# e3649d5a 02-Aug-2010 Rick Macklem <rmacklem@FreeBSD.org>

Modify the return value for nfscl_mustflush() from boolean_t,
which I mistakenly thought was correct w.r.t. style(9), back
to int and add the checks for != 0. This is just a stylistic
modification.

MFC after: 1 week


# f92bbff2 24-Jul-2010 Rick Macklem <rmacklem@FreeBSD.org>

Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.

Reviewed by: jh
MFC after: 2 weeks


# 6ec1ef63 18-Jul-2010 Rick Macklem <rmacklem@FreeBSD.org>

Add a call to nfscl_mustflush() in nfs_close() of the experimental
NFSv4 client, so that attributes are not acquired from the server
when a delegation for the file is held. This can reduce the number
of Getattr Ops significantly.

MFC after: 2 weeks


# f9b1a4a3 15-Jul-2010 John Baldwin <jhb@FreeBSD.org>

Merge 208603, 209946, and 209948 to the new NFS client:
Move attribute cache flushes from VOP_OPEN() to VOP_LOOKUP() to provide
more graceful recovery for stale filehandles and eliminate the need for
conditionally clearing the attribute cache in the !NMODIFIED case in
VOP_OPEN().

Reviewed by: rmacklem
MFC after: 2 weeks


# b38f7723 12-Jun-2010 Konstantin Belousov <kib@FreeBSD.org>

In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by: Mikolaj Golub <to.my.trociny gmail com>
MFC after: 2 weeks


# 227b9ebe 30-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207170
An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.


# 52d8bf89 29-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207082
When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.


# 23f929df 24-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after: 1 week


# 67c5c2d2 22-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.

MFC after: 1 week


# 16a11310 13-Feb-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r203303
Patch the experimental NFS client so that there is a timeout
for negative name cache entries in a manner analogous to
r202767 for the regular NFS client. Also, make the code in
nfs_lookup() compatible with that of the regular client
and replace the sysctl variable that enabled negative name
caching with the mount point option.


# dd5b5a94 31-Jan-2010 Rick Macklem <rmacklem@FreeBSD.org>

Patch the experimental NFS client so that there is a timeout
for negative name cache entries in a manner analogous to
r202767 for the regular NFS client. Also, make the code in
nfs_lookup() compatible with that of the regular client
and replace the sysctl variable that enabled negative name
caching with the mount point option.

MFC after: 2 weeks


# 6049e7fc 11-Jan-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r201029
When porting the experimental nfs subsystem to the FreeBSD8 krpc,
I added 3 functions that were already in the experimental client
under different names. This patch deletes the functions in the
experimental client and renames the calls to use the other set.
(This is just removal of duplicated code and does not fix any bug.)


# 4a8e2176 26-Dec-2009 Rick Macklem <rmacklem@FreeBSD.org>

When porting the experimental nfs subsystem to the FreeBSD8 krpc,
I added 3 functions that were already in the experimental client
under different names. This patch deletes the functions in the
experimental client and renames the calls to use the other set.
(This is just removal of duplicated code and does not fix any bug.)

MFC after: 2 weeks


# 74991298 03-Dec-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove unneeded ifdefs.

Reviewed by: rmacklem


# 1bb015c0 11-Nov-2009 Jaakko Heinonen <jh@FreeBSD.org>

Create verifier used by FreeBSD NFS client is suboptimal because the
first part of a verifier is set to the first IP address from
V_in_ifaddrhead list. This address is typically the loopback address
making the first part of the verifier practically non-unique. The second
part of the verifier is initialized to zero making its initial value
non-unique too.

This commit changes the strategy for create verifier initialization:
just initialize it to a random value. Also move verifier handling into
its own function and use a mutex to protect the variable.

This change is a candidate for porting to sys/nfsclient.

Reviewed by: jhb, rmacklem
Approved by: trasz (mentor)


# 83864c81 28-Aug-2009 Marko Zec <zec@FreeBSD.org>

MFC r196503:

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)

Approved by: re (rwatson)


# 0348c661 24-Aug-2009 Marko Zec <zec@FreeBSD.org>

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)
MFC after: 3 days


# 6a795918 29-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental nfs client so that it only calls ncl_vinvalbuf()
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.

Approved by: re (kensmith), kib (mentor)


# c79e6976 22-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add changes to the experimental nfs client to use the PBDRY flag for
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.

Approved by: re (kensmith), kib (mentor)


# 405229f9 14-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental nfs client so that it does not cause a
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.

Approved by: re (kensmith), kib (mentor)


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# ad86aef9 12-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the handling of dotdot in lookup for the experimental nfs client
in a manner analagous to the change in r195294 for the regular nfs client.

Approved by: re (kensmith), kib (mentor)


# 2d9cfaba 25-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


# ebd8672c 17-Jun-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


# 81e3c4fc 17-Jun-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix handling of ".." in nfs_lookup() for the forced dismount case
by cribbing the change made to the regular nfs client in r194358.

Approved by: kib (mentor)


# 410654ec 09-Jun-2009 Rick Macklem <rmacklem@FreeBSD.org>

Since vn_lock() with the LK_RETRY flag never returns an error
for FreeBSD-CURRENT, the code that checked for and returned the
error was broken. Change it to check for VI_DOOMED set after
vn_lock() and return an error for that case. I believe this
should only happen for forced dismounts.

Approved by: kib (mentor)


# 705fe7ce 31-May-2009 Marko Zec <zec@FreeBSD.org>

Unbreak options VIMAGE kernel builds.

Approved by: julian (mentor)


# 6ffb637d 22-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Change the code in the experimental nfs client to avoid flushing
writes upon close when a write delegation is held by the client.
This should be safe to do, now that nfsv4 Close operations are
delayed until ncl_inactive() is called for the vnode.

Approved by: kib (mentor)


# 47a59856 18-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Change the experimental NFSv4 client so that it does not do
the NFSv4 Close operations until ncl_inactive(). This is
necessary so that the Open StateIDs are available for doing
I/O on mmap'd files after VOP_CLOSE(). I also changed some
indentation for the nfscl_getclose() function.

Approved by: kib (mentor)


# 9ec7b004 04-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)