History log of /linux-master/fs/nfsd/nfs4proc.c
Revision Date Author Comments
# 24d92de9 15-Feb-2024 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()

The main point of the guarded SETATTR is to prevent races with other
WRITE and SETATTR calls. That requires that the check of the guard time
against the inode ctime be done after taking the inode lock.

Furthermore, we need to take into account the 32-bit nature of
timestamps in NFSv3, and the possibility that files may change at a
faster rate than once a second.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6412e44c 15-Feb-2024 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Fix a regression in nfsd_setattr()

Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of
fh_(un)lock for file operations") broke the NFSv3 pre/post op
attributes behaviour when doing a SETATTR rpc call by stripping out
the calls to fh_fill_pre_attrs() and fh_fill_post_attrs().

Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Message-ID: <20240216012451.22725-1-trondmy@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4b148854 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

nfsd: make all of the nfsd stats per-network namespace

We have a global set of counters that we modify for all of the nfsd
operations, but now that we're exposing these stats across all network
namespaces we need to make the stats also be per-network namespace. We
already have some caching stats that are per-network namespace, so move
these definitions into the same counter and then adjust all the helpers
and users of these stats to provide the appropriate nfsd_net struct so
that the stats are maintained for the per-network namespace objects.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a2c91753 17-Nov-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Modify NFSv4 to use nfsd_read_splice_ok()

Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 11fec9b9 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

nfsd: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-50-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 21d316a7 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_copy_notify()

Replace open-coded encoding logic with the use of conventional XDR
utility functions.

Note that if we replace the cpn_sec and cpn_nsec fields with a
single struct timespec64 field, the encoder can use
nfsd4_encode_nfstime4(), as that is the data type specified by the
XDR spec.

NFS4ERR_INVAL seems inappropriate if the encoder doesn't support
encoding the response. Instead use NFS4ERR_SERVERFAULT, since this
condition is a software bug on the server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 92d82e99 12-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove a layering violation when encoding lock_denied

An XDR encoder is responsible for marshaling results, not releasing
memory that was allocated by the upper layer. We have .op_release
for that purpose.

Move the release of the ld_owner.data string to op_release functions
for LOCK and LOCKT.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cc313f80 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_layoutcommit()

Adopt the use of conventional XDR utility functions. Restructure
the encoder to better align with the XDR definition of the result.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fa341560 11-Sep-2023 NeilBrown <neilb@suse.de>

SUNRPC: change how svc threads are asked to exit.

svc threads are currently stopped using kthread_stop(). This requires
identifying a specific thread. However we don't care which thread
stops, just as long as one does.

So instead, set a flag in the svc_pool to say that a thread needs to
die, and have each thread check this flag instead of calling
kthread_should_stop(). The first thread to find and clear this flag
then moves towards exiting.

This removes an explicit dependency on sp_all_threads which will make a
future patch simpler.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5896a870 31-Aug-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: add trace points to track server copy progress

Add trace points on destination server to track inter and intra
server copy operations.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Tested-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 15d1975b 30-Aug-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point

Prepare for adding server copy trace points.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Tested-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fdd2630a 09-Sep-2023 Jeff Layton <jlayton@kernel.org>

nfsd: fix change_info in NFSv4 RENAME replies

nfsd sends the transposed directory change info in the RENAME reply. The
source directory is in save_fh and the target is in current_fh.

Reported-by: Zhi Li <yieli@redhat.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2218844
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 39039024 18-Jul-2023 NeilBrown <neilb@suse.de>

nfsd: don't allow nfsd threads to be signalled.

The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.

In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.

This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.

This will allow for some simplification in later patches.

A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f2b7019d 24-Jul-2023 Jeff Layton <jlayton@kernel.org>

nfsd: set missing after_change as before_change + 1

In the event that we can't fetch post_op_attr attributes, we still need
to set a value for the after_change. The operation has already happened,
so we're not able to return an error at that point, but we do want to
ensure that the client knows that its cache should be invalidated.

If we weren't able to fetch post-op attrs, then just set the
after_change to before_change + 1. The atomic flag should already be
clear in this case.

Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 97662607 21-Jul-2023 Jeff Layton <jlayton@kernel.org>

nfsd: remove unsafe BUG_ON from set_change_info

At one time, nfsd would scrape inode information directly out of struct
inode in order to populate the change_info4. At that time, the BUG_ON in
set_change_info made some sense, since having it unset meant a coding
error.

More recently, it calls vfs_getattr to get this information, which can
fail. If that fails, fh_pre_saved can end up not being set. While this
situation is unfortunate, we don't need to crash the box.

Move set_change_info to nfs4proc.c since all of the callers are there.
Revise the condition for setting "atomic" to also check for
fh_pre_saved. Drop the BUG_ON and just have it zero out both
change_attr4s when this occurs.

Reported-by: Boyang Xue <bxue@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2223560
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a332018a 21-Jul-2023 Jeff Layton <jlayton@kernel.org>

nfsd: handle failure to collect pre/post-op attrs more sanely

Collecting pre_op_attrs can fail, in which case it's probably best to
fail the whole operation.

Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and
have the callers check the return code and abort the operation if it's
not nfs_ok.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 81e72297 31-Jan-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: fix problems with cleanup on errors in nfsd4_copy

When nfsd4_copy fails to allocate memory for async_copy->cp_src, or
nfs4_init_copy_state fails, it calls cleanup_async_copy to do the
cleanup for the async_copy which causes page fault since async_copy
is not yet initialized.

This patche rearranges the order of initializing the fields in
async_copy and adds checks in cleanup_async_copy to skip un-initialized
fields.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 34e8f9ec 23-Jan-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: fix leaked reference count of nfsd4_ssc_umount_item

The reference count of nfsd4_ssc_umount_item is not decremented
on error conditions. This prevents the laundromat from unmounting
the vfsmount of the source file.

This patch decrements the reference count of nfsd4_ssc_umount_item
on error.

Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6ba434cb 17-Jan-2023 Jeff Layton <jlayton@kernel.org>

nfsd: clean up potential nfsd_file refcount leaks in COPY codepath

There are two different flavors of the nfsd4_copy struct. One is
embedded in the compound and is used directly in synchronous copies. The
other is dynamically allocated, refcounted and tracked in the client
struture. For the embedded one, the cleanup just involves releasing any
nfsd_files held on its behalf. For the async one, the cleanup is a bit
more involved, and we need to dequeue it from lists, unhash it, etc.

There is at least one potential refcount leak in this code now. If the
kthread_create call fails, then both the src and dst nfsd_files in the
original nfsd4_copy object are leaked.

The cleanup in this codepath is also sort of weird. In the async copy
case, we'll have up to four nfsd_file references (src and dst for both
flavors of copy structure). They are both put at the end of
nfsd4_do_async_copy, even though the ones held on behalf of the embedded
one outlive that structure.

Change it so that we always clean up the nfsd_file refs held by the
embedded copy structure before nfsd4_copy returns. Rework
cleanup_async_copy to handle both inter and intra copies. Eliminate
nfsd4_cleanup_intra_ssc since it now becomes a no-op.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1f0001d4 17-Jan-2023 Jeff Layton <jlayton@kernel.org>

nfsd: zero out pointers after putting nfsd_files on COPY setup error

At first, I thought this might be a source of nfsd_file overputs, but
the current callers seem to avoid an extra put when nfsd4_verify_copy
returns an error.

Still, it's "bad form" to leave the pointers filled out when we don't
have a reference to them anymore, and that might lead to bugs later.
Zero them out as a defensive coding measure.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fcb53097 17-Jan-2023 Jeff Layton <jlayton@kernel.org>

nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS

We're not doing any blocking operations for OP_OFFLOAD_STATUS, so taking
and putting a reference is a waste of effort. Take the client lock,
search for the copy and fetch the wr_bytes_written field and return.

Also, make find_async_copy a static function.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 65ba3d24 10-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Use per-CPU counters to tally server RPC counts

- Improves counting accuracy
- Reduces cross-CPU memory traffic

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# df24ac7a 18-Dec-2022 Dai Ngo <dai.ngo@oracle.com>

NFSD: enhance inter-server copy cleanup

Currently nfsd4_setup_inter_ssc returns the vfsmount of the source
server's export when the mount completes. After the copy is done
nfsd4_cleanup_inter_ssc is called with the vfsmount of the source
server and it searches nfsd_ssc_mount_list for a matching entry
to do the clean up.

The problems with this approach are (1) the need to search the
nfsd_ssc_mount_list and (2) the code has to handle the case where
the matching entry is not found which looks ugly.

The enhancement is instead of nfsd4_setup_inter_ssc returning the
vfsmount, it returns the nfsd4_ssc_umount_item which has the
vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called
it's passed with the nfsd4_ssc_umount_item directly to do the
clean up so no searching is needed and there is no need to handle
the 'not found' case.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: adjusted whitespace and variable/function names ]
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>


# e6cf91b7 11-Jan-2023 Xingyuan Mo <hdthky0@gmail.com>

NFSD: fix use-after-free in nfsd4_ssc_setup_dul()

If signal_pending() returns true, schedule_timeout() will not be executed,
causing the waiting task to remain in the wait queue.
Fixed by adding a call to finish_wait(), which ensures that the waiting
task will always be removed from the wait queue.

Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Signed-off-by: Xingyuan Mo <hdthky0@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7827c81f 05-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"

The premise that "Once an svc thread is scheduled and executing an
RPC, no other processes will touch svc_rqst::rq_flags" is false.
svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
threads when determining which thread to wake up next.

Found via KCSAN.

Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 75333d48 12-Dec-2022 Dai Ngo <dai.ngo@oracle.com>

NFSD: fix use-after-free in __nfs42_ssc_open()

Problem caused by source's vfsmount being unmounted but remains
on the delayed unmount list. This happens when nfs42_ssc_open()
return errors.

Fixed by removing nfsd4_interssc_connect(), leave the vfsmount
for the laundromat to unmount when idle time expires.

We don't need to call nfs_do_sb_deactive when nfs42_ssc_open
return errors since the file was not opened so nfs_server->active
was not incremented. Same as in nfsd4_copy, if we fail to
launch nfsd4_do_async_copy thread then there's no need to
call nfs_do_sb_deactive

Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Tested-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 79a1d88a 16-Nov-2022 Brian Foster <bfoster@redhat.com>

NFSD: pass range end to vfs_fsync_range() instead of count

_nfsd_copy_file_range() calls vfs_fsync_range() with an offset and
count (bytes written), but the former wants the start and end bytes
of the range to sync. Fix it up.

Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 01d53a88 07-Nov-2022 Jeff Layton <jlayton@kernel.org>

nfsd: return error if nfs4_setacl fails

With the addition of POSIX ACLs to struct nfsd_attrs, we no longer
return an error if setting the ACL fails. Ensure we return the na_aclerr
error on SETATTR if there is one.

Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs")
Cc: Neil Brown <neilb@suse.de>
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# eeff73f7 28-Oct-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfs4_preprocess_stateid_op() call sites

Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op()
call sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>


# c2528490 28-Oct-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Pass the target nfsd_file to nfsd_commit()

In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>


# 76ce4dce 01-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Cap rsize_bop result based on send buffer size

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

Add an NFSv4 helper that computes the size of the send buffer. It
replaces svc_max_payload() in spots where svc_max_payload() returns
a value that might be larger than the remaining send buffer space.
Callers who need to know the transport's actual maximum payload size
will continue to use svc_max_payload().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 781fde1a 22-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename the fields in copy_stateid_t

Code maintenance: The name of the copy_stateid_t::sc_count field
collides with the sc_count field in struct nfs4_stid, making the
latter difficult to grep for when auditing stateid reference
counting.

No behavior change expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6604148c 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove "inline" directives on op_rsize_bop helpers

These helpers are always invoked indirectly, so the compiler can't
inline these anyway. While we're updating the synopses of these
helpers, defensively convert their parameters to const pointers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3fdc5464 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing

Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 103cc1fa 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Parametrize how much of argsize should be zeroed

Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.

The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.

This patch should cause no behavior changes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1035d654 08-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add tracepoints to report NFSv4 callback completions

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>


# de29cf7e 08-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Trace NFSv4 COMPOUND tags

The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>


# 7518a3dc 05-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix handling of oversized NFSv4 COMPOUND requests

If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4ab3442c 31-Aug-2022 Jinpeng Cui <cui.jinpeng2@zte.com.cn>

NFSD: remove redundant variable status

Return value directly from fh_verify() do_open_permission()
exp_pseudoroot() instead of getting value from
redundant variable status.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 754035ff 19-Aug-2022 Olga Kornievskaia <kolga@netapp.com>

NFSD enforce filehandle check for source file in COPY

If the passed in filehandle for the source file in the COPY operation
is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 72f78ae0 18-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

NFSD: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# debf16f0 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: use explicit lock/unlock for directory ops

When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done
for renames, with lock_rename() as the explicit locking.

Also move the 'fill' calls closer to the operation that might change the
attributes. This way they are avoided on some error paths.

For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.

Making the locking explicit will simplify proposed future changes to
locking for directories. It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 19d008b4 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: reduce locking in nfsd_lookup()

nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.

Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.

nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.

NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
wasn't requested and no change happened. Now that nfsd_lookup()
doesn't use fh_lock(), we need to explicitly fill the attributes
when no create happens. A new fh_fill_both_attrs() is provided
for that task.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b677c0c6 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: always drop directory lock in nfsd_unlink()

Some error paths in nfsd_unlink() allow it to exit without unlocking the
directory. This is not a problem in practice as the directory will be
locked with an fh_put(), but it is untidy and potentially confusing.

This allows us to remove all the fh_unlock() calls that are immediately
after nfsd_unlink() calls.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 927bfc56 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.

nfsd_create() usually returns with the directory still locked.
nfsd_symlink() usually returns with it unlocked. This is clumsy.

Until recently nfsd_create() needed to keep the directory locked until
ACLs and security label had been set. These are now set inside
nfsd_create() (in nfsd_setattr()) so this need is gone.

So change nfsd_create() and nfsd_symlink() to always unlock, and remove
any fh_unlock() calls that follow calls to these functions.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c0cbe707 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: add posix ACLs to struct nfsd_attrs

pacl and dpacl pointers are added to struct nfsd_attrs, which requires
that we have an nfsd_attrs_free() function to free them.
Those nfsv4 functions that can set ACLs now set up these pointers
based on the passed in NFSv4 ACL.

nfsd_setattr() sets the acls as appropriate.

Errors are handled as with security labels.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d6a97d3f 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: add security label to struct nfsd_attrs

nfsd_setattr() now sets a security label if provided, and nfsv4 provides
it in the 'open' and 'create' paths and the 'setattr' path.
If setting the label failed (including because the kernel doesn't
support labels), an error field in 'struct nfsd_attrs' is set, and the
caller can respond. The open/create callers clear
FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case.
The setattr caller returns the error.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 93adc1e3 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: set attributes when creating symlinks

The NFS protocol includes attributes when creating symlinks.
Linux does store attributes for symlinks and allows them to be set,
though they are not used for permission checking.

NFSD currently doesn't set standard (struct iattr) attributes when
creating symlinks, but for NFSv4 it does set ACLs and security labels.
This is inconsistent.

To improve consistency, pass the provided attributes into nfsd_symlink()
and call nfsd_create_setattr() to set them.

NOTE: this results in a behaviour change for all NFS versions when the
client sends non-default attributes with a SYMLINK request. With the
Linux client, the only attributes are:
attr.ia_mode = S_IFLNK | S_IRWXUGO;
attr.ia_valid = ATTR_MODE;
so the final outcome will be unchanged. Other clients might sent
different attributes, and if they did they probably expect them to be
honoured.

We ignore any error from nfsd_create_setattr(). It isn't really clear
what should be done if a file is successfully created, but the
attributes cannot be set. NFS doesn't allow partial success to be
reported. Reporting failure is probably more misleading than reporting
success, so the status is ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7fe2a71d 26-Jul-2022 NeilBrown <neilb@suse.de>

NFSD: introduce struct nfsd_attrs

The attributes that nfsd might want to set on a file include 'struct
iattr' as well as an ACL and security label.
The latter two are passed around quite separately from the first, in
part because they are only needed for NFSv4. This leads to some
clumsiness in the code, such as the attributes NOT being set in
nfsd_create_setattr().

We need to keep the directory locked until all attributes are set to
ensure the file is never visibile without all its attributes. This need
combined with the inconsistent handling of attributes leads to more
clumsiness.

As a first step towards tidying this up, introduce 'struct nfsd_attrs'.
This is passed (by reference) to vfs.c functions that work with
attributes, and is assembled by the various nfs*proc functions which
call them. As yet only iattr is included, but future patches will
expand this.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 876c553c 26-Jul-2022 Jeff Layton <jlayton@kernel.org>

NFSD: verify the opened dentry after setting a delegation

Between opening a file and setting a delegation on it, someone could
rename or unlink the dentry. If this happens, we do not want to grant a
delegation on the open.

On a CLAIM_NULL open, we're opening by filename, and we may (in the
non-create case) or may not (in the create case) be holding i_rwsem
when attempting to set a delegation. The latter case allows a
race.

After getting a lease, redo the lookup of the file being opened and
validate that the resulting dentry matches the one in the open file
description.

To properly redo the lookup we need an rqst pointer to pass to
nfsd_lookup_dentry(), so make sure that is available.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a11ada99 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Move copy offload callback arguments into a separate structure

Refactor so that CB_OFFLOAD arguments can be passed without
allocating a whole struct nfsd4_copy object. On my system (x86_64)
this removes another 96 bytes from struct nfsd4_copy.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e72f9bc0 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_send_cb_offload()

Refactor for legibility.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ad1e46c9 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove kmalloc from nfsd4_do_async_copy()

Instead of manufacturing a phony struct nfsd_file, pass the
struct file returned by nfs42_ssc_open() directly to
nfsd4_do_copy().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3b7bf593 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor nfsd4_do_copy()

Refactor: Now that nfsd4_do_copy() no longer calls the cleanup
helpers, plumb the use of struct file pointers all the way down to
_nfsd_copy_file_range().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 478ed7b1 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)

Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A
subsequent patch will modify one of the new call sites to avoid
the need to manufacture the phony struct nfsd_file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 24d796ea 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)

The @src parameter is sometimes a pointer to a struct nfsd_file and
sometimes a pointer to struct file hiding in a phony struct
nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter
is always an explicit struct file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1913cdf5 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace boolean fields in struct nfsd4_copy

Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
with an atomic bitop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8ea6e2c9 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Make nfs4_put_copy() static

Clean up: All call sites are in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 87689df6 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Shrink size of struct nfsd4_copy

struct nfsd4_copy is part of struct nfsd4_op, which resides in an
8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 1696, cachelines: 27, members: 5 */
After: /* size: 672, cachelines: 11, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 09426ef2 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Shrink size of struct nfsd4_copy_notify

struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
in an 8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 2208, cachelines: 35, members: 5 */
After: /* size: 1696, cachelines: 27, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 53048779 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix strncpy() fortify warning

In function ‘strncpy’,
inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3,
inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11:
/home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation]
52 | #define __underlying_strncpy __builtin_strncpy
| ^
/home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’
89 | return __underlying_strncpy(p, q, size);
| ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ca3f9acb 08-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Demote a WARN to a pr_warn()

The call trace doesn't add much value, but it sure is noisy.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f532c9ff 23-Jun-2022 Zhang Jiaming <jiaming@nfschina.com>

NFSD: Fix space and spelling mistake

Add a blank space after ','.
Change 'succesful' to 'successful'.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 28df0988 29-Apr-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Use RMW bitops in single-threaded hot paths

I noticed CPU pipeline stalls while using perf.

Once an svc thread is scheduled and executing an RPC, no other
processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are
not needed outside the svc thread scheduler.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7e2ce0cc 23-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Move documenting comment for nfsd4_process_open2()

Clean up nfsd4_open() by converting a large comment at the only
call site for nfsd4_process_open2() to a kerneldoc comment in
front of that function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 26320d7e 21-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix whitespace

Clean up: Pull case arms back one tab stop to conform every other
switch statement in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f67a16b1 30-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove dprintk call sites from tail of nfsd4_open()

Clean up: These relics are not likely to benefit server
administrators.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fb70bf12 30-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Instantiate a struct file when creating a regular NFSv4 file

There have been reports of races that cause NFSv4 OPEN(CREATE) to
return an error even though the requested file was created. NFSv4
does not provide a status code for this case.

To mitigate some of these problems, reorganize the NFSv4
OPEN(CREATE) logic to allocate resources before the file is actually
created, and open the new file while the parent directory is still
locked.

Two new APIs are added:

+ Add an API that works like nfsd_file_acquire() but does not open
the underlying file. The OPEN(CREATE) path can use this API when it
already has an open file.

+ Add an API that is kin to dentry_open(). NFSD needs to create a
file and grab an open "struct file *" atomically. The
alloc_empty_file() has to be done before the inode create. If it
fails (for example, because the NFS server has exceeded its
max_files limit), we avoid creating the file and can still return
an error to the NFS client.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: JianHong Yin <jiyin@redhat.com>


# 254454a5 28-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor NFSv4 OPEN(CREATE)

Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6260d9a5 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clamp WRITE offsets

Ensure that a client cannot specify a WRITE range that falls in a
byte range outside what the kernel's internal types (such as loff_t,
which is signed) can represent. The kiocb iterators, invoked in
nfsd_vfs_write(), should properly limit write operations to within
the underlying file system's s_maxbytes.

Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0cb4d23a 04-Feb-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix the behavior of READ near OFFSET_MAX

Dan Aloni reports:
> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
> the RPC read layers") on the client, a read of 0xfff is aligned up
> to server rsize of 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server
> and it returns an NFS code of EINVAL to the client. The client as
> a result indefinitely retries the request.

The Linux NFS client does not handle NFS?ERR_INVAL, even though all
NFS specifications permit servers to return that status code for a
READ.

Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed
and return a short result. Set the EOF flag in the result to prevent
the client from retrying the READ request. This behavior appears to
be consistent with Solaris NFS servers.

Note that NFSv3 and NFSv4 use u64 offset values on the wire. These
must be converted to loff_t internally before use -- an implicit
type cast is not adequate for this purpose. Otherwise VFS checks
against sb->s_maxbytes do not work properly.

Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fcb5e3fa 24-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Move fill_pre_wcc() and fill_post_wcc()

These functions are related to file handle processing and have
nothing to do with XDR encoding or decoding. Also they are no longer
NFSv3-specific. As a clean-up, move their definitions to a more
appropriate location. WCC is also an NFSv3-specific term, so rename
them as general-purpose helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3988a578 30-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename boot verifier functions

Clean up: These functions handle what the specs call a write
verifier, which in the Linux NFS server implementation is now
divorced from the server's boot instance

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a2f4c3fa 18-Dec-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()

Since a clone error commit can cause the boot verifier to change,
we should trace those errors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ]


# 555dbf1a 18-Dec-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Replace use of rwsem with errseq_t

The nfsd_file nf_rwsem is currently being used to separate file write
and commit instances to ensure that we catch errors and apply them to
the correct write/commit.
We can improve scalability at the expense of a little accuracy (some
extra false positives) by replacing the nf_rwsem with more careful
use of the errseq_t mechanism to track errors across the different
operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[ cel: rebased on zero-verifier fix ]


# c2f1c4bd 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix sparse warning

/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: expected restricted __be32 [usertype] status
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: got int

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3b0ebb25 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Save location of NFSv4 COMPOUND status

Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS
dispatcher to pass in the buffer location of the COMPOUND status.
Instead, save that buffer location in struct nfsd4_compoundres.

The compound tag follows immediately after.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dae9a6ca 30-Sep-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()

Refactor.

Now that the NFSv2 and NFSv3 XDR decoders have been converted to
use xdr_streams, the WRITE decoder functions can use
xdr_stream_subsegment() to extract the WRITE payload into its own
xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.

That makes it possible to pass the first kvec, pages array + length,
page_base, and total payload length via a single function parameter.

The payload's page_base is not yet assigned or used, but will be in
subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8e70bf27 25-Sep-2021 Colin Ian King <colin.king@intel.com>

NFSD: Initialize pointer ni with NULL and not plain integer 0

Pointer ni is being initialized with plain integer zero. Fix
this by initializing with NULL.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d8b26071 01-Sep-2021 NeilBrown <neilb@suse.de>

NFSD: simplify struct nfsfh

Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
struct) and are accessed using macros like:

#define fh_FOO fh_base.fh_new.fb_FOO

This patch makes the union and struct anonymous, so that "fh_FOO" can be
a name directly within 'struct knfsd_fh' and the #defines aren't needed.

The file handle as a whole is sometimes accessed as "fh_base" or
"fh_base.fh_pad", neither of which are particularly helpful names.
As the struct holding the filehandle is now anonymous, we
cannot use the name of that, so we union it with 'fh_raw' and use that
where the raw filehandle is needed. fh_raw also ensure the structure is
large enough for the largest possible filehandle.

fh_raw is a 'char' array, removing any need to cast it for memcpy etc.

SVCFH_fmt() is simplified using the "%ph" printk format. This
changes the appearance of filehandles in dprintk() debugging, making
them a little more precise.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e34c0ce9 13-May-2021 Colin Ian King <colin.king@canonical.com>

nfsd: remove redundant assignment to pointer 'this'

The pointer 'this' is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 54185267 04-Jun-2021 Wei Yongjun <weiyongjun1@huawei.com>

NFSD: Fix error return code in nfsd4_interssc_connect()

'status' has been overwritten to 0 after nfsd4_ssc_setup_dul(), this
cause 0 will be return in vfs_kern_mount() error case. Fix to return
nfserr_nodev in this error.

Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f47dc2d3 03-Jun-2021 Dai Ngo <dai.ngo@oracle.com>

nfsd: fix kernel test robot warning in SSC code

Fix by initializing pointer nfsd4_ssc_umount_item with NULL instead of 0.
Replace return value of nfsd4_ssc_setup_dul with __be32 instead of int.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f4e44b39 21-May-2021 Dai Ngo <dai.ngo@oracle.com>

NFSD: delay unmount source's export after inter-server copy completed.

Currently the source's export is mounted and unmounted on every
inter-server copy operation. This patch is an enhancement to delay
the unmount of the source export for a certain period of time to
eliminate the mount and unmount overhead on subsequent copy operations.

After a copy operation completes, a work entry is added to the
delayed unmount list with an expiration time. This list is serviced
by the laundromat thread to unmount the export of the expired entries.
Each time the export is being used again, its expiration time is
extended and the entry is re-inserted to the tail of the list.

The unmount task and the mount operation of the copy request are
synced to make sure the export is not unmounted while it's being
used.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# eac0b17a 19-May-2021 Olga Kornievskaia <kolga@netapp.com>

NFSD add vfs_fsync after async copy is done

Currently, the server does all copies as NFS_UNSTABLE. For synchronous
copies linux client will append a COMMIT to the COPY compound but for
async copies it does not (because COMMIT needs to be done after all
bytes are copied and not as a reply to the COPY operation).

However, in order to save the client doing a COMMIT as a separate
rpc, the server can reply back with NFS_FILE_SYNC copy. This patch
proposed to add vfs_fsync() call at the end of the async copy.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87512386 14-May-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add an nfsd_cb_offload tracepoint

Record the arguments of CB_OFFLOAD callbacks so we can better
observe asynchronous copy-offload behavior. For example:

nfsd-995 [008] 7721.934222: nfsd_cb_offload:
addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8739113a
count=116528 status=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fa60ce2c 06-May-2021 Masahiro Yamada <masahiroy@kernel.org>

treewide: remove editor modelines and cruft

The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e739b120 30-Mar-2021 Olga Kornievskaia <kolga@netapp.com>

NFSv4.2: fix copy stateid copying for the async copy

This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>


# e7a833e9 18-Mar-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: don't ignore high bits of copy count

Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 792a5112 18-Mar-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: COPY with length 0 should copy to end of file

>From https://tools.ietf.org/html/rfc7862#page-65

A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bddfdbcd 27-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 614c9750 09-Mar-2021 Olga Kornievskaia <kolga@netapp.com>

NFSD: fix dest to src mount in inter-server COPY

A cleanup of the inter SSC copy needs to call fput() of the source
file handle to make sure that file structure is freed as well as
drop the reference on the superblock to unmount the source server.

Fixes: 36e1e5ba90fb ("NFSD: Fix use-after-free warning when doing inter-server copy")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Dai Ngo <dai.ngo@oracle.com>


# ec59659b 21-Jan-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: cstate->session->se_client -> cstate->clp

I'm not sure why we're writing this out the hard way in so many places.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1722b046 21-Jan-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: simplify nfsd4_check_open_reclaim

The set_client() was already taken care of by process_open1().

The comments here are mostly redundant with the code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e567b98c 06-Jan-2021 Amir Goldstein <amir73il@gmail.com>

nfsd: protect concurrent access to nfsd stats counters

nfsd stats counters can be updated by concurrent nfsd threads without any
protection.

Convert some nfsd_stats and nfsd_net struct members to use percpu counters.

The longest_chain* members of struct nfsd_net remain unprotected.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2289e87b 17-Sep-2020 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Make trace_svc_process() display the RPC procedure symbolically

The next few patches will employ these strings to help make server-
side trace logs more human-readable. A similar technique is already
in use in kernel RPC client code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d6c9e436 17-Dec-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix sparse warning in nfssvc.c

fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static?

The parameter was added by commit ce0887ac96d3 ("NFSD add nfs4 inter
ssc to nfsd4_copy"). Relocate it into the source file that uses it,
and make it static. This approach is similar to the
nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable
module parameters.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# eb162e17 30-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix sparse warning in nfs4proc.c

linux/fs/nfsd/nfs4proc.c:1542:24: warning: incorrect type in assignment (different base types)
linux/fs/nfsd/nfs4proc.c:1542:24: expected restricted __be32 [assigned] [usertype] status
linux/fs/nfsd/nfs4proc.c:1542:24: got int

Clean-up: The dup_copy_fields() function returns only zero, so make
it return void for now, and get rid of the return code check.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3a237b4a 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Make nfsd4_ops::opnum a u32

Avoid passing a "pointer to int" argument to xdr_stream_decode_u32.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1708e50b 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to decode OPEN's open_claim4 argument

Refactor for clarity.

Note that op_fname is the only instance of an NFSv4 filename stored
in a struct xdr_netobj. Convert it to a u32/char * pair so that the
new nfsd4_decode_filename() helper can be used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c1346a12 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace the internals of the READ_BUF() macro

Convert the READ_BUF macro in nfs4xdr.c from open code to instead
use the new xdr_stream-style decoders already in use by the encode
side (and by the in-kernel NFS client implementation). Once this
conversion is done, each individual NFSv4 argument decoder can be
independently cleaned up to replace these macros with C code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 788f7183 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add common helpers to decode void args and encode void results

Start off the conversion to xdr_stream by de-duplicating the functions
that decode void arguments and encode void results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0ae4c3e8 11-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer()

Clean up: De-duplicate some frequently-used code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 49a36132 29-Oct-2020 Dai Ngo <dai.ngo@oracle.com>

NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy

Need to initialize nfsd4_copy's refcount to 1 to avoid use-after-free
warning when nfs4_put_copy is called from nfsd4_cb_offload_release.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 36e1e5ba 29-Oct-2020 Dai Ngo <dai.ngo@oracle.com>

NFSD: Fix use-after-free warning when doing inter-server copy

The source file nfsd_file is not constructed the same as other
nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
called to free the object; nfsd_file_put is not the inverse of
kzalloc, instead kfree is called by nfsd4_do_async_copy when done.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0cfcd405 18-Oct-2020 Dai Ngo <dai.ngo@oracle.com>

NFSv4.2: Fix NFS4ERR_STALE error when doing inter server copy

NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have
build errors and some configs with NFSD=m to get NFS4ERR_STALE
error when doing inter server copy.

Added ops table in nfs_common for knfsd to access NFS client modules.

Fixes: 3ac3711adb88 ("NFSD: Fix NFS server build errors")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 528b8493 28-Sep-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Add READ_PLUS data support

This patch adds READ_PLUS support for returning a single
NFS4_CONTENT_DATA segment to the client. This is basically the same as
the READ operation, only with the extra information about data segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cc028a10 02-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Hoist status code encoding into XDR encoder functions

The original intent was presumably to reduce code duplication. The
trade-off was:

- No support for an NFSD proc function returning a non-success
RPC accept_stat value.
- No support for void NFS replies to non-NULL procedures.
- Everyone pays for the deduplication with a few extra conditional
branches in a hot path.

In addition, nfsd_dispatch() leaves *statp uninitialized in the
success path, unlike svc_generic_dispatch().

Address all of these problems by moving the logic for encoding
the NFS status code into the NFS XDR encoders themselves. Then
update the NFS .pc_func methods to return an RPC accept_stat
value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcc46991 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Encoder and decoder functions are always present

nfsd_dispatch() is a hot path. Let's optimize the XDR method calls
for the by-far common case, which is that the XDR methods are indeed
present.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 44b49aa6 10-Sep-2020 Zheng Bin <zhengbin13@huawei.com>

nfsd: fix comparison to bool warning

Fixes coccicheck warning:

fs/nfsd/nfs4proc.c:3234:5-29: WARNING: Comparison to bool

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 23e50fe3 23-Jun-2020 Frank van der Linden <fllinden@amazon.com>

nfsd: implement the xattr functions and en/decode logic

Implement the main entry points for the *XATTR operations.

Add functions to calculate the reply size for the user extended attribute
operations, and implement the XDR encode / decode logic for these
operations.

Add the user extended attributes operations to nfsd4_ops.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c11d7fd1 23-Jun-2020 Frank van der Linden <fllinden@amazon.com>

nfsd: take xattr bits into account for permission checks

Since the NFSv4.2 extended attributes extension defines 3 new access
bits for xattr operations, take them in to account when validating
what the client is asking for, and when checking permissions.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f2453978 06-Apr-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix improperly-formatted Doxygen comments

fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'file' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'buf' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:256: warning: Function parameter or member 'size' not described in 'write_unlock_ip'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'file' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'buf' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:295: warning: Function parameter or member 'size' not described in 'write_unlock_fs'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'file' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'buf' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:352: warning: Function parameter or member 'size' not described in 'write_filehandle'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'file' not described in 'write_threads'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'buf' not described in 'write_threads'
fs/nfsd/nfsctl.c:434: warning: Function parameter or member 'size' not described in 'write_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'file' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'buf' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:478: warning: Function parameter or member 'size' not described in 'write_pool_threads'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'file' not described in 'write_versions'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'buf' not described in 'write_versions'
fs/nfsd/nfsctl.c:697: warning: Function parameter or member 'size' not described in 'write_versions'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'file' not described in 'write_ports'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'buf' not described in 'write_ports'
fs/nfsd/nfsctl.c:858: warning: Function parameter or member 'size' not described in 'write_ports'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'file' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'buf' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:892: warning: Function parameter or member 'size' not described in 'write_maxblksize'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'file' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'buf' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:941: warning: Function parameter or member 'size' not described in 'write_maxconn'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'file' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'buf' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1023: warning: Function parameter or member 'size' not described in 'write_leasetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'file' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'buf' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1039: warning: Function parameter or member 'size' not described in 'write_gracetime'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'file' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'buf' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1094: warning: Function parameter or member 'size' not described in 'write_recoverydir'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'file' not described in 'write_v4_end_grace'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'buf' not described in 'write_v4_end_grace'
fs/nfsd/nfsctl.c:1125: warning: Function parameter or member 'size' not described in 'write_v4_end_grace'

fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'nss' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'rqstp' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1164: warning: Function parameter or member 'mount' not described in 'nfsd4_interssc_connect'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'rqstp' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'cstate' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'copy' not described in 'nfsd4_setup_inter_ssc'
fs/nfsd/nfs4proc.c:1262: warning: Function parameter or member 'mount' not described in 'nfsd4_setup_inter_ssc'

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 28df3d15 28-Jul-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: clients don't need to break their own delegations

We currently revoke read delegations on any write open or any operation
that modifies file data or metadata (including rename, link, and
unlink). But if the delegation in question is the only read delegation
and is held by the client performing the operation, that's not really
necessary.

It's not always possible to prevent this in the NFSv4.0 case, because
there's not always a way to determine which client an NFSv4.0 delegation
came from. (In theory we could try to guess this from the transport
layer, e.g., by assuming all traffic on a given TCP connection comes
from the same client. But that's not really correct.)

In the NFSv4.1 case the session layer always tells us the client.

This patch should remove such self-conflicts in all cases where we can
reliably determine the client from the compound.

To do that we need to track "who" is performing a given (possibly
lease-breaking) file operation. We're doing that by storing the
information in the svc_rqst and using kthread_data() to map the current
task back to a svc_rqst.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 91fd3c3e 13-Jan-2020 Dan Carpenter <dan.carpenter@oracle.com>

nfsd4: fix double free in nfsd4_do_async_copy()

This frees "copy->nf_src" before and again after the goto.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 19e0663f 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Ensure sampling of the write verifier is atomic with the write

When doing an unstable write, we need to ensure that we sample the
write verifier before releasing the lock, and allowing a commit to
the same file to proceed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 524ff1af 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Ensure sampling of the commit verifier is atomic with the commit

When we have a successful commit, ensure we sample the commit verifier
before releasing the lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b66ae6dd 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Pass the nfsd_file as arguments to nfsd4_clone_file_range()

Needed in order to fix exclusion w.r.t. writes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 16f8f894 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument

Needed in order to fix stable writes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 500c2481 24-Dec-2019 zhengbin <zhengbin13@huawei.com>

nfsd: use true,false for bool variable in nfs4proc.c

Fixes coccicheck warning:

fs/nfsd/nfs4proc.c:235:1-18: WARNING: Assignment of 0/1 to bool variable
fs/nfsd/nfs4proc.c:368:1-17: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2a1aa489 03-Nov-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: pass a 64-bit guardtime to nfsd_setattr()

Guardtime handling in nfs3 differs between 32-bit and 64-bit
architectures, and uses the deprecated time_t type.

Change it to using time64_t, which behaves the same way on
64-bit and 32-bit architectures, treating the number as an
unsigned 32-bit entity with a range of year 1970 to 2106
consistently, and avoiding the y2038 overflow.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d781e3df 06-Dec-2019 J. Bruce Fields <bfields@redhat.com>

nfsd4: avoid NULL deference on strange COPY compounds

With cross-server COPY we've introduced the possibility that the current
or saved filehandle might not have fh_dentry/fh_export filled in, but we
missed a place that assumed it was. I think this could be triggered by
a compound like:

PUTFH(foreign filehandle)
GETATTR
SAVEFH
COPY

First, check_if_stalefh_allowed sets no_verify on the first (PUTFH) op.
Then op_func = nfsd4_putfh runs and leaves current_fh->fh_export NULL.
need_wrongsec_check returns true, since this PUTFH has OP_IS_PUTFH_LIKE
set and GETATTR does not have OP_HANDLES_WRONGSEC set.

We should probably also consider tightening the checks in
check_if_stalefh_allowed and double-checking that we don't assume the
filehandle is verified elsewhere in the compound. But I think this
fixes the immediate issue.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 4e48f1cccab3 "NFSD: allow inter server COPY to have... "
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2e577f0f 04-Dec-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD fixing possible null pointer derefering in copy offload

Static checker revealed possible error path leading to possible
NULL pointer dereferencing.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5805a: ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b8290ca2 04-Dec-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD fix nfserro errno mismatch

There is mismatch between __be32 and u32 in nfserr and errno.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: d5e54eeb0e3d ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3f9544ca 04-Dec-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD: fix seqid in copy stateid

s_stid->si_generation is a u32, copy->stateid.seqid is a __be32, so we
should be byte-swapping here if necessary.

This effectively undoes the byte-swap performed when reading
s_stid->s_generation in nfsd4_decode_copy(). Without this second swap,
the stateid we sent to the source in READ could be different from the
one the client provided us in the COPY. We didn't spot this in testing
since our implementation always uses a 0 in the seqid field. But other
implementations might not do that.

You'd think we should just skip the byte-swapping entirely, but the
s_stid field can be used for either our own stateids (in the
intra-server case) or foreign stateids (in the inter-server case), and
the former are interpreted by us and need byte-swapping.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: d5e54eeb0e3d ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ce0887ac 09-Oct-2019 Olga Kornievskaia <kolga@netapp.com>

NFSD add nfs4 inter ssc to nfsd4_copy

Given a universal address, mount the source server from the destination
server. Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enable".

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# b9e8638e 07-Oct-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD: allow inter server COPY to have a STALE source server fh

The inter server to server COPY source server filehandle
is a foreign filehandle as the COPY is sent to the destination
server.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 51100d2b 13-Sep-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD generalize nfsd4_compound_state flag names

Allow for sid_flag field non-stateid use.

Signed-off-by: Andy Adamson <andros@netapp.com>


# 624322f1 04-Oct-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD add COPY_NOTIFY operation

Introducing the COPY_NOTIFY operation.

Create a new unique stateid that will keep track of the copy
state and the upcoming READs that will use that stateid.
Each associated parent stateid has a list of copy
notify stateids. A copy notify structure makes a copy of
the parent stateid and a clientid and will use it to look
up the parent stateid during the READ request (suggested
by Trond Myklebust <trond.myklebust@hammerspace.com>).

At nfs4_put_stid() time, we walk the list of the associated
copy notify stateids and delete them.

Laundromat thread will traverse globally stored copy notify
stateid in idr and notice if any haven't been referenced in the
lease period, if so, it'll remove them.

Return single netaddr to advertise to the copy.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>


# 51911868 08-Aug-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD COPY_NOTIFY xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 18f428d4 04-Dec-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD fixing possible null pointer derefering in copy offload

Static checker revealed possible error path leading to possible
NULL pointer dereferencing.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5805a: ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a25e3726 27-Nov-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Ensure CLONE persists data and metadata changes to the target file

The NFSv4.2 CLONE operation has implicit persistence requirements on the
target file, since there is no protocol requirement that the client issue
a separate operation to persist data.
For that reason, we should call vfs_fsync_range() on the destination file
after a successful call to vfs_clone_file_range().

Fixes: ffa0160a1039 ("nfsd: implement the NFSv4.2 CLONE operation")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 27c438f5 02-Sep-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Support the server resetting the boot verifier

Add support to allow the server to reset the boot verifier in order to
force clients to resend I/O after a timeout failure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5c4583b2 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e333f3bb 09-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Allow containers to set supported nfs versions

Support use of the --nfs-version/--no-nfs-version arguments to rpc.nfsd
in containers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0a4c9265 23-Jan-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>


# 03b31f48 04-Dec-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD remove OP_CACHEME from 4.2 op_flags

OP_CACHEME is only for the 4.0 operations.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0d4d6720 06-Nov-2018 J. Bruce Fields <bfields@redhat.com>

nfsd4: skip unused assignment

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f8f71d00 06-Nov-2018 J. Bruce Fields <bfields@redhat.com>

nfsd4: forbid all renames during grace period

The idea here was that renaming a file on a nosubtreecheck export would
make lookups of the old filehandle return STALE, making it impossible
for clients to reclaim opens.

But during the grace period I think we should also hold off on
operations that would break delegations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fdec6114 15-Nov-2018 J. Bruce Fields <bfields@redhat.com>

nfsd4: zero-length WRITE should succeed

Zero-length writes are legal; from 5661 section 18.32.3: "If the count
is zero, the WRITE will succeed and return a count of zero subject to
permissions checking".

This check is unnecessary and is causing zero-length reads to return
EINVAL.

Cc: stable@vger.kernel.org
Fixes: 3fd9557aec91 "NFSD: Refactor the generic write vector fill helper"
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 01310bb7 08-Nov-2018 Scott Mayhew <smayhew@redhat.com>

nfsd: COPY and CLONE operations require the saved filehandle to be set

Make sure we have a saved filehandle, otherwise we'll oops with a null
pointer dereference in nfs4_preprocess_stateid_op().

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e0639dc5 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD introduce async copy feature

Upon receiving a request for async copy, create a new kthread. If we
get asynchronous request, make sure to copy the needed arguments/state
from the stack before starting the copy. Then start the thread and reply
back to the client indicating copy is asynchronous.

nfsd_copy_file_range() will copy in a loop over the total number of
bytes is needed to copy. In case a failure happens in the middle, we
ignore the error and return how much we copied so far. Once done
creating a workitem for the callback workqueue and send CB_OFFLOAD with
the results.

The lifetime of the copy stateid is bound to the vfs copy. This way we
don't need to keep the nfsd_net structure for the callback. We could
keep it around longer so that an OFFLOAD_STATUS that came late would
still get results, but clients should be able to deal without that.

We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
calling kthread_stop.

A client should cancel any ongoing copies before calling DESTROY_CLIENT;
if not, we return a CLIENT_BUSY error.

If the client is destroyed for some other reason (lease expiration, or
server shutdown), we must clean up any ongoing copies ourselves.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[colin.king@canonical.com: fix leak in error case]
[bfields@fieldses.org: remove signalling, merge patches]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 885e2bf3 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD OFFLOAD_CANCEL xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6308bc98 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD OFFLOAD_STATUS xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3fd9557a 27-Jul-2018 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor the generic write vector fill helper

fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.

svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.

The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5b7b15ae 13-Jun-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: fix corrupted reply to badly ordered compound

We're encoding a single op in the reply but leaving the number of ops
zero, so the reply makes no sense.

Somewhat academic as this isn't a case any real client will hit, though
in theory perhaps that could change in a future protocol extension.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7a04cfda 13-Jun-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: clarify check_op_ordering

Document a couple things that confused me on a recent reading.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 03f318ca 07-Jun-2018 J. Bruce Fields <bfields@redhat.com>

nfsd4: extend reclaim period for reclaiming clients

If the client is only renewing state a little sooner than once a lease
period, then it might not discover the server has restarted till close
to the end of the grace period, and might run out of time to do the
actual reclaim.

Extend the grace period by a second each time we notice there are
clients still trying to reclaim, up to a limit of another whole lease
period.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 880a3a53 21-Mar-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: fix incorrect umasks

We're neglecting to clear the umask after it's set, which can cause a
later unrelated rpc to (incorrectly) use the same umask if it happens to
be processed by the same thread.

There's a more subtle problem here too:

An NFSv4 compound request is decoded all in one pass before any
operations are executed.

Currently we're setting current->fs->umask at the time we decode the
compound. In theory a single compound could contain multiple creates
each setting a umask. In that case we'd end up using whichever umask
was passed in the *last* operation as the umask for all the creates,
whether that was correct or not.

So, we should just be saving the umask at decode time and waiting to set
it until we actually process the corresponding operation.

In practice it's unlikely any client would do multiple creates in a
single compound. And even if it did they'd likely be from the same
process (hence carry the same umask). So this is a little academic, but
we should get it right anyway.

Fixes: 47057abde515 (nfsd: add support for the umask attribute)
Cc: stable@vger.kernel.org
Reported-by: Lucash Stach <l.stach@pengutronix.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fff4080b 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Trace NFSv4 COMPOUND execution

This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87c5942e 28-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Add I/O trace points in the NFSv4 read proc

NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d890be15 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Add I/O trace points in the NFSv4 write path

NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f394b62b 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Add "nfsd_" to trace point names

Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# edcc8452 07-Mar-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: remove unsused "cp_consecutive" field

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0078117c 14-Nov-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: return RESOURCE not GARBAGE_ARGS on too many ops

A client that sends more than a hundred ops in a single compound
currently gets an rpc-level GARBAGE_ARGS error.

It would be more helpful to return NFS4ERR_RESOURCE, since that gives
the client a better idea how to recover (for example by splitting up the
compound into smaller compounds).

This is all a bit academic since we've never actually seen a reason for
clients to send such long compounds, but we may as well fix it.

While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
constant we already use in the 4.1 case, instead of hard-coding 100.
Chances anyone actually uses even 16 ops per compound are small enough
that I think there's a neglible risk or any regression.

This fixes pynfs test COMP6.

Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 66282ec1 19-Dec-2017 Benjamin Coddington <bcodding@redhat.com>

nfsd4: permit layoutget of executable-only files

Clients must be able to read a file in order to execute it, and for pNFS
that means the client needs to be able to perform a LAYOUTGET on the file.

This behavior for executable-only files was added for OPEN in commit
a043226bc140 "nfsd4: permit read opens of executable-only files".

This fixes up xfstests generic/126 on block/scsi layouts.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 256a89fa 18-Oct-2017 Arnd Bergmann <arnd@arndb.de>

nfds: avoid gettimeofday for nfssvc_boot time

do_gettimeofday() is deprecated and we should generally use time64_t
based functions instead.

In case of nfsd, all three users of nfssvc_boot only use the initial
time as a unique token, and are not affected by it overflowing, so they
are not affected by the y2038 overflow.

This converts the structure to timespec64 anyway and adds comments
to all uses, to document that we have thought about it and avoid
having to look at it again.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ec572b9e 29-Sep-2017 Eryu Guan <eguan@redhat.com>

nfsd4: define nfsd4_secinfo_no_name_release()

Commit 34b1744c91cc ("nfsd4: define ->op_release for compound ops")
defined a couple ->op_release functions and run them if necessary.

But there's a problem with that is that it reused
nfsd4_secinfo_release() as the op_release of OP_SECINFO_NO_NAME, and
caused a leak on struct nfsd4_secinfo_no_name in
nfsd4_encode_secinfo_no_name(), because there's no .si_exp field in
struct nfsd4_secinfo_no_name.

I found this because I was unable to umount an ext4 partition after
exporting it via NFS & run fsstress on the nfs mount. A simplified
reproducer would be:

# mount a local-fs device at /mnt/test, and export it via NFS with
# fsid=0 export option (this is required)
mount /dev/sda5 /mnt/test
echo "/mnt/test *(rw,no_root_squash,fsid=0)" >> /etc/exports
service nfs restart

# locally mount the nfs export with all default, note that I have
# nfsv4.1 configured as the default nfs version, because of the
# fsid export option, v4 mount would fail and fall back to v3
mount localhost:/mnt/test /mnt/nfs

# try to umount the underlying device, but got EBUSY
umount /mnt/nfs
service nfs stop
umount /mnt/test <=== EBUSY here

Fixed it by defining a separate nfsd4_secinfo_no_name_release()
function as the op_release method of OP_SECINFO_NO_NAME that
releases the correct nfsd4_secinfo_no_name structure.

Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops")
Signed-off-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95424460 15-Sep-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: remove unnecessary nofilehandle checks

These checks should have already be done centrally in
nfsd4_proc_compound, the checks in each individual operation are
unnecessary.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7571e4c 06-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: skip encoder in trivial error cases

Most encoders do nothing in the error case. But they can still screw
things up in that case: most errors happen very early in rpc processing,
possibly before argument fields are filled in and bounds-tested, so
encoders that do anything other than immediately bail on error can
easily crash in odd error cases.

So just handle errors centrally most of the time to remove the chance of
error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 34b1744c 05-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: define ->op_release for compound ops

Run a separate ->op_release function if necessary instead of depending
on the xdr encoder to do this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f4f9ef4a 06-Jul-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: opdesc will be useful outside nfs4proc.c

Trivial cleanup, no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0020939f 06-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: move some nfsd4 op definitions to xdr4.h

I want code in nfs4xdr.c to have access to this stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 800222f8 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: const-ify nfsd4_ops

nfsd4_ops contains function pointers, and marking it as constant avoids
it being able to be used as an attach vector for code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# aa8217d5 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_version instances as const

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# b9c744c1 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_procinfo instances as const

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 0becc118 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: move pc_count out of struct svc_procinfo

pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 72edc37a 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_func callbacks

Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.

It also adds two missing structures to struct nfsd4_op.u to facilitate
this.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 62bbf8bb 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: remove nfsd4op_rsize

Except for a lot of unnecessary casts this typedef only has one user,
so remove the casts and expand it in struct nfsd4_operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# c2a1102a 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_get_currentstateid callbacks

Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6c9600a7 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_set_currentstateid callbacks

Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# d16d1867 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# cc6acc20 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1c8a5409 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_func callbacks

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# ec7e8cae 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd: use named initializers in PROC()

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9a307403 22-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix null dereference on replay

if we receive a compound such that:

- the sessionid, slot, and sequence number in the SEQUENCE op
match a cached succesful reply with N ops, and
- the Nth operation of the compound is a PUTFH, PUTPUBFH,
PUTROOTFH, or RESTOREFH,

then nfsd4_sequence will return 0 and set cstate->status to
nfserr_replay_cache. The current filehandle will not be set. This will
cause us to call check_nfsd_access with first argument NULL.

To nfsd4_compound it looks like we just succesfully executed an
operation that set a filehandle, but the current filehandle is not set.

Fix this by moving the nfserr_replay_cache earlier. There was never any
reason to have it after the encode_op label, since the only case where
he hit that is when opdesc->op_func sets it.

Note that there are two ways we could hit this case:

- a client is resending a previously sent compound that ended
with one of the four PUTFH-like operations, or
- a client is sending a *new* compound that (incorrectly) shares
sessionid, slot, and sequence number with a previously sent
compound, and the length of the previously sent compound
happens to match the position of a PUTFH-like operation in the
new compound.

The second is obviously incorrect client behavior. The first is also
very strange--the only purpose of a PUTFH-like operation is to set the
current filehandle to be used by the following operation, so there's no
point in having it as the last in a compound.

So it's likely this requires a buggy or malicious client to reproduce.

Reported-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@kernel.vger.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bb2a8b0c 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: const-ify nfsd4_ops

nfsd4_ops contains function pointers, and marking it as constant avoids
it being able to be used as an attach vector for code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# e9679189 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_version instances as const

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 860bda29 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_procinfo instances as const

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 7fd38af9 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: move pc_count out of struct svc_procinfo

pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# eb69853d 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_func callbacks

Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.

It also adds two missing structures to struct nfsd4_op.u to facilitate
this.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1c122638 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: remove nfsd4op_rsize

Except for a lot of unnecessary casts this typedef only has one user,
so remove the casts and expand it in struct nfsd4_operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 57832e7b 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_get_currentstateid callbacks

Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# b60e9859 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd4: properly type op_set_currentstateid callbacks

Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 63f8de37 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 026fec7e 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# a6beb732 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_func callbacks

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# f7235b6b 08-May-2017 Christoph Hellwig <hch@lst.de>

nfsd: use named initializers in PROC()

Signed-off-by: Christoph Hellwig <hch@lst.de>


# b550a32e 05-May-2017 Ari Kauppi <ari@synopsys.com>

nfsd: fix undefined behavior in nfsd4_layout_verify

UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34
shift exponent 128 is too large for 32-bit type 'int'

Depending on compiler+architecture, this may cause the check for
layout_type to succeed for overly large values (which seems to be the
case with amd64). The large value will be later used in de-referencing
nfsd4_layout_ops for function pointers.

Reported-by: Jani Tuovila <tuovila@synopsys.com>
Signed-off-by: Ari Kauppi <ari@synopsys.com>
[colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32]
Cc: stable@vger.kernel.org
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 05b7278d 23-Mar-2017 Olga Kornievskaia <aglo@umich.edu>

nfsd: fix oops on unsupported operation

I'm hitting the BUG in nfsd4_max_reply() at fs/nfsd/nfs4proc.c:2495 when
client sends an operation the server doesn't support.

in nfsd4_max_reply() it checks for NULL rsize_bop but a non-supported
operation wouldn't have that set.

Cc: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 2282cd2c05e2 "NFSD: Get response size before operation..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5283b03e 24-Feb-2017 Jeff Layton <jlayton@kernel.org>

nfs/nfsd/sunrpc: enforce transport requirements for NFSv4

NFSv4 requires a transport "that is specified to avoid network
congestion" (RFC 7530, section 3.1, paragraph 2). In practical terms,
that means that you should not run NFSv4 over UDP. The server has never
enforced that requirement, however.

This patchset fixes this by adding a new flag to the svc_version that
states that it has these transport requirements. With that, we can check
that the transport has XPT_CONG_CTRL set before processing an RPC. If it
doesn't we reject it with RPC_PROG_MISMATCH.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 05a45a2d 24-Feb-2017 Jeff Layton <jlayton@kernel.org>

sunrpc: turn bitfield flags in svc_version into bools

It's just simpler to read this way, IMO. Also, no need to explicitly
set vs_hidden to false in the nfsacl ones.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2282cd2c 03-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Get response size before operation for all RPCs

NFSD usess PAGE_SIZE as the reply size estimate for RPCs which don't
support op_rsize_bop(), A PAGE_SIZE (4096) is larger than many real
response sizes, eg, access (op_encode_hdr_size + 2), seek
(op_encode_hdr_size + 3).

This patch just adds op_rsize_bop() for all RPCs getting response size.

An overestimate is generally safe but the tighter estimates are probably
better.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 32ddd944 02-Jan-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: opt in to labeled nfs per export

Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
individual file labels as they're set on the server. This is not what
they've previously seen, and not appropriate in may cases. (In
particular, if clients have heterogenous security policies then one
client's labels may not even make sense to another.) Labeled NFS should
be opted in only in those cases when the administrator knows it makes
sense.

It's helpful to be able to turn 4.2 on by default, and otherwise the
protocol upgrade seems free of regressions. So, default labeled NFS to
off and provide an export flag to reenable it.

Users wanting labeled NFS support on an export will henceforth need to:

- make sure 4.2 support is enabled on client and server (as
before), and
- upgrade the server nfs-utils to a version supporting the new
"security_label" export flag.
- set that "security_label" flag on the export.

This is commit may be seen as a regression to anyone currently depending
on security labels. We believe those cases are currently rare.

Reported-by: tibbs@math.uh.edu
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 54bbb7d2 31-Dec-2016 Kinglong Mee <kinglongmee@gmail.com>

NFSD: pass an integer for stable type to nfsd_vfs_write

After fae5096ad217 "nfsd: assume writeable exportabled filesystems have
f_sync" we no longer modify this argument.

This is just cleanup, no change in functionality.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 47057abd 12-Jan-2016 Andreas Gruenbacher <agruenba@redhat.com>

nfsd: add support for the umask attribute

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory. That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more
details.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 916d2d84 18-Oct-2016 J. Bruce Fields <bfields@redhat.com>

nfsd: clean up supported attribute handling

Minor cleanup, no change in behavior.

Provide helpers for some common attribute bitmap operations. Drop some
comments that just echo the code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 29ae7f9d 07-Sep-2016 Anna Schumaker <Anna.Schumaker@netapp.com>

NFSD: Implement the COPY call

I only implemented the sync version of this call, since it's the
easiest. I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fa08139d 21-Jul-2016 J. Bruce Fields <bfields@redhat.com>

nfsd: drop unnecessary MAY_EXEC check from create

We need an fh_verify to make sure we at least have a dentry, but actual
permission checks happen later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7eed34f1 14-Jul-2016 Oleg Drokin <green@linuxhacker.ru>

nfsd: Make creates return EEXIST instead of EACCES

When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8a4c3926 10-Jul-2016 Jeff Layton <jlayton@kernel.org>

nfsd: allow nfsd to advertise multiple layout types

If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.

Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.

Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d7c920d1 14-Jun-2016 Tom Haynes <thomas.haynes@primarydata.com>

nfsd: flex file device id encoding will need the server address

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed941643 14-Jun-2016 Andrew Elble <aweits@rit.edu>

nfsd: implement machine credential support for some operations

This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f99d4fbd 04-Mar-2016 Christoph Hellwig <hch@lst.de>

nfsd: add SCSI layout support

This is a simple extension to the block layout driver to use SCSI
persistent reservations for access control and fencing, as well as
SCSI VPD pages for device identification.

For this we need to pass the nfs4_client to the proc_getdeviceinfo method
to generate the reservation key, and add a new fence_client method
to allow for fence actions in the layout driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2f6fc056 02-Mar-2016 J. Bruce Fields <bfields@redhat.com>

nfsd: fix deadlock secinfo+readdir compound

nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.

A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0f1738a1 02-Mar-2016 J. Bruce Fields <bfields@redhat.com>

nfsd4: resfh unused in nfsd4_secinfo

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5955102c 22-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

wrappers for ->i_mutex access

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ffa0160a 02-Dec-2015 Christoph Hellwig <hch@lst.de>

nfsd: implement the NFSv4.2 CLONE operation

This is basically a remote version of the btrfs CLONE operation,
so the implementation is fairly trivial. Made even more trivial
by stealing the XDR code and general framework Anna Schumaker's
COPY prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aa0d6aed 02-Dec-2015 Anna Schumaker <Anna.Schumaker@netapp.com>

nfsd: Pass filehandle to nfs4_preprocess_stateid_op()

This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cc8a5532 17-Sep-2015 Jeff Layton <jlayton@kernel.org>

nfsd: serialize layout stateid morphing operations

In order to allow the client to make a sane determination of what
happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we
must ensure that the seqids return accurately represent the order of
operations. The simplest way to do that is to ensure that operations on
a single stateid are serialized.

This patch adds a mutex to the layout stateid, and locks it when
checking the layout stateid's seqid. The mutex is held over the entire
operation and released after the seqid is bumped.

Note that in the case of CB_LAYOUTRECALL we must move the increment of
the seqid and setting into a new cb "prepare" operation. The lease
infrastructure will call the lm_break callback with a spinlock held, so
and we can't take the mutex in that codepath.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ead8fb8c 30-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1

According to rfc5661 18.16.4,
"If EXCLUSIVE4_1 was used, the client determines the attributes
used for the verifier by comparing attrset with cva_attrs.attrmask;"

So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c87fb4a3 05-Aug-2015 J. Bruce Fields <bfields@redhat.com>

lockd: NLM grace period shouldn't block NFSv4 opens

NLM locks don't conflict with NFSv4 share reservations, so we're not
going to learn anything new by watiting for them.

They do conflict with NFSv4 locks and with delegations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6cd22668 13-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Remove unneeded values in nfsd4_open()

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d8398fc1 06-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Set lc_size_chg before ops->proc_layoutcommit

After proc_layoutcommit success, i_size_read(inode) always >= new_size.
Just set lc_size_chg before proc_layoutcommit, if proc_layoutcommit
failed, nfsd will skip the lc_size_chg, so it's no harm.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 96bcad50 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: fput rd_file from XDR encode context

Remove the hack where we fput the read-specific file in generic code.
Instead we can do it in nfsd4_encode_read as that gets called for all
error cases as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# af90f707 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: take struct file setup fully into nfs4_preprocess_stateid_op

This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 980608fb 21-Apr-2015 J. Bruce Fields <bfields@redhat.com>

nfsd4: disallow SEEK with special stateids

If the client uses a special stateid then we'll pass a NULL file to
vfs_llseek.

Fixes: 24bab491220f " NFSD: Implement SEEK"
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ba4a25a 03-Apr-2015 J. Bruce Fields <bfields@redhat.com>

nfsd4: disallow ALLOCATE with special stateids

vfs_fallocate will hit a NULL dereference if the client tries an
ALLOCATE or DEALLOCATE with a special stateid. Fix that. (We also
depend on the open to have broken any conflicting leases or delegations
for us.)

(If it turns out we need to allow special stateid's then we could do a
temporary open here in the special-stateid case, as we do for read and
write. For now I'm assuming it's not necessary.)

Fixes: 95d871f03cae "nfsd: Add ALLOCATE support"
Cc: stable@vger.kernel.org
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b0143b5 17-Mar-2015 David Howells <dhowells@redhat.com>

VFS: normal filesystems (and lustre): d_inode() annotations

that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1ec8c0c4 28-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Remove duplicate macro define for max sec label length

NFS4_MAXLABELLEN has defined for sec label max length, use it directly.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 42297899 23-Mar-2015 Jeff Layton <jlayton@kernel.org>

nfsd: remove unused status arg to nfsd4_cleanup_open_state

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# beaca234 15-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Use correct reply size calculating function

ALLOCATE/DEALLOCATE only reply one status value to client,
so, using nfsd4_only_status_rsize for reply size calculating.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a1420384 15-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Put exports after nfsd4_layout_verify fail

Fix commit 9cf514ccfa (nfsd: implement pNFS operations).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 31ef83dc 16-Aug-2014 Christoph Hellwig <hch@lst.de>

nfsd: add trace events

For now just a few simple events to trace the layout stateid lifetime, but
these already were enough to find several bugs in the Linux client layout
stateid handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# c5c707f9 22-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS layout recalls

Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9cf514cc 05-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS operations

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 779fb0f3 19-Nov-2014 Jeff Layton <jlayton@kernel.org>

sunrpc: move rq_splice_ok flag into rq_flags

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 30660e04b 19-Nov-2014 Jeff Layton <jlayton@kernel.org>

sunrpc: move rq_usedeferral flag to rq_flags

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0cb9085 07-Nov-2014 Anna Schumaker <Anna.Schumaker@Netapp.com>

nfsd: Add DEALLOCATE support

DEALLOCATE only returns a status value, meaning we can use the noop()
xdr encoder to reply to the client.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95d871f0 07-Nov-2014 Anna Schumaker <Anna.Schumaker@Netapp.com>

nfsd: Add ALLOCATE support

The ALLOCATE operation is used to preallocate space in a file. I can do
this by using vfs_fallocate() to do the actual preallocation.

ALLOCATE only returns a status indicator, so we don't need to write a
special encode() function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 51904b08 22-Oct-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix crash on unknown operation number

Unknown operation numbers are caught in nfsd4_decode_compound() which
sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
error causes the main loop in nfsd4_proc_compound() to skip most
processing. But nfsd4_proc_compound also peeks ahead at the next
operation in one case and doesn't take similar precautions there.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d1d84c96 21-Aug-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix response size estimation for OP_SEQUENCE

We added this new estimator function but forgot to hook it up. The
effect is that NFSv4.1 (and greater) won't do zero-copy reads.

The estimate was also wrong by 8 bytes.

Fixes: ccae70a9ee41 "nfsd4: estimate sequence response size"
Cc: stable@vger.kernel.org
Reported-by: Chuck Lever <chucklever@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 24bab491 26-Sep-2014 Anna Schumaker <Anna.Schumaker@netapp.com>

NFSD: Implement SEEK

This patch adds server support for the NFS v4.2 operation SEEK, which
returns the position of the next hole or data segment in a file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3234975f 30-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58fb12e6 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache

We don't want to rely on the client_mutex for protection in the case of
NFSv4 open owners. Instead, we add a mutex that will only be taken for
NFSv4.0 state mutating operations, and that will be released once the
entire compound is done.

Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay
take a reference to the stateowner when they are using it for NFSv4.0
open and lock replay caching.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b3fbfe0e 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: print status when nfsd4_open fails to open file it just created

It's possible for nfsd to fail opening a file that it has just created.
When that happens, we throw a WARN but it doesn't include any info about
the error code. Print the status code to give us a bit more info.

Our QA group hit some of these warnings under some very heavy stress
testing. My suspicion is that they hit the file-max limit, but it's hard
to know for sure. Go ahead and add a -ENFILE mapping to
nfserr_serverfault to make the error more distinct (and correct).

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0fe492db 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()

lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1e444f5b 01-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Remove iattr parameter from nfsd_symlink()

Commit db2e747b1499 (vfs: remove mode parameter from vfs_symlink())
have remove mode parameter from vfs_symlink.
So that, iattr isn't needed by nfsd_symlink now, just remove it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7fb84306 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: rename cr_linkname->cr_data

The name of a link is currently stored in cr_name and cr_namelen, and
the content in cr_linkname and cr_linklen. That's confusing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 52ee0433 20-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: let nfsd_symlink assume null-terminated data

Currently nfsd_symlink has a weird hack to serve callers who don't
null-terminate symlink data: it looks ahead at the next byte to see if
it's zero, and copies it to a new buffer to null-terminate if not.

That means callers don't have to null-terminate, but they *do* have to
ensure that the byte following the end of the data is theirs to read.

That's a bit subtle, and the NFSv4 code actually got this wrong.

So let's just throw out that code and let callers pass null-terminated
strings; we've already fixed them to do that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b829e919 19-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: fix rare symlink decoding bug

An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 76f47128 19-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: fix rare symlink decoding bug

An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f419992c 17-Jun-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add __force to opaque verifier field casts

sparse complains that we're stuffing non-byte-swapped values into
__be32's here. Since they're supposed to be opaque, it doesn't matter
much. Just add __force to make sparse happy.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bf18f163 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using exp_get for export getting

Don't using cache_get besides export.h, using exp_get for export.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f15a5cf9 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok

rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c7aa15d 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using min/max/min_t/max_t for calculate

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 05638dc7 01-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify server xdr->next_page use

The rpc code makes available to the NFS server an array of pages to
encod into. The server represents its reply as an xdr buf, with the
head pointing into the first page in that array, the pages ** array
starting just after that, and the tail (if any) sharing any leftover
space in the page used by the head.

While encoding, we use xdr_stream->page_ptr to keep track of which page
we're currently using.

Currently we set xdr_stream->page_ptr to buf->pages, which makes the
head a weird exception to the rule that page_ptr always points to the
page we're currently encoding into. So, instead set it to buf->pages -
1 (the page actually containing the head), and remove the need for a
little unintuitive logic in xdr_get_next_encode_buffer() and
xdr_truncate_encode.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7025005d 30-May-2014 Jeff Layton <jlayton@kernel.org>

nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound

The memset of resp in svc_process_common should ensure that these are
already zeroed by the time they get here.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ba5378b6 30-May-2014 Jeff Layton <jlayton@kernel.org>

nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open

In the NFS4_OPEN_CLAIM_PREVIOUS case, we should only mark it confirmed
if the nfs4_check_open_reclaim check succeeds.

In the NFS4_OPEN_CLAIM_DELEG_PREV_FH and NFS4_OPEN_CLAIM_DELEGATE_PREV
cases, I see no point in declaring the openowner confirmed when the
operation is going to fail anyway, and doing so might allow the client
to game things such that it wouldn't need to confirm a subsequent open
with the same owner.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a5cddc88 12-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: better reservation of head space for krb5

RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ccae70a9 23-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: estimate sequence response size

Otherwise a following patch would turn off all 4.1 zero-copy reads.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b86cef60 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: better estimate of getattr response size

We plan to use this estimate to decide whether or not to allow zero-copy
reads. Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 561f0ed4 20-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow large readdirs

Currently we limit readdir results to a single page. This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4f0cefbf 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: more precise nfsd4_max_reply

It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8c7424cf 09-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't try to encode conflicting owner if low on space

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long. In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2825a7f9 26-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow encoding across page boundaries

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a8095f7e 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: size-checking cleanup

Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ea8d7720 08-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove redundant encode buffer size checking

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d0a381dd 30-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: teach encoders to handle reserve_space failures

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
more protocol.
- BUG_ON or page faulting on failure seems overly fragile.
- Especially in the 4.1 case, we prefer not to fail compounds
just because the returned result came *close* to session
limits. (Though perfect enforcement here may be difficult.)
- I'd prefer encoding to be uniform for all encoders instead of
having special exceptions for encoders containing, for
example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ac90391 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: keep xdr buf length updated

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d3f627c8 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: use xdr_stream throughout compound encoding

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ddd1ea56 27-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: use xdr_reserve_space in attribute encoding

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres. However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far. We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 07d1f802 06-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix encoding of out-of-space replies

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1802a678 21-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: reserve head space for krb5 integ/priv info

Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client. But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good. Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2d124dfaa 15-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: move proc_compound xdr encode init to helper

Mechanical transformation with no change of behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5184658 26-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

Just change the nfsd4_encode_getattr api. Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4aea24b2 15-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: embed xdr_stream in nfsd4_compoundres

This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e372ba60 18-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: decoding errors can still be cached and require space

Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding. I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns. Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f34e432b 16-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix write reply size estimate

The write reply also includes count and stable_how.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 622f560e 16-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: read size estimate should include padding

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5b648699 07-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: READ, READDIR, etc., are idempotent

OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 14bcab1a 18-Apr-2014 Trond Myklebust <trond.myklebust@primarydata.com>

NFSd: Clean up nfs4_preprocess_stateid_op

Move the state locking and file descriptor reference out from the
callers and into nfs4_preprocess_stateid_op() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2336745e 28-Mar-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Clear wcc data between compound ops

Testing NFS4.0 by pynfs, I got some messeages as,
"nfsd: inode locked twice during operation."

When one compound RPC contains two or more ops that locks
the filehandle,the second op will cause the message.

As two SETATTR ops, after the first SETATTR, nfsd will not call
fh_put() to release current filehandle, it means filehandle have
unlocked with fh_post_saved = 1.
The second SETATTR find fh_post_saved = 1, and printk the message.

v2: introduce helper fh_clear_wcc().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 480efaee 10-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix setclientid encode size

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4daeed25 26-Mar-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: simplify saved/current fh uses in nfsd4_proc_compound

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4c69d585 28-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: session needs room for following op to error out

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4335723e 24-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix delegation-unlink/rename race

If a file is unlinked or renamed between the time when we do the local
open and the time when we get the delegation, then we will return to the
client indicating that it holds a delegation even though the file no
longer exists under the name it was open under.

But a client performing an open-by-name, when it is returned a
delegation, must be able to assume that the file is still linked at the
name it was opened under.

So, hold the parent i_mutex for longer to prevent concurrent renames or
unlinks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c0e6bee4 27-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: delay setting current_fh in open

This is basically a no-op, to simplify a following patch.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4ac7249e 20-Dec-2013 Christoph Hellwig <hch@infradead.org>

nfsd: use get_acl and ->set_acl

Remove the boilerplate code to marshall and unmarhall ACL objects into
xattrs and operate on the posix_acl objects directly. Also move all
the ACL handling code into nfs?acl.c where it belongs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 41ae6e71 21-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: better VERIFY comment

This confuses me every time.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7e55b59b 30-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

SUNRPC/NFSD: Support a new option for ignoring the result of svc_register

NFSv4 clients can contact port 2049 directly instead of needing the
portmapper.

Therefore a failure to register to the portmapper when starting an
NFSv4-only server isn't really a problem.

But Gareth Williams reports that an attempt to start an NFSv4-only
server without starting portmap fails:

#rpc.nfsd -N 2 -N 3
rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
rpc.nfsd: unable to set any sockets for nfsd

Add a flag to svc_version to tell the rpc layer it can safely ignore an
rpcbind failure in the NFSv4-only case.

Reported-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a8bb84bc 10-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: calculate the missing length of bitmap in EXCHANGE_ID

commit 58cd57bfd9db3bc213bf9d6a10920f82095f0114
"nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID"
miss calculating the length of bitmap for spo_must_enforce and spo_must_allow.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58cd57bf 05-Aug-2013 Weston Andros Adamson <dros@netapp.com>

nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID

- don't BUG_ON() when not SP4_NONE
- calculate recv and send reserve sizes correctly

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 35f7a14f 08-Jul-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix minorversion support interface

You can turn on or off support for minorversions using e.g.

echo "-4.2" >/proc/fs/nfsd/versions

However, the current implementation is a little wonky. For example, the
above will turn off 4.2 support, but it will also turn *on* 4.1 support.

This didn't matter as long as we only had 2 minorversions, which was
true till very recently.

And do a little cleanup here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 89f6c336 19-Jun-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: delegation-based open reclaims should bypass permissions

We saw a v4.0 client's create fail as follows:

- open create succeeds and gets a read delegation
- client attempts to set mode on new file, gets DELAY while
server recalls delegation.
- client attempts a CLAIM_DELEGATE_CUR open using the
delegation, gets error because of new file mode.

This probably can't happen on a recent kernel since we're no longer
giving out delegations on create opens. Nevertheless, it's a
bug--reclaim opens should bypass permission checks.

Reported-by: Steve Dickson <steved@redhat.com>
Reported-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 18032ca0 02-May-2013 David Quigley <dpquigl@davequigley.com>

NFSD: Server implementation of MAC Labeling

Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.

Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9f415eb2 03-May-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't allow owner override on 4.1 CLAIM_FH opens

The Linux client is using CLAIM_FH to implement regular opens, not just
recovery cases, so it depends on the server to check permissions
correctly.

Therefore the owner override, which may make sense in the delegation
recovery case, isn't right in the CLAIM_FH case.

Symptoms: on a client with 49f9a0fafd844c32f2abada047c0b9a5ba0d6255
"NFSv4.1: Enable open-by-filehandle", Bryan noticed this:

touch test.txt
chmod 000 test.txt
echo test > test.txt

succeeding.

Cc: stable@kernel.org
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2a6cf944 30-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't remap EISDIR errors in rename

We're going out of our way here to remap an error to make rfc 3530
happy--but the rfc itself (nor rfc 1813, which has similar language)
gives no justification. And disagrees with local filesystem behavior,
with Linux and posix man pages, and knfsd's implemented behavior for v2
and v3.

And the documented behavior seems better, in that it gives a little more
information--you could implement the 3530 behavior using the posix
behavior, but not the other way around.

Also, the Linux client makes no attempt to remap this error in the v4
case, so it can end up just returning EEXIST to the application in a
case where it should return EISDIR.

So honestly I think the rfc's are just buggy here--or in any case it
doesn't see worth the trouble to remap this error.

Reported-by: Frank S Filz <ffilz@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bbc9c36c 22-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: more sessions/open-owner-replay cleanup

More logic that's unnecessary in the 4.1 case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3d74e6a5 22-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: no need for replay_owner in sessions case

The replay_owner will never be used in the sessions case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9411b1d4 01-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup handling of nfsv4.0 closed stateid's

Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b022032e 01-Apr-2013 fanchaoting <fanchaoting@cn.fujitsu.com>

nfsd: don't run get_file if nfs4_preprocess_stateid_op return error

we should return error status directly when nfs4_preprocess_stateid_op
return error.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9d313b17 28-Feb-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: handle seqid-mutating open errors from xdr decoding

If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b600de7a 28-Feb-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove BUG_ON

This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 84822d0b 14-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify nfsd4_encode_fattr interface slightly

It seems slightly simpler to make nfsd4_encode_fattr rather than its
callers responsible for advancing the write pointer on success.

(Also: the count == 0 check in the verify case looks superfluous.
Running out of buffer space is really the only reason fattr encoding
should fail with eresource.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a1dc6955 17-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: free_stateid can use the current stateid

Cc: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9b3234b9 04-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: disable zero-copy on non-final read ops

To ensure ordering of read data with any following operations, turn off
zero copy if the read is not the final operation in the compound.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b9c0ef85 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make NFSd service boot time per-net

This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7007c90f 07-Dec-2012 Neil Brown <neilb@suse.de>

nfsd: avoid permission checks on EXCLUSIVE_CREATE replay

With NFSv4, if we create a file then open it we explicit avoid checking
the permissions on the file during the open because the fact that we
created it ensures we should be allow to open it (the create and the
open should appear to be a single operation).

However if the reply to an EXCLUSIVE create gets lots and the client
resends the create, the current code will perform the permission check -
because it doesn't realise that it did the open already..

This patch should fix this.

Note that I haven't actually seen this cause a problem. I was just
looking at the code trying to figure out a different EXCLUSIVE open
related issue, and this looked wrong.

(Fix confirmed with pynfs 4.0 test OPEN4--bfields)

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields: use OWNER_OVERRIDE and update for 4.1]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ffe1137b 15-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: delay filling in write iovec array till after xdr decoding

Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK. No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation. In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op. So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them. So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time. I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3320fef19 14-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: use service net instead of hard-coded init_net

This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cb73a9f4 01-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement backchannel_ctl operation

This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d15c077e 13-Sep-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: enforce per-client sessions/no-sessions distinction

Something like creating a client with setclientid and then trying to
confirm it with create_session may not crash the server, but I'm not
completely positive of that, and in any case it's obviously bad client
behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 24ff99c6 14-Aug-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Swap the struct nfs4_operation getter and setter

stateid_setter should be matched to op_set_currentstateid, rather than
op_get_currentstateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ccb0066 25-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

LockD: pass actual network namespace to grace period management functions

Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 96f6f985 12-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

nfsd: fix b0rken error value for setattr on read-only mount

..._want_write() returns -EROFS on failure, _not_ an NFS error value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9dc4e6c4 09-Apr-2012 J. Bruce Fields <bfields@redhat.com>

nfsd: don't fail unchecked creates of non-special files

Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a52d726b 21-Mar-2012 Jeff Layton <jlayton@kernel.org>

nfsd: convert nfs4_client->cl_cb_flags to a generic flags field

We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall. Currently, that's
recorded in the cl_firststate field of the client struct. Using an
entire u32 to store a flag is rather wasteful though.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new flags don't interfere.

Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ab4684d1 02-Mar-2012 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix nfs4_verifier memory alignment

Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

We can fix most of this by using a __be32 array to generate the
verifier's contents and then byte-copying it into the verifier field.

However, there is one spot where there is a backwards compatibility
constraint: the do_nfsd_create() call expects a verifier which is
32-bit aligned. Fix this spot by forcing the alignment of the create
verifier in the nfsd4_open args structure.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier. The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8f199b82 20-Mar-2012 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSD: Fix warnings when NFSD_DEBUG is not defined

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 59deeb9e 27-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: reduce do_open_lookup() stack usage

I get 320 bytes for struct svc_fh on x86_64, really a little large to be
putting on the stack; kmalloc() instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 41fd1e42 27-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: delay setting current filehandle till success

Compound processing stops on error, so the current filehandle won't be
used on error. Thus the order here doesn't really matter. It'll be
more convenient to do it later, though.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c8bd7e0 16-Feb-2012 Benny Halevy <benny@tonian.com>

nfsd41: split out share_access want and signal flags while decoding

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 37c593c5 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: use current stateid by value

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9428fe1a 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: consume current stateid on DELEGRETURN and OPENDOWNGRADE

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1e97b519 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: handle current stateid in SETATTR and FREE_STATEID

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d1471053 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: mark LOOKUP, LOOKUPP and CREATE to invalidate current stateid

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 83071114 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: save and restore current stateid with current fh

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 80e01cc1 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: mark PUTFH, PUTPUBFH and PUTROOTFH to clear current stateid

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 30813e27 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: consume current stateid on read and write

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 62cd4a59 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: handle current stateid on lock and locku

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8b70484c 13-Feb-2012 Tigran Mkrtchyan <kofemann@gmail.com>

nfsd41: handle current stateid in open and close

Signed-off-by: Tigran Mkrtchyan <kofemann@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bad0dcff 22-Nov-2011 Al Viro <viro@zeniv.linux.org.uk>

new helpers: fh_{want,drop}_write()

A bunch of places in nfsd does mnt_{want,drop}_write on vfsmount of
export of given fhandle. Switched to obvious inlined helpers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0cf99b91 22-Nov-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: allow non-reclaim open-by-fh's in 4.1

With NFSv4.0 it was safe to assume that open-by-filehandles were always
reclaims.

With NFSv4.1 there are non-reclaim open-by-filehandle operations, so we
should ensure we're only insisting on reclaims in the
OPEN_CLAIM_PREVIOUS case.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 345c2842 20-Oct-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfs41: implement DESTROY_CLIENTID operation

According to rfc5661 18.50, implement DESTROY_CLIENTID operation.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8b289b2c 19-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement new 4.1 open reclaim types

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 856121b2 13-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: warn on open failure after create

If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.

I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d29b20cd 13-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up open owners on OPEN failure

If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around. It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.

Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.

Fix both problems.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b6d2f1ca 10-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: more robust ignoring of WANT bits in OPEN

Mask out the WANT bits right at the start instead of on each use.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c856694e 20-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: make op_cacheresult another flag

I'm not sure why I used a new field for this originally.

Also, the differences between some of these flags are a little subtle;
add some comments to explain.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dad1c067 11-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: replace oo_confirmed by flag bit

I want at least one more bit here. So, let's haul out the caps lock key
and add a flags field.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58e7b33a 28-Aug-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: try to check reply size before operation

For checking the size of reply before calling a operation,
we need try to get maxsize of the operation's reply.

v3: using new method as Bruce said,

"we could handle operations in two different ways:

- For operations that actually change something (write, rename,
open, close, ...), do it the way we're doing it now: be
very careful to estimate the size of the response before even
processing the operation.
- For operations that don't change anything (read, getattr, ...)
just go ahead and do the operation. If you realize after the
fact that the response is too large, then return the error at
that point.

So we'd add another flag to op_flags: say, OP_MODIFIES_SOMETHING. And for
operations with OP_MODIFIES_SOMETHING set, we'd do the first thing. For
operations without it set, we'd do the second."

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: crash, don't attempt to handle, undefined op_rsize_bop]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fe0750e5 30-Jul-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: split stateowners into open and lockowners

The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7c13f344 30-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: drop most stateowner refcounting

Maybe we'll bring it back some day, but we don't have much real use for
it now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ec094c1 30-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: extend state lock over seqid replay logic

There are currently a couple races in the seqid replay code: a
retransmission could come while we're still encoding the original reply,
or a new seqid-mutating call could come as we're encoding a replay.

So, extend the state lock over the encoding (both encoding of a replayed
reply and caching of the original encoded reply).

I really hate doing this, and previously added the stateowner
reference-counting code to avoid it (which was insufficient)--but I
don't see a less complicated alternative at the moment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3e772463 10-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: stop using nfserr_resource for transitory errors

The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later. Save nfserr_resource for the former and use delay/jukebox for
the latter.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a043226b 25-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: permit read opens of executable-only files

A client that wants to execute a file must be able to read it. Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.

NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.

So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.

So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.

The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.

Reported-by: Leonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 75c096f7 15-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: it's OK to return nfserr_symlink

The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.

In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.

The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.

I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.

Delete these unnecessary exceptions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# aadab6c6 15-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: return nfserr_symlink on v4 OPEN of non-regular file

Without this, an attempt to open a device special file without first
stat'ing it will fail.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 832023bf 08-Aug-2011 Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>

nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir()

Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Cc: stable@kernel.org
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1091006c 23-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd: turn on reply cache for NFSv4

It's sort of ridiculous that we've never had a working reply cache for
NFSv4.

On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4). What we really need here is some serious testing.

Anyway, here's a start.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3e98abff 16-Jul-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: call nfsd4_release_compoundargs from pc_release

This simplifies cleanup a bit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ab1350b2 14-Jul-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: Deny new lock before RECLAIM_COMPLETE done

Before nfs41 client's RECLAIM_COMPLETE done, nfs server should deny any
new locks or opens.

rfc5661:

" Whenever a client establishes a new client ID and before it does
the first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. "

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 17456804 13-Jul-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Added TEST_STATEID operation

This operation is used by the client to check the validity of a list of
stateids.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e1ca12df 13-Jul-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: added FREE_STATEID operation

This operation is used by the client to tell the server to free a
stateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 094b5d74 16-Jun-2011 Benny Halevy <benny@tonian.com>

NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as
the client ID derived from the session ID of SEQUENCE is not the same
as the client ID to be destroyed. If the client IDs are the same,
then the server MUST return NFS4ERR_CLIENTID_BUSY.

(that's not implemented yet)

If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only
operation in the COMPOUND request (otherwise, the server MUST return
NFS4ERR_NOT_ONLY_OP).

This fixes the error return; before, we returned
NFS4ERR_OP_NOT_IN_SESSION; after this patch, we return NFS4ERR_NOTSUPP.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ac6721a1 20-Apr-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly

The NFS server uses nfsd_create_v3 to handle EXCLUSIVE4_1 opens, but
that function is not prepared to handle them.

Rename nfsd_create_v3() to do_nfsd_create(), and add handling of
EXCLUSIVE4_1.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 68d93184 08-Apr-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix wrongsec handling for PUTFH + op cases

When PUTFH is followed by an operation that uses the filehandle, and
when the current client is using a security flavor that is inconsistent
with the given filehandle, we have a choice: we can return WRONGSEC
either when the current filehandle is set using the PUTFH, or when the
filehandle is first used by the following operation.

Follow the recommendations of RFC 5661 in making this choice.

(Our current behavior prevented the client from doing security
negotiation by returning WRONGSEC on PUTFH+SECINFO_NO_NAME.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 29a78a3e 09-Apr-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller

The secinfo caller actually won't want this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 22b03214 09-Apr-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: introduce OPDESC helper

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ece3caf 17-Feb-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: modify the members value of nfsd4_op_flags

The members of nfsd4_op_flags, (ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS)
equals to ALLOWED_AS_FIRST_OP, maybe that's not what we want.

OP_PUTROOTFH with op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS,
can't appears as the first operation with out SEQUENCE ops.

This patch modify the wrong value of ALLOWED_WITHOUT_FH etc which
was introduced by f9bb94c4.

Cc: stable@kernel.org
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1d1bc8f2 04-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: support BIND_CONN_TO_SESSION

Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# da165dd6 02-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd: remove some unnecessary dropit handling

We no longer need a few of these special cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 04f4ad16 16-Dec-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement secinfo_no_name

Implementation of this operation is mandatory for NFSv4.1.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0ff7ab46 16-Dec-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: move guts of nfsd4_lookupp into helper

We'll reuse this code in secinfo_no_name.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 56560b9a 16-Dec-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: 4.1 SECINFO should consume filehandle

See the referenced spec language; an attempt by a 4.1 client to use the
current filehandle after a secinfo call should result in a NOFILEHANDLE
error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8ff30fa4 12-Aug-2010 NeilBrown <neilb@suse.de>

nfsd: disable deferral for NFSv4

Now that a slight delay in getting a reply to an upcall doesn't
require deferring of requests, request deferral for all NFSv4
requests - the concept doesn't really fit with the v4 model.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4dc6ec00 19-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: implement reclaim_complete

This is a mandatory operation. Also, here (not in open) is where we
should be committing the reboot recovery information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 26c0c75e 24-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: fix unlikely race in session replay case

In the replay case, the

renew_client(session->se_client);

happens after we've droppped the sessionid_lock, and without holding a
reference on the session; so there's nothing preventing the session
being freed before we get here.

Thanks to Benny Halevy for catching a bug in an earlier version of this
patch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Benny Halevy <bhalevy@panasas.com>


# 57716355 20-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: complete enforcement of 4.1 op ordering

Enforce the rules about compound op ordering.

Motivated by implementing RECLAIM_COMPLETE, for which the client is
implicit in the current session, so it is important to ensure a
succesful SEQUENCE proceeds the RECLAIM_COMPLETE.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 7663dacd 04-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: remove pointless paths in file headers

The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a74af21 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Move private headers to source directory

Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 341eb184 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Source files #include cleanups

Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 57ecb34f 01-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: fix share mode permissions

NFSv4 opens may function as locks denying other NFSv4 users the rights
to open a file.

We're requiring a user to have write permissions before they can deny
write. We're *not* requiring a user to have write permissions to deny
read, which is if anything a more drastic denial.

What was intended was to require write permissions for DENY_READ.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0a3adade 04-Nov-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: make fs/nfsd/vfs.h for common includes

None of this stuff is used outside nfsd, so move it out of the common
linux include directory.

Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a06b1261 31-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSD: Fix a bug in the NFSv4 'supported attrs' mandatory attribute

The fact that the filesystem doesn't currently list any alternate
locations does _not_ imply that the fs_locations attribute should be
marked as "unsupported".

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# abfabf8c 23-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: encode replay sequence from the slot values

The sequence operation is not cached; always encode the sequence operation on
a replay from the slot table and session values. This simplifies the sessions
replay logic in nfsd4_proc_compound.

If this is a replay of a compound that was specified not to be cached, return
NFS4ERR_RETRY_UNCACHED_REP.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c8647947 23-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: rename nfsd4_enc_uncached_replay

This function is only used for SEQUENCE replay.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 49557cc7 23-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: Use separate DRC for setclientid

Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.

The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.

The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.

nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.

Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.

All other errors get cached.

Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9208faf2 06-Jul-2009 Yu Zhiguo <yuzg@cn.fujitsu.com>

NFSv4: ACL in operations 'open' and 'create' should be used

ACL in operations 'open' and 'create' is decoded but never be used.
It should be set as the initial ACL for the object according to RFC3530.
If error occurs when setting the ACL, just clear the ACL bit in the
returned attr bitmap.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0a93a47f 19-May-2009 Yu Zhiguo <yuzg@cn.fujitsu.com>

NFSv4: kill off complicated macro 'PROC'

J. Bruce Fields wrote:
...
> (This is extremely confusing code to track down: note that
> proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC()
> macro at the end of fs/nfsd/nfs4proc.c. Which means, for example, that
> grepping for nfs4svc_decode_compoundargs() gets you nowhere. Patches to
> kill off that macro would be welcomed....)

the macro 'PROC' is complicated and obscure, it had better
be killed off in order to make the code more clear.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3c8e0316 16-May-2009 Yu Zhiguo <yuzg@cn.fujitsu.com>

NFSv4: do exact check about attribute specified

Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
not supported in current environment.
Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.

This bug is found when do newpynfs tests. The names of the tests that failed
are following:
CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s

Add function do_check_fattr() to do exact check:
1, Check attribute specified is supported by the NFSv4 server or not.
2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
in current environment or not.
3, Check attribute specified is writable or not.

step 1 and 3 are done in function nfsd4_decode_fattr() but removed
to this function now.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 79fb54ab 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: CREATE_EXCLUSIVE4_1

Implement the CREATE_EXCLUSIVE4_1 open mode conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

This mode allows the client to atomically create a file
if it doesn't exist while setting some of its attributes.

It must be implemented if the server supports persistent
reply cache and/or pnfs.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7e705706 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: support for 3-word long attribute bitmask

Also, use client minorversion to generate supported attrs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 95ec28cd 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify

_nfsd4_verify currently skips 3 words from the encoded buffer begining.
With support for 3-word attr bitmaps in nfsd41, nfsd4_encode_fattr
may encode 1, 2, or 3 words, and not always 2 as it used to be, hence
we need to find out where to skip using the encoded bitmap length.

Note: This patch may be applied over pre-nfsd41 nfsd.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8daf220a 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions

Support enabling and disabling nfsv4.1 via /proc/fs/nfsd/versions
by writing the strings "+4.1" or "-4.1" correspondingly.

Use user mode nfs-utils (rpc.nfsd option) to enable.
This will allow us to get rid of CONFIG_NFSD_V4_1

[nfsd41: disable support for minorversion by default]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# d87a8ade 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: access_valid

For nfs41, the open share flags are used also for
delegation "wants" and "signals". Check that they are valid.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 60adfc50 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: clientid handling

Extract the clientid from sessionid to set the op_clientid on open.
Verify that the clid for other stateful ops is zero for minorversion != 0
Do all other checks for stateful ops without sessions.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed whitespace indent]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_open]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6668958f 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: stateid handling

When sessions are used, stateful operation sequenceid and stateid handling
are not used. When sessions are used, on the first open set the seqid to 1,
mark state confirmed and skip seqid processing.

When sessionas are used the stateid generation number is ignored when it is zero
whereas without sessions bad_stateid or stale stateid is returned.

Add flags to propagate session use to all stateful ops and down to
check_stateid_generation.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd4_has_session should return a boolean, not u32]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# dd453dfd 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op

Currently we only use cstate->current_fh,
will also be used by nfsd41 code.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# bf864a31 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: non-page DRC for solo sequence responses

A session inactivity time compound (lease renewal) or a compound where the
sequence operation has sa_cachethis set to FALSE do not require any pages
to be held in the v4.1 DRC. This is because struct nfsd4_slot is already
caching the session information.

Add logic to the nfs41 server to not cache response pages for solo sequence
responses.

Return nfserr_replay_uncached_rep on the operation following the sequence
operation when sa_cachethis is FALSE.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_replay_cache_entry]
[nfsd41: rename nfsd4_no_page_in_cache]
[nfsd41 rename nfsd4_enc_no_page_replay]
[nfsd41 nfsd4_is_solo_sequence]
[nfsd41 change nfsd4_not_cached return]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed return type to bool]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 drop parens in nfsd4_is_solo_sequence call]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed "== 0" to "!"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# da3846a2 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: nfsd DRC logic

Replay a request in nfsd4_sequence.
Add a minorversion to struct nfsd4_compound_state.

Pass the current slot to nfs4svc_encode_compound res via struct
nfsd4_compoundres to set an NFSv4.1 DRC entry.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfs4svc_encode_compoundres]
[nfsd41 replace nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f9bb94c4 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: enforce NFS4ERR_SEQUENCE_POS operation order rules for minorversion != 0 only.

Signed-off-by: Andy Adamson<andros@netapp.com>
[nfsd41: do not verify nfserr_sequence_pos for minorversion 0]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 069b6ad4 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: proc stubs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2f425878 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd: don't use the deferral service, return NFS4ERR_DELAY

On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
returned. It is up to the NFSv4.1 client to resend only the operations that
have not been processed.

Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

Note: this isn't an adequate solution on its own. It's acceptable as a way
to get some minimal 4.1 up and working, but we're going to have to find a
way to avoid returning DELAY in all common cases before 4.1 can really be
considered ready.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse rq_nodeferral negative logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: initialize rq_usedeferral]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 20766016 28-Mar-2009 Benny Halevy <bhalevy@panasas.com>

nfsd: remove nfsd4_ops array size

There's no need for it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e354d571 28-Mar-2009 Andy Adamson <andros@netapp.com>

nfsd: embed nfsd4_current_state in nfsd4_compoundres

Remove the allocation of struct nfsd4_compound_state.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5cb031b0 14-Mar-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: remove redundant check from nfsd4_open

Note that we already checked for this invalid case at the top of this
function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a1c8c4d1 08-Mar-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: support putpubfh operation

Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually
allowed for v4. The right error is probably NFSERR_NOTSUPP.

But let's just implement it; though rarely seen, it can be used by
Solaris (with a special mount option), is mandated by the rfc, and is
trivial for us to support.

Thanks to Yang Hongyang for pointing out the original problem, and to
Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further
argument....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 31dec253 05-Mar-2009 David Shaw <dshaw@jabberwocky.com>

Short write in nfsd becomes a full write to the client

If a filesystem being written to via NFS returns a short write count
(as opposed to an error) to nfsd, nfsd treats that as a success for
the entire write, rather than the short count that actually succeeded.

For example, given a 8192 byte write, if the underlying filesystem
only writes 4096 bytes, nfsd will ack back to the nfs client that all
8192 bytes were written. The nfs client does have retry logic for
short writes, but this is never called as the client is told the
complete write succeeded.

There are probably other ways it could happen, but in my case it
happened with a fuse (filesystem in userspace) filesystem which can
rather easily have a partial write.

Here is a patch to properly return the short write count to the
client.

Signed-off-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6150ef0d 21-Feb-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: remove unused CHECK_FH flag

All users now pass this, so it's meaningless.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a4773c08 02-Feb-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: use helper for copying filehandles for replay

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 13024b7b 02-Feb-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: fix misplaced comment

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 99f88726 02-Feb-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: clarify exclusive create bitmask result.

The use of |= is confusing--the bitmask is always initialized to zero in
this case, so we're effectively just doing an assignment here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0407717d 15-Dec-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: dprint each op status in nfsd4_proc_compound

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# af558e33 05-Sep-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: common grace period control

Rewrite grace period code to unify management of grace period across
lockd and nfsd. The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it. This creates a slight race condition, since
the enforcement is not coordinated. It's also more complicated than
necessary.

Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.

We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c228c24b 21-Aug-2008 Andy Adamson <andros@netapp.com>

nfsd: fix compound state allocation error handling

Move the cstate_alloc call so that if it fails, the response is setup to
encode the NFS error. The out label now means that the
nfsd4_compound_state has not been allocated.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f1c7f79b 08-Aug-2008 Adrian Bunk <bunk@kernel.org>

[NFSD] uninline nfsd4_op_name()

There doesn't seem to be a compelling reason why nfsd4_op_name() is
marked as "inline":

It's only used in a dprintk(), and as long as it has only one caller
non-ancient gcc versions anyway inline it automatically.

This patch fixes the following compile error with gcc 3.4:

...
CC fs/nfsd/nfs4proc.o
nfs4proc.c: In function `nfsd4_proc_compound':
nfs4proc.c:854: sorry, unimplemented: inlining failed in call to
nfs4proc.c:897: sorry, unimplemented: called from here
make[3]: *** [fs/nfsd/nfs4proc.o] Error 1

Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
[ Also made it "const char *" - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b001a1b6 02-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: dprint operation names

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8837abca 16-Jun-2008 Miklos Szeredi <mszeredi@suse.cz>

nfsd: rename MAY_ flags

Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.

[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3b12cd98 05-May-2008 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: add dprintk of compound return

We already print each operation of the compound when debugging is turned
on; printing the result could also help with remote debugging.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 18f335af 15-Feb-2008 Dave Hansen <haveblue@us.ibm.com>

[PATCH] r/o bind mounts: elevate write count for xattr_permission() callers

This basically audits the callers of xattr_permission(), which calls
permission() and can perform writes to the filesystem.

[AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak
spotted by Miklos]

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 406a7ea9 27-Nov-2007 Frank Filz <ffilzlnx@us.ibm.com>

nfsd: Allow AIX client to read dir containing mountpoints

This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.

I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i

The AIX client mounts / -o sec=krb5:krb5i onto /mnt

If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.

Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.

In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.

The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2fdada03 27-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: demote some printk()s to dprintk()s

To quote a recent mail from Andrew Morton:

Look: if there's a way in which an unprivileged user can trigger
a printk we fix it, end of story.

OK. I assume that goes double for printk()s that might be triggered by
random hosts on the internet. So, disable some printk()s that look like
they could be triggered by malfunctioning or malicious clients. For
now, just downgrade them to dprintk()s.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 749997e5 31-Jul-2007 Jeff Layton <jlayton@kernel.org>

knfsd: set the response bitmask for NFS4_CREATE_EXCLUSIVE

RFC 3530 says:

If the server uses an attribute to store the exclusive create verifier, it
will signify which attribute by setting the appropriate bit in the attribute
mask that is returned in the results.

Linux uses the atime and mtime to store the verifier, but sends a zeroed out
bitmask back to the client. This patch makes sure that we set the correct
bits in the bitmask in this situation.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dcb488a3 17-Jul-2007 Andy Adamson <andros@citi.umich.edu>

knfsd: nfsd4: implement secinfo

Implement the secinfo operation.

(Thanks to Usha Ketineni wrote an earlier version of this support.)

Cc: Usha Ketineni <uketinen@us.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# df547efb 17-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: nfsd4: simplify exp_pseudoroot arguments

We're passing three arguments to exp_pseudoroot, two of which are just fields
of the svc_rqst. Soon we'll want to pass in a third field as well. So let's
just give up and pass in the whole struct svc_rqst.

Also sneak in some minor style cleanups while we're at it.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 27d630ec 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: simplify filehandle check

Kill another big "if" clause.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# eeac294e 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: simplify migration op check

I'm not too fond of these big if conditions. Replace them by checks of a flag
in the operation descriptor. To my eye this makes the code a bit more
self-documenting, and makes the complicated part of the code (proc_compound) a
little more compact.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b591480b 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: reorganize compound ops

Define an op descriptor struct, use it to simplify nfsd4_proc_compound().

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c954e2a5 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: make verify and nverify wrappers

Make wrappers for verify and nverify, for consistency with other ops.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7191155b 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: don't inline nfsd4 compound op functions

The inlining contributes to bloating the stack of nfsd4_compound, and I want
to change the compound op functions to function pointers anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a4f1706a9 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: move replay_owner to cstate

Tuck away the replay_owner in the cstate while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d9e626f1 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: remove spurious replay_owner check

OK, this is embarassing--I've even looked back at the history, and cannot for
the life of me figure out why I added this check.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# ca364317 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: pass saved and current fh together into nfsd4 operations

Pass the saved and current filehandles together into all the nfsd4 compound
operations.

I want a unified interface to these operations so we can just call them by
pointer and throw out the huge switch statement.

Also I'll eventually want a structure like this--that holds the state used
during compound processing--for deferral.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e5710199 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: clarify units of COMPOUND_SLACK_SPACE

A comment here incorrectly states that "slack_space" is measured in words, not
bytes. Remove the comment, and adjust a variable name and a few comments to
clarify the situation.

This is pure cleanup; there should be no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 81ac95c5 08-Nov-2006 J. Bruce Fields <bfields@fieldses.org>

[PATCH] nfsd4: fix open-create permissions

In the case where an open creates the file, we shouldn't be rechecking
permissions to open the file; the open succeeds regardless of what the new
file's mode bits say.

This patch fixes the problem, but only by introducing yet another parameter
to nfsd_create_v3. This is ugly. This will be fixed by later patches.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# af85852d 08-Nov-2006 J. Bruce Fields <bfields@fieldses.org>

[PATCH] nfsd4: reindent do_open_lookup()

Minor rearrangement, cleanup of do_open_lookup(). No change in behavior.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a90b061c 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] nfsd: nfs_replay_me

We are using NFS_REPLAY_ME as a special error value that is never leaked to
clients. That works fine; the only problem is mixing host- and network-
endian values in the same objects. Network-endian equivalent would work just
as fine; switch to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b37ad28b 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] nfsd: nfs4 code returns error values in net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 2ebbc012 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] xdr annotations: NFSv4 server

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7111c66e 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] fix svc_procfunc declaration

svc_procfunc instances return __be32, not int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9801d8a3 17-Oct-2006 J. Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: fix open permission checking

We weren't actually checking for SHARE_ACCESS_WRITE, with the result that the
owner could open a non-writeable file for write!

Continue to allow DENY_WRITE only with write access.

Thanks to Jim Rees for reporting the bug.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# dc730e17 17-Oct-2006 J. Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: fix owner-override on open

If a client creates a file using an open which sets the mode to 000, or if a
chmod changes permissions after a file is opened, then situations may arise
where an NFS client knows that some IO is permitted (because a process holds
the file open), but the NFS server does not (because it doesn't know about the
open, and only sees that the IO conflicts with the current mode of the file).

As a hack to solve this problem, NFS servers normally allow the owner to
override permissions on IO. The client can still enforce correct
permissions-checking on open by performing an explicit access check.

In NFSv4 the client can rely on the explicit on-the-wire open instead of an
access check.

Therefore we should not be allowing the owner to override permissions on an
over-the-wire open!

However, we should still allow the owner to override permissions in the case
where the client is claiming an open that it already made either before a
reboot, or while it was holding a delegation.

Thanks to Jim Rees for reporting the bug.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 42ca0993 04-Oct-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: actually use all the pieces to implement referrals

Use all the pieces set up so far to implement referral support, allowing
return of NFS4ERR_MOVED and fs_locations attribute.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3cc03b16 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom

.. by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack. So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union). So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3e3b4800 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: add some missing newlines in printks

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e2b20950 10-Jul-2006 Shankar Anand <shanand@novell.com>

[PATCH] knfsd: nfsd4: add per-operation server stats

Add an nfs4 operations count array to nfsd_stats structure. The count is
incremented in nfsd4_proc_compound() where all the operations are handled
by the nfsv4 server. This count of individual nfsv4 operations is also
entered into /proc filesystem.

Signed-off-by: Shankar Anand<shanand@novell.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f0e2993e 10-Apr-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: nfsd4: remove nfsd_setuser from putrootfh

Since nfsd_setuser() is already called from any operation that uses the
current filehandle (because it's called from fh_verify), there's no reason to
call it from putrootfh.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7775f4c8 10-Apr-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Correct reserved reply space for read requests.

NFSd makes sure there is enough space to hold the maximum possible reply
before accepting a request. The units for this maximum is (4byte) words.
However in three places, particularly for read request, the number given is
a number of bytes.

This means too much space is reserved which is slightly wasteful.

This is the sort of patch that could uncover a deeper bug, and it is not
critical, so it would be best for it to spend a while in -mm before going
in to mainline.

(akpm: target 2.6.17-rc2, 2.6.16.3 (approx))

Discovered-by: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# cbd0d51a 07-Feb-2006 J. Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: fix nfs4_open lock leak

I just noticed that my patch "don't create on open that fails due to
ERR_GRACE" (recently commited as fb553c0f17444e090db951b96df4d2d71b4f4b6b)
had an obvious problem that causes a deadlock on reboot recovery. Sending
in this now since it seems like a clear 2.6.16 candidate.--b.

We're returning with a lock held in some error cases.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 52748819 18-Jan-2006 Fred Isaman <iisaman@citi.umich.edu>

[PATCH] nfsd4: clean up settattr code

Clean up some unnecessary special-casing in the setattr code..

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# fb553c0f 18-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] nfsd4: don't create on open that fails due to ERR_GRACE

In an earlier patch (commit b648330a1d741d5df8a5076b2a0a2519c69c8f41) I noted
that a too-early grace-period check was preventing us from bumping the
sequence id on open. Unfortunately in that patch I stupidly moved the
grace-period check back too far, so now an open for create can succesfully
create the file while still returning ERR_GRACE.

The correct place for that check is after we've set the open_owner and handled
any replays, but before we actually start mucking with the filesystem.

Thanks to Avishay Traeger for reporting the bug.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 375c5547 18-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] nfsd4: nfs4state.c miscellaneous goto removals

Remove some goto's that made the logic here a little more tortuous than
necessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a525825d 18-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] nfsd4: handle replays of failed open reclaims

We need to make sure open reclaims are marked confirmed immediately so that we
can handle replays even if they fail (e.g. with a seqid-incrementing error).
(See 8.1.8.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# fd445277 18-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] nfsd4: operation debugging

Simple, useful debugging printk: print the number of each op as we process it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f2327d9a 13-Sep-2005 Neil Brown <neilb@suse.de>

[PATCH] nfsd4: move replay_owner

It seems more natural to move the setting of the replay_owner into the
relevant procedure instead of doing it in nfsv4_proc_compound.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b648330a 07-Jul-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: ERR_GRACE should bump seqid on open

The GRACE and NOGRACE errors should bump the sequence id on open. So we delay
the handling of these errors until nfsd4_process_open2, at which point we've
set the open owner, so the encode routine will be able to bump the sequence
id.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7e06b7f9 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: nfs4: hold filp while reading or writing

We're trying to read and write from a struct file that we may not hold a
reference to any more (since a close could be processed as soon as we drop the
state lock).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c815afc7 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: block metadata ops during grace period

We currently return err_grace if a user attempts a non-reclaim open during the
grace period. But we also need to prevent renames and removes, at least, to
ensure clients have the chance to recover state on files before they are moved
or deleted.

Of course, local users could also do renames and removes during the lease
period, and there's not much we can do about that. This at least will help
with remote users.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 0dd3c192 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: support CLAIM_DELEGATE_CUR

Add OPEN claim type NFS4_OPEN_CLAIM_DELEGATE_CUR to nfsd4_open().

A delegation stateid and a name are provided. OPEN with O_CREAT is not legal
with this claim type; otherwise, use the NFS4_OPEN_CLAIM_NULL code path to
lookup the filename to be opened.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!