History log of /linux-master/fs/nfsd/nfs4xdr.c
Revision Date Author Comments
# f488138b 11-Apr-2024 Vasily Gorbik <gor@linux.ibm.com>

NFSD: fix endianness issue in nfsd4_encode_fattr4

The nfs4 mount fails with EIO on 64-bit big endian architectures since
v6.7. The issue arises from employing a union in the nfsd4_encode_fattr4()
function to overlay a 32-bit array with a 64-bit values based bitmap,
which does not function as intended. Address the endianness issue by
utilizing bitmap_from_arr32() to copy 32-bit attribute masks into a
bitmap in an endianness-agnostic manner.

Cc: stable@vger.kernel.org
Fixes: fce7913b13d0 ("NFSD: Use a bitmask loop to encode FATTR4 results")
Link: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2060217
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9b350d3e 04-Mar-2024 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_replay()

Replace open-coded encoding logic with the use of conventional XDR
utility functions. Add a tracepoint to make replays observable in
field troubleshooting situations.

The WARN_ON is removed. A stack trace is of little use, as there is
only one call site for nfsd4_encode_replay(), and a buffer length
shortage here is unlikely.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c5967721 15-Feb-2024 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

. update time_modify and time_metadata into file's metadata
with current time

. encode GETATTR as normal except the file size is encoded with
the value returned from CB_GETATTR

. mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 31e4bb8f 25-Jan-2024 Jorge Mora <jmora1300@gmail.com>

NFSD: fix LISTXATTRS returning more bytes than maxcount

The maxcount is the maximum number of bytes for the LISTXATTRS4resok
result. This includes the cookie and the count for the name array,
thus subtract 12 bytes from the maxcount: 8 (cookie) + 4 (array count)
when filling up the name array.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2f73f37d 25-Jan-2024 Jorge Mora <jmora1300@gmail.com>

NFSD: fix LISTXATTRS returning a short list with eof=TRUE

If the XDR buffer is not large enough to fit all attributes
and the remaining bytes left in the XDR buffer (xdrleft) is
equal to the number of bytes for the current attribute, then
the loop will prematurely exit without setting eof to FALSE.
Also in this case, adding the eof flag to the buffer will
make the reply 4 bytes larger than lsxa_maxcount.

Need to check if there are enough bytes to fit not only the
next attribute name but also the eof as well.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 61ab5e07 25-Jan-2024 Jorge Mora <jmora1300@gmail.com>

NFSD: change LISTXATTRS cookie encoding to big-endian

Function nfsd4_listxattr_validate_cookie() expects the cookie
as an offset to the list thus it needs to be encoded in big-endian.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 52a357db 25-Jan-2024 Jorge Mora <jmora1300@gmail.com>

NFSD: fix nfsd4_listxattr_validate_cookie

If LISTXATTRS is sent with a correct cookie but a small maxcount,
this could lead function nfsd4_listxattr_validate_cookie to
return NFS4ERR_BAD_COOKIE. If maxcount = 20, then second check
on function gives RHS = 3 thus any cookie larger than 3 returns
NFS4ERR_BAD_COOKIE.

There is no need to validate the cookie on the return XDR buffer
since attribute referenced by cookie will be the first in the
return buffer.

Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic")
Signed-off-by: Jorge Mora <mora@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a2c91753 17-Nov-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Modify NFSv4 to use nfsd_read_splice_ok()

Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 862bee84 16-Dec-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d

For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict()
is waiting forever, preventing a clean server shutdown. The
requesting client might also hang waiting for a reply to the
conflicting GETATTR.

Invoking wait_on_bit() in an nfsd thread context is a hazard. The
correct fix is to replace this wait_on_bit() call site with a
mechanism that defers the conflicting GETATTR until the CB_GETATTR
completes or is known to have failed.

That will require some surgery and extended testing and it's late
in the v6.7-rc cycle, so I'm reverting now in favor of trying again
in a subsequent kernel release.

This is my fault: I should have recognized the ramifications of
calling wait_on_bit() in here before accepting this patch.

Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue.

Reported-by: Wolfgang Walter <linux-nfs@stwm.de>
Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1f121e2d 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_seek()

Use modern XDR encoder utilities.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b609ad60 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_offset_status()

Use modern XDR encoder utilities.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 21d316a7 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_copy_notify()

Replace open-coded encoding logic with the use of conventional XDR
utility functions.

Note that if we replace the cpn_sec and cpn_nsec fields with a
single struct timespec64 field, the encoder can use
nfsd4_encode_nfstime4(), as that is the data type specified by the
XDR spec.

NFS4ERR_INVAL seems inappropriate if the encoder doesn't support
encoding the response. Instead use NFS4ERR_SERVERFAULT, since this
condition is a software bug on the server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 02e0297f 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_copy()

Restructure this function using conventional XDR utility functions
and so it aligns better with the XDR in the specification.

I've also moved nfsd4_encode_copy() closer to the data type encoders
that only it uses.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 08b4436a 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_test_stateid()

Use conventional XDR utilities.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# abef972c 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_exchange_id()

Restructure nfsd4_encode_exchange_id() so that it will be more
straightforward to add support for SSV one day. Also, adopt the use
of the conventional XDR utility functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 91c7a905 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_do_encode_secinfo()

Refactor nfsd4_encode_secinfo() so it is more clear what XDR data
item is being encoded by which piece of code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d38e570f 09-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_access()

Convert nfsd4_encode_access() to use modern XDR utility functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 25c307ac 04-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_readdir()

Untangle nfsd4_encode_readdir() so it is more clear what XDR data
item is being encoded by which piece of code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a1aee9aa 04-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_entry4()

Reshape nfsd4_encode_entry4() to be more like the legacy dirent
encoders, which were recently rewritten to use xdr_stream.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3fc5048c 04-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add an nfsd4_encode_nfs_cookie4() helper

De-duplicate the entry4 cookie encoder, similar to the arrangement
for the NFSv2 and NFSv3 directory entry encoders.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a0d042f8 04-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_rdattr_error()

No need for specialized code here, as this function is invoked only
rarely. Convert it to encode to xdr_stream using conventional XDR
helpers.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a0f3c835 04-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename nfsd4_encode_dirent()

Rename nfsd4_encode_dirent() to match the naming convention already
used in the NFSv2 and NFSv3 readdir paths. The new name reflects the
name of the spec-defined XDR data type for an NFSv4 directory entry.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6621b88b 02-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_sequence()

De-duplicate open-coded encoding of the sessionid, and convert the
rest of the function to use conventional XDR utility functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b0c1b1ba 02-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Restructure nfsd4_encode_create_session()

Convert nfsd4_encode_create_session() to use the conventional XDR
encoding utilities.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 150990f4 02-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_channel_attr4()

De-duplicate the encoding of the fore channel and backchannel
attributes.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 65baa609 02-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a utility function for encoding sessionid4 objects

There is more than one NFSv4 operation that needs to encode a
sessionid4, so extract that data type into a separate helper.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 841735b3 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_open()

Finish cleaning up nfsd4_encode_open().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 802e1913 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_open_delegation4()

To better align our implementation with the XDR specification,
refactor the part of nfsd4_encode_open() that encodes delegation
metadata.

As part of that refactor, remove an unnecessary BUG() call site and
a comment that appears to be stale.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6dd43c6d 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_open_none_delegation4()

To better align our implementation with the XDR specification,
refactor the part of nfsd4_encode_open() that encodes the
open_none_delegation4 type.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 32efa674 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_open_write_delegation4()

Make it easier to adjust the XDR encoder to handle new features
related to write delegations.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e4ad7ce7 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_open_read_delegation4()

Refactor nfsd4_encode_open() so the open_read_delegation4 type is
encoded in a separate function. This makes it more straightforward
to later add support for returning an nfsace4 in OPEN responses that
offer a delegation.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c5641782 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor nfsd4_encode_lock_denied()

Use the modern XDR utility functions.

The LOCK and LOCKT encoder functions need to return nfserr_denied
when a lock is denied, but nfsd4_encode_lock4denied() should return
a status code that is consistent with other XDR encoders.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c4a29c52 29-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_lock_owner4()

To improve readability and better align the LOCK encoders with the
XDR specification, add an explicit encoder named for the lock_owner4
type.

In particular, to avoid code duplication, use
nfsd4_encode_clientid4() to encode the clientid in the lock owner
rather than open-coding it.

It looks to me like nfs4_set_lock_denied() already clears the
clientid if it won't return an owner (cf: the nevermind: label). The
code in the XDR encoder appears to be redundant and can safely be
removed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 92d82e99 12-Oct-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove a layering violation when encoding lock_denied

An XDR encoder is responsible for marshaling results, not releasing
memory that was allocated by the upper layer. We have .op_release
for that purpose.

Move the release of the ld_owner.data string to op_release functions
for LOCK and LOCKT.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4bbe42e8 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_getdeviceinfo()

Adopt the conventional XDR utility functions. Also, restructure to
make the function align more closely with the spec -- there doesn't
seem to be a performance need for speciality code, so prioritize
readability.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 85dbc978 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_layoutreturn()

Adopt the use of conventional XDR utility functions. Restructure
the encoder to better align with the XDR definition of the result.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cc313f80 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_layoutcommit()

Adopt the use of conventional XDR utility functions. Restructure
the encoder to better align with the XDR definition of the result.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 69f5f019 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_layoutget()

De-duplicate the open-coded stateid4 encoder. Adopt the use of the
conventional current XDR encoding helpers. Refactor the encoder to
align with the XDR specification.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 40bb2baa 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_stateid()

Update the encoder function name to match the type name, as is the
convention with other such encoder utility functions, and with
nfsd4_decode_stateid4().

Make the @stateid argument a const so that callers of
nfsd4_encode_stateid4() in the future can be passed const pointers
to structures.

Since the compiler is allowed to add padding to structs, use the
wire (spec-defined) size when reserving buffer space.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 76bebcc7 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_count4()

This is a synonym for nfsd4_encode_uint32_t() that matches the
name of the XDR type. It will get at least one more use in a
subsequent patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ae1131d4 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename nfsd4_encode_fattr()

For better alignment with the specification, NFSD's encoder function
name should match the name of the XDR data type.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fce7913b 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use a bitmask loop to encode FATTR4 results

The fattr4 encoder is now structured like the COMPOUND op encoder:
one function for each individual attribute, called by bit number.
Benefits include:

- The individual attributes are now guaranteed to be encoded in
bitmask order into the send buffer

- There can be no unwanted side effects between attribute encoders

- The code now clearly documents which attributes are /not/
implemented on this server

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b3dbf4e4 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_xattr_support()

Refactor the encoder for FATTR4_XATTR_SUPPORT into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f59388a5 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_sec_label()

Refactor the encoder for FATTR4_SEC_LABEL into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 345c3877 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_suppattr_exclcreat()

Refactor the encoder for FATTR4_SUPPATTR_EXCLCREAT into a helper. In
a subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4c584731 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_layout_blksize()

Refactor the encoder for FATTR4_LAYOUT_BLKSIZE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4c15878e 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_layout_types()

Refactor the encoder for FATTR4_LAYOUT_TYPES into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e7a5b1b2 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_fs_layout_types()

Refactor the encoder for FATTR4_FS_LAYOUT_TYPES into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1b9097e3 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_mounted_on_fileid()

Refactor the encoder for FATTR4_MOUNTED_ON_FILEID into a helper. In
a subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d1828611 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_time_modify()

Refactor the encoder for FATTR4_TIME_MODIFY into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 673720bc 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_time_metadata()

Refactor the encoder for FATTR4_TIME_METADATA into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 993474e8 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_time_delta()

Refactor the encoder for FATTR4_TIME_DELTA into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

fattr4_time_delta is specified as an nfstime4, so de-duplicate this
encoder.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2e38722d 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_time_create()

Refactor the encoder for FATTR4_TIME_CREATE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# eed4d1ad 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_time_access()

Refactor the encoder for FATTR4_TIME_ACCESS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6d37ac3a 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_space_used()

Refactor the encoder for FATTR4_SPACE_USED into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d0cde979 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_space_total()

Refactor the encoder for FATTR4_SPACE_TOTAL into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 74ebc697 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_space_free()

Refactor the encoder for FATTR4_SPACE_FREE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 83afa091 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_space_avail()

Refactor the encoder for FATTR4_SPACE_AVAIL into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a460cda2 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_rawdev()

Refactor the encoder for FATTR4_RAWDEV into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 62f31e56 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_owner_group()

Refactor the encoder for FATTR4_OWNER_GROUP into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fa51a520 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_owner()

Refactor the encoder for FATTR4_OWNER into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9f329fea 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_numlinks()

Refactor the encoder for FATTR4_NUMLINKS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f4cf5042 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_mode()

Refactor the encoder for FATTR4_MODE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 951378dc 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_maxwrite()

Refactor the encoder for FATTR4_MAXWRITE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c17195c3 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_maxread()

Refactor the encoder for FATTR4_MAXREAD into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9c1adacc 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_maxname()

Refactor the encoder for FATTR4_MAXNAME into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b066aa5c 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_maxlink()

Refactor the encoder for FATTR4_MAXLINK into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7c605dcc 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_maxfilesize()

Refactor the encoder for FATTR4_MAXFILESIZE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a1469a37 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_fs_locations()

Refactor the encoder for FATTR4_FS_LOCATIONS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b56b7526 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_files_total()

Refactor the encoder for FATTR4_FILES_TOTAL into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 74361e2b 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_files_free()

Refactor the encoder for FATTR4_FILES_FREE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b0c3a5f8 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_files_avail()

Refactor the encoder for FATTR4_FILES_AVAIL into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# eb7ece81 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_fileid()

Refactor the encoder for FATTR4_FILEID into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3283bf64 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_filehandle()

Refactor the encoder for FATTR4_FILEHANDLE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

We can de-duplicate the other filehandle encoder (in GETFH) using
our new helper.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 07455dc4 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_acl()

Refactor the encoder for FATTR4_ACL into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0207ee08 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_nfsace4()

Refactor the ACE encoding helper so that it can eventually be reused
for encoding OPEN results that contain delegation ACEs.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6515b7d7 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_aclsupport()

Refactor the encoder for FATTR4_ACLSUPPORT into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 782448e1 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_rdattr_error()

Refactor the encoder for FATTR4_RDATTR_ERROR into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1252b283a 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_lease_time()

Refactor the encoder for FATTR4_LEASE_TIME into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b6b62595 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_fsid()

Refactor the encoder for FATTR4_FSID into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d0b28aad 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_size()

Refactor the encoder for FATTR4_SIZE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 263453d9 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_change()

Refactor the encoder for FATTR4_CHANGE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

The code is restructured a bit to use the modern xdr_stream flow,
and the encoded cinfo value is made const so that callers of the
encoders can be passed a const cinfo.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 36ed7e64 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_fh_expire_type()

Refactor the encoder for FATTR4_FH_EXPIRE_TYPE into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b06cf375 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_type()

Refactor the encoder for FATTR4_TYPE into a helper. In a subsequent
patch, this helper will be called from a bitmask loop.

In addition, restructure the code so that byte-swapping is done on
constant values rather than at run time. Run-time swapping can be
costly on some platforms, and "type" is a frequently-requested
attribute.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c9090e27 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4_supported_attrs()

Refactor the encoder for FATTR4_SUPPORTED_ATTRS into a helper. In a
subsequent patch, this helper will be called from a bitmask loop.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8c442288 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4__false()

Add an encoding helper that encodes a single boolean "false" value.
Attributes that always return "false" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c88cb472 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add nfsd4_encode_fattr4__true()

Add an encoding helper that encodes a single boolean "true" value.
Attributes that always return "true" can use this helper.

In a subsequent patch, this helper will be called from a bitmask
loop, so it is given a standardized synopsis.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 83ab8678 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add struct nfsd4_fattr_args

I'm about to split nfsd4_encode_fattr() into a number of smaller
functions. Instead of passing a large number of arguments to each of
the smaller functions, create a struct that can gather the common
argument variables into something with a convenient handle on it.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c3dcb45b 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_setattr()

De-duplicate the encoding of bitmap4 results in
nfsd4_encode_setattr().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e64301f5 18-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename nfsd4_encode_bitmap()

For alignment with the specification, the name of NFSD's encoder
function should match the name of the XDR type.

I've also replaced a few "naked integers" with symbolic constants
that better reflect the usage of these values.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6c41d9a9 13-Sep-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

. update time_modify and time_metadata into file's metadata
with current time

. encode GETATTR as normal except the file size is encoded with
the value returned from CB_GETATTR

. mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0d32a6bb 27-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set

nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
the nfsd_iter_read() sink buffer -- page_len is going to be offset
by the parts of the COMPOUND that have already been encoded into
xdr->buf->pages.

However, that value must be captured /before/
xdr_reserve_space_vec() advances page_len by the expected size of
the read payload. Otherwise, the whole front part of the first
page of the payload in the reply will be uninitialized.

Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
invokes the readv part of the nfsd4_encode_read() path. Also,
older Linux NFS clients appear to send shorter READ requests
for files smaller than a page, whereas newer clients just send
page-sized requests and let the server send as many bytes as
are in the file.

Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
Fixes: 703d75215555 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6372e2ee 16-Aug-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: da_addr_body field missing in some GETDEVICEINFO replies

The XDR specification in RFC 8881 looks like this:

struct device_addr4 {
layouttype4 da_layout_type;
opaque da_addr_body<>;
};

struct GETDEVICEINFO4resok {
device_addr4 gdir_device_addr;
bitmap4 gdir_notification;
};

union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
case NFS4_OK:
GETDEVICEINFO4resok gdir_resok4;
case NFS4ERR_TOOSMALL:
count4 gdir_mincount;
default:
void;
};

Looking at nfsd4_encode_getdeviceinfo() ....

When the client provides a zero gd_maxcount, then the Linux NFS
server implementation encodes the da_layout_type field and then
skips the da_addr_body field completely, proceeding directly to
encode gdir_notification field.

There does not appear to be an option in the specification to skip
encoding da_addr_body. Moreover, Section 18.40.3 says:

> If the client wants to just update or turn off notifications, it
> MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
> In that event, if the device ID is valid, the reply's da_addr_body
> field of the gdir_device_addr field will be of zero length.

Since the layout drivers are responsible for encoding the
da_addr_body field, put this fix inside the ->encode_getdeviceinfo
methods.

Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tom Haynes <loghyr@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 50bce06f 19-Jul-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Report zero space limit for write delegations

Replace the -1 (no limit) with a zero (no reserved space).

This prevents certain non-determinant client behavior, such as
silly-renaming a file when the only open reference is a write
delegation. Such a rename can leave unexpected .nfs files in a
directory that is otherwise supposed to be empty.

Note that other server implementations that support write delegation
also set this field to zero.

Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fd19ca36 29-Jun-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect and
the request attributes include the change info and size attribute then
the write delegation is recalled. If the delegation is returned within
30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY
error is returned for the GETATTR.

Add counter for write delegation recall due to conflict GETATTR. This is
used to evaluate the need to implement CB_GETATTR to adoid recalling the
delegation with conflit GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d7dbed45 23-Jun-2023 Tavian Barnes <tavianator@tavianator.com>

nfsd: Fix creation time serialization order

In nfsd4_encode_fattr(), TIME_CREATE was being written out after all
other times. However, they should be written out in an order that
matches the bit flags in bmval1, which in this case are

#define FATTR4_WORD1_TIME_ACCESS (1UL << 15)
#define FATTR4_WORD1_TIME_CREATE (1UL << 18)
#define FATTR4_WORD1_TIME_DELTA (1UL << 19)
#define FATTR4_WORD1_TIME_METADATA (1UL << 20)
#define FATTR4_WORD1_TIME_MODIFY (1UL << 21)

so TIME_CREATE should come second.

I noticed this on a FreeBSD NFSv4.2 client, which supports creation
times. On this client, file times were weirdly permuted. With this
patch applied on the server, times looked normal on the client.

Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute")
Link: https://unix.stackexchange.com/q/749605/56202
Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 26217679 12-Jun-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add an nfsd4_encode_nfstime4() helper

Clean up: de-duplicate some common code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 58f5d894 06-Jun-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: add encoding of op_recall flag for write delegation

Modified nfsd4_encode_open to encode the op_recall flag properly
for OPEN result with write delegation granted.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org


# 703d7521 18-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Hoist rq_vec preparation into nfsd_read() [step two]

Now that the preparation of an rq_vec has been removed from the
generic read path, nfsd_splice_read() no longer needs to reset
rq_next_page.

nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I
can ascertain, resetting rq_next_page for NFSv4 splice reads is
unnecessary because rq_next_page is already set correctly.

Moreover, resetting it might even be incorrect if previous
operations in the COMPOUND have already consumed at least a page of
the send buffer. I would expect that the result would be encoding
the READ payload over previously-encoded results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ed4a567a 18-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update rq_next_page between COMPOUND operations

A GETATTR with a large result can advance xdr->page_ptr without
updating rq_next_page. If a splice READ follows that GETATTR in the
COMPOUND, nfsd_splice_actor can start splicing at the wrong page.

I've also seen READLINK and READDIR leave rq_next_page in an
unmodified state.

There are potentially a myriad of combinations like this, so play it
safe: move the rq_next_page update to nfsd4_encode_operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ba21e20b 18-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use svcxdr_encode_opaque_pages() in nfsd4_encode_splice_read()

Commit 15b23ef5d348 ("nfsd4: fix corruption of NFSv4 read data")
encountered exactly the same issue: after a splice read, a
filesystem-owned page is left in rq_pages[]; the symptoms are the
same as described there.

If the computed number of pages in nfsd4_encode_splice_read() is not
exactly the same as the actual number of pages that were consumed by
nfsd_splice_actor() (say, because of a bug) then hilarity ensues.

Instead of recomputing the page offset based on the size of the
payload, use rq_next_page, which is already properly updated by
nfsd_splice_actor(), to cause svc_rqst_release_pages() to operate
correctly in every instance.

This is a defensive change since we believe that after commit
27c934dd8832 ("nfsd: don't replace page in rq_pages if it's a
continuation of last page") has been applied, there are no known
opportunities for nfsd_splice_actor() to screw up. So I'm not
marking it for stable backport.

Reported-by: Andy Zlotek <andy.zlotek@oracle.com>
Suggested-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 66a21db7 16-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace encode_cinfo()

De-duplicate "reserve_space; encode_cinfo".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# adaa7a50 16-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add encoders for NFSv4 clientids and verifiers

Deduplicate some common code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 831be973 01-Feb-2023 Christian Brauner <brauner@kernel.org>

xattr: remove unused argument

his helpers is really just used to check for user.* xattr support so
don't make it pointlessly generic.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 15a8b55d 27-Mar-2023 Jeff Layton <jlayton@kernel.org>

nfsd: call op_release, even when op_func returns an error

For ops with "trivial" replies, nfsd4_encode_operation will shortcut
most of the encoding work and skip to just marshalling up the status.
One of the things it skips is calling op_release. This could cause a
memory leak in the layoutget codepath if there is an error at an
inopportune time.

Have the compound processing engine always call op_release, even when
op_func sets an error in op->status. With this change, we also need
nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL
on error to avoid a double free.

Reported-by: Zhi Li <yieli@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403
Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 804d8e0a 31-Mar-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL

OPDESC() simply indexes into nfsd4_ops[] by the op's operation
number, without range checking that value. It assumes callers are
careful to avoid calling it with an out-of-bounds opnum value.

nfsd4_decode_compound() is not so careful, and can invoke OPDESC()
with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end
of nfsd4_ops[].

Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 638e3e7d 07-Sep-2022 Jeff Layton <jlayton@kernel.org>

nfsd: use the getattr operation to fetch i_version

Now that we can call into vfs_getattr to get the i_version field, use
that facility to fetch it instead of doing it in nfsd4_change_attribute.

Neil also pointed out recently that IS_I_VERSION directory operations
are always logged, and so we only need to mitigate the rollback problem
on regular files. Also, we don't need to factor in the ctime when
reexporting NFS or Ceph.

Set the STATX_CHANGE_COOKIE (and BTIME) bits in the request when we're
dealing with a v4 request. Then, instead of looking at IS_I_VERSION when
generating the change attr, look at the result mask and only use it if
STATX_CHANGE_COOKIE is set.

Change nfsd4_change_attribute to only factor in the ctime if it's a
regular file and the fs doesn't advertise STATX_ATTR_CHANGE_MONOTONIC.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>


# 7827c81f 05-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"

The premise that "Once an svc thread is scheduled and executing an
RPC, no other processes will touch svc_rqst::rq_flags" is false.
svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd
threads when determining which thread to wake up next.

Found via KCSAN.

Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cad85337 13-Dec-2022 Jeff Layton <jlayton@kernel.org>

nfsd: fix handling of readdir in v4root vs. mount upcall timeout

If v4 READDIR operation hits a mountpoint and gets back an error,
then it will include that entry in the reply and set RDATTR_ERROR for it
to the error.

That's fine for "normal" exported filesystems, but on the v4root, we
need to be more careful to only expose the existence of dentries that
lead to exports.

If the mountd upcall times out while checking to see whether a
mountpoint on the v4root is exported, then we have no recourse other
than to fail the whole operation.

Cc: Steve Dickson <steved@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777
Reported-by: JianHong Yin <yin-jianhong@163.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>


# e78e274e 02-Dec-2022 Kees Cook <keescook@chromium.org>

NFSD: Avoid clashing function prototypes

When built with Control Flow Integrity, function prototypes between
caller and function declaration must match. These mismatches are visible
at compile time with the new -Wcast-function-type-strict in Clang[1].

There were 97 warnings produced by NFS. For example:

fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict]
[OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The enc/dec callbacks were defined as passing "void *" as the second
argument, but were being implicitly cast to a new type. Replace the
argument with union nfsd4_op_u, and perform explicit member selection
in the function body. There are no resulting binary differences.

Changes were made mechanically using the following Coccinelle script,
with minor by-hand fixes for members that didn't already match their
existing argument name:

@find@
identifier func;
type T, opsT;
identifier ops, N;
@@

opsT ops[] = {
[N] = (T) func,
};

@already_void@
identifier find.func;
identifier name;
@@

func(...,
-void
+union nfsd4_op_u
*name)
{
...
}

@proto depends on !already_void@
identifier find.func;
type T;
identifier name;
position p;
@@

func@p(...,
T name
) {
...
}

@script:python get_member@
type_name << proto.T;
member;
@@

coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0])

@convert@
identifier find.func;
type proto.T;
identifier proto.name;
position proto.p;
identifier get_member.member;
@@

func@p(...,
- T name
+ union nfsd4_op_u *u
) {
+ T name = &u->member;
...
}

@cast@
identifier find.func;
type T, opsT;
identifier ops, N;
@@

opsT ops[] = {
[N] =
- (T)
func,
};

Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# eeadcb75 13-Sep-2022 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Simplify READ_PLUS

Chuck had suggested reverting READ_PLUS so it returns a single DATA
segment covering the requested read range. This prepares the server for
a future "sparse read" function so support can easily be added without
needing to rip out the old READ_PLUS code at the same time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9993a663 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfs4svc_encode_compoundres()

In today's Linux NFS server implementation, the NFS dispatcher
initializes each XDR result stream, and the NFSv4 .pc_func and
.pc_encode methods all use xdr_stream-based encoding. This keeps
rq_res.len automatically updated. There is no longer a need for
the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3fdc5464 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing

Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 06981d56 13-Sep-2022 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data

This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6106d911 07-Sep-2022 Jeff Layton <jlayton@kernel.org>

nfsd: clean up mounted_on_fileid handling

We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7518a3dc 05-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix handling of oversized NFSv4 COMPOUND requests

If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 80e591ce 02-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND

When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
single large COMPOUND that chains a series of LOOKUPs to get to the
pseudo filesystem root directory that is to be mounted. The Linux
NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
not large enough to ensure that this works for paths that are more
than a few components deep.

Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
trigger any re-allocation of the operation array on the server),
increasing this maximum should result in little to no impact.

The ops array can get large now, so allocate it via vmalloc() to
help ensure memory fragmentation won't cause an allocation failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1913cdf5 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace boolean fields in struct nfsd4_copy

Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy()
with an atomic bitop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 87689df6 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Shrink size of struct nfsd4_copy

struct nfsd4_copy is part of struct nfsd4_op, which resides in an
8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 1696, cachelines: 27, members: 5 */
After: /* size: 672, cachelines: 11, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 09426ef2 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Shrink size of struct nfsd4_copy_notify

struct nfsd4_copy_notify is part of struct nfsd4_op, which resides
in an 8-element array.

sizeof(struct nfsd4_op):
Before: /* size: 2208, cachelines: 35, members: 5 */
After: /* size: 1696, cachelines: 27, members: 5 */

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bb4d8427 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: nfserrno(-ENOMEM) is nfserr_jukebox

Suggested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 99b002a1 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_readlink()

Similar changes to nfsd4_encode_readv(), all bundled into a single
patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5e64d85c 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use xdr_pad_size()

Clean up: Use a helper instead of open-coding the calculation of
the XDR pad size.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 071ae99f 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Simplify starting_len

Clean-up: Now that nfsd4_encode_readv() does not have to encode the
EOF or rd_length values, it no longer needs to subtract 8 from
@starting_len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 28d5bc46 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Optimize nfsd4_encode_readv()

write_bytes_to_xdr_buf() is pretty expensive to use for inserting
an XDR data item that is always 1 XDR_UNIT at an address that is
always XDR word-aligned.

Since both the readv and splice read paths encode EOF and maxcount
values, move both to a common code path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 24c7fb85 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add an nfsd4_read::rd_eof field

Refactor: Make the EOF result available in the entire NFSv4 READ
path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c738b218 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up SPLICE_OK in nfsd4_encode_read()

Do the test_bit() once -- this reduces the number of locked-bus
operations and makes the function a little easier to read.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ab04de60 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Optimize nfsd4_encode_fattr()

write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.

However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
because the data item is fixed in size and the buffer destination
address is always word-aligned.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 095a764b 22-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Optimize nfsd4_encode_operation()

write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.
However, it is costly, and here, it is unnecessary because the
data item is fixed in size, the buffer destination address is
always word-aligned, and the destination location is already in
@p.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5b2f3e07 10-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Decode NFSv4 birth time attribute

NFSD has advertised support for the NFSv4 time_create attribute
since commit e377a3e698fb ("nfsd: Add support for the birth time
attribute").

Igor Mammedov reports that Mac OS clients attempt to set the NFSv4
birth time attribute via OPEN(CREATE) and SETATTR if the server
indicates that it supports it, but since the above commit was
merged, those attempts now fail.

Table 5 in RFC 8881 lists the time_create attribute as one that can
be both set and retrieved, but the above commit did not add server
support for clients to provide a time_create attribute. IMO that's
a bug in our implementation of the NFSv4 protocol, which this commit
addresses.

Whether NFSD silently ignores the new birth time or actually sets it
is another matter. I haven't found another filesystem service in the
Linux kernel that enables users or clients to modify a file's birth
time attribute.

This commit reflects my (perhaps incorrect) understanding of whether
Linux users can set a file's birth time. NFSD will now recognize a
time_create attribute but it ignores its value. It clears the
time_create bit in the returned attribute bitmask to indicate that
the value was not used.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute")
Tested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 28df0988 29-Apr-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Use RMW bitops in single-threaded hot paths

I noticed CPU pipeline stalls while using perf.

Once an svc thread is scheduled and executing an RPC, no other
processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are
not needed outside the svc thread scheduler.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e377a3e6 11-Jan-2022 Ondrej Valousek <ondrej.valousek.xm@renesas.com>

nfsd: Add support for the birth time attribute

For filesystems that supports "btime" timestamp (i.e. most modern
filesystems do) we share it via kernel nfsd. Btime support for NFS
client has already been added by Trond recently.

Suggested-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Ondrej Valousek <ondrej.valousek.xm@renesas.com>
[ cel: addressed some whitespace/checkpatch nits ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c306d737 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Deprecate NFS_OFFSET_MAX

NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0cb4d23a 04-Feb-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix the behavior of READ near OFFSET_MAX

Dan Aloni reports:
> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
> the RPC read layers") on the client, a read of 0xfff is aligned up
> to server rsize of 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server
> and it returns an NFS code of EINVAL to the client. The client as
> a result indefinitely retries the request.

The Linux NFS client does not handle NFS?ERR_INVAL, even though all
NFS specifications permit servers to return that status code for a
READ.

Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed
and return a short result. Set the EOF flag in the result to prevent
the client from retrying the READ request. This behavior appears to
be consistent with Solaris NFS servers.

Note that NFSv3 and NFSv4 use u64 offset values on the wire. These
must be converted to loff_t internally before use -- an implicit
type cast is not adequate for this purpose. Otherwise VFS checks
against sb->s_maxbytes do not work properly.

Reported-by: Dan Aloni <dan.aloni@vastdata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cd2e999c 13-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: De-duplicate nfsd4_decode_bitmap4()

Clean up. Trond points out that xdr_stream_decode_uint32_array()
does the same thing as nfsd4_decode_bitmap4().

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1e37d0e5 02-Dec-2021 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

NFSD: Fix inconsistent indenting

Eliminate the follow smatch warning:

fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent
indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c0019b7d 14-Nov-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix exposure in nfsd4_decode_bitmap()

rtm@csail.mit.edu reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().

Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.

Reported by: rtm@csail.mit.edu
Fixes: d1c263a031e8 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 130e2054 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_encode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_encode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fda49441 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_encode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.

Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3b0ebb25 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Save location of NFSv4 COMPOUND status

Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS
dispatcher to pass in the buffer location of the COMPOUND status.
Instead, save that buffer location in struct nfsd4_compoundres.

The compound tag follows immediately after.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c44b31c2 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_decode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_decode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 16c66364 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_decode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.

Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d8b26071 01-Sep-2021 NeilBrown <neilb@suse.de>

NFSD: simplify struct nfsfh

Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
struct) and are accessed using macros like:

#define fh_FOO fh_base.fh_new.fb_FOO

This patch makes the union and struct anonymous, so that "fh_FOO" can be
a name directly within 'struct knfsd_fh' and the #defines aren't needed.

The file handle as a whole is sometimes accessed as "fh_base" or
"fh_base.fh_pad", neither of which are particularly helpful names.
As the struct holding the filehandle is now anonymous, we
cannot use the name of that, so we union it with 'fh_raw' and use that
where the raw filehandle is needed. fh_raw also ensure the structure is
large enough for the largest possible filehandle.

fh_raw is a 'char' array, removing any need to cast it for memcpy etc.

SVCFH_fmt() is simplified using the "%ph" printk format. This
changes the appearance of filehandles in dprintk() debugging, making
them a little more precise.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f2e717d6 30-Sep-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero

RFC3530 notes that the 'dircount' field may be zero, in which case the
recommendation is to ignore it, and only enforce the 'maxcount' field.
In RFC5661, this recommendation to ignore a zero valued field becomes a
requirement.

Fixes: aee377644146 ("nfsd4: fix rd_dircount enforcement")
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fa60ce2c 06-May-2021 Masahiro Yamada <masahiroy@kernel.org>

treewide: remove editor modelines and cruft

The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bddfdbcd 27-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7b723008 17-Dec-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Restore NFSv4 decoding's SAVEMEM functionality

While converting the NFSv4 decoder to use xdr_stream-based XDR
processing, I removed the old SAVEMEM() macro. This macro wrapped
a bit of logic that avoided a memory allocation by recognizing when
the decoded item resides in a linear section of the Receive buffer.
In that case, it returned a pointer into that buffer instead of
allocating a bounce buffer.

The bounce buffer is necessary only when xdr_inline_decode() has
placed the decoded item in the xdr_stream's scratch buffer, which
disappears the next time xdr_inline_decode() is called with that
xdr_stream. That happens only if the data item crosses a page
boundary in the receive buffer, an exceedingly rare occurrence.

Allocating a bounce buffer every time results in a minor performance
regression that was introduced by the recent NFSv4 decoder overhaul.
Let's restore the previous behavior. On average, it saves about 1.5
kmalloc() calls per COMPOUND.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b68f0cbd 10-Dec-2020 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Don't set eof on a truncated READ_PLUS

If the READ_PLUS operation was truncated due to an error, then ensure we
clear the 'eof' flag.

Fixes: 9f0b5792f07d ("NFSD: Encode a full READ_PLUS reply")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 72d78717 10-Dec-2020 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Fixes for nfsd4_encode_read_plus_data()

Ensure that we encode the data payload + padding, and that we truncate
the preallocated buffer to the actual read size.

Fixes: 528b84934eb9 ("NFSD: Add READ_PLUS data support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1631087b 30-Nov-2020 J. Bruce Fields <bfields@redhat.com>

Revert "nfsd4: support change_attr_type attribute"

This reverts commit a85857633b04d57f4524cca0a2bfaf87b2543f9f.

We're still factoring ctime into our change attribute even in the
IS_I_VERSION case. If someone sets the system time backwards, a client
could see the change attribute go backwards. Maybe we can just say
"well, don't do that", but there's some question whether that's good
enough, or whether we need a better guarantee.

Also, the client still isn't actually using the attribute.

While we're still figuring this out, let's just stop returning this
attribute.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b2140338 30-Nov-2020 J. Bruce Fields <bfields@redhat.com>

nfsd: simplify nfsd4_change_info

It doesn't make sense to carry all these extra fields around. Just
make everything into change attribute from the start.

This is just cleanup, there should be no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 70b87f77 30-Nov-2020 J. Bruce Fields <bfields@redhat.com>

nfsd: only call inode_query_iversion in the I_VERSION case

inode_query_iversion() can modify i_version. Depending on the exported
filesystem, that may not be safe. For example, if you're re-exporting
NFS, NFS stores the server's change attribute in i_version and does not
expect it to be modified locally. This has been observed causing
unnecessary cache invalidations.

The way a filesystem indicates that it's OK to call
inode_query_iverson() is by setting SB_I_VERSION.

So, move the I_VERSION check out of encode_change(), where it's used
only in GETATTR responses, to nfsd4_change_attribute(), which is
also called for pre- and post- operation attributes.

(Note we could also pull the NFSEXP_V4ROOT case into
nfsd4_change_attribute() as well. That would actually be a no-op,
since pre/post attrs are only used for metadata-modifying operations,
and V4ROOT exports are read-only. But we might make the change in
the future just for simplicity.)

Reported-by: Daire Byrne <daire@dneg.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5cfc822f 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove macros that are no longer used

Now that all the NFSv4 decoder functions have been converted to
make direct calls to the xdr helpers, remove the unused C macros.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d9b74bda 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_compound()

And clean-up: Now that we have removed the DECODE_TAIL macro from
nfsd4_decode_compound(), we observe that there's no benefit for
nfsd4_decode_compound() to return nfs_ok or nfserr_bad_xdr only to
have its sole caller convert those values to one or zero,
respectively. Have nfsd4_decode_compound() return 1/0 instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3a237b4a 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Make nfsd4_ops::opnum a u32

Avoid passing a "pointer to int" argument to xdr_stream_decode_u32.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2212036c 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_listxattrs()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 403366a7 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_setxattr()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 830c7150 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_xattr_name()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3dfd0b0e 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_clone()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9d32b412 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_seek()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2846bb05 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_offload_status()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f9a953fb 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_copy_notify()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e8febea7 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_copy()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f49e4b4d 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_nl4_server()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6aef27aa 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_fallocate()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0d646784 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_reclaim_complete()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c95f2ec3 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_destroy_clientid()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b7a0c8f6 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_test_stateid()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cf907b11 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_sequence()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 53d70873 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_secinfo_no_name()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 645fcad3 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_layoutreturn()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c8e88e3a 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_layoutget()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5185980d 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_layoutcommit()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 04495971 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_getdeviceinfo()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# aec387d5 01-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_free_stateid()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 94e254af 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_destroy_session()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 81243e3f 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_create_session()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3a3f1fba 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a helper to decode channel_attrs4

De-duplicate some code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 10ff8422 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a helper to decode nfs_impl_id4

Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 523ec6ed 02-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a helper to decode state_protect4_a

Refactor for clarity.

Also, remove a stale comment. Commit ed94164398c9 ("nfsd: implement
machine credential support for some operations") added support for
SP4_MACH_CRED, so state_protect_a is no longer completely ignored.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 547bfeb4 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a separate decoder for ssv_sp_parms

Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2548aa78 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a separate decoder to handle state_protect_ops

Refactor for clarity and de-duplication of code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 571e0451 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_bind_conn_to_session()

A dedicated sessionid4 decoder is introduced that will be used by
other operation decoders in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0f81d960 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_backchannel_ctl()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1a994408 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_cb_sec()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a4a80c15 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_release_lockowner()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 244e2bef 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_write()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 67cd453e 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_verify()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d1ca5514 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_setclientid_confirm()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 92fa6c08 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_setclientid()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 44592fe9 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_setattr()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d0abdae5 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_secinfo()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d12f9045 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_renew()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ba881a0a 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_rename()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b7f5fbf2 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_remove()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0dfaf2a3 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_readdir()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3909c3bc 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_read()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a73bed98 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_putfh()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# dca71651 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_open_downgrade()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 06bee693 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_open_confirm()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 61e5e0b3 31-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_open()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1708e50b 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to decode OPEN's open_claim4 argument

Refactor for clarity.

Note that op_fname is the only instance of an NFSv4 filename stored
in a struct xdr_netobj. Convert it to a u32/char * pair so that the
new nfsd4_decode_filename() helper can be used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b07bebd9 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_share_deny()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9aa62f51 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_share_access()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e6ec04b2 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to decode OPEN's openflag4 argument

Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bf33bab3 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to decode OPEN's createhow4 argument

Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 796dd1c6 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to decode NFSv4 verifiers

This helper will be used to simplify decoders in subsequent
patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3d5877e8 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_lookup()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ca9cf9fc 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_locku()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0a146f04 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_lockt()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7c59deed 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_lock()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8918cc0d 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper for decoding locker4

Refactor for clarity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 144e8269 16-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helpers to decode a clientid4 and an NFSv4 state owner

These helpers will also be used to simplify decoders in subsequent
patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5dcbfabb 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Relocate nfsd4_decode_opaque()

Enable nfsd4_decode_opaque() to be used in more decoders, and
replace the READ* macros in nfsd4_decode_opaque().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5c505d12 04-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_link()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f759eff2 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_getattr()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 95e6482c 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_delegreturn()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 000dfa18 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_create()

A dedicated decoder for component4 is introduced here, which will be
used by other operation decoders in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d1c263a0 02-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_fattr()

Let's be more careful to avoid overrunning the memory that backs
the bitmap array. This requires updating the synopsis of
nfsd4_decode_fattr().

Bruce points out that a server needs to be careful to return nfs_ok
when a client presents bitmap bits the server doesn't support. This
includes bits in bitmap words the server might not yet support.

The current READ* based implementation is good about that, but that
requirement hasn't been documented.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 66f0476c 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 umask attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# dabe9182 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 security label attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1c3eff7e 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 time_set attributes

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 393c31dd 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 owner_group attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9853a5ac 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 owner attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1c8f0ad7 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 mode attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c941a968 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 acl attribute

Refactor for clarity and to move infrequently-used code out of line.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2ac1b9b2 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros that decode the fattr4 size attribute

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 081d53fe 19-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Change the way the expected length of a fattr4 is checked

Because the fattr4 is now managed in an xdr_stream, all that is
needed is to store the initial position of the stream before
decoding the attribute list. Then the actual length of the list
is computed using the final stream position, after decoding is
complete.

No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cbd9abb3 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_commit()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d3d2f381 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_close()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d169a6a9 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace READ* macros in nfsd4_decode_access()

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c1346a12 03-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Replace the internals of the READ_BUF() macro

Convert the READ_BUF macro in nfs4xdr.c from open code to instead
use the new xdr_stream-style decoders already in use by the encode
side (and by the in-kernel NFS client implementation). Once this
conversion is done, each individual NFSv4 argument decoder can be
independently cleaned up to replace these macros with C code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 08281341 21-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add tracepoints in nfsd4_decode/encode_compound()

For troubleshooting purposes, record failures to decode NFSv4
operation arguments and encode operation results.

trace_nfsd_compound_decode_err() replaces the dprintk() call sites
that are embedded in READ_* macros that are about to be removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 788f7183 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add common helpers to decode void args and encode void results

Start off the conversion to xdr_stream by de-duplicating the functions
that decode void arguments and encode void results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 25fef48b 01-Nov-2020 Tom Rix <trix@redhat.com>

NFSD: A semicolon is not needed after a switch statement.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 76e5492b 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders

Have the NFSD encoders annotate the boundaries of every
direct-data-placement eligible result data payload. Then change
svcrdma to use that annotation instead of the xdr->page_len
when handling Write chunks.

For NFSv4 on RDMA, that enables the ability to recognize multiple
result payloads per compound. This is a pre-requisite for supporting
multiple Write chunks per RPC transaction.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 03493bca 10-Jun-2020 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Rename svc_encode_read_payload()

Clean up: "result payload" is a less confusing name for these
payloads. "READ payload" reflects only the NFS usage.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9f0b5792 28-Sep-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Encode a full READ_PLUS reply

Reply to the client with multiple hole and data segments. I use the
result of the first vfs_llseek() call for encoding as an optimization so
we don't have to immediately repeat the call. This also lets us encode
any remaining reply as data if we get an unexpected result while trying
to calculate a hole.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 278765ea 28-Sep-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Return both a hole and a data segment

But only one of each right now. We'll expand on this in the next patch.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2db27992 28-Sep-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Add READ_PLUS hole segment encoding

However, we still only reply to the READ_PLUS call with a single segment
at this time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 528b8493 28-Sep-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

NFSD: Add READ_PLUS data support

This patch adds READ_PLUS support for returning a single
NFS4_CONTENT_DATA segment to the client. This is basically the same as
the READ operation, only with the extra information about data segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cc028a10 02-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Hoist status code encoding into XDR encoder functions

The original intent was presumably to reduce code duplication. The
trade-off was:

- No support for an NFSD proc function returning a non-success
RPC accept_stat value.
- No support for void NFS replies to non-NULL procedures.
- Everyone pays for the deduplication with a few extra conditional
branches in a hot path.

In addition, nfsd_dispatch() leaves *statp uninitialized in the
success path, unlike svc_generic_dispatch().

Address all of these problems by moving the logic for encoding
the NFS status code into the NFS XDR encoders themselves. Then
update the NFS .pc_func methods to return an RPC accept_stat
value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcc46991 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Encoder and decoder functions are always present

nfsd_dispatch() is a hot path. Let's optimize the XDR method calls
for the by-far common case, which is that the XDR methods are indeed
present.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5aff7d08 11-Sep-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Correct type annotations in COPY XDR functions

Squelch some sparse warnings:

/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: expected int status
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1860:16: got restricted __be32
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: warning: incorrect type in return expression (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: expected restricted __be32
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:1862:24: got int status

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b9a49237 11-Sep-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Correct type annotations in user xattr XDR functions

Squelch some sparse warnings:

/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: warning: incorrect type in return expression (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: expected int
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4692:24: got restricted __be32 [usertype]
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: warning: incorrect type in return expression (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: expected int
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4702:32: got restricted __be32 [usertype]
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: expected restricted __be32 [usertype] err
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4739:13: got int
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: expected unsigned int [assigned] [usertype] count
/home/cel/src/linux/linux/fs/nfsd/nfs4xdr.c:4891:15: got restricted __be32 [usertype]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 403217f3 16-Aug-2020 Anna Schumaker <Anna.Schumaker@Netapp.com>

SUNRPC/NFSD: Implement xdr_reserve_space_vec()

Reserving space for a large READ payload requires special handling when
reserving space in the xdr buffer pages. One problem we can have is use
of the scratch buffer, which is used to get a pointer to a contiguous
region of data up to PAGE_SIZE. When using the scratch buffer, calls to
xdr_commit_encode() shift the data to it's proper alignment in the xdr
buffer. If we've reserved several pages in a vector, then this could
potentially invalidate earlier pointers and result in incorrect READ
data being sent to the client.

I get around this by looking at the amount of space left in the current
page, and never reserve more than that for each entry in the read
vector. This lets us place data directly where it needs to go in the
buffer pages.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e2a1840e 12-Aug-2020 Alex Dewar <alex.dewar90@gmail.com>

nfsd: Remove unnecessary assignment in nfs4xdr.c

In nfsd4_encode_listxattrs(), the variable p is assigned to at one point
but this value is never used before p is reassigned. Fix this.

Addresses-Coverity: ("Unused value")
Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4cce11fa 17-Aug-2020 Alex Dewar <alex.dewar90@gmail.com>

nfsd: Fix typo in comment

Missing "is".

Signed-off-by: Alex Dewar <alex.dewar90@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0e885e84 23-Jun-2020 Frank van der Linden <fllinden@amazon.com>

nfsd: add fattr support for user extended attributes

Check if user extended attributes are supported for an inode,
and return the answer when being queried for file attributes.

An exported filesystem can now signal its RFC8276 user extended
attributes capability.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 23e50fe3 23-Jun-2020 Frank van der Linden <fllinden@amazon.com>

nfsd: implement the xattr functions and en/decode logic

Implement the main entry points for the *XATTR operations.

Add functions to calculate the reply size for the user extended attribute
operations, and implement the XDR encode / decode logic for these
operations.

Add the user extended attributes operations to nfsd4_ops.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 874c7b8e 23-Jun-2020 Frank van der Linden <fllinden@amazon.com>

nfsd: split off the write decode code into a separate function

nfs4_decode_write has code to parse incoming XDR write data in to
a kvec head, and a list of pages.

Put this code in to a separate function, so that it can be used
later by the xattr code, for setxattr. No functional change.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7dcf4ab9 02-Mar-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_readv

Address some minor nits I noticed while working on this function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 41205539 02-Mar-2020 Chuck Lever <chuck.lever@oracle.com>

nfsd: Fix NFSv4 READ on RDMA when using readv

svcrdma expects that the payload falls precisely into the xdr_buf
page vector. This does not seem to be the case for
nfsd4_encode_readv().

This code is called only when fops->splice_read is missing or when
RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
common cases.

Add new transport method: ->xpo_read_payload so that when a READ
payload does not fit exactly in rq_res's page vector, the XDR
encoder can inform the RPC transport exactly where that payload is,
without the payload's XDR pad.

That way, when a Write chunk is present, the transport knows what
byte range in the Reply message is supposed to be matched with the
chunk.

Note that the Linux NFS server implementation of NFS/RDMA can
currently handle only one Write chunk per RPC-over-RDMA message.
This simplifies the implementation of this fix.

Fixes: b04209806384 ("nfsd4: allow exotic read compounds")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7627d7dc 19-Feb-2020 Scott Mayhew <smayhew@redhat.com>

nfsd: set the server_scope during service startup

Currently, nfsd4_encode_exchange_id() encodes the utsname nodename
string in the server_scope field. In a multi-host container
environemnt, if an nfsd container is restarted on a different host than
it was originally running on, clients will see a server_scope mismatch
and will not attempt to reclaim opens.

Instead, set the server_scope while we're in a process context during
service startup, so we get the utsname nodename of the current process
and store that in nfsd_net.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[bfields: fix up major_id too]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e4598e38 31-Oct-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: use timespec64 in encode_time_delta

The values in encode_time_delta are always small and don't
overflow the range of 'struct timespec', so changing it has
no effect.

Change it to timespec64 as a prerequisite for removing the
timespec definition later.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc1b2065 17-Dec-2019 Aditya Pakki <pakki001@umn.edu>

nfsd: remove unnecessary assertion in nfsd4_encode_replay

The replay variable is set in the only caller of nfsd4_encode_replay.
The assertion is unnecessary and the patch removes this check.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 51911868 08-Aug-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD COPY_NOTIFY xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 84e1b21d 13-Sep-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD add ca_source_server<> to COPY

Decode the ca_source_server list that's sent but only use the
first one. Presence of non-zero list indicates an "inter" copy.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 6c2d4798 30-Oct-2019 Al Viro <viro@zeniv.linux.org.uk>

new helper: lookup_positive_unlocked()

Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 19a1aad8 27-Sep-2019 YueHaibing <yuehaibing@huawei.com>

nfsd: remove set but not used variable 'len'

Fixes gcc '-Wunused-but-set-variable' warning:

fs/nfsd/nfs4xdr.c: In function nfsd4_encode_splice_read:
fs/nfsd/nfs4xdr.c:3464:7: warning: variable len set but not used [-Wunused-but-set-variable]

It is not used since commit 83a63072c815 ("nfsd: fix nfs read eof detection")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 83a63072 26-Aug-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: fix nfs read eof detection

Currently, the knfsd server assumes that a short read indicates an
end of file. That assumption is incorrect. The short read means that
either we've hit the end of file, or we've hit a read error.

In the case of a read error, the client may want to retry (as per the
implementation recommendations in RFC1813 and RFC7530), but currently it
is being told that it hit an eof.

Move the code to detect eof from version specific code into the generic
nfsd read.

Report eof only in the two following cases:
1) read() returns a zero length short read with no error.
2) the offset+length of the read is >= the file size.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b86e3aa 28-Aug-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: eliminate an unnecessary acl size limit

We're unnecessarily limiting the size of an ACL to less than what most
filesystems will support. Some users do hit the limit and it's
confusing and unnecessary.

It still seems prudent to impose some limit on the number of ACEs the
client gives us before passing it straight to kmalloc(). So, let's just
limit it to the maximum number that would be possible given the amount
of data left in the argument buffer.

That will still leave one limit beyond whatever the filesystem imposes:
the client and server negotiate a limit on the size of a request, which
we have to respect.

But we're no longer imposing any additional arbitrary limit.

struct nfs4_ace is 20 bytes on my system and the maximum call size we'll
negotiate is about a megabyte, so in practice this is limiting the
allocation here to about a megabyte.

Reported-by: "de Vandiere, Louis" <louis.devandiere@atos.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed992753 18-Aug-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Fix the documentation for svcxdr_tmpalloc()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b96811cd 18-Aug-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Fix up some unused variable warnings

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5c4583b2 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 79123444 04-Jun-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: decode implementation id

Decode the implementation ID and display in nfsd/clients/#/info. It may
be help identify the client. It won't be used otherwise.

(When this went into the protocol, I thought the implementation ID would
be a slippery slope towards implementation-specific workarounds as with
the http user-agent. But I guess I was wrong, the risk seems pretty low
now.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 30498dcc 04-Jun-2019 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove outdated nfsd4_decode_time comment

Commit bf8d909705e "nfsd: Decode and send 64bit time values" fixed the
code without updating the comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bdba5368 05-Jun-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: use 64-bit seconds fields in nfsd v4 code

After commit 95582b008388 "vfs: change inode times to use struct
timespec64" there are spots in the NFSv4 decoding where we decode the
protocol into a struct timeval and then convert that into a timeval64.

That's unnecesary in the NFSv4 case since the on-the-wire protocol also
uses 64-bit values. So just fix up our code to use timeval64 everywhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e45d1a18 08-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: knfsd must use the container user namespace

Convert knfsd to use the user namespace of the container that started
the server processes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0ab88ca4 22-Mar-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: avoid uninitialized variable warning

clang warns that 'contextlen' may be accessed without an initialization:

fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
contextlen);
^~~~~~~~~~
fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
int contextlen;
^
= 0

Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
Adding another #ifdef like the other two in this function
avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e0639dc5 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD introduce async copy feature

Upon receiving a request for async copy, create a new kthread. If we
get asynchronous request, make sure to copy the needed arguments/state
from the stack before starting the copy. Then start the thread and reply
back to the client indicating copy is asynchronous.

nfsd_copy_file_range() will copy in a loop over the total number of
bytes is needed to copy. In case a failure happens in the middle, we
ignore the error and return how much we copied so far. Once done
creating a workitem for the callback workqueue and send CB_OFFLOAD with
the results.

The lifetime of the copy stateid is bound to the vfs copy. This way we
don't need to keep the nfsd_net structure for the callback. We could
keep it around longer so that an OFFLOAD_STATUS that came late would
still get results, but clients should be able to deal without that.

We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
calling kthread_stop.

A client should cancel any ongoing copies before calling DESTROY_CLIENT;
if not, we return a CLIENT_BUSY error.

If the client is destroyed for some other reason (lease expiration, or
server shutdown), we must clean up any ongoing copies ourselves.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[colin.king@canonical.com: fix leak in error case]
[bfields@fieldses.org: remove signalling, merge patches]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 885e2bf3 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD OFFLOAD_CANCEL xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6308bc98 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD OFFLOAD_STATUS xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ed96bc5 22-Jul-2018 nixiaoming <nixiaoming@huawei.com>

fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id

READ_BUF(8);
dummy = be32_to_cpup(p++);
dummy = be32_to_cpup(p++);
...
READ_BUF(4);
dummy = be32_to_cpup(p++);

Assigning value to "dummy" here, but that stored value
is overwritten before it can be used.
At the same time READ_BUF() will re-update the pointer p.

delete invalid assignment statements

Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a8585763 25-Apr-2018 J. Bruce Fields <bfields@redhat.com>

nfsd4: support change_attr_type attribute

The change attribute is what is used by clients to revalidate their
caches. Our server may use i_version or ctime for that purpose. Those
choices behave slightly differently, and it may be useful to the client
to know which we're using. This attribute tells the client that. The
Linux client doesn't yet use this attribute yet, though.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 16945141 25-Apr-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: fix NFSv4 time_delta attribute

Currently we return the worst-case value of 1 second in the time delta
attribute. That's not terribly useful. Instead, return a value
calculated from the time granularity supported by the filesystem and the
system clock.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3171822f 08-Jun-2018 Scott Mayhew <smayhew@redhat.com>

nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo

When running a fuzz tester against a KASAN-enabled kernel, the following
splat periodically occurs.

The problem occurs when the test sends a GETDEVICEINFO request with a
malformed xdr array (size but no data) for gdia_notify_types and the
array size is > 0x3fffffff, which results in an overflow in the value of
nbytes which is passed to read_buf().

If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
the overflow occurs, the value of nbytes 0, and when that happens the
pointer returned by read_buf() points to the end of the xdr data (i.e.
argp->end) when really it should be returning NULL.

Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
value is arbitrary, but it's the same threshold used by
nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
expected to get at most a single bitmap in gdia_notify_types).

[ 119.256854] ==================================================================
[ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538

[ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
[ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[ 119.261202] Call Trace:
[ 119.262265] dump_stack+0x71/0xab
[ 119.263371] print_address_description+0x6a/0x270
[ 119.264609] kasan_report+0x258/0x380
[ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
[ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
[ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd]
[ 119.274918] svc_process+0x161c/0x31a0 [sunrpc]
[ 119.276172] ? svc_printk+0x190/0x190 [sunrpc]
[ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc]
[ 119.278622] nfsd+0x2b9/0x430 [nfsd]
[ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
[ 119.281157] kthread+0x2db/0x390
[ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 119.283756] ret_from_fork+0x35/0x40

[ 119.286041] Allocated by task 436:
[ 119.287525] kasan_kmalloc+0xa0/0xd0
[ 119.288685] kmem_cache_alloc+0xe9/0x1f0
[ 119.289900] get_empty_filp+0x7b/0x410
[ 119.291037] path_openat+0xca/0x4220
[ 119.292242] do_filp_open+0x182/0x280
[ 119.293411] do_sys_open+0x216/0x360
[ 119.294555] do_syscall_64+0xa0/0x2f0
[ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 119.298068] Freed by task 436:
[ 119.299271] __kasan_slab_free+0x130/0x180
[ 119.300557] kmem_cache_free+0x78/0x210
[ 119.301823] rcu_process_callbacks+0x35b/0xbd0
[ 119.303162] __do_softirq+0x192/0x5ea

[ 119.305443] The buggy address belongs to the object at ffff880113ada000
which belongs to the cache filp of size 256
[ 119.308556] The buggy address is located 0 bytes inside of
256-byte region [ffff880113ada000, ffff880113ada100)
[ 119.311376] The buggy address belongs to the page:
[ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
[ 119.314428] flags: 0x17ffe000000100(slab)
[ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
[ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
[ 119.319050] page dumped because: kasan: bad access detected

[ 119.321652] Memory state around the buggy address:
[ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.327547] ^
[ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 119.331740] ==================================================================

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95582b00 08-May-2018 Deepa Dinamani <deepa.kernel@gmail.com>

vfs: change inode times to use struct timespec64

struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
current_time ( ... )
{
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
...
- return timespec_trunc(
+ return timespec64_trunc(
... );
}

@ depends on patch @
identifier xtime;
@@
struct \( iattr \| inode \| kstat \) {
...
- struct timespec xtime;
+ struct timespec64 xtime;
...
}

@ depends on patch @
identifier t;
@@
struct inode_operations {
...
int (*update_time) (...,
- struct timespec t,
+ struct timespec64 t,
...);
...
}

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
|
node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
stat->xtime = node2->i_xtime1;
|
stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>


# 9c2ece6e 07-May-2018 Scott Mayhew <smayhew@redhat.com>

nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir

nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when
estimating the size of the readdir reply, but nfsd_encode_readdir
restricts it to INT_MAX when encoding the reply. This can result in log
messages like "kernel: RPC request reserved 32896 but used 1049444".

Restrict rd_dircount similarly (no reason it should be larger than
svc_max_payload).

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 880a3a53 21-Mar-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: fix incorrect umasks

We're neglecting to clear the umask after it's set, which can cause a
later unrelated rpc to (incorrectly) use the same umask if it happens to
be processed by the same thread.

There's a more subtle problem here too:

An NFSv4 compound request is decoded all in one pass before any
operations are executed.

Currently we're setting current->fs->umask at the time we decode the
compound. In theory a single compound could contain multiple creates
each setting a umask. In that case we'd end up using whichever umask
was passed in the *last* operation as the umask for all the creates,
whether that was correct or not.

So, we should just be saving the umask at decode time and waiting to set
it until we actually process the corresponding operation.

In practice it's unlikely any client would do multiple creates in a
single compound. And even if it did they'd likely be from the same
process (hence carry the same umask). So this is a little academic, but
we should get it right anyway.

Fixes: 47057abde515 (nfsd: add support for the umask attribute)
Cc: stable@vger.kernel.org
Reported-by: Lucash Stach <l.stach@pengutronix.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87c5942e 28-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Add I/O trace points in the NFSv4 read proc

NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# edcc8452 07-Mar-2018 J. Bruce Fields <bfields@redhat.com>

nfsd: remove unsused "cp_consecutive" field

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2285ae76 22-Jan-2018 Arnd Bergmann <arnd@arndb.de>

NFSD: hide unused svcxdr_dupstr()

There is now only one caller left for svcxdr_dupstr() and this is inside
of an #ifdef, so we can get a warning when the option is disabled:

fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function]

This changes the remaining caller to use a nicer IS_ENABLED() check,
which lets the compiler drop the unused code silently.

Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders")
Suggested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 39ca1bf6 03-Jan-2018 Amir Goldstein <amir73il@gmail.com>

nfsd: store stat times in fill_pre_wcc() instead of inode times

The time values in stat and inode may differ for overlayfs and stat time
values are the correct ones to use. This is also consistent with the fact
that fill_post_wcc() also stores stat time values.

This means introducing a stat call that could fail, where previously we
were just copying values out of the inode. To be conservative about
changing behavior, we fall back to copying values out of the inode in
the error case. It might be better just to clear fh_pre_saved (though
note the BUG_ON in set_change_info).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0078117c 14-Nov-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: return RESOURCE not GARBAGE_ARGS on too many ops

A client that sends more than a hundred ops in a single compound
currently gets an rpc-level GARBAGE_ARGS error.

It would be more helpful to return NFS4ERR_RESOURCE, since that gives
the client a better idea how to recover (for example by splitting up the
compound into smaller compounds).

This is all a bit academic since we've never actually seen a reason for
clients to send such long compounds, but we may as well fix it.

While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
constant we already use in the 4.1 case, instead of hard-coding 100.
Chances anyone actually uses even 16 ops per compound are small enough
that I think there's a neglible risk or any regression.

This fixes pynfs test COMP6.

Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# eae03e2a 18-Aug-2017 Chuck Lever <chuck.lever@oracle.com>

nfsd: Incoming xdr_bufs may have content in tail buffer

Since the beginning, svcsock has built a received RPC Call message
by populating the xdr_buf's head, then placing the remaining
message bytes in the xdr_buf's page list. The xdr_buf's tail is
never populated.

This means that an NFSv4 COMPOUND containing an NFS WRITE operation
plus trailing operations has a page list that contains the WRITE
data payload followed by the trailing operations. NFSv4 XDR decoders
will not look in the xdr_buf's tail, ever, because svcsock never put
anything there.

To support transports that can pass the write payload in the
xdr_buf's pagelist and trailing content in the xdr_buf's tail,
introduce logic in READ_BUF that switches to the xdr_buf's tail vec
when the decoder runs out of content in rq_arg.pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c1df609d 01-Aug-2017 Chuck Lever <chuck.lever@oracle.com>

nfsd: Const-ify NFSv4 encoding and decoding ops arrays

Close an attack vector by moving the arrays of encoding and decoding
methods to read-only memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bac966d6 06-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: individual encoders no longer see error cases

With a few exceptions, most individual encoders don't handle error
cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7571e4c 06-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: skip encoder in trivial error cases

Most encoders do nothing in the error case. But they can still screw
things up in that case: most errors happen very early in rpc processing,
possibly before argument fields are filled in and bounds-tested, so
encoders that do anything other than immediately bail on error can
easily crash in odd error cases.

So just handle errors centrally most of the time to remove the chance of
error.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 34b1744c 05-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: define ->op_release for compound ops

Run a separate ->op_release function if necessary instead of depending
on the xdr encoder to do this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f4f9ef4a 06-Jul-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: opdesc will be useful outside nfs4proc.c

Trivial cleanup, no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc788f64 18-Aug-2017 Chuck Lever <chuck.lever@oracle.com>

nfsd: Limit end of page list when decoding NFSv4 WRITE

When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.

More critical, nfsd4_decode_write is failing to increment argp->pagelen
when it increments argp->pagelist. This can cause later xdr decoders
to assume more data is available than really is, which can cause server
crashes on malformed requests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d16d1867 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# cc6acc20 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1150ded8 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_release callbacks

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 630458e7 11-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: factor ctime into change attribute

Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.

Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems. For now, nfsd
is the only user.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 63f8de37 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 026fec7e 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8537488b 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_release callbacks

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# b26b78cb 09-May-2017 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Fix up the "supattr_exclcreat" attributes

If an NFSv4 client asks us for the supattr_exclcreat, then we must
not return attributes that are unsupported by this minor version.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 75976de6556f ("NFSD: Return word2 bitmask if setting security..,")
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f961e3f2 05-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: encoders mustn't use unitialized values in error cases

In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.

This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.

GETDEVICEINFO has the same problem.

Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a528d35e 31-Jan-2017 David Howells <dhowells@redhat.com>

statx: Add a system call to make enhanced file info available

Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode. This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included. The
following have been included:

(1) Make the fields a consistent size on all arches and make them large.

(2) Spare space, request flags and information flags are provided for
future expansion.

(3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
__s64).

(4) Creation time: The SMB protocol carries the creation time, which could
be exported by Samba, which will in turn help CIFS make use of
FS-Cache as that can be used for coherency data (stx_btime).

This is also specified in NFSv4 as a recommended attribute and could
be exported by NFSD [Steve French].

(5) Lightweight stat: Ask for just those details of interest, and allow a
netfs (such as NFS) to approximate anything not of interest, possibly
without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
Dilger] (AT_STATX_DONT_SYNC).

(6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
its cached attributes are up to date [Trond Myklebust]
(AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

(7) Data version number: Could be used by userspace NFS servers [Aneesh
Kumar].

Can also be used to modify fill_post_wcc() in NFSD which retrieves
i_version directly, but has just called vfs_getattr(). It could get
it from the kstat struct if it used vfs_xgetattr() instead.

(There's disagreement on the exact semantics of a single field, since
not all filesystems do this the same way).

(8) BSD stat compatibility: Including more fields from the BSD stat such
as creation time (st_btime) and inode generation number (st_gen)
[Jeremy Allison, Bernd Schubert].

(9) Inode generation number: Useful for FUSE and userspace NFS servers
[Bernd Schubert].

(This was asked for but later deemed unnecessary with the
open-by-handle capability available and caused disagreement as to
whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

(No particular data were offered, but things like last backup
timestamp, the data version number and the DOS archive bit would come
into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
filesystem can now say it doesn't support a standard stat feature if
that isn't available, so if, for instance, inode numbers or UIDs don't
exist or are fabricated locally...

(This requires a separate system call - I have an fsinfo() call idea
for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
struct xstat [Steve French].

(Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
granularity of each of the times (NFSv4 time_delta) [Steve French].

(Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
Note that the Linux IOC flags are a mess and filesystems such as Ext4
define flags that aren't in linux/fs.h, so translation in the kernel
may be a necessity (or, possibly, we provide the filesystem type too).

(Some attributes are made available in stx_attributes, but the general
feeling was that the IOC flags were to ext[234]-specific and shouldn't
be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
Michael Kerrisk].

(Deferred, probably to fsinfo. Finding out if there's an ACL or
seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

(A __reserved field has been left in the statx_timestamp struct for
this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

int ret = statx(int dfd,
const char *filename,
unsigned int flags,
unsigned int mask,
struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat(). There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

(1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
respect.

(2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
its attributes with the server - which might require data writeback to
occur to get the timestamps correct.

(3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
network filesystem. The resulting values should be considered
approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller. The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat(). It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data. This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

struct statx_timestamp {
__s64 tv_sec;
__s32 tv_nsec;
__s32 __reserved;
};

struct statx {
__u32 stx_mask;
__u32 stx_blksize;
__u64 stx_attributes;
__u32 stx_nlink;
__u32 stx_uid;
__u32 stx_gid;
__u16 stx_mode;
__u16 __spare0[1];
__u64 stx_ino;
__u64 stx_size;
__u64 stx_blocks;
__u64 __spare1[1];
struct statx_timestamp stx_atime;
struct statx_timestamp stx_btime;
struct statx_timestamp stx_ctime;
struct statx_timestamp stx_mtime;
__u32 stx_rdev_major;
__u32 stx_rdev_minor;
__u32 stx_dev_major;
__u32 stx_dev_minor;
__u64 __spare2[14];
};

The defined bits in request_mask and stx_mask are:

STATX_TYPE Want/got stx_mode & S_IFMT
STATX_MODE Want/got stx_mode & ~S_IFMT
STATX_NLINK Want/got stx_nlink
STATX_UID Want/got stx_uid
STATX_GID Want/got stx_gid
STATX_ATIME Want/got stx_atime{,_ns}
STATX_MTIME Want/got stx_mtime{,_ns}
STATX_CTIME Want/got stx_ctime{,_ns}
STATX_INO Want/got stx_ino
STATX_SIZE Want/got stx_size
STATX_BLOCKS Want/got stx_blocks
STATX_BASIC_STATS [The stuff in the normal stat struct]
STATX_BTIME Want/got stx_btime{,_ns}
STATX_ALL [All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution. Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does. The following
attributes map to FS_*_FL flags and are the same numerical value:

STATX_ATTR_COMPRESSED File is compressed by the fs
STATX_ATTR_IMMUTABLE File is marked immutable
STATX_ATTR_APPEND File is append-only
STATX_ATTR_NODUMP File is not to be dumped
STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

STATX_ATTR_AUTOMOUNT Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

(0) stx_dev_*, stx_blksize.

These are local system information and are always available.

(1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
stx_size, stx_blocks.

These will be returned whether the caller asks for them or not. The
corresponding bits in stx_mask will be set to indicate whether they
actually have valid values.

If the caller didn't ask for them, then they may be approximated. For
example, NFS won't waste any time updating them from the server,
unless as a byproduct of updating something requested.

If the values don't actually exist for the underlying object (such as
UID or GID on a DOS file), then the bit won't be set in the stx_mask,
even if the caller asked for the value. In such a case, the returned
value will be a fabrication.

Note that there are instances where the type might not be valid, for
instance Windows reparse points.

(2) stx_rdev_*.

This will be set only if stx_mode indicates we're looking at a
blockdev or a chardev, otherwise will be 0.

(3) stx_btime.

Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output. Firstly, an NFS directory that crosses to
another FSID. Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

[root@andromeda ~]# /tmp/test-statx -A /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:26 Inode: 1703937 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000
Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

[root@andromeda ~]# /tmp/test-statx /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:27 Inode: 2 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7323f0d2 03-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Reserve adequate space for LOCKT operation

After tightening the OP_LOCKT reply size estimate, we can get warnings
like:

[11512.783519] RPC request reserved 124 but used 152
[11512.813624] RPC request reserved 108 but used 136

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b8800921 29-Jan-2017 NeilBrown <neilb@suse.com>

NFSDv4: use export cache flushtime for changeid on V4ROOT objects.

If you change the set of filesystems that are exported, then
the contents of various directories in the NFSv4 pseudo-root
is likely to change. However the change-id of those
directories is currently tied to the underlying directory,
so the client may not see the changes in a timely fashion.

This patch changes the change-id number to be derived from the
"flush_time" of the export cache. Whenever any changes are
made to the set of exported filesystems, this flush_time is
updated. The result is that clients see changes to the set
of exported filesystems much more quickly, often immediately.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 32ddd944 02-Jan-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: opt in to labeled nfs per export

Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
individual file labels as they're set on the server. This is not what
they've previously seen, and not appropriate in may cases. (In
particular, if clients have heterogenous security policies then one
client's labels may not even make sense to another.) Labeled NFS should
be opted in only in those cases when the administrator knows it makes
sense.

It's helpful to be able to turn 4.2 on by default, and otherwise the
protocol upgrade seems free of regressions. So, default labeled NFS to
off and provide an export flag to reenable it.

Users wanting labeled NFS support on an export will henceforth need to:

- make sure 4.2 support is enabled on client and server (as
before), and
- upgrade the server nfs-utils to a version supporting the new
"security_label" export flag.
- set that "security_label" flag on the export.

This is commit may be seen as a regression to anyone currently depending
on security labels. We believe those cases are currently rare.

Reported-by: tibbs@math.uh.edu
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5cf23dbb 11-Jan-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: constify nfsd_suppatttrs

To keep me from accidentally writing to this again....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 54bbb7d2 31-Dec-2016 Kinglong Mee <kinglongmee@gmail.com>

NFSD: pass an integer for stable type to nfsd_vfs_write

After fae5096ad217 "nfsd: assume writeable exportabled filesystems have
f_sync" we no longer modify this argument.

This is just cleanup, no change in functionality.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcd20869 11-Jan-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: fix supported attributes for acl & labels

Oops--in 916d2d844afd I moved some constants into an array for
convenience, but here I'm accidentally writing to that array.

The effect is that if you ever encounter a filesystem lacking support
for ACLs or security labels, then all queries of supported attributes
will report that attribute as unsupported from then on.

Fixes: 916d2d844afd "nfsd: clean up supported attribute handling"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 47057abd 12-Jan-2016 Andreas Gruenbacher <agruenba@redhat.com>

nfsd: add support for the umask attribute

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory. That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more
details.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fd4a0edf 09-Dec-2016 Miklos Szeredi <mszeredi@redhat.com>

vfs: replace calling i_op->readlink with vfs_readlink()

Also check d_is_symlink() in callers instead of inode->i_op->readlink
because following patches will allow NULL ->readlink for symlinks.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# e864c189 10-Jun-2016 J. Bruce Fields <bfields@redhat.com>

nfsd: catch errors in decode_fattr earlier

3c8e03166ae2 "NFSv4: do exact check about attribute specified" fixed
some handling of unsupported-attribute errors, but it also delayed
checking for unwriteable attributes till after we decode them. This
could lead to odd behavior in the case a client attemps to set an
attribute we don't know about followed by one we try to parse. In that
case the parser for the known attribute will attempt to parse the
unknown attribute. It should fail in some safe way, but the error might
at least be incorrect (probably bad_xdr instead of inval). So, it's
better to do that check at the start.

As far as I know this doesn't cause any problems with current clients
but it might be a minor issue e.g. if we encounter a future client that
supports a new attribute that we currently don't.

Cc: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 916d2d84 18-Oct-2016 J. Bruce Fields <bfields@redhat.com>

nfsd: clean up supported attribute handling

Minor cleanup, no change in behavior.

Provide helpers for some common attribute bitmap operations. Drop some
comments that just echo the code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 29ae7f9d 07-Sep-2016 Anna Schumaker <Anna.Schumaker@netapp.com>

NFSD: Implement the COPY call

I only implemented the sync version of this call, since it's the
easiest. I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bec782b4 22-Sep-2016 Jeff Layton <jlayton@kernel.org>

nfsd: fix dprintk in nfsd4_encode_getdeviceinfo

nfserr is big-endian, so we should convert it to host-endian before
printing it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8a4c3926 10-Jul-2016 Jeff Layton <jlayton@kernel.org>

nfsd: allow nfsd to advertise multiple layout types

If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.

Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.

Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed941643 14-Jun-2016 Andrew Elble <aweits@rit.edu>

nfsd: implement machine credential support for some operations

This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ac503e4a 22-Mar-2016 Benjamin Coddington <bcodding@redhat.com>

nfsd: use short read as well as i_size to set eof

Use the result of a local read to determine when to set the eof flag. This
allows us to return the location of the end of the file atomically at the
time of the read.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[bfields: add some documentation]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4ce85c8c 01-Mar-2016 Chuck Lever <chuck.lever@oracle.com>

nfsd: Update NFS server comments related to RDMA support

The server does indeed now support NFSv4.1 on RDMA transports. It
does not support shifting an RDMA-capable TCP transport (such as
iWARP) to RDMA mode.

Reported-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4aed9c46 29-Feb-2016 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix bad bounds checking

A number of spots in the xdr decoding follow a pattern like

n = be32_to_cpup(p++);
READ_BUF(n + 4);

where n is a u32. The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher. I'm not sure exactly what the consequences
are, but we've seen crashes soon after.

Instead, just break these up into two READ_BUF()s.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bbddca8e 07-Jan-2016 NeilBrown <neilb@suse.de>

nfsd: don't hold i_mutex over userspace upcalls

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir. If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about. We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd. But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex. So we could just drop the i_mutex elsewhere and do
something like

mutex_lock()
lookup_one_len()
mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ffa0160a 02-Dec-2015 Christoph Hellwig <hch@lst.de>

nfsd: implement the NFSv4.2 CLONE operation

This is basically a remote version of the btrfs CLONE operation,
so the implementation is fairly trivial. Made even more trivial
by stealing the XDR code and general framework Anna Schumaker's
COPY prototype.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 75976de6 30-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Return word2 bitmask if setting security label in OPEN/CREATE

Security label can be set in OPEN/CREATE request, nfsd should set
the bitmask in word2 if setting success.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7d580722 30-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.

The encode order should be as the bitmask defined order.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6896f15a 30-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug

Currently we'll respond correctly to a request for either
FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both
attributes simultaneously.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0a2050d7 30-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Store parent's stat in a separate value

After commit ae7095a7c4 (nfsd4: helper function for getting mounted_on
ino) we ignore the return value from get_parent_attributes().

Also, the following FATTR4_WORD2_LAYOUT_BLKSIZE uses stat.blksize, so to
avoid overwriting that, use an independent value for the parent's
attributes.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c2227a39 06-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem

On an absent filesystem (one served by another server), we need to be
able to handle requests for certain attributest (like fs_locations, so
the client can find out which server does have the filesystem), but
others we can't.

We forgot to take that into account when adding another attribute
bitmask work for the SECURITY_LABEL attribute.

There an export entry with the "refer" option can result in:

[ 88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249!
[ 88.414828] invalid opcode: 0000 [#1] SMP
[ 88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd]
[ 88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 #1
[ 88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000
[ 88.419729] RIP: 0010:[<ffffffffa04b3c10>] [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[ 88.420376] RSP: 0000:ffff8800785db998 EFLAGS: 00010206
[ 88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980
[ 88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000
[ 88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000
[ 88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a
[ 88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980
[ 88.424295] FS: 0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
[ 88.424944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0
[ 88.426285] Stack:
[ 88.426921] ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0
[ 88.427585] ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0
[ 88.428228] ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980
[ 88.428877] Call Trace:
[ 88.429527] [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd]
[ 88.430168] [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0
[ 88.430807] [<ffffffff8123833e>] ? d_lookup+0x2e/0x60
[ 88.431449] [<ffffffff81236133>] ? dput+0x33/0x230
[ 88.432097] [<ffffffff8123f214>] ? mntput+0x24/0x40
[ 88.432719] [<ffffffff812272b2>] ? path_put+0x22/0x30
[ 88.433340] [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd]
[ 88.433954] [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd]
[ 88.434601] [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd]
[ 88.435172] [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd]
[ 88.435710] [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[ 88.436447] [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd]
[ 88.437011] [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd]
[ 88.437566] [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd]
[ 88.438157] [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd]
[ 88.438680] [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc]
[ 88.439192] [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc]
[ 88.439694] [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd]
[ 88.440194] [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd]
[ 88.440697] [<ffffffff810bb728>] kthread+0xd8/0xf0
[ 88.441260] [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[ 88.441762] [<ffffffff81789e58>] ret_from_fork+0x58/0x90
[ 88.442322] [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180
[ 88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe
[ 88.444052] RIP [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd]
[ 88.444658] RSP <ffff8800785db998>
[ 88.445232] ---[ end trace 6cb9d0487d94a29f ]---

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 68e8bb03 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: wrap too long lines in nfsd4_encode_read

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 96bcad50 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: fput rd_file from XDR encode context

Remove the hack where we fput the read-specific file in generic code.
Instead we can do it in nfsd4_encode_read as that gets called for all
error cases as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# af90f707 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: take struct file setup fully into nfs4_preprocess_stateid_op

This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e749a462 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: clean up raparams handling

Refactor the raparam hash helpers to just deal with the raparms,
and keep opening/closing files separate from that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0c9d65e7 24-Apr-2015 Andreas Gruenbacher <andreas.gruenbacher@gmail.com>

nfsd: Checking for acl support does not require fetching any acls

Whether or not a file system supports acls can be determined with
IS_POSIXACL(inode) and does not require trying to fetch any acls; the code for
computing the supported_attrs and aclsupport attributes can be simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6e4891dc 03-Apr-2015 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix READ permission checking

In the case we already have a struct file (derived from a stateid), we
still need to do permission-checking; otherwise an unauthorized user
could gain access to a file by sniffing or guessing somebody else's
stateid.

Cc: stable@vger.kernel.org
Fixes: dc97618ddda9 "nfsd4: separate splice and readv cases"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b0143b5 17-Mar-2015 David Howells <dhowells@redhat.com>

VFS: normal filesystems (and lustre): d_inode() annotations

that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1ec8c0c4 28-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Remove duplicate macro define for max sec label length

NFS4_MAXLABELLEN has defined for sec label max length, use it directly.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b77a4b2e 15-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using path_equal() for checking two paths

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 376675da 22-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Take care the return value from nfsd4_encode_stateid

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# db59c0ef 19-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Take care the return value from nfsd4_decode_stateid

Return status after nfsd4_decode_stateid failed.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9cf514cc 05-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS operations

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4c94e13e 21-Jan-2015 Christoph Hellwig <hch@lst.de>

nfsd: factor out a helper to decode nfstime4 values

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0ec016e3 19-Dec-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: tweak rd_dircount accounting

RFC 3530 14.2.24 says

This value represents the length of the names of the directory
entries and the cookie value for these entries. This length
represents the XDR encoding of the data (names and cookies)...

The "xdr encoding" of the name should probably include the 4 bytes for
the length.

But this is all just a hint so not worth e.g. backporting to stable.

Also reshuffle some lines to more clearly group together the
dircount-related code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bf7491f1 07-Dec-2014 Benjamin Coddington <bcodding@redhat.com>

nfsd4: fix xdr4 count of server in fs_location4

Fix a bug where nfsd4_encode_components_esc() incorrectly calculates the
length of server array in fs_location4--note that it is a count of the
number of array elements, not a length in bytes.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 082d4bd72a45 (nfsd4: "backfill" using write_bytes_to_xdr_buf)
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5a64e569 07-Dec-2014 Benjamin Coddington <bcodding@redhat.com>

nfsd4: fix xdr4 inclusion of escaped char

Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as
an additional string encoding.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Fixes: e7a0444aef4a "nfsd: add IPv6 addr escaping to fs_location hosts"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 779fb0f3 19-Nov-2014 Jeff Layton <jlayton@kernel.org>

sunrpc: move rq_splice_ok flag into rq_flags

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a455589f 21-Oct-2014 Al Viro <viro@zeniv.linux.org.uk>

assorted conversions to %p[dD]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# b0cb9085 07-Nov-2014 Anna Schumaker <Anna.Schumaker@Netapp.com>

nfsd: Add DEALLOCATE support

DEALLOCATE only returns a status value, meaning we can use the noop()
xdr encoder to reply to the client.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95d871f0 07-Nov-2014 Anna Schumaker <Anna.Schumaker@Netapp.com>

nfsd: Add ALLOCATE support

The ALLOCATE operation is used to preallocate space in a file. I can do
this by using vfs_fallocate() to do the actual preallocation.

ALLOCATE only returns a status indicator, so we don't need to write a
special encode() function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 15b23ef5 24-Sep-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix corruption of NFSv4 read data

The calculation of page_ptr here is wrong in the case the read doesn't
start at an offset that is a multiple of a page.

The result is that nfs4svc_encode_compoundres sets rq_next_page to a
value one too small, and then the loop in svc_free_res_pages may
incorrectly fail to clear a page pointer in rq_respages[].

Pages left in rq_respages[] are available for the next rpc request to
use, so xdr data may be written to that page, which may hold data still
waiting to be transmitted to the client or data in the page cache.

The observed result was silent data corruption seen on an NFSv4 client.

We tag this as "fixing" 05638dc73af2 because that commit exposed this
bug, though the incorrect calculation predates it.

Particular thanks to Andrea Arcangeli and David Gilbert for analysis and
testing.

Fixes: 05638dc73af2 "nfsd4: simplify server xdr->next_page use"
Cc: stable@vger.kernel.org
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 24bab491 26-Sep-2014 Anna Schumaker <Anna.Schumaker@netapp.com>

NFSD: Implement SEEK

This patch adds server support for the NFS v4.2 operation SEEK, which
returns the position of the next hole or data segment in a file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87a15a80 26-Sep-2014 Anna Schumaker <Anna.Schumaker@netapp.com>

NFSD: Add generic v4.2 infrastructure

It's cleaner to introduce everything at once and have the server reply
with "not supported" than it would be to introduce extra operations when
implementing a specific one in the middle of the list.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# aee37764 20-Aug-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix rd_dircount enforcement

Commit 3b299709091b "nfsd4: enforce rd_dircount" totally misunderstood
rd_dircount; it refers to total non-attribute bytes returned, not number
of directory entries returned.

Bring the code into agreement with RFC 3530 section 14.2.24.

Cc: stable@vger.kernel.org
Fixes: 3b299709091b "nfsd4: enforce rd_dircount"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f7b43d0c 12-Aug-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: reserve adequate space for LOCK op

As of 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space", we permit the server to process a LOCK operation even if
there might not be space to return the conflicting lockowner, because
we've made returning the conflicting lockowner optional.

However, the rpc server still wants to know the most we might possibly
return, so we need to take into account the possible conflicting
lockowner in the svc_reserve_space() call here.

Symptoms were log messages like "RPC request reserved 88 but used 108".

Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space"
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1383bf37 11-Aug-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove obsolete comment

We do what Neil suggests now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58fb12e6 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache

We don't want to rely on the client_mutex for protection in the case of
NFSv4 open owners. Instead, we add a mutex that will only be taken for
NFSv4.0 state mutating operations, and that will be released once the
entire compound is done.

Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay
take a reference to the stateowner when they are using it for NFSv4.0
open and lock replay caching.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f98bac5a 07-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix crash encoding lock reply on 32-bit

Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.

Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).

(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the

memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));

in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5d6031ca 17-Jul-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: zero op arguments beyond the 8th compound op

The first 8 ops of the compound are zeroed since they're a part of the
argument that's zeroed by the

memset(rqstp->rq_argp, 0, procp->pc_argsize);

in svc_process_common(). But we handle larger compounds by allocating
the memory on the fly in nfsd4_decode_compound(). Other than code
recently fixed by 01529e3f8179 "NFSD: Fix memory leak in encoding denied
lock", I don't know of any examples of code depending on this
initialization. But it definitely seems possible, and I'd rather be
safe.

Compounds this long are unusual so I'm much more worried about failure
in this poorly tested cases than about an insignificant performance hit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5d5c304 09-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix bad checking of space for padding in splice read

Note that the caller has already reserved space for count and eof, so
xdr->p has already moved past them, only the padding remains.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes dc97618ddd (nfsd4: separate splice and readv cases)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 01529e3f 07-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix memory leak in encoding denied lock

Commit 8c7424cff6 (nfsd4: don't try to encode conflicting owner if low on space)
forgot free conf->data in nfsd4_encode_lockt and before sign conf->data to NULL
in nfsd4_encode_lock_denied.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b607664e 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Cleanup nfs4svc_encode_compoundres

Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c
instead of open coding in nfs4svc_encode_compoundres.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1055414f 29-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Avoid warning message when compile at i686 arch

fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv':
>> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default]
thislen = min(len, ((void *)xdr->end - (void *)xdr->p));

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5e23383 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: replace defer_free by svcxdr_tmpalloc

Avoid an extra allocation for the tmpbuf struct itself, and stop
ignoring some allocation failures.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bcaab953 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove nfs4_acl_new

This is a not-that-useful kmalloc wrapper. And I'd like one of the
callers to actually use something other than kmalloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 29c353b3 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: define svcxdr_dupstr to share some common code

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ce043ac8 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove unused defer_free argument

28e05dd8457c "knfsd: nfsd4: represent nfsv4 acl with array instead of
linked list" removed the last user that wanted a custom free function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7fb84306 24-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: rename cr_linkname->cr_data

The name of a link is currently stored in cr_name and cr_namelen, and
the content in cr_linkname and cr_linklen. That's confusing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b829e919 19-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: fix rare symlink decoding bug

An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c3a45617 05-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Fix bad reserving space for encoding rdattr_error

Introduced by commit 561f0ed498 (nfsd4: allow large readdirs).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 69bbd9c7 26-Jun-2014 Avi Kivity <avi@cloudius-systems.com>

nfs: fix nfs4d readlink truncated packet

XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.

Fix by taking the padding into consideration when truncating the packet.

Symptoms:

# ll /mnt/
ls: cannot read symbolic link /mnt/test: Input/output error
total 4
-rw-r--r--. 1 root root 0 Jun 14 01:21 123456
lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test
drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp
drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 76f47128 19-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: fix rare symlink decoding bug

An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.

The vfs, on the other hand, wants null-terminated data.

The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.

The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.

But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.

Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.

In the NFSv2/v3 cases this actually appears to be safe:

- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.

In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.

The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.

Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c7aa15d 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using min/max/min_t/max_t for calculate

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f41c5ad2 13-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: fix bug for readdir of pseudofs

Commit 561f0ed498ca (nfsd4: allow large readdirs) introduces a bug
about readdir the root of pseudofs.

Call xdr_truncate_encode() revert encoded name when skipping.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 542d1ab3 01-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill READ64

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 06553991 01-Jun-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill READ32

While we're here, let's kill off a couple of the read-side macros.

Leaving the more complicated ones alone for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# da2ebce6 30-May-2014 Jeff Layton <jlayton@kernel.org>

nfsd: make nfsd4_encode_fattr static

sparse says:

CHECK fs/nfsd/nfs4xdr.c
fs/nfsd/nfs4xdr.c:2043:1: warning: symbol 'nfsd4_encode_fattr' was not declared. Should it be static?

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 12337901 28-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer

Note nobody's ever noticed because the typical client probably never
requests FILES_AVAIL without also requesting something else on the list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 94eb3689 23-May-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Adds macro EX_UUID_LEN for exports uuid's length

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a5cddc88 12-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: better reservation of head space for krb5

RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d05d5744 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill write32, write64

And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0c0c267b 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill WRITEMEM

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b64c7f3b 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill WRITE64

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c373b0a4 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: kill WRITE32

These macros just obscure what's going on. Adopt the convention of the
client-side code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c8f13d97 08-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: really fix nfs4err_resource in 4.1 case

encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.

(Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space. The result was to trade one illegal error
for another in those cases. We decided that was helpful, so reverted
the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only
reinstating it now that we've elimited almost all of those cases.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0420980 18-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow exotic read compounds

I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fec25fa4 13-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: more read encoding cleanup

More cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 34a78b48 13-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: read encoding cleanup

Trivial cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dc97618d 18-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: separate splice and readv cases

The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.

It is probably clearer to separate the two cases entirely.

There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0e35fda 04-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: turn off zero-copy-read in exotic cases

We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 476a7b1f 20-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't treat readlink like a zero-copy operation

There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle. (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3b299709 20-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: enforce rd_dircount

As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 561f0ed4 20-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow large readdirs

Currently we limit readdir results to a single page. This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 47ee5298 12-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: adjust buflen to session channel limit

We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 30596768 19-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix buflen calculation after read encoding

We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages. We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 89ff884e 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: nfsd4_check_resp_size should check against whole buffer

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ff9897d 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: minor encode_read cleanup

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4f0cefbf 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: more precise nfsd4_max_reply

It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8c7424cf 09-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't try to encode conflicting owner if low on space

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long. In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f5236013 21-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: convert 4.1 replay encoding

Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads. That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2825a7f9 26-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow encoding across page boundaries

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a8095f7e 11-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: size-checking cleanup

Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ea8d7720 08-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove redundant encode buffer size checking

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 67492c99 08-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: nfsd4_check_resp_size needn't recalculate length

We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4e21ac4b 22-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: reserve space before inlining 0-copy pages

Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding. (xdr_truncate_encode doesn't handle
that case.) So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d0a381dd 30-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: teach encoders to handle reserve_space failures

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
more protocol.
- BUG_ON or page faulting on failure seems overly fragile.
- Especially in the 4.1 case, we prefer not to fail compounds
just because the returned result came *close* to session
limits. (Though perfect enforcement here may be difficult.)
- I'd prefer encoding to be uniform for all encoders instead of
having special exceptions for encoders containing, for
example, attributes.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 082d4bd7 29-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: "backfill" using write_bytes_to_xdr_buf

Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to. We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1fcea5b2 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: use xdr_truncate_encode

Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ac90391 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: keep xdr buf length updated

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dd97fdde 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: no need for encode_compoundres to adjust lengths

xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f46d382a 31-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove ADJUST_ARGS

It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d3f627c8 26-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: use xdr_stream throughout compound encoding

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ddd1ea56 27-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: use xdr_reserve_space in attribute encoding

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres. However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far. We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5f4ab945 07-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow space for final error return

This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 07d1f802 06-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix encoding of out-of-space replies

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5184658 26-Aug-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

Just change the nfsd4_encode_getattr api. Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4aea24b2 15-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: embed xdr_stream in nfsd4_compoundres

This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e372ba60 18-May-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: decoding errors can still be cached and require space

Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding. I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns. Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc208d02 09-Apr-2014 J. Bruce Fields <bfields@redhat.com>

Revert "nfsd4: fix nfs4err_resource in 4.1 case"

Since we're still limiting attributes to a page, the result here is that
a large getattr result will return NFS4ERR_REP_TOO_BIG/TOO_BIG_TO_CACHE
instead of NFS4ERR_RESOURCE.

Both error returns are wrong, and the real bug here is the arbitrary
limit on getattr results, fixed by as-yet out-of-tree patches. But at a
minimum we can make life easier for clients by sticking to one broken
behavior in released kernels instead of two....

Trond says:

one immediate consequence of this patch will be that NFSv4.1
clients will now report EIO instead of EREMOTEIO if they hit the
problem. That may make debugging a little less obvious.

Another consequence will be that if we ever do try to add client
side handling of NFS4ERR_REP_TOO_BIG, then we now have to deal
with the “handle existing buggy server” syndrome.

Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 18df11d0 09-Mar-2014 Yan, Zheng <zheng.z.yan@intel.com>

nfsd4: fix memory leak in nfsd4_encode_fattr()

fh_put() does not free the temporary file handle.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1bc49d83 10-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix nfs4err_resource in 4.1 case

encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case. And
in the 4.1 case, if we ran out of buffer space, we should have exceeded
a session limit too.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1bed92cb 20-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove redundant check from nfsd4_check_resp_size

cstate->slot and ->session are each set together in nfsd4_sequence. If
one is non-NULL, so is the other.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 067e1ace 21-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: update comments with obsolete function name

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e874f9f8 10-Mar-2014 Jeff Layton <jlayton@kernel.org>

svcrpc: explicitly reject compounds that are not padded out to 4-byte multiple

We have a WARN_ON in the nfsd4_decode_write() that tells us when the
client has sent a request that is not padded out properly according to
RFC4506. A WARN_ON really isn't appropriate in this case though since
this indicates a client bug, not a server one.

Move this check out to the top-level compound decoder and have it just
explicitly return an error. Also add a dprintk() that shows the client
address and xid to help track down clients and frames that trigger it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a11fcce1 03-Feb-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix test_stateid error reply encoding

If the entire operation fails then there's nothing to encode.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 798df338 29-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: make set of large acl return efbig, not resource

If a client attempts to set an excessively large ACL, return
NFS4ERR_FBIG instead of NFS4ERR_RESOURCE. I'm not sure FBIG is correct,
but I'm positive RESOURCE is wrong (it isn't even a well-defined error
any more for NFS versions since 4.1).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# de3997a7 28-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: buffer-length check for SUPPATTR_EXCLCREAT

This was an omission from 8c18f2052e756e7d5dea712fc6e7ed70c00e8a39
"nfsd41: SUPPATTR_EXCLCREAT attribute".

Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d50e6136 14-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: decrease nfsd4_encode_fattr stack usage

A struct svc_fh is 320 bytes on x86_64, it'd be better not to have these
on the stack.

kmalloc'ing them probably isn't ideal either, but this is the simplest
thing to do. If it turns out to be a problem in the readdir case then
we could add a svc_fh to nfsd4_readdir and pass that in.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3554116d 08-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify xdr encoding of nfsv4 names

We can simplify the idmapping code if it does its own encoding and
returns nfs errors.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87915c64 16-Jan-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: encode_rdattr_error cleanup

There's a simpler way to write this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6b6d8137 16-Jan-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: nfsd4_encode_fattr cleanup

Remove some pointless goto's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dfeecc82 09-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: get rid of unused macro definition

Since defined in Linux-2.6.12-rc2, READTIME has not been used.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# eba1c99c 09-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: clean up unnecessary temporary variable in nfsd4_decode_fattr

host_err was only used for nfs4_acl_new.
This patch delete it, and return nfserr_jukebox directly.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 43212cc7 09-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: using nfsd4_encode_noop for encoding destroy_session/free_stateid

Get rid of the extra code, using nfsd4_encode_noop for encoding destroy_session and free_stateid.
And, delete unused argument (fr_status) int nfsd4_free_stateid.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a9f7b4a0 09-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: clean up an xdr reserved space calculation

We should use XDR_LEN to calculate reserved space in case the oid is not
a multiple of 4.

RESERVE_SPACE actually rounds up for us, but it's probably better to be
careful here.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a8bb84bc 10-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

nfsd: calculate the missing length of bitmap in EXCHANGE_ID

commit 58cd57bfd9db3bc213bf9d6a10920f82095f0114
"nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID"
miss calculating the length of bitmap for spo_must_enforce and spo_must_allow.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2d8498db 20-Nov-2013 Christoph Hellwig <hch@infradead.org>

nfsd: start documenting some XDR handling functions

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 365da4ad 19-Nov-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix xdr decoding of large non-write compounds

This fixes a regression from 247500820ebd02ad87525db5d9b199e5b66f6636
"nfsd4: fix decoding of compounds across page boundaries". The previous
code was correct: argp->pagelist is initialized in
nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a
pointer to the page *after* the page we are currently decoding.

The reason that patch nevertheless fixed a problem with decoding
compounds containing write was a bug in the write decoding introduced by
5a80a54d21c96590d013378d8c5f65f879451ab4 "nfsd4: reorganize write
decoding", after which write decoding no longer adhered to the rule that
argp->pagelist point to the next page.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# aea240f4 14-Nov-2013 Christoph Hellwig <hch@infradead.org>

nfsd: export proper maximum file size to the client

I noticed that we export a way to high value for the maxfilesize
attribute when debugging a client issue. The issue didn't turn
out to be related to it, but I think we should export it, so that
clients can limit what write sizes they accept before hitting
the server.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ff40dec 05-Nov-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: improve write performance with better sendspace reservations

Currently the rpc code conservatively refuses to accept rpc's from a
client if the sum of its worst-case estimates of the replies it owes
that client exceed the send buffer space.

Unfortunately our estimate of the worst-case reply for an NFSv4 compound
is always the maximum read size. This can unnecessarily limit the
number of operations we handle concurrently, for example in the case
most operations are writes (which have small replies).

We can do a little better if we check which ops the compound contains.

This is still a rough estimate, we'll need to improve on it some day.

Reported-by: Shyam Kaushik <shyamnfs1@gmail.com>
Tested-by: Shyam Kaushik <shyamnfs1@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3378b7f4 01-Nov-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix discarded security labels on setattr

Security labels in setattr calls are currently ignored because we forget
to set label->len.

Cc: stable@vger.kernel.org
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8217d146 30-Oct-2013 Anna Schumaker <bjschuma@netapp.com>

NFSD: Add support for NFS v4.2 operation checking

The server does allow NFS over v4.2, even if it doesn't add any new
operations yet.

I also switch to using constants to represent the last operation for
each minor version since this makes the code cleaner and easier to
understand at a quick glance.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e1a90ebd 30-Oct-2013 Anna Schumaker <bjschuma@netapp.com>

NFSD: Combine decode operations for v4 and v4.1

We were using a different array of function pointers to represent each
minor version. This makes adding a new minor version tedious, since it
needs a step to copy, paste and modify a new version of the same
functions.

This patch combines the v4 and v4.1 arrays into a single instance and
will check minor version support inside each decoder function.

Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 301f0268 01-Sep-2013 Al Viro <viro@zeniv.linux.org.uk>

nfsd: racy access to ->d_name in nsfd4_encode_path()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 58cd57bf 05-Aug-2013 Weston Andros Adamson <dros@netapp.com>

nfsd: Fix SP4_MACH_CRED negotiation in EXCHANGE_ID

- don't BUG_ON() when not SP4_NONE
- calculate recv and send reserve sizes correctly

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f0f51f5c 18-Jun-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow destroy_session over destroyed session

RFC 5661 allows a client to destroy a session using a compound
associated with the destroyed session, as long as the DESTROY_SESSION op
is the last op of the compound.

We attempt to allow this, but testing against a Solaris client (which
does destroy sessions in this way) showed that we were failing the
DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
count on the session (held by us) represented another rpc in progress
over this session.

Fix this by noting that in this case the expected reference count is 1,
not 0.

Also, note as long as the session holds a reference to the compound
we're destroying, we can't free it here--instead, delay the free till
the final put in nfs4svc_encode_compoundres.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 590b7431 21-Jun-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: minor read_buf cleanup

The code to step to the next page seems reasonably self-contained.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 24750082 21-Jun-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix decoding of compounds across page boundaries

A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
A network trace showed the server returning BAD_XDR on the final getattr
of a getattr+write+getattr compound. The final getattr started on a
page boundary.

I believe the Linux client ignores errors on the post-write getattr, and
that that's why we haven't seen this before.

Cc: stable@vger.kernel.org
Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57569a70 17-May-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow client to send no cb_sec flavors

In testing I notice that some of the pynfs tests forget to send any
cb_sec flavors, and that we haven't necessarily errored out in that case
before.

I'll fix pynfs, but am also inclined to default to trying AUTH_NONE in
that case in case this is something clients actually do.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57266a6e 13-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement minimal SP4_MACH_CRED

Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
the client-provided spo_must_* arrays and just enforcing credential
checks for the minimum required operations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ba4e55bb 15-May-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix compile in !CONFIG_NFSD_V4_SECURITY_LABEL case

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 18032ca0 02-May-2013 David Quigley <dpquigl@davequigley.com>

NFSD: Server implementation of MAC Labeling

Implement labeled NFS on the server: encoding and decoding, and writing
and reading, of file labels.

Enabled with CONFIG_NFSD_V4_SECURITY_LABEL.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4bdc33ed 02-May-2013 Steve Dickson <steved@redhat.com>

NFSDv4.2: Add NFS v4.2 support to the NFS server

This enables NFSv4.2 support for the server. To enable this
code do the following:
echo "+4.2" >/proc/fs/nfsd/versions

after the nfsd kernel module is loaded.

On its own this does nothing except allow the server to respond to
compounds with minorversion set to 2. All the new NFSv4.2 features are
optional, so this is perfectly legal.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 676e4ebd 30-Apr-2013 Chuck Lever <chuck.lever@oracle.com>

NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly

If nfsd4_do_encode_secinfo() can't find GSS info that matches an
export security flavor, it assumes the flavor is not a GSS
pseudoflavor, and simply puts it on the wire.

However, if this XDR encoding logic is given a legitimate GSS
pseudoflavor but the RPC layer says it does not support that
pseudoflavor for some reason, then the server leaks GSS pseudoflavor
numbers onto the wire.

I confirmed this happens by blacklisting rpcsec_gss_krb5, then
attempted a client transition from the pseudo-fs to a Kerberos-only
share. The client received a flavor list containing the Kerberos
pseudoflavor numbers, rather than GSS tuples.

The encoder logic can check that each pseudoflavor in flavs[] is
less than MAXFLAVOR before writing it into the buffer, to prevent
this. But after "nflavs" is written into the XDR buffer, the
encoder can't skip writing flavor information into the buffer when
it discovers the RPC layer doesn't support that flavor.

So count the number of valid flavors as they are written into the
XDR buffer, then write that count into a placeholder in the XDR
buffer when all recognized flavors have been encoded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed9411a0 30-Apr-2013 Chuck Lever <chuck.lever@oracle.com>

NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bf8d9097 19-Apr-2013 Bryan Schumaker <bjschuma@netapp.com>

nfsd: Decode and send 64bit time values

The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9aeb5aee 16-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove unused macro

Cleanup a piece I forgot to remove in
9411b1d4c7df26dca6bc6261b5dc87a5b4c81e5c "nfsd4: cleanup handling of
nfsv4.0 closed stateid's".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9411b1d4 01-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup handling of nfsv4.0 closed stateid's

Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 221a6876 01-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't destroy in-use clients

When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.

The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.

This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.

The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.

I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0a9d3ab 07-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix race on client shutdown

Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL. An xpt_user
callback that encounters such a session may then crash:

[ 303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[ 303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] PGD 37811067 PUD 3d498067 PMD 0
[ 303.959061] Oops: 0002 [#8] PREEMPT SMP
[ 303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[ 303.959061] CPU 0
[ 303.959061] Pid: 264, comm: nfsd Tainted: G D 3.8.0-ARCH+ #156 Bochs Bochs
[ 303.959061] RIP: 0010:[<ffffffff81481a8e>] [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP: 0018:ffff880037877dd8 EFLAGS: 00010202
[ 303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[ 303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[ 303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[ 303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[ 303.959061] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 303.959061] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[ 303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[ 303.959061] Stack:
[ 303.959061] ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[ 303.959061] ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[ 303.959061] 0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[ 303.959061] Call Trace:
[ 303.959061] [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[ 303.959061] [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[ 303.959061] [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[ 303.959061] [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[ 303.959061] [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[ 303.959061] [<ffffffff8107ae00>] kthread+0xc0/0xd0
[ 303.959061] [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[ 303.959061] RIP [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP <ffff880037877dd8>
[ 303.959061] CR2: 0000000000000318
[ 304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[ 304.001903] note: nfsd[264] exited with preempt_count 2

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9d313b17 28-Feb-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: handle seqid-mutating open errors from xdr decoding

If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a77c806f 16-Mar-2013 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Refactor nfsd4_do_encode_secinfo()

Clean up. This matches a similar API for the client side, and
keeps ULP fingers out the of the GSS mech switch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 64a817cf 26-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: reject "negative" acl lengths

Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3dadecce 24-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

switch vfs_getattr() to struct path

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 03bc6d1c 02-Feb-2013 Eric W. Biederman <ebiederm@xmission.com>

nfsd: Modify nfsd4_cb_sec to use kuids and kgids

Change uid and gid in struct nfsd4_cb_sec to be of type kuid_t and
kgid_t.

In nfsd4_decode_cb_sec when reading uids and gids off the wire convert
them to kuids and kgids, and if they don't convert to valid kuids or
valid kuids ignore RPC_AUTH_UNIX and don't fill in any of the fields.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# ab8e4aee 02-Feb-2013 Eric W. Biederman <ebiederm@xmission.com>

nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion

In struct nfs4_ace remove the member who and replace it with an
anonymous union holding who_uid and who_gid. Allowing typesafe
storage uids and gids.

Add a helper pace_gt for sorting posix_acl_entries.

In struct posix_user_ace_state to replace uid with a union
of kuid_t uid and kgid_t gid.

Remove all initializations of the deprecated posic_acl_entry
e_id field. Which is not present when user namespaces are enabled.

Split find_uid into two functions find_uid and find_gid that work
in a typesafe manner.

In nfs4xdr update nfsd4_encode_fattr to deal with the changes
in struct nfs4_ace.

Rewrite nfsd4_encode_name to take a kuid_t and a kgid_t instead
of a generic id and flag if it is a group or a uid. Replace
the group flag with a test for a valid gid.

Modify nfsd4_encode_user to take a kuid_t and call the modifed
nfsd4_encode_name.

Modify nfsd4_encode_group to take a kgid_t and call the modified
nfsd4_encode_name.

Modify nfsd4_encode_aclname to take an ace instead of taking the
fields of an ace broken out. This allows it to detect if the ace is
for a user or a group and to pass the appropriate value while still
being typesafe.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 84822d0b 14-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify nfsd4_encode_fattr interface slightly

It seems slightly simpler to make nfsd4_encode_fattr rather than its
callers responsible for advancing the write pointer on success.

(Also: the count == 0 check in the verify case looks superfluous.
Running out of buffer space is really the only reason fattr encoding
should fail with eresource.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# afc59400 10-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup: replace rq_resused count by rq_next_page pointer

It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5f50b0c 04-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix oops on unusual readlike compound

If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e5f95703 30-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: discard some unused nfsd4_verify xdr code

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3d733711 27-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make NFSv4 lease time per net

Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a36b1725 25-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: return badname, not inval, on "." or "..", or "/"

The spec requires badname, not inval, in these cases.

Some callers want us to return enoent, but I can see no justification
for that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ffe1137b 15-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: delay filling in write iovec array till after xdr decoding

Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK. No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation. In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op. So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them. So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time. I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 70cc7f75 16-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: move more write parameters into xdr argument

In preparation for moving some of this elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5a80a54d 16-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: reorganize write decoding

In preparation for moving some of it elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8a61b18c 16-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify reading of opnum

The comment here is totally bogus:
- OP_WRITE + 1 is RELEASE_LOCKOWNER. Maybe there was some older
version of the spec in which that served as a sort of
OP_ILLEGAL? No idea, but it's clearly wrong now.
- In any case, I can't see that the spec says anything about
what to do if the client sends us less ops than promised.
It's clearly nutty client behavior, and we should do
whatever's easiest: returning an xdr error (even though it
won't be consistent with the error on the last op returned)
seems fine to me.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 447bfcc9 16-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: no, we're not going to check tags for utf8

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 12fc3e92 05-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: backchannel should use client-provided security flavor

For now this only adds support for AUTH_NULL. (Previously we assumed
AUTH_UNIX.) We'll also need AUTH_GSS, which is trickier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cb73a9f4 01-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement backchannel_ctl operation

This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# acb2887e 27-Mar-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up callback security parsing

Move the callback parsing into a separate function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ae7095a7 01-Oct-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: helper function for getting mounted_on ino

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6e67b5d1 13-Sep-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix bind_conn_to_session xdr comment

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2930d381 05-Jun-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: our filesystems are normally case sensitive

Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e7a0444a 24-Apr-2012 Weston Andros Adamson <dros@netapp.com>

nfsd: add IPv6 addr escaping to fs_location hosts

The fs_location->hosts list is split on colons, but this doesn't work when
IPv6 addresses are used (they contain colons).
This patch adds the function nfsd4_encode_components_esc() to
allow the caller to specify escape characters when splitting on 'sep'.
In order to fix referrals, this patch must be used with the mountd patch
that similarly fixes IPv6 [] escaping.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 45eaa1c1 25-Apr-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix change attribute endianness

Though actually this doesn't matter much, as NFSv4.0 clients are
required to treat the change attribute as opaque.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d1829b38 25-Apr-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix free_stateid return endianness

Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57b7b43b 25-Apr-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: int/__be32 fixes

In each of these cases there's a simple unambiguous correct choice, and
no actual bug.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2355c596 25-Apr-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix missing "static"

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# afcf6792 12-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()

PTR_ERR(NULL) is going to be 0...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 02f5fde5 12-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

nfsd: fix endianness breakage in TEST_STATEID handling

->ts_id_status gets nfs errno, i.e. it's already big-endian; no need
to apply htonl() to it. Broken by commit 174568 (NFSD: Added TEST_STATEID
operation) last year...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ab4684d1 02-Mar-2012 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix nfs4_verifier memory alignment

Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

We can fix most of this by using a __be32 array to generate the
verifier's contents and then byte-copying it into the verifier field.

However, there is one spot where there is a backwards compatibility
constraint: the do_nfsd_create() call expects a verifier which is
32-bit aligned. Fix this spot by forcing the alignment of the create
verifier in the nfsd4_open args structure.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier. The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d24433cd 16-Feb-2012 Benny Halevy <benny@tonian.com>

nfsd41: implement NFS4_SHARE_WANT_NO_DELEG, NFS4_OPEN_DELEGATE_NONE_EXT, why_no_deleg

Respect client request for not getting a delegation in NFSv4.1
Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT
and WND4_NOT_WANTED reason.

[nfsd41: add missing break when encoding op_why_no_deleg]
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 03cfb420 27-Jan-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Clean up the test_stateid function

When I initially wrote it, I didn't understand how lists worked so I
wrote something that didn't use them. I think making a list of stateids
to test is a more straightforward implementation, especially compared to
especially compared to decoding stateids while simultaneously encoding
a reply to the client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c8bd7e0 16-Feb-2012 Benny Halevy <benny@tonian.com>

nfsd41: split out share_access want and signal flags while decoding

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 73e79482 13-Feb-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: rearrange struct nfsd4_slot

Combine two booleans into a single flag field, move the smaller fields
to the end.

(In practice this doesn't make the struct any smaller. But we'll be
adding another flag here soon.)

Remove some debugging code that doesn't look useful, while we're in the
neighborhood.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 67114fe6 17-Nov-2011 Thomas Meyer <thomas@m3y3r.de>

nfsd4: Use kmemdup rather than duplicating its implementation

The semantic patch that makes this change is available
in scripts/coccinelle/api/memdup.cocci.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc0d14fe 27-Oct-2011 Benny Halevy <bhalevy@tonian.com>

nfsd4: typo logical vs bitwise negate in nfsd4_decode_share_access

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 345c2842 20-Oct-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfs41: implement DESTROY_CLIENTID operation

According to rfc5661 18.50, implement DESTROY_CLIENTID operation.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 92bac8c5 19-Oct-2011 Benny Halevy <bhalevy@tonian.com>

nfsd4: typo logical vs bitwise negate for want_mask

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c668fc6d 19-Oct-2011 Benny Halevy <bhalevy@tonian.com>

nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED

RFC5661 says:
The client may set one or both of
OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and
OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED.

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8b289b2c 19-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement new 4.1 open reclaim types

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 01cd4afa 17-Oct-2011 Dan Carpenter <dan.carpenter@oracle.com>

nfsd4: typo logical vs bitwise negate

This should be a bitwise negate here. It silences a Sparse warning:
fs/nfsd/nfs4xdr.c:693:16: warning: dubious: x & !y

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a084daf5 10-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move name-length checks to xdr

Again, these checks are better in the xdr code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 04f9e664 10-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move access/deny validity checks to xdr code

I'd rather put more of these sorts of checks into standardized xdr
decoders for the various types rather than have them cluttering up the
core logic in nfs4proc.c and nfs4state.c.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 38c2f4b1 23-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: look up stateid's per clientid

Use a separate stateid idr per client, and lookup a stateid by first
finding the client, then looking up the stateid relative to that client.

Also some minor refactoring.

This allows us to improve error returns: we can return expired when the
clientid is not found and bad_stateid when the clientid is found but not
the stateid, as opposed to returning expired for both cases.

I hope this will also help to replace the state lock mostly by a
per-client lock, but that hasn't been done yet.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 36279ac1 25-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: assume test_stateid always has session

Test_stateid is 4.1-only and only allowed after a sequence operation, so
this check is unnecessary.

Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 38c387b5 16-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: match close replays on stateid, not open owner id

Keep around an unhashed copy of the final stateid after the last close
using an openowner, and when identifying a replay, match against that
stateid instead of just against the open owner id. Free it the next
time the seqid is bumped or the stateowner is destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58e7b33a 28-Aug-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd41: try to check reply size before operation

For checking the size of reply before calling a operation,
we need try to get maxsize of the operation's reply.

v3: using new method as Bruce said,

"we could handle operations in two different ways:

- For operations that actually change something (write, rename,
open, close, ...), do it the way we're doing it now: be
very careful to estimate the size of the response before even
processing the operation.
- For operations that don't change anything (read, getattr, ...)
just go ahead and do the operation. If you realize after the
fact that the response is too large, then return the error at
that point.

So we'd add another flag to op_flags: say, OP_MODIFIES_SOMETHING. And for
operations with OP_MODIFIES_SOMETHING set, we'd do the first thing. For
operations without it set, we'd do the second."

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: crash, don't attempt to handle, undefined op_rsize_bop]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed748aac 12-Sep-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSD: Cleanup for nfsd4_path()

The current code is sort of hackish in that it assumes a referral is always
matched to an export. When we add support for junctions that may not be the
case.
We can replace nfsd4_path() with a function that encodes the components
directly from the dentries. Since nfsd4_path is currently the only user of
the 'ex_pathname' field in struct svc_export, this has the added benefit
of allowing us to get rid of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fe0750e5 30-Jul-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: split stateowners into open and lockowners

The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7c13f344 30-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: drop most stateowner refcounting

Maybe we'll bring it back some day, but we don't have much real use for
it now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9072d5c6 23-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup seqid op stateowner usage

Now that the replay owner is in the cstate we can remove it from a lot
of other individual operations and further simplify
nfs4_preprocess_seqid_op().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f3e42237 23-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: centralize handling of replay owners

Set the stateowner associated with a replay in one spot in
nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing
a few lines of boilerplate from all the nfs4_preprocess_seqid_op()
callers.

Also turn ENCODE_SEQID_OP_TAIL into a function while we're here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7d7ca35 31-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix off-by-one-error in SEQUENCE reply

The values here represent highest slotid numbers. Since slotid's are
numbered starting from zero, the highest should be one less than the
number of slots.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a9004abc 23-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup and consolidate seqid_mutating_err

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 75c096f7 15-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: it's OK to return nfserr_symlink

The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.

In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.

The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.

I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.

Delete these unnecessary exceptions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3d2544b1 15-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up S_IS -> NF4 file type mapping

A slightly unconventional approach to make the code more compact I could
live with, but let's give the poor reader *some* chance.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57616300 10-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix seqid_mutating_error

The set of errors here does *not* agree with the set of errors specified
in the rfc!

While we're there, turn this macros into a function, for the usual
reasons, and move it to the one place where it's actually used.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1091006c 23-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd: turn on reply cache for NFSv4

It's sort of ridiculous that we've never had a working reply cache for
NFSv4.

On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4). What we really need here is some serious testing.

Anyway, here's a start.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3e98abff 16-Jul-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: call nfsd4_release_compoundargs from pc_release

This simplifies cleanup a bit.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 17456804 13-Jul-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Added TEST_STATEID operation

This operation is used by the client to check the validity of a list of
stateids.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e1ca12df 13-Jul-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: added FREE_STATEID operation

This operation is used by the client to tell the server to free a
stateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c47d832b 16-May-2011 Daniel Mack <zonque@gmail.com>

nfsd: make local functions static

This also fixes a number of sparse warnings.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ce2357f 27-Apr-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()

Compiling gave me this warning:
fs/nfsd/nfs4xdr.c: In function ‘nfsd4_decode_bind_conn_to_session’:
fs/nfsd/nfs4xdr.c:427:6: warning: variable ‘dummy’ set but not used
[-Wunused-but-set-variable]

The local variable "dummy" wasn't being used past the READ32() macro that
set it. READ_BUF() should ensure that the xdr buffer is pushed past the
data read into dummy already, so nothing needs to be read in.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[bfields@redhat.com: minor comment fixup.]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7c66360 21-Apr-2011 Andy Adamson <andros@netapp.com>

nfsd v4.1 lOCKT clientid field must be ignored

RFC 5661 Section 18.11.3

The clientid field of the owner MAY be set to any value by the client
and MUST be ignored by the server. The reason the server MUST ignore
the clientid field is that the server MUST derive the client ID from
the session ID from the SEQUENCE operation of the COMPOUND request.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5a02ab7c 10-Mar-2011 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd: wrong index used in inner loop

We must not use dummy for index.
After the first index, READ32(dummy) will change dummy!!!!

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: Trond points out READ_BUF alone is sufficient.]
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3ec07aa9 08-Mar-2011 roel <roel.kluin@gmail.com>

nfsd: wrong index used in inner loop

Index i was already used in the outer loop

Cc: stable@kernel.org
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 47c85291 15-Feb-2011 NeilBrown <neilb@suse.de>

nfsd: correctly handle return value from nfsd_map_name_to_*

These functions return an nfs status, not a host_err. So don't
try to convert before returning.

This is a regression introduced by
3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
but missed these two.

Cc: stable@kernel.org
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0d7bb719 18-Nov-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: set sequence flag when backchannel is down

Implement the SEQ4_STATUS_CB_PATH_DOWN flag.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1d1bc8f2 04-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: support BIND_CONN_TO_SESSION

Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c726023 04-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: return nfs errno from name_to_id functions

This avoids the need for the confusing ESRCH mapping.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2ca72e17 04-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move idmap and acl header files into fs/nfsd

These are internal nfsd interfaces.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# da165dd6 02-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd: remove some unnecessary dropit handling

We no longer need a few of these special cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 22b6dee8 26-Dec-2010 Mi Jinlong <mijinlong@cn.fujitsu.com>

nfsd4: fix oops on secinfo_no_name result encoding

The secinfo_no_name code oopses on encoding with

BUG: unable to handle kernel NULL pointer dereference at 00000044
IP: [<e2bd239a>] nfsd4_encode_secinfo+0x1c/0x1c1 [nfsd]

We should implement a nfsd4_encode_secinfo_no_name() instead using
nfsd4_encode_secinfo().

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 04f4ad16 16-Dec-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement secinfo_no_name

Implementation of this operation is mandatory for NFSv4.1.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5afa040b 08-Nov-2010 Mi Jinlong <mijinlong@cn.fujitsu.com>

NFSv4.1: Make sure nfsd can decode SP4_SSV correctly at exchange_id

According to RFC, the argument of ssv_sp_parms4 is:

struct ssv_sp_parms4 {
state_protect_ops4 ssp_ops;
sec_oid4 ssp_hash_algs<>;
sec_oid4 ssp_encr_algs<>;
uint32_t ssp_window;
uint32_t ssp_num_gss_handles;
};

If client send a exchange_id with SP4_SSV, server cann't decode
the SP4_SSV's ssp_hash_algs and ssp_encr_algs arguments correctly.

Because the kernel treat the two arguments as a signal
sec_oid4 struct, but should be a set of sec_oid4 struct.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b44f1ba 30-Sep-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: adjust buflen for encoded attrs bitmap based on actual bitmap length

The existing code adjusted it based on the worst case scenario for the returned
bitmap and the best case scenario for the supported attrs attribute.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[bfields@redhat.com: removed likely/unlikely's]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ebabe9a9 07-Jul-2010 Christoph Hellwig <hch@lst.de>

pass a struct path to vfs_statfs

We'll need the path to implement the flags field for statvfs support.
We do have it available in all callers except:

- ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
needs to do a caller to the lower filesystem statfs method.
- sys_ustat. Add a non-exported statfs_by_dentry helper for it which
doesn't won't be able to fill out the flags field later on.

In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
of the misleading vfs prefix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 039a87ca 30-Jul-2010 J. Bruce Fields <bfields@redhat.com>

nfsd: minor nfsd read api cleanup

Christoph points that the NFSv2/v3 callers know which case they want
here, so we may as well just call the file=NULL case directly instead of
making this conditional.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 76407f76 22-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4; fix session reference count leak

Note the session has to be put() here regardless of what happens to the
client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 4dc6ec00 19-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: implement reclaim_complete

This is a mandatory operation. Also, here (not in open) is where we
should be committing the reboot recovery information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# d7682988 11-May-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: keep a reference count on client while in use

Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.

Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# dbd65a7e 03-May-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: use local variable in nfs4svc_encode_compoundres

'cs' is already computed, re-use it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 26c0c75e 24-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: fix unlikely race in session replay case

In the replay case, the

renew_client(session->se_client);

happens after we've droppped the sessionid_lock, and without holding a
reference on the session; so there's nothing preventing the session
being freed before we get here.

Thanks to Benny Halevy for catching a bug in an earlier version of this
patch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Benny Halevy <bhalevy@panasas.com>


# 2bc3c117 19-Apr-2010 Neil Brown <neilb@suse.de>

nfsd4: bug in read_buf

When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to. So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.

We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic. Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# cf07d2ea 28-Feb-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: simplify references to nfsd4 lease time

Instead of accessing the lease time directly, some users call
nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as
nfs4_lease_time(). Neither layer of indirection serves any purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 462d6057 30-Jan-2010 Al Viro <viro@zeniv.linux.org.uk>

fix NFS4 handling of mountpoint stat

RFC says we need to follow the chain of mounts if there's more
than one stacked on that point.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 3ad2f3fb 02-Feb-2010 Daniel Mack <daniel@caiaq.de>

tree-wide: Assorted spelling fixes

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# de3cab79 11-Dec-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfsd4: Use FIRST_NFS4_OP in nfsd4_decode_compound()

Since we're checking for LAST_NFS4_OP, use FIRST_NFS4_OP to be consistent.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c551866e 11-Dec-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfsd41: nfsd4_decode_compound() does not recognize all ops

The server incorrectly assumes that the operations in the
array start with value 0. The first operation (OP_ACCESS)
has a value of 3, causing the check in nfsd4_decode_compound
to be off.

Instead of comparing that the operation number is less than
the number of elements in the array, the server should verify
that it is less than the maximum valid operation number
defined by LAST_NFS4_OP.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3227fa41 25-Oct-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: filter readdir results in V4ROOT case

As with lookup, we treat every boject as a mountpoint and pretend it
doesn't exist if it isn't exported.

The preexisting code here is confusing, but I haven't yet figured out
how to make it clearer.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a74af21 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Move private headers to source directory

Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 341eb184 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Source files #include cleanups

Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0a3adade 04-Nov-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: make fs/nfsd/vfs.h for common includes

None of this stuff is used outside nfsd, so move it out of the common
linux include directory.

Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2671a4bf 02-Sep-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSd: Fix filehandle leak in exp_pseudoroot() and nfsd4_path()

nfsd4_path() allocates a temporary filehandle and then fails to free it
before the function exits, leaking reference counts to the dentry and
export that it refers to.

Also, nfsd4_lookupp() puts the result of exp_pseudoroot() in a temporary
filehandle which it releases on success of exp_pseudoroot() but not on
failure; fix exp_pseudoroot to ensure that on failure it releases the
filehandle before returning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 557ce264 28-Aug-2009 Andy Adamson <andros@netapp.com>

nfsd41: replace page based DRC with buffer based DRC

Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.

Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.

Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.

Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.

The iov_len calculation in nfs4svc_encode_compoundres is now always
correct. Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.

The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().

Move useful nfsd4_cache_entry fields into nfsd4_slot.

Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a06b1261 31-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSD: Fix a bug in the NFSv4 'supported attrs' mandatory attribute

The fact that the filesystem doesn't currently list any alternate
locations does _not_ imply that the fs_locations attribute should be
marked as "unsupported".

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 49557cc7 23-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: Use separate DRC for setclientid

Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.

The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.

The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.

nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.

Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.

All other errors get cached.

Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6c18ba9f 15-Jun-2009 Alexandros Batsakis <Alexandros.Batsakis@netapp.com>

nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct

the change is valid for both the forechannel and the backchannel (currently dummy)

Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3c8e0316 16-May-2009 Yu Zhiguo <yuzg@cn.fujitsu.com>

NFSv4: do exact check about attribute specified

Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
not supported in current environment.
Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.

This bug is found when do newpynfs tests. The names of the tests that failed
are following:
CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s

Add function do_check_fattr() to do exact check:
1, Check attribute specified is supported by the NFSv4 server or not.
2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
in current environment or not.
3, Check attribute specified is writable or not.

step 1 and 3 are done in function nfsd4_decode_fattr() but removed
to this function now.

Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# b2c0cea6 05-May-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: check for negative dentry before use in nfsv4 readdir

After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs. readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry. Check for this case to avoid a NULL dereference.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Cc: stable@kernel.org


# 9064caae 28-Apr-2009 Randy Dunlap <randy.dunlap@oracle.com>

nfsd: use C99 struct initializers

Eliminate 56 sparse warnings like this one:

fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c654b8a9 16-Apr-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: support ext4 i_version

ext4 supports a real NFSv4 change attribute, which is bumped whenever
the ctime would be updated, including times when two updates arrive
within a jiffy of each other. (Note that although ext4 has space for
nanosecond-precision ctime, the real resolution is lower: it actually
uses jiffies as the time-source.) This ensures clients will invalidate
their caches when they need to.

There is some fear that keeping the i_version up-to-date could have
performance drawbacks, so for now it's turned on only by a mount option.
We hope to do something better eventually.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Theodore Tso <tytso@mit.edu>


# 3352d2c2 07-Apr-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: delete obsolete xdr comments

We don't need comments to tell us these macros are ugly. And we're long
past trying to share any of this code with the BSD's.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# bc749ca4 07-Apr-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: eliminate ENCODE_HEAD macro

This macro doesn't serve any useful purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 79fb54ab 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: CREATE_EXCLUSIVE4_1

Implement the CREATE_EXCLUSIVE4_1 open mode conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

This mode allows the client to atomically create a file
if it doesn't exist while setting some of its attributes.

It must be implemented if the server supports persistent
reply cache and/or pnfs.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8c18f205 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: SUPPATTR_EXCLCREAT attribute

Return bitmask for supported EXCLUSIVE4_1 create attributes.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7e705706 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: support for 3-word long attribute bitmask

Also, use client minorversion to generate supported attrs

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c0d6fc8a 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: pass writable attrs mask to nfsd4_decode_fattr

In preparation for EXCLUSIVE4_1

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 60adfc50 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: clientid handling

Extract the clientid from sessionid to set the op_clientid on open.
Verify that the clid for other stateful ops is zero for minorversion != 0
Do all other checks for stateful ops without sessions.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed whitespace indent]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_open]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 496c262c 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: check encode size for sessions maxresponse cached

Calculate the space the compound response has taken after encoding the current
operation.

pad: add on 8 bytes for the next operation's op_code and status so that
there is room to cache a failure on the next operation.

Compare this length to the session se_fmaxresp_cached and return
nfserr_rep_too_big_to_cache if the length is too large.

Our se_fmaxresp_cached will always be a multiple of PAGE_SIZE, and so
will be at least a page and will therefore hold the xdr_buf head.

Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd41: non-page DRC for solo sequence responses]
[fixed nfsd4_check_drc_limit cosmetics]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_check_drc_limit]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6668958f 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: stateid handling

When sessions are used, stateful operation sequenceid and stateid handling
are not used. When sessions are used, on the first open set the seqid to 1,
mark state confirmed and skip seqid processing.

When sessionas are used the stateid generation number is ignored when it is zero
whereas without sessions bad_stateid or stale stateid is returned.

Add flags to propagate session use to all stateful ops and down to
check_stateid_generation.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd4_has_session should return a boolean, not u32]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e10e0cfc 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: destroy_session operation

Implement the destory_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

[use sessionid_lock spin lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# bf864a31 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: non-page DRC for solo sequence responses

A session inactivity time compound (lease renewal) or a compound where the
sequence operation has sa_cachethis set to FALSE do not require any pages
to be held in the v4.1 DRC. This is because struct nfsd4_slot is already
caching the session information.

Add logic to the nfs41 server to not cache response pages for solo sequence
responses.

Return nfserr_replay_uncached_rep on the operation following the sequence
operation when sa_cachethis is FALSE.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_replay_cache_entry]
[nfsd41: rename nfsd4_no_page_in_cache]
[nfsd41 rename nfsd4_enc_no_page_replay]
[nfsd41 nfsd4_is_solo_sequence]
[nfsd41 change nfsd4_not_cached return]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed return type to bool]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 drop parens in nfsd4_is_solo_sequence call]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed "== 0" to "!"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ec6b5d7b 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: create_session operation

Implement the create_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Look up the client id (generated by the server on exchange_id,
given by the client on create_session).
If neither a confirmed or unconfirmed client is found
then the client id is stale
If a confirmed cilent is found (i.e. we already received
create_session for it) then compare the sequence id
to determine if it's a replay or possibly a mis-ordered rpc.
If the seqid is in order, update the confirmed client seqid
and procedd with updating the session parameters.

If an unconfirmed client_id is found then verify the creds
and seqid. If both match move the client id to confirmed state
and proceed with processing the create_session.

Currently, we do not support persistent sessions, and RDMA.

alloc_init_session generates a new sessionid and creates
a session structure.

NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for
the counting of DRC pages using the hard limits set in struct srv_serv.

A note on NFSD_PAGES_PER_SLOT:

Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be
cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT *
PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a
small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf
pages. So, the hard limit calculation use of pages by a session is
underestimated by the number of cached operations using the xdr_buf pages.

Yet another patch caches no pages for the solo sequence operation, or any
compound where cache_this is False. So the hard limit calculation use of
pages by a session is overestimated by the number of these operations in the
cache.

TODO: improve resource pre-allocation and negotiate session
parameters accordingly. Respect and possibly adjust
backchannel attributes.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
[nfsd41: remove headerpadsz from channel attributes]
Our client and server only support a headerpadsz of 0.
[nfsd41: use DRC limits in fore channel init]
[nfsd41: do not change CREATE_SESSION back channel attrs]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from alloc_init_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_create_session error handling]
[nfsd41: fix comment style in init_forechannel_attrs]
[nfsd41: allocate struct nfsd4_session and slot table in one piece]
[nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# da3846a2 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: nfsd DRC logic

Replay a request in nfsd4_sequence.
Add a minorversion to struct nfsd4_compound_state.

Pass the current slot to nfs4svc_encode_compound res via struct
nfsd4_compoundres to set an NFSv4.1 DRC entry.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfs4svc_encode_compoundres]
[nfsd41 replace nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# b85d4c01 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: sequence operation

Implement the sequence operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Check for stale clientid (as derived from the sessionid).
Enforce slotid range and exactly-once semantics using
the slotid and seqid.

If everything went well renew the client lease and
mark the slot INPROGRESS.

Add a struct nfsd4_slot pointer to struct nfsd4_compound_state.
To be used for sessions DRC replay.

[nfsd41: rename sequence catchthis to cachethis]
Signed-off-by: Andy Adamson<andros@netapp.com>
[pulled some code to set cstate->slot from "nfsd DRC logic"]
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd: add a struct nfsd4_slot pointer to struct nfsd4_compound_state]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: add nfsd4_session pointer to nfsd4_compound_state]
[nfsd41: set cstate session]
[nfsd41: use cstate session in nfsd4_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_sequence error handling]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0733d213 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: exchange_id operation

Implement the exchange_id operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-28

Based on the client provided name, hash a client id.
If a confirmed one is found, compare the op's creds and
verifier. If the creds match and the verifier is different
then expire the old client (client re-incarnated), otherwise,
if both match, assume it's a replay and ignore it.

If an unconfirmed client is found, then copy the new creds
and verifer if need update, otherwise assume replay.

The client is moved to a confirmed state on create_session.

In the nfs41 branch set the exchange_id flags to
EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_SUPP_MOVED_REFER
(pNFS is not supported, Referrals are supported,
Migration is not.).

Address various scenarios from section 18.35 of the spec:

1. Check for EXCHGID4_FLAG_UPD_CONFIRMED_REC_A and set
EXCHGID4_FLAG_CONFIRMED_R as appropriate.

2. Return error codes per 18.35.4 scenarios.

3. Update client records or generate new client ids depending on
scenario.

Note: 18.35.4 case 3 probably still needs revisiting. The handling
seems not quite right.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamosn <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use utsname for major_id (and copy to server_scope)]
[nfsd41: fix handling of various exchange id scenarios]
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse use of EXCHGID4_INVAL_FLAG_MASK_A]
[simplify nfsd4_encode_exchange_id error handling]
[nfsd41: embed an xdr_netobj in nfsd4_exchange_id]
[nfsd41: return nfserr_serverfault for spa_how == SP4_MACH_CRED]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2db134eb 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: xdr infrastructure

Define nfsd41_dec_ops vector and add it to nfsd4_minorversion for
minorversion 1.

Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1
since we don't need to filter out obsolete ops as this is
done in the decoding phase.

exchange_id, create_session, destroy_session, and sequence ops are
implemented as stubs returning nfserr_opnotsupp at this stage.

[was nfsd41: xdr stubs]
[get rid of CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a1c8c4d1 08-Mar-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: support putpubfh operation

Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually
allowed for v4. The right error is probably NFSERR_NOTSUPP.

But let's just implement it; though rarely seen, it can be used by
Solaris (with a special mount option), is mandated by the rfc, and is
trivial for us to support.

Thanks to Yang Hongyang for pointing out the original problem, and to
Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further
argument....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1e685ec2 04-Mar-2009 Benny Halevy <bhalevy@panasas.com>

NFSD: return nfsv4 error code nfserr_notsupp rather than nfsv[23]'s nfserr_opnotsupp

Thanks for Bill Baker at sun.com for catching this
at Connectathon 2009.

This bug was introduced in 2.6.27

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 84f09f46 04-Mar-2009 Benny Halevy <bhalevy@panasas.com>

NFSD: provide encode routine for OP_OPENATTR

Although this operation is unsupported by our implementation
we still need to provide an encode routine for it to
merely encode its (error) status back in the compound reply.

Thanks for Bill Baker at sun.com for testing with the Sun
OpenSolaris' client, finding, and reporting this bug at
Connectathon 2009.

This bug was introduced in 2.6.27

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 4e65ebf0 15-Dec-2008 Marc Eshel <eshel@almaden.ibm.com>

nfsd: delete wrong file comment from nfsd/nfs4xdr.c

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e31a1b66 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: nfs4xdr decode_stateid helper function

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5bf8c691 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: properly xdr-decode NFS4_OPEN_CLAIM_DELEGATE_CUR stateid

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1b6b2257 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: don't declare p in ENCODE_SEQID_OP_HEAD

After using the encode_stateid helper the "p" pointer declared
by ENCODE_SEQID_OP_HEAD is warned as unused.
In the single site where it is still needed it can be declared
separately using the ENCODE_HEAD macro.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e2f282b9 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: nfs4xdr encode_stateid helper function

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5033b77a 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: fix nfsd4_encode_open buffer space reservation

nfsd4_encode_open first reservation is currently for 36 + sizeof(stateid_t)
while it writes after the stateid a cinfo (20 bytes) and 5 more 4-bytes
words, for a total of 40 + sizeof(stateid_t).

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c47b2ca4 12-Aug-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: properly xdr-encode deleg stateid returned from open

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5108b276 17-Jul-2008 Harvey Harrison <harvey.harrison@gmail.com>

nfsd: nfs4xdr.c do-while is not a compound statement

The WRITEMEM macro produces sparse warnings of the form:
fs/nfsd/nfs4xdr.c:2668:2: warning: do-while statement is not a compound statement

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ad1060c8 18-Jul-2008 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c

Thanks to problem report and original patch from Harvey Harrison.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>


# 695e12f8 04-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: tabulate nfs4 xdr encoding functions

In preparation for minorversion 1

All encoders now return an nfserr status (typically their
nfserr argument). Unsupported ops go through nfsd4_encode_operation
too, so use nfsd4_encode_noop to encode nothing for their reply body.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f2feb96b 02-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: nfs4 minorversion decoder vectors

Have separate vectors of operation decoders for each minorversion.
Obsolete ops in newer minorversions have default implementation returning
nfserr_opnotsupp.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3c375c6f 02-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: unsupported nfs4 ops should fail with nfserr_opnotsupp

nfserr_opnotsupp should be returned for unsupported nfs4 ops
rather than nfserr_op_illegal.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 347e0ad9 02-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: tabulate nfs4 xdr decoding functions

In preparation for minorversion 1

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 30cff1ff 02-Jul-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: return nfserr_minor_vers_mismatch when compound minorversion != 0

Check minorversion once before decoding any operation and reject with
nfserr_minor_vers_mismatch if != 0 (this still happens in nfsd4_proc_compound).
In this case return a zero length resultdata array as required by RFC3530.

minorversion 1 processing will have its own vector of decoders.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 13b1867c 28-May-2008 Benny Halevy <bhalevy@panasas.com>

nfsd: make nfs4xdr WRITEMEM safe against zero count

WRITEMEM zeroes the last word in the destination buffer
for padding purposes, but this must not be done if
no bytes are to be copied, as it would result
in zeroing of the word right before the array.

The current implementation works since it's always called
with non zero nbytes or it follows an encoding of the
string (or opaque) length which, if equal to zero,
can be overwritten with zero.

Nevertheless, it seems safer to check for this case.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e36cd4a2 24-Apr-2008 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: don't allow setting ctime over v4

Presumably this is left over from earlier drafts of v4, which listed
TIME_METADATA as writeable. It's read-only in rfc 3530, and shouldn't
be modifiable anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c0ce6ec8 11-Feb-2008 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: clarify readdir/mountpoint-crossing code

The code here is difficult to understand; attempt to clarify somewhat by
pulling out one of the more mystifying conditionals into a separate
function.

While we're here, also add lease_time to the list of attributes that we
don't really need to cross a mountpoint to fetch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>


# 54775491 14-Feb-2008 Jan Blunck <jblunck@suse.de>

Use struct path in struct svc_export

I'm embedding struct path into struct svc_export.

[akpm@linux-foundation.org: coding-style fixes]
[ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 406a7ea9 27-Nov-2007 Frank Filz <ffilzlnx@us.ibm.com>

nfsd: Allow AIX client to read dir containing mountpoints

This patch addresses a compatibility issue with a Linux NFS server and
AIX NFS client.

I have exported /export as fsid=0 with sec=krb5:krb5i
I have mount --bind /home onto /export/home
I have exported /export/home with sec=krb5i

The AIX client mounts / -o sec=krb5:krb5i onto /mnt

If I do an ls /mnt, the AIX client gets a permission error. Looking at
the network traceIwe see a READDIR looking for attributes
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID. The response gives a
NFS4ERR_WRONGSEC which the AIX client is not expecting.

Since the AIX client is only asking for an attribute that is an
attribute of the parent file system (pseudo root in my example), it
seems reasonable that there should not be an error.

In discussing this issue with Bruce Fields, I initially proposed
ignoring the error in nfsd4_encode_dirent_fattr() if all that was being
asked for was FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID, however,
Bruce suggested that we avoid calling cross_mnt() if only these
attributes are requested.

The following patch implements bypassing cross_mnt() if only
FATTR4_RDATTR_ERROR and FATTR4_MOUNTED_ON_FILEID are called. Since there
is some complexity in the code in nfsd4_encode_fattr(), I didn't want to
duplicate code (and introduce a maintenance nightmare), so I added a
parameter to nfsd4_encode_fattr() that indicates whether it should
ignore cross mounts and simply fill in the attribute using the passed in
dentry as opposed to it's parent.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ca2a05aa 11-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: Fix handling of negative lengths in read_buf()

The length "nbytes" passed into read_buf should never be negative, but
we check only for too-large values of "nbytes", not for too-small
values. Make nbytes unsigned, so it's clear that the former tests are
sufficient. (Despite this read_buf() currently correctly returns an xdr
error in the case of a negative length, thanks to an unsigned
comparison with size_of() and bounds-checking in kmalloc(). This seems
very fragile, though.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a16e92ed 28-Sep-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME

Without this we always return 2^32-1 as the the maximum namelength.

Thanks to Andreas Gruenbacher for bug report and testing.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Andreas Gruenbacher <agruen@suse.de>


# 40ee5dc6 15-Aug-2007 Peter Staubach <staubach@redhat.com>

knfsd: 64 bit ino support for NFS server

Modify the NFS server code to support 64 bit ino's, as
appropriate for the system and the NFS protocol version.

The gist of the changes is to query the underlying file system
for attributes and not just to use the cached attributes in the
inode. For this specific purpose, the inode only contains an
ino field which unsigned long, which is large enough on 64 bit
platforms, but is not large enough on 32 bit platforms.

I haven't been able to find any reason why ->getattr can't be called
while i_mutex. The specification indicates that i_mutex is not
required to be held in order to invoke ->getattr, but it doesn't say
that i_mutex can't be held while invoking ->getattr.

I also haven't come to any conclusions regarding the value of
lease_get_mtime() and whether it should or should not be invoked
by fill_post_wcc() too. I chose not to change this because I
thought that it was safer to leave well enough alone. If we
decide to make a change, it can be done separately.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 817cb9d4 11-Sep-2007 Chuck Lever <chuck.lever@oracle.com>

NFSD: Convert printk's to dprintk's in NFSD's nfs4xdr

Due to recent edict to remove or replace printk's that can flood the system
log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ca5c8cde 26-Jul-2007 Al Viro <viro@ftp.linux.org.uk>

lockd and nfsd endianness annotation fixes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4796f457 17-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: nfsd4: secinfo handling without secinfo= option

We could return some sort of error in the case where someone asks for secinfo
on an export without the secinfo= option set--that'd be no worse than what
we've been doing. But it's not really correct. So, hack up an approximate
secinfo response in that case--it may not be complete, but it'll tell the
client at least one acceptable security flavor.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dcb488a3 17-Jul-2007 Andy Adamson <andros@citi.umich.edu>

knfsd: nfsd4: implement secinfo

Implement the secinfo operation.

(Thanks to Usha Ketineni wrote an earlier version of this support.)

Cc: Usha Ketineni <uketinen@us.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# df547efb 17-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: nfsd4: simplify exp_pseudoroot arguments

We're passing three arguments to exp_pseudoroot, two of which are just fields
of the svc_rqst. Soon we'll want to pass in a third field as well. So let's
just give up and pass in the whole struct svc_rqst.

Also sneak in some minor style cleanups while we're at it.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e63340ae 08-May-2007 Randy Dunlap <randy.dunlap@oracle.com>

header cleaning: don't include smp_lock.h when not used

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f34f9242 16-Feb-2007 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] knfsd: nfsd4: fix error return on unsupported acl

We should be returning ATTRNOTSUPP, not NOTSUPP, when acls are unsupported.

Also fix a comment.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a4db5fe5 16-Feb-2007 J. Bruce Fields <bfields@snoopy.citi.umich.edu>

[PATCH] knfsd: nfsd4: fix memory leak on kmalloc failure in savemem

The wrong pointer is being kfree'd in savemem() when defer_free returns with
an error.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 28e05dd8 16-Feb-2007 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] knfsd: nfsd4: represent nfsv4 acl with array instead of linked list

Simplify the memory management and code a bit by representing acls with an
array instead of a linked list.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# af6a4e28 14-Feb-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: add some new fsid types

Add support for using a filesystem UUID to identify and export point in the
filehandle.

For NFSv2, this UUID is xor-ed down to 4 or 8 bytes so that it doesn't take up
too much room. For NFSv3+, we use the full 16 bytes, and possibly also a
64bit inode number for exports beneath the root of a filesystem.

When generating an fsid to return in 'stat' information, use the UUID (hashed
down to size) if it is available and a small 'fsid' was not specifically
provided.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0ad13ef 26-Jan-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd

nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'. It
then casts encode_dent_fn points to 'filldir_t' as needed. This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.

So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now. Less internal (to nfsd) checking
offset by more external checking, which is more important.

Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.

Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 021d3a72 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: handling more nfsd_cross_mnt errors in nfsd4 readdir

This patch on its own causes no change in behavior, since nfsd_cross_mnt()
only returns -EAGAIN; but in the future I'd like it to also be able to return
-ETIMEDOUT, so we may as well handle any possible error here.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b8dd7b9a 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] nfsd: NFSv4 errno endianness annotations

don't use the same variable to store NFS and host error values

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b37ad28b 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] nfsd: nfs4 code returns error values in net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 2ebbc012 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] xdr annotations: NFSv4 server

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# cc45f017 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] bug: nfsd/nfs4xdr.c misuse of ERR_PTR()

a) ERR_PTR(nfserr_something) is a bad idea;
IS_ERR() will be false for it.
b) mixing nfserr_.... with -EOPNOTSUPP is
even worse idea.

nfsd4_path() does both; caller expects to get NFS protocol error out it if
anything goes wrong, but if it does we either do not notice (see (a)) or get
host-endian negative (see (b)).

IOW, that's a case when we can't use ERR_PTR() to return error, even though we
return a pointer in case of success.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 42ca0993 04-Oct-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: actually use all the pieces to implement referrals

Use all the pieces set up so far to implement referral support, allowing
return of NFS4ERR_MOVED and fs_locations attribute.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 81c3f413 04-Oct-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd4: xdr encoding for fs_locations

Encode fs_locations attribute.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7adae489 04-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP

The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet. That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3cc03b16 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom

.. by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack. So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union). So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 44524359 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Replace two page lists in struct svc_rqst with one

We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays. However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply. Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 73dff8be 03-Oct-2006 Eric Sesterhenn <snakebyte@gmx.de>

BUG_ON() conversion in fs/nfsd/

This patch converts an if () BUG(); construct to BUG_ON();
which occupies less space, uses unlikely and is safer when
BUG() is disabled.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 726c3342 23-Jun-2006 David Howells <dhowells@redhat.com>

[PATCH] VFS: Permit filesystem to perform statfs with a known root dentry

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# bb6e8a9f 10-Apr-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: nfsd4: fix corruption on readdir encoding with 64k pages

Fix corruption on readdir encoding with 64k pages.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ed6decc 10-Apr-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: nfsd4: fix corruption of returned data when using 64k pages

In v4 we grab an extra page just for the padding of returned data. The
formula that the rpc server uses to allocate pages for the response doesn't
take into account this extra page.

Instead of adjusting those formulae, we adopt the same solution as v2 and v3,
and put the "tail" data in the same page as the "head" data.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b905b7b0 10-Apr-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: nfsd4: better nfs4acl errors

We're returning -1 in a few places in the NFSv4<->POSIX acl translation code
where we could return a reasonable error.

Also allows some minor simplification elsewhere.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e8c96f8c 24-Mar-2006 Tobias Klauser <tklauser@nuerscht.ch>

[PATCH] fs: Use ARRAY_SIZE macro

Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE. Some trailing whitespaces are also deleted.

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 34081efc 18-Jan-2006 Fred Isaman <iisaman@citi.umich.edu>

[PATCH] nfsd4: Fix bug in rdattr_error return

Fix bug in rdattr_error return which causes correct error code to be
overwritten by nfserr_toosmall.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3a65588a 18-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu>

[PATCH] nfsd4: rename lk_stateowner

One of the things that's confusing about nfsd4_lock is that the lk_stateowner
field could be set to either of two different lockowners: the open owner or
the lock owner. Rename to lk_replay_owner and add a comment to make it clear
that it's used for whichever stateowner has its sequence id bumped for replay
detection.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f99d49ad 07-Nov-2005 Jesper Juhl <jesper.juhl@gmail.com>

[PATCH] kfree cleanup: fs

This is the fs/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in fs/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e34ac862 07-Jul-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: fix fh_expire_type

After discussion at the recent NFSv4 bake-a-thon, I realized that my
assumption that NFS4_FH_PERSISTENT required filehandles to persist was a
misreading of the spec. This also fixes an interoperability problem with the
Solaris client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7fb64cee 07-Jul-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: seqid comments

Add some comments on the use of so_seqid, in an attempt to avoid some of the
confusion outlined in the previous patch....

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# bd9aac52 07-Jul-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: fix open_reclaim seqid

The sequence number we store in the sequence id is the last one we received
from the client. So on the next operation we'll check that the client gives
us the next higher number.

We increment sequence id's at the last moment, in encode, so that we're sure
of knowing the right error return. (The decision to increment the sequence id
depends on the exact error returned.)

However on the *first* use of a sequence number, if we set the sequence number
to the one received from the client and then let the increment happen on
encode, we'll be left with a sequence number one to high.

For that reason, ENCODE_SEQID_OP_TAIL only increments the sequence id on
*confirmed* stateowners.

This creates a problem for open reclaims, which are confirmed on first use.
Therefore the open reclaim code, as a special exception, *decrements* the
sequence id, cancelling out the undesired increment on encode. But this
prevents the sequence id from ever being incremented in the case where
multiple reclaims are sent with the same openowner. Yuch!

We could add another exception to the open reclaim code, decrementing the
sequence id only if this is the first use of the open owner.

But it's simpler by far to modify the meaning of the op_seqid field: instead
of representing the previous value sent by the client, we take op_seqid, after
encoding, to represent the *next* sequence id that we expect from the client.
This eliminates the need for special-case handling of the first use of a
stateowner.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# fd39ca9a 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: nfsd4: make needlessly global code static

This patch contains the following possible cleanups:

- make needlessly global code static

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7b190fec 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: nfsd4: delegation recovery

Allow recovery of delegations after reboot.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 49640001 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd4: fix fh_expire_type

We're returning NFS4_FH_NOEXPIRE_WITH_OPEN | NFS4_FH_VOL_RENAME for the
fh_expire_type attribute. This is incorrect:
1. The spec actually only allows NOEXPIRE_WITH_OPEN when
VOLATILE_ANY is also set.
2. Filehandles for open files can expire, if the file is removed
and there is a reboot.
3. Filehandles are only volatile on rename in the nosubtree check
case.

Unfortunately, there's no way to indicate that we only expire on remove. So
our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set
FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually
cause problems in practice and it seems possibly useful to give clients some
way to distinguish that case.

Fix a mispelled #define while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!