History log of /linux-master/fs/nfsd/nfs3xdr.c
Revision Date Author Comments
# 24d92de9 15-Feb-2024 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()

The main point of the guarded SETATTR is to prevent races with other
WRITE and SETATTR calls. That requires that the check of the guard time
against the inode ctime be done after taking the inode lock.

Furthermore, we need to take into account the 32-bit nature of
timestamps in NFSv3, and the possibility that files may change at a
faster rate than once a second.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 82078b98 18-May-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Ensure that xdr_write_pages updates rq_next_page

All other NFSv[23] procedures manage to keep page_ptr and
rq_next_page in lock step.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d4da5baa 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up WRITE arg decoders

xdr_stream_subsegment() already returns a boolean value.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c3d2a04f 12-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks

Replace the check for buffer over/underflow with a helper that is
commonly used for this purpose. The helper also sets xdr->nwords
correctly after successfully linearizing the symlink argument into
the stream's scratch buffer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c306d737 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Deprecate NFS_OFFSET_MAX

NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a648fdeb 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes

iattr::ia_size is a loff_t, so these NFSv3 procedures must be
careful to deal with incoming client size values that are larger
than s64_max without corrupting the value.

Silently capping the value results in storing a different value
than the client passed in which is unexpected behavior, so remove
the min_t() check in decode_sattr3().

Note that RFC 1813 permits only the WRITE procedure to return
NFS3ERR_FBIG. We believe that NFSv3 reference implementations
also return NFS3ERR_FBIG when ia_size is too large.

Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fcb5e3fa 24-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Move fill_pre_wcc() and fill_post_wcc()

These functions are related to file handle processing and have
nothing to do with XDR encoding or decoding. Also they are no longer
NFSv3-specific. As a clean-up, move their definitions to a more
appropriate location. WCC is also an NFSv3-specific term, so rename
them as general-purpose helpers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 58f258f6 24-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

Revert "nfsd: skip some unnecessary stats in the v4 case"

On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes
returning a reasonable-looking value in the cinfo.before field and
zero in the cinfo.after field.

RFC 8881 Section 10.8.1 says:
> When a client is making changes to a given directory, it needs to
> determine whether there have been changes made to the directory by
> other clients. It does this by using the change attribute as
> reported before and after the directory operation in the associated
> change_info4 value returned for the operation.

and

> ... The post-operation change
> value needs to be saved as the basis for future change_info4
> comparisons.

A good quality client implementation therefore saves the zero
cinfo.after value. During a subsequent OPEN operation, it will
receive a different non-zero value in the cinfo.before field for
that directory, and it will incorrectly believe the directory has
changed, triggering an undesirable directory cache invalidation.

There are filesystem types where fs_supports_change_attribute()
returns false, tmpfs being one. On NFSv4 mounts, this means the
fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is
never invoked. Subsequently, nfsd4_change_attribute() is invoked
with an uninitialized @stat argument.

In fill_pre_wcc(), @stat contains stale stack garbage, which is
then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all
zeroes, so zero is placed on the wire. Both of these values are
meaningless.

This fix can be applied immediately to stable kernels. Once there
are more regression tests in this area, this optimization can be
attempted again.

Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 130e2054 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_encode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_encode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fda49441 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_encode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.

Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c44b31c2 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_decode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_decode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 16c66364 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_decode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.

Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dae9a6ca 30-Sep-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()

Refactor.

Now that the NFSv2 and NFSv3 XDR decoders have been converted to
use xdr_streams, the WRITE decoder functions can use
xdr_stream_subsegment() to extract the WRITE payload into its own
xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.

That makes it possible to pass the first kvec, pages array + length,
page_base, and total payload length via a single function parameter.

The payload's page_base is not yet assigned or used, but will be in
subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d8b26071 01-Sep-2021 NeilBrown <neilb@suse.de>

NFSD: simplify struct nfsfh

Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
struct) and are accessed using macros like:

#define fh_FOO fh_base.fh_new.fb_FOO

This patch makes the union and struct anonymous, so that "fh_FOO" can be
a name directly within 'struct knfsd_fh' and the #defines aren't needed.

The file handle as a whole is sometimes accessed as "fh_base" or
"fh_base.fh_pad", neither of which are particularly helpful names.
As the struct holding the filehandle is now anonymous, we
cannot use the name of that, so we union it with 'fh_raw' and use that
where the raw filehandle is needed. fh_raw also ensure the structure is
large enough for the largest possible filehandle.

fh_raw is a 'char' array, removing any need to cast it for memcpy etc.

SVCFH_fmt() is simplified using the "%ph" printk format. This
changes the appearance of filehandles in dprintk() debugging, making
them a little more precise.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 219a1705 05-Mar-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up NFSDDBG_FACILITY macro

These are no longer needed because there are no dprintk() call sites
in these files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1416f435 15-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up after updating NFSv3 ACL encoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 20798dfe 18-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 14119346 13-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove unused NFSv3 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7f87fc2d 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream

The benefit of the xdr_stream helpers is that they transparently
handle encoding an XDR data item that crosses page boundaries.
Most of the open-coded logic to do that here can be eliminated.

A sub-buffer and sub-stream are set up as a sink buffer for the
directory entry encoder. As an entry is encoded, it is added to
the end of the content in this buffer/stream. The total length of
the directory list is tracked in the buffer's @len field.

When it comes time to encode the Reply, the sub-buffer is merged
into rq_res's page array at the correct place using
xdr_write_pages().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e4ccfe30 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a1409e2d 09-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a161e6c7 10-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: De-duplicate identical code that handles encoding of
directory offset cookies across page boundaries.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5ef2826c 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream

As an additional clean up, encode_wcc_data() is removed because it
is now no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ded04a58 06-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0a139d1b 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8b704498 06-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4d74380a 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 89d79e96 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 78315b36 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ecb7a085 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cc9bcdad 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9a9c8923 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 70f8e839 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5cf35335 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream

Also, clean up: Rename the encoder function to match the name of
the result structure in RFC 1813, consistent with other encoder
function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 907c3822 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2c42f804 21-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the GETATTR3res encoder to use struct xdr_stream

As an additional clean up, some renaming is done to more closely
reflect the data type and variable names used in the NFSv3 XDR
definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 428a23d2 29-Jan-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: skip some unnecessary stats in the v4 case

In the typical case of v4 and an i_version-supporting filesystem, we can
skip a stat which is only required to fake up a change attribute from
ctime.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9cee763e 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up after updating NFSv3 ACL decoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 05027eaf 17-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 GETACL argument decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f8a38e2d 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the MKNOD3args decoder to use struct xdr_stream

This commit removes the last usage of the original decode_sattr3(),
so it is removed as a clean-up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# da392016 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the SYMLINK3args decoder to use struct xdr_stream

Similar to the WRITE decoder, code that checks the sanity of the
payload size is re-wired to work with xdr_stream infrastructure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 83374c27 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the MKDIR3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6b3a1196 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the CREATE3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9cde9360 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the SETATTR3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# efaa1e7c 19-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the LINK3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d181e0a4 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the RENAME3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 54d1d43d 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c8d26a0a 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update COMMIT3arg decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9cedc2e6 19-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update READDIR3args decoders to use struct xdr_stream

As an additional clean up, neither nfsd3_proc_readdir() nor
nfsd3_proc_readdirplus() make use of the dircount argument, so
remove it from struct nfsd3_readdirargs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 40116ebd 17-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add helper to set up the pages where the dirlist is encoded

De-duplicate some code that is used by both READDIR and READDIRPLUS
to build the dirlist in the Reply. Because this code is not related
to decoding READ arguments, it is moved to a more appropriate spot.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 224c1c89 23-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update READLINK3arg decoder to use struct xdr_stream

The NFSv3 READLINK request takes a single filehandle, so it can
re-use GETATTR's decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c43b2f22 22-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update WRITE3arg decoder to use struct xdr_stream

As part of the update, open code that sanity-checks the size of the
data payload against the length of the RPC Call message has to be
re-implemented to use xdr_stream infrastructure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# be63bd2a 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update READ3arg decoder to use struct xdr_stream

The code that sets up rq_vec is refactored so that it is now
adjacent to the nfsd_read() call site where it is used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3b921a2b 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update ACCESS3arg decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9575363a 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Update GETATTR3args decoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 51b2ee7d 11-Jan-2021 J. Bruce Fields <bfields@redhat.com>

nfsd4: readdirplus shouldn't return parent of export

If you export a subdirectory of a filesystem, a READDIRPLUS on the root
of that export will return the filehandle of the parent with the ".."
entry.

The filehandle is optional, so let's just not return the filehandle for
".." if we're at the root of an export.

Note that once the client learns one filehandle outside of the export,
they can trivially access the rest of the export using further lookups.

However, it is also not very difficult to guess filehandles outside of
the export. So exporting a subdirectory of a filesystem should
considered equivalent to providing access to the entire filesystem. To
avoid confusion, we recommend only exporting entire filesystems.

Reported-by: Youjipeng <wangzhibei1999@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# daab110e 30-Nov-2020 Jeff Layton <jeff.layton@primarydata.com>

nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations

With NFSv3 nfsd will always attempt to send along WCC data to the
client. This generally involves saving off the in-core inode information
prior to doing the operation on the given filehandle, and then issuing a
vfs_getattr to it after the op.

Some filesystems (particularly clustered or networked ones) have an
expensive ->getattr inode operation. Atomicity is also often difficult
or impossible to guarantee on such filesystems. For those, we're best
off not trying to provide WCC information to the client at all, and to
simply allow it to poll for that information as needed with a GETATTR
RPC.

This patch adds a new flags field to struct export_operations, and
defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
that nfsd should not attempt to provide WCC info in NFSv3 replies. It
also adds a blurb about the new flags field and flag to the exporting
documentation.

The server will also now skip collecting this information for NFSv2 as
well, since that info is never used there anyway.

Note that this patch does not add this flag to any filesystem
export_operations structures. This was originally developed to allow
reexporting nfs via nfsd.

Other filesystems may want to consider enabling this flag too. It's hard
to tell however which ones have export operations to enable export via
knfsd and which ones mostly rely on them for open-by-filehandle support,
so I'm leaving that up to the individual maintainers to decide. I am
cc'ing the relevant lists for those filesystems that I think may want to
consider adding this though.

Cc: HPDD-discuss@lists.01.org
Cc: ceph-devel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: fuse-devel@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 942b20dc 30-Nov-2020 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't query change attribute in v2/v3 case

inode_query_iversion() has side effects, and there's no point calling it
when we're not even going to use it.

We check whether we're currently processing a v4 request by checking
fh_maxsize, which is arguably a little hacky; we could add a flag to
svc_fh instead.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 70b87f77 30-Nov-2020 J. Bruce Fields <bfields@redhat.com>

nfsd: only call inode_query_iversion in the I_VERSION case

inode_query_iversion() can modify i_version. Depending on the exported
filesystem, that may not be safe. For example, if you're re-exporting
NFS, NFS stores the server's change attribute in i_version and does not
expect it to be modified locally. This has been observed causing
unnecessary cache invalidations.

The way a filesystem indicates that it's OK to call
inode_query_iverson() is by setting SB_I_VERSION.

So, move the I_VERSION check out of encode_change(), where it's used
only in GETATTR responses, to nfsd4_change_attribute(), which is
also called for pre- and post- operation attributes.

(Note we could also pull the NFSEXP_V4ROOT case into
nfsd4_change_attribute() as well. That would actually be a no-op,
since pre/post attrs are only used for metadata-modifying operations,
and V4ROOT exports are read-only. But we might make the change in
the future just for simplicity.)

Reported-by: Daire Byrne <daire@dneg.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 788f7183 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add common helpers to decode void args and encode void results

Start off the conversion to xdr_stream by de-duplicating the functions
that decode void arguments and encode void results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 76e5492b 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders

Have the NFSD encoders annotate the boundaries of every
direct-data-placement eligible result data payload. Then change
svcrdma to use that annotation instead of the xdr->page_len
when handling Write chunks.

For NFSv4 on RDMA, that enables the ability to recognize multiple
result payloads per compound. This is a pre-requisite for supporting
multiple Write chunks per RPC transaction.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1905cac9 23-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: NFSv3 PATHCONF Reply is improperly formed

Commit cc028a10a48c ("NFSD: Hoist status code encoding into XDR
encoder functions") missed a spot.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cc028a10 02-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Hoist status code encoding into XDR encoder functions

The original intent was presumably to reduce code duplication. The
trade-off was:

- No support for an NFSD proc function returning a non-success
RPC accept_stat value.
- No support for void NFS replies to non-NULL procedures.
- Everyone pays for the deduplication with a few extra conditional
branches in a hot path.

In addition, nfsd_dispatch() leaves *statp uninitialized in the
success path, unlike svc_generic_dispatch().

Address all of these problems by moving the logic for encoding
the NFS status code into the NFS XDR encoders themselves. Then
update the NFS .pc_func methods to return an RPC accept_stat
value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcc46991 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Encoder and decoder functions are always present

nfsd_dispatch() is a hot path. Let's optimize the XDR method calls
for the by-far common case, which is that the XDR methods are indeed
present.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 19e0663f 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Ensure sampling of the write verifier is atomic with the write

When doing an unstable write, we need to ensure that we sample the
write verifier before releasing the lock, and allowing a commit to
the same file to proceed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 524ff1af 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Ensure sampling of the commit verifier is atomic with the commit

When we have a successful commit, ensure we sample the commit verifier
before releasing the lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 92c5e469 31-Oct-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: handle nfs3 timestamps as unsigned

The decode_time3 function behaves differently on 32-bit
and 64-bit architectures: on the former, a 32-bit timestamp
gets converted into an signed number and then into a timestamp
between 1902 and 2038, while on the latter it is interpreted
as unsigned in the range 1970-2106.

Change all the remaining 'timespec' in nfsd to 'timespec64'
to make the behavior the same, and use the current interpretation
of the dominant 64-bit architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6c2d4798 30-Oct-2019 Al Viro <viro@zeniv.linux.org.uk>

new helper: lookup_positive_unlocked()

Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 27c438f5 02-Sep-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Support the server resetting the boot verifier

Add support to allow the server to reset the boot verifier in order to
force clients to resend I/O after a timeout failure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e45d1a18 08-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: knfsd must use the container user namespace

Convert knfsd to use the user namespace of the container that started
the server processes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c86794a 04-Apr-2019 Murphy Zhou <jencce.kernel@gmail.com>

nfsd/nfsd3_proc_readdir: fix buffer count and page pointers

After this commit
f875a79 nfsd: allow nfsv3 readdir request to be larger.
nfsv3 readdir request size can be larger than PAGE_SIZE. So if the
directory been read is large enough, we can use multiple pages
in rq_respages. Update buffer count and page pointers like we do
in readdirplus to make this happen.

Now listing a directory within 3000 files will panic because we
are counting in a wrong way and would write on random page.

Fixes: f875a79 "nfsd: allow nfsv3 readdir request to be larger"
Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f875a792 06-Mar-2019 NeilBrown <neilb@suse.com>

nfsd: allow nfsv3 readdir request to be larger.

nfsd currently reports the NFSv3 dtpref FSINFO parameter
to be PAGE_SIZE, so NFS clients will typically ask for one
page of directory entries at a time. This is needlessly restrictive
as nfsd can handle larger replies easily.

Also, a READDIR request (but not a READDIRPLUS request) has the count
size clipped to PAGE_SIE, again unnecessary.

This patch lifts these limits so that larger readdir requests can be
used.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b602345d 03-Mar-2019 NeilBrown <neilb@suse.com>

nfsd: fix memory corruption caused by readdir

If the result of an NFSv3 readdir{,plus} request results in the
"offset" on one entry having to be split across 2 pages, and is sized
so that the next directory entry doesn't fit in the requested size,
then memory corruption can happen.

When encode_entry() is called after encoding the last entry that fits,
it notices that ->offset and ->offset1 are set, and so stores the
offset value in the two pages as required. It clears ->offset1 but
*does not* clear ->offset.

Normally this omission doesn't matter as encode_entry_baggage() will
be called, and will set ->offset to a suitable value (not on a page
boundary).
But in the case where cd->buflen < elen and nfserr_toosmall is
returned, ->offset is not reset.

This means that nfsd3proc_readdirplus will see ->offset with a value 4
bytes before the end of a page, and ->offset1 set to NULL.
It will try to write 8bytes to ->offset.
If we are lucky, the next page will be read-only, and the system will
BUG: unable to handle kernel paging request at...

If we are unlucky, some innocent page will have the first 4 bytes
corrupted.

nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
writes 8 bytes to the offset wherever it is.

Fix this by clearing ->offset after it is used, and copying the
->offset handling code from nfsd3_proc_readdirplus into
nfsd3_proc_readdir.

(Note that the commit hash in the Fixes tag is from the 'history'
tree - this bug predates git).

Fixes: 0b1d57cf7654 ("[PATCH] kNFSd: Fix nfs3 dentry encoding")
Fixes-URL: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=0b1d57cf7654
Cc: stable@vger.kernel.org (v2.6.12+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95582b00 08-May-2018 Deepa Dinamani <deepa.kernel@gmail.com>

vfs: change inode times to use struct timespec64

struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
current_time ( ... )
{
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
...
- return timespec_trunc(
+ return timespec64_trunc(
... );
}

@ depends on patch @
identifier xtime;
@@
struct \( iattr \| inode \| kstat \) {
...
- struct timespec xtime;
+ struct timespec64 xtime;
...
}

@ depends on patch @
identifier t;
@@
struct inode_operations {
...
int (*update_time) (...,
- struct timespec t,
+ struct timespec64 t,
...);
...
}

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
|
node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
stat->xtime = node2->i_xtime1;
|
stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>


# 38a70315 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up legacy NFS SYMLINK argument XDR decoders

Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:

- one fewer data copies on transports that support DDP
- consistent error checking across all versions
- reduction of code duplication
- support for both legal forms of SYMLINK requests on RDMA
transports for all versions of NFS (in particular, NFSv2, for
completeness)

In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.

Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8154ef27 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up legacy NFS WRITE argument XDR decoders

Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 39ca1bf6 03-Jan-2018 Amir Goldstein <amir73il@gmail.com>

nfsd: store stat times in fill_pre_wcc() instead of inode times

The time values in stat and inode may differ for overlayfs and stat time
values are the correct ones to use. This is also consistent with the fact
that fill_post_wcc() also stores stat time values.

This means introducing a stat call that could fail, where previously we
were just copying values out of the inode. To be conservative about
changing behavior, we fall back to copying values out of the inode in
the error case. It might be better just to clear fh_pre_saved (though
note the BUG_ON in set_change_info).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 256a89fa 18-Oct-2017 Arnd Bergmann <arnd@arndb.de>

nfds: avoid gettimeofday for nfssvc_boot time

do_gettimeofday() is deprecated and we should generally use time64_t
based functions instead.

In case of nfsd, all three users of nfssvc_boot only use the initial
time as a unique token, and are not affected by it overflowing, so they
are not affected by the y2038 overflow.

This converts the structure to timespec64 anyway and adds comments
to all uses, to document that we have thought about it and avoid
having to look at it again.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d16d1867 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# cc6acc20 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1150ded8 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_release callbacks

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 630458e7 11-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: factor ctime into change attribute

Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.

Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems. For now, nfsd
is the only user.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9512a16b 16-May-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: Revert "nfsd: check for oversized NFSv2/v3 arguments"

This reverts commit 51f567777799 "nfsd: check for oversized NFSv2/v3
arguments", which breaks support for NFSv3 ACLs.

That patch was actually an earlier draft of a fix for the problem that
was eventually fixed by e6838a29ecb "nfsd: check for oversized NFSv2/v3
arguments". But somehow I accidentally left this earlier draft in the
branch that was part of my 2.12 pull request.

Reported-by: Eryu Guan <eguan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 63f8de37 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 026fec7e 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8537488b 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_release callbacks

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 51f56777 06-Apr-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: check for oversized NFSv2/v3 arguments

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.

So, insist that the argument not be any longer than we expect.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 13bf9fbf 21-Apr-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: stricter decoding of write-like NFSv2/v3 ops

The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# db44bac4 25-Apr-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: minor NFSv2/v3 write decoding cleanup

Use a couple shortcuts that will simplify a following bugfix.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6625d091 04-May-2016 Chuck Lever <chuck.lever@oracle.com>

svcrdma: Do not add XDR padding to xdr_buf page vector

An xdr_buf has a head, a vector of pages, and a tail. Each
RPC request is presented to the NFS server contained in an
xdr_buf.

The RDMA transport would like to supply the NFS server with only
the NFS WRITE payload bytes in the page vector. In some common
cases, that would allow the NFS server to swap those pages right
into the target file's page cache.

Have the transport's RDMA Read logic put XDR pad bytes in the tail
iovec, and not in the pages that hold the data payload.

The NFSv3 WRITE XDR decoder is finicky about the lengths involved,
so make sure it is looking in the correct places when computing
the total length of the incoming NFS WRITE request.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc64005c 09-Apr-2016 Al Viro <viro@zeniv.linux.org.uk>

don't bother with ->d_inode->i_sb - it's always equal to ->d_sb

... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# bbddca8e 07-Jan-2016 NeilBrown <neilb@suse.de>

nfsd: don't hold i_mutex over userspace upcalls

We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir. If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about. We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd. But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex. So we could just drop the i_mutex elsewhere and do
something like

mutex_lock()
lookup_one_len()
mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aaf91ec1 17-Sep-2015 Jeff Layton <jlayton@kernel.org>

nfsd: switch unsigned char flags in svc_fh to bools

...just for clarity.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 43b0e7ea 02-May-2015 NeilBrown <neilb@suse.de>

nfsd: stop READDIRPLUS returning inconsistent attributes

The NFSv3 READDIRPLUS gets some of the returned attributes from the
readdir, and some from an inode returned from a new lookup. The two
objects could be different thanks to intervening renames.

The attributes in READDIRPLUS are optional, so let's just skip them if
we notice this case.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b0143b5 17-Mar-2015 David Howells <dhowells@redhat.com>

VFS: normal filesystems (and lustre): d_inode() annotations

that's the bulk of filesystem drivers dealing with inodes of their own

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 3c7aa15d 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using min/max/min_t/max_t for calculate

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d40aa337 22-May-2014 Benoit Taine <benoit.taine@lip6.fr>

nfsd: Remove assignments inside conditions

Assignments should not happen inside an if conditional, but in the line
before. This issue was reported by checkpatch.

The semantic patch that makes this change is as follows
(http://coccinelle.lip6.fr/):

// <smpl>

@@
identifier i1;
expression e1;
statement S;
@@
-if(!(i1 = e1)) S
+i1 = e1;
+if(!i1)
+S

// </smpl>

It has been tested by compilation.

Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 068c34c0 09-Jan-2014 J. Bruce Fields <bfields@redhat.com>

nfsd: fix encode_entryplus_baggage stack usage

We stick an extra svc_fh in nfsd3_readdirres to save the need to
kmalloc, though maybe it would be fine to kmalloc instead.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6e14b46b 17-Nov-2013 Albert Fluegel <af@muc.de>

nfsd: don't return high mode bits

The Linux NFS server replies among other things to a "Check access permission"
the following:

NFS: File type = 2 (Directory)
NFS: Mode = 040755

A netapp server replies here:
NFS: File type = 2 (Directory)
NFS: Mode = 0755

The RFC 1813 i read:
fattr3

struct fattr3 {
ftype3 type;
mode3 mode;
uint32 nlink;
...
For the mode bits only the lowest 9 are defined in the RFC

As far as I can tell, knfsd has always done this, so apparently it's harmless.
Nevertheless, it appears to be wrong.

Note this is already correct in the NFSv4 case, only v2 and v3 need
fixing.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3dadecce 24-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

switch vfs_getattr() to struct path

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 458878a7 02-Feb-2013 Eric W. Biederman <ebiederm@xmission.com>

nfsd: Convert nfs3xdr to use kuids and kgids

When reading uids and gids off the wire convert them to kuids and
kgids.

When putting kuids and kgids onto the wire first convert them to uids
and gids the other side will understand.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# e097258f 02-Feb-2013 Eric W. Biederman <ebiederm@xmission.com>

nfsd: Remove nfsd_luid, nfsd_lgid, nfsd_ruid and nfsd_rgid

These trivial macros that don't currently do anything are the last
vestiages of an old attempt at uid mapping that was removed from the
kernel in September of 2002. Remove them to make it clear what the
code is currently doing.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# afc59400 10-Dec-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup: replace rq_resused count by rq_next_page pointer

It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b9c0ef85 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make NFSd service boot time per-net

This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# efe39651 12-Apr-2012 Al Viro <viro@zeniv.linux.org.uk>

nfsd: fix compose_entry_fh() failure exits

Restore the original logics ("fail on mountpoints, negatives and in
case of fh_compose() failures"). Since commit 8177e (nfsd: clean up
readdirplus encoding) that got broken -
rv = fh_compose(fhp, exp, dchild, &cd->fh);
if (rv)
goto out;
if (!dchild->d_inode)
goto out;
rv = 0;
out:
is equivalent to
rv = fh_compose(fhp, exp, dchild, &cd->fh);
out:
and the second check has no effect whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c47d832b 16-May-2011 Daniel Mack <zonque@gmail.com>

nfsd: make local functions static

This also fixes a number of sparse warnings.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# c1ac3ffc 01-Dec-2010 Neil Brown <neilb@suse.de>

nfsd: Fix possible BUG_ON firing in set_change_info

If vfs_getattr in fill_post_wcc returns an error, we don't
set fh_post_change.
For NFSv4, this can result in set_change_info triggering a BUG_ON.
i.e. fh_post_saved being zero isn't really a bug.

So:
- instead of BUGging when fh_post_saved is zero, just clear ->atomic.
- if vfs_getattr fails in fill_post_wcc, take a copy of i_ctime anyway.
This will be used i seg_change_info, but not overly trusted.
- While we are there, remove the pointless 'if' statements in set_change_info.
There is no harm setting all the values.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7663dacd 04-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: remove pointless paths in file headers

The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a74af21 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Move private headers to source directory

Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 341eb184 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Source files #include cleanups

Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 479c2553 14-Nov-2009 Petr Vandrovec <petr@vandrovec.name>

Fix memory corruption caused by nfsd readdir+

Commit 8177e6d6dfb9cd03d9bdeb647c32161f8f58f686 ("nfsd: clean up
readdirplus encoding") introduced single character typo in nfs3 readdir+
implementation. Unfortunately that typo has quite bad side effects:
random memory corruption, followed (on my box) with immediate
spontaneous box reboot.

Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware
ESXi box tries to list contents of my home directory.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aed100fa 04-Sep-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: fix leak on error in nfsv3 readdir

Note the !dchild->d_inode case can leak the filehandle.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8177e6d6 04-Sep-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: clean up readdirplus encoding

Make the return from compose_entry_fh() zero or an error, even though
the returned error isn't used, just to make the meaning of the return
immediately obvious.

Move some repeated code out of main function into helper.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c654b8a9 16-Apr-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: support ext4 i_version

ext4 supports a real NFSv4 change attribute, which is bumped whenever
the ctime would be updated, including times when two updates arrive
within a jiffy of each other. (Note that although ext4 has space for
nanosecond-precision ctime, the real resolution is lower: it actually
uses jiffies as the time-source.) This ensures clients will invalidate
their caches when they need to.

There is some fear that keeping the i_version up-to-date could have
performance drawbacks, so for now it's turned on only by a mount option.
We hope to do something better eventually.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Theodore Tso <tytso@mit.edu>


# 54775491 14-Feb-2008 Jan Blunck <jblunck@suse.de>

Use struct path in struct svc_export

I'm embedding struct path into struct svc_export.

[akpm@linux-foundation.org: coding-style fixes]
[ezk@cs.sunysb.edu: NFSD: fix wrong mnt_writer count in rename]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99d965ed 21-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: fix encode_entryplus_baggage() indentation

Fix bizarre indentation.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2e8138a2 15-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: move nfsd/auth.h into fs/nfsd

This header is used only in a few places in fs/nfsd, so there seems to
be little point to having it in include/. (Thanks to Robert Day for
pointing this out.)

Cc: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a628f667 01-Nov-2007 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix mixed sign comparison in nfs3svc_decode_symlinkargs

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ee1a95b3 01-Nov-2007 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use unsigned length argument for decode_filename

Clean up: file name lengths are unsigned on the wire, negative lengths
are not meaningful natively either.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-By: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ba67a39e 11-Jan-2008 NeilBrown <neilb@suse.de>

knfsd: Allow NFSv2/3 WRITE calls to succeed when krb5i etc is used.

When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple
of 8 bytes. This can make the request look slightly longer than it
really is.

As of

f34b95689d2ce001c "The NFSv2/NFSv3 server does not handle zero
length WRITE request correctly",

the xdr decode routines for NFSv2 and NFSv3 reject requests that aren't
the right length, so krb5i (for example) WRITE requests can get lost.

This patch relaxes the appropriate test and enhances the related comment.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 40ee5dc6 15-Aug-2007 Peter Staubach <staubach@redhat.com>

knfsd: 64 bit ino support for NFS server

Modify the NFS server code to support 64 bit ino's, as
appropriate for the system and the NFS protocol version.

The gist of the changes is to query the underlying file system
for attributes and not just to use the cached attributes in the
inode. For this specific purpose, the inode only contains an
ino field which unsigned long, which is large enough on 64 bit
platforms, but is not large enough on 32 bit platforms.

I haven't been able to find any reason why ->getattr can't be called
while i_mutex. The specification indicates that i_mutex is not
required to be held in order to invoke ->getattr, but it doesn't say
that i_mutex can't be held while invoking ->getattr.

I also haven't come to any conclusions regarding the value of
lease_get_mtime() and whether it should or should not be invoked
by fill_post_wcc() too. I chose not to change this because I
thought that it was safer to leave well enough alone. If we
decide to make a change, it can be done separately.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 072f62ed 09-May-2007 NeilBrown <neilb@suse.de>

knfsd: various nfsd xdr cleanups

1/ decode_sattr and decode_sattr3 never return NULL, so remove
several checks for that. ditto for xdr_decode_hyper.

2/ replace some open coded XDR_QUADLEN calls with calls to
XDR_QUADLEN

3/ in decode_writeargs, simply an 'if' to use a single
calculation.
.page_len is the length of that part of the packet that did
not fit in the first page (the head).
So the length of the data part is the remainder of the
head, plus page_len.

3/ other minor cleanups.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f34b9568 09-May-2007 Peter Staubach <staubach@redhat.com>

The NFSv2/NFSv3 server does not handle zero length WRITE requests correctly

The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes
correctly. The specifications indicate that the server should accept the
request, but it should mostly turn into a no-op. Currently, the server
will return an XDR decode error, which it should not.

Attached is a patch which addresses this issue. It also adds some boundary
checking to ensure that the request contains as much data as was requested
to be written. It also correctly handles an NFSv3 request which requests
to write more data than the server has stated that it is prepared to
handle. Previously, there was some support which looked like it should
work, but wasn't quite right.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 598b9a56 26-Mar-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: allow nfsd READDIR to return 64bit cookies

->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.

So filesystems that returned 64bit offsets would lose.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# af6a4e28 14-Feb-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: add some new fsid types

Add support for using a filesystem UUID to identify and export point in the
filehandle.

For NFSv2, this UUID is xor-ed down to 4 or 8 bytes so that it doesn't take up
too much room. For NFSv3+, we use the full 16 bytes, and possibly also a
64bit inode number for exports beneath the root of a filesystem.

When generating an fsid to return in 'stat' information, use the UUID (hashed
down to size) if it is available and a small 'fsid' was not specifically
provided.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0ad13ef 26-Jan-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Fix type mismatch with filldir_t used by nfsd

nfsd defines a type 'encode_dent_fn' which is much like 'filldir_t' except
that the first pointer is 'struct readdir_cd *' rather than 'void *'. It
then casts encode_dent_fn points to 'filldir_t' as needed. This hides any
other type mismatches between the two such as the fact that the 'ino' arg
recently changed from ino_t to u64.

So: get rid of 'encode_dent_fn', get rid of the cast of the function type,
change the first arg of various functions from 'struct readdir_cd *' to
'void *', and live with the fact that we have a little less type checking
on the calling of these functions now. Less internal (to nfsd) checking
offset by more external checking, which is more important.

Thanks to Gabriel Paubert <paubert@iram.es> for discovering this and
providing an initial patch.

Signed-off-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3ee6f61c 06-Dec-2006 Adrian Bunk <bunk@stusta.de>

[PATCH] remove NFSD_OPTIMIZE_SPACE

This patch removes the unused NFSD_OPTIMIZE_SPACE.

Additionally, it does differently what NFSD_OPTIMIZE_SPACE was supposed to do:

Nowadays, gcc knows best when to inline code, and CONFIG_CC_OPTIMIZE_FOR_SIZE
even tells gcc globally whether to optimize for size or for speed. Therefore,
this patch also removes all inline's from these files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 91f07168 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] xdr annotations: NFSv3 server

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 7adae489 04-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP

The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet. That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3cc03b16 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom

.. by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack. So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union). So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 44524359 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Replace two page lists in struct svc_rqst with one

We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays. However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply. Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a334de28 06-Jan-2006 David Shaw <dshaw@jabberwocky.com>

[PATCH] knfsd: check error status from vfs_getattr and i_op->fsync

Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring. This as noticed when exporting directories using fuse.

This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.

There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 0ba7536d 07-Nov-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: Fix some minor sign problems in nfsd/xdr

There are a couple of tests which could possibly be confused by extremely
large numbers appearing in 'xdr' packets. I think the closest to an exploit
you could get would be writing random data from a free page into a file - i.e.
leak data out of kernel space.

I'm fairly sure they cannot be used for remote compromise.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a257cdd0 22-Jun-2005 Andreas Gruenbacher <agruen@suse.de>

[PATCH] NFSD: Add server support for NFSv3 ACLs.

This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!