History log of /linux-master/fs/nfsd/nfs4layouts.c
Revision Date Author Comments
# 10bcc2f1 30-Jan-2024 Kunwu Chan <chentao@kylinos.cn>

nfsd: Simplify the allocation of slab caches in nfsd4_init_pnfs

commit 0a31bd5f2bbb ("KMEM_CACHE(): simplify slab cache creation")
introduces a new macro.
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1e33e141 29-Jan-2024 NeilBrown <neilb@suse.de>

nfsd: allow layout state to be admin-revoked.

When there is layout state on a filesystem that is being "unlocked" that
is now revoked, which involves closing the nfsd_file and releasing the
vfs lease.

To avoid races, ->ls_file can now be accessed either:
- under ->fi_lock for the state's sc_file or
- under rcu_read_lock() if nfsd_file_get() is used.
To support this, ->fence_client and nfsd4_cb_layout_fail() now take a
second argument being the nfsd_file.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3f29cc82 29-Jan-2024 NeilBrown <neilb@suse.de>

nfsd: split sc_status out of sc_type

sc_type identifies the type of a state - open, lock, deleg, layout - and
also the status of a state - closed or revoked.

This is a bit untidy and could get worse when "admin-revoked" states are
added. So clean it up.

With this patch, the type is now all that is stored in sc_type. This is
zero when the state is first added to ->cl_stateids (causing it to be
ignored), and is then set appropriately once it is fully initialised.
It is set under ->cl_lock to ensure atomicity w.r.t lookup. It is now
never cleared.

sc_type is still a bit-set even though at most one bit is set. This allows
lookup functions to be given a bitmap of acceptable types.

sc_type is now an unsigned short rather than char. There is no value in
restricting to just 8 bits.

All the constants now start SC_TYPE_ matching the field in which they
are stored. Keeping the existing names and ensuring clear separation
from non-type flags would have required something like
NFS4_STID_TYPE_CLOSED which is cumbersome. The "NFS4" prefix is
redundant was they only appear in NFS4 code, so remove that and change
STID to SC to match the field.

The status is stored in a separate unsigned short named "sc_status". It
has two flags: SC_STATUS_CLOSED and SC_STATUS_REVOKED.
CLOSED combines NFS4_CLOSED_STID, NFS4_CLOSED_DELEG_STID, and is used
for SC_TYPE_LOCK and SC_TYPE_LAYOUT instead of setting the sc_type to zero.
These flags are only ever set, never cleared.
For deleg stateids they are set under the global state_lock.
For open and lock stateids they are set under ->cl_lock.
For layout stateids they are set under ->ls_lock

nfs4_unhash_stid() has been removed, and we never set sc_type = 0. This
was only used for LOCK and LAYOUT stids and they now use
SC_STATUS_CLOSED.

Also TRACE_DEFINE_NUM() calls for the various STID #define have been
removed because these things are not enums, and so that call is
incorrect.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7b800101 05-Feb-2024 Jeff Layton <jlayton@kernel.org>

filelock: don't do security checks on nfsd setlease calls

Zdenek reported seeing some AVC denials due to nfsd trying to set
delegations:

type=AVC msg=audit(09.11.2023 09:03:46.411:496) : avc: denied { lease } for pid=5127 comm=rpc.nfsd capability=lease scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:nfsd_t:s0 tclass=capability permissive=0

When setting delegations on behalf of nfsd, we don't want to do all of
the normal capabilty and LSM checks. nfsd is a kernel thread and runs
with CAP_LEASE set, so the uid checks end up being a no-op in most cases
anyway.

Some nfsd functions can end up running in normal process context when
tearing down the server. At that point, the CAP_LEASE check can fail and
cause the client to not tear down delegations when expected.

Also, the way the per-fs ->setlease handlers work today is a little
convoluted. The non-trivial ones are wrappers around generic_setlease,
so when they fail due to permission problems they usually they end up
doing a little extra work only to determine that they can't set the
lease anyway. It would be more efficient to do those checks earlier.

Transplant the permission checking from generic_setlease to
vfs_setlease, which will make the permission checking happen earlier on
filesystems that have a ->setlease operation. Add a new kernel_setlease
function that bypasses these checks, and switch nfsd to use that instead
of vfs_setlease.

There is one behavioral change here: prior this patch the
setlease_notifier would fire even if the lease attempt was going to fail
the security checks later. With this change, it doesn't fire until the
caller has passed them. I think this is a desirable change overall. nfsd
is the only user of the setlease_notifier and it doesn't benefit from
being notified about failed attempts.

Cc: Ondrej Mosnáček <omosnacek@gmail.com>
Reported-by: Zdenek Pytela <zpytela@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2248830
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240205-bz2248830-v1-1-d0ec0daecba1@kernel.org
Acked-by: Tom Talpey <tom@talpey.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# c69ff407 31-Jan-2024 Jeff Layton <jlayton@kernel.org>

filelock: split leases out of struct file_lock

Add a new struct file_lease and move the lease-specific fields from
struct file_lock to it. Convert the appropriate API calls to take
struct file_lease instead, and convert the callers to use them.

There is zero overlap between the lock manager operations for file
locks and the ones for file leases, so split the lease-related
operations off into a new lease_manager_operations struct.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-47-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 05580bbf 31-Jan-2024 Jeff Layton <jlayton@kernel.org>

nfsd: adapt to breakup of struct file_lock

Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-42-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 85dbc978 25-Sep-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd4_encode_layoutreturn()

Adopt the use of conventional XDR utility functions. Restructure
the encoder to better align with the XDR definition of the result.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fb610c4d 27-Jan-2023 Benjamin Coddington <bcodding@redhat.com>

nfsd: fix race to check ls_layouts

Its possible for __break_lease to find the layout's lease before we've
added the layout to the owner's ls_layouts list. In that case, setting
ls_recalled = true without actually recalling the layout will cause the
server to never send a recall callback.

Move the check for ls_layouts before setting ls_recalled.

Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1035d654 08-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add tracepoints to report NFSv4 callback completions

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>


# 4fc5f534 19-Mar-2022 Jakob Koschel <jakobkoschel@gmail.com>

nfsd: fix using the correct variable for sizeof()

While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the destination.

Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8c6aabd1 21-Oct-2021 Christoph Hellwig <hch@lst.de>

nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands

Call the ->get_unique_id method to query the SCSI identifiers. This can
use the cached VPD page in the sd driver instead of sending a command
on every LAYOUTGET. It will also allow to support NVMe based volumes
if the draft for that ever takes off.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20211021060607.264371-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 2561c92b 03-Nov-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: fix delay timer on 32-bit architectures

The nfsd4_cb_layout_done() function takes a 'time_t' value,
multiplied by NSEC_PER_SEC*2 to get a nanosecond value.

This works fine on 64-bit architectures, but on 32-bit, any
value over 1 second results in a signed integer overflow
with unexpected results.

Cast one input to a 64-bit type in order to produce the
same result that we have on 64-bit architectures, regarless
of the type of nfsd4_lease.

Fixes: 6b9b21073d3b ("nfsd: give up on CB_LAYOUTRECALLs after two lease periods")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# eb82dd39 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: convert fi_deleg_file and ls_file fields to nfsd_file

Have them keep an nfsd_file reference instead of a struct file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1c73b9d2 02-May-2019 Scott Mayhew <smayhew@redhat.com>

nfsd: update callback done processing

Instead of having the convention where individual nfsd4_callback_ops->done
operations return -1 to indicate the callback path is down, move the check
to nfsd4_cb_done. Only mark the callback path down on transport-level
errors, not NFS-level errors.

The existing logic causes the server to set SEQ4_STATUS_CB_PATH_DOWN
just because the client returned an error to a CB_RECALL for a
delegation that the client had already done a FREE_STATEID for. But
clearly that error doesn't mean that there's anything wrong with the
backchannel.

Additionally, handle NFS4ERR_DELAY in nfsd4_cb_recall_done. The client
returns NFS4ERR_DELAY if it is already in the process of returning the
delegation.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8a68d3da 23-Dec-2018 Julia Lawall <Julia.Lawall@lip6.fr>

nfsd: drop useless LIST_HEAD

Drop LIST_HEAD where the variable it declares is never used.

This was introduced in c5c707f96fc9a ("nfsd: implement pNFS
layout recalls"), but was not used even in that commit.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>

Fixes: c5c707f96fc9a ("nfsd: implement pNFS layout recalls")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8163496e 19-Jun-2018 Benjamin Coddington <bcodding@redhat.com>

nfsd: don't advertise a SCSI layout for an unsupported request_queue

Commit 30181faae37f ("nfsd: Check queue type before submitting a SCSI
request") did the work of ensuring that we don't send SCSI requests to a
request queue that won't support them, but that check is in the
GETDEVICEINFO path. Let's not set the SCSI layout in fs_layout_type in the
first place, and then we'll have less clients sending GETDEVICEINFO for
non-SCSI request queues and less unnecessary WARN_ONs.

While we're in here, remove some outdated comments that refer to
"overwriting" layout seletion because commit 8a4c3926889e ("nfsd: allow
nfsd to advertise multiple layout types") changed things to no longer
overwrite the layout type.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f394b62b 27-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Add "nfsd_" to trace point names

Follow naming convention used in client and in sunrpc layers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a15dfcd5 19-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d19fb70d 18-Jan-2017 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix a null reference case in find_or_create_lock_stateid()

nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 377e7a27 11-Dec-2016 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Make static usermode helper binaries constant

There are a number of usermode helper binaries that are "hard coded" in
the kernel today, so mark them as "const" to make it harder for someone
to change where the variables point to.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alex Elder <elder@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 851238a2 19-Oct-2016 Jeff Layton <jlayton@kernel.org>

nfsd: fix error handling for clients that fail to return the layout

Currently, when the client continually returns NFS4ERR_DELAY on a
CB_LAYOUTRECALL, we'll give up trying to retransmit after two lease
periods, but leave the layout in place.

What we really need to do here is fence the client in this case. Have it
fall through to that code in that case instead of into the
NFS4ERR_NOMATCHING_LAYOUT case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1983a66f 11-Aug-2016 Jeff Layton <jlayton@kernel.org>

nfsd: don't set a FL_LAYOUT lease for flexfiles layouts

We currently can hit a deadlock (of sorts) when trying to use flexfiles
layouts with XFS. XFS will call break_layout when something wants to
write to the file. In the case of the (super-simple) flexfiles layout
driver in knfsd, the MDS and DS are the same machine.

The client can get a layout and then issue a v3 write to do its I/O. XFS
will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
be issued to the client. The client however can't return the layout
until the v3 WRITE completes, but XFS won't allow the write to proceed
until the layout is returned.

Christoph says:

XFS only cares about block-like layouts where the client has direct
access to the file blocks. I'd need to look how to propagate the
flag into break_layout, but in principle we don't need to do any
recalls on truncate ever for file and flexfile layouts.

If we're never going to recall the layout, then we don't even need to
set the lease at all. Just skip doing so on flexfiles layouts by
adding a new flag to struct nfsd4_layout_ops and skipping the lease
setting and removal when that flag is true.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8a4c3926 10-Jul-2016 Jeff Layton <jlayton@kernel.org>

nfsd: allow nfsd to advertise multiple layout types

If the underlying filesystem supports multiple layout types, then there
is little reason not to advertise that fact to clients and let them
choose what type to use.

Turn the ex_layout_type field into a bitfield. For each supported
layout type, we set a bit in that field. When the client requests a
layout, ensure that the bit for that layout type is set. When the
client requests attributes, send back a list of supported types.

Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9b9960a0 14-Jun-2016 Tom Haynes <thomas.haynes@primarydata.com>

nfsd: Add a super simple flex file server

Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2)
is also the ds (NFSv3). I.e., the metadata and the data file are
the exact same file.

This will allow testing of the flex file client.

Simply add the "pnfs" export option to your export
in /etc/exports and mount from a client that supports
flex files.

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 14b7f4a1 05-May-2016 Jeff Layton <jlayton@kernel.org>

nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid

Move the existing static function to an inline helper, and call it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f99d4fbd 04-Mar-2016 Christoph Hellwig <hch@lst.de>

nfsd: add SCSI layout support

This is a simple extension to the block layout driver to use SCSI
persistent reservations for access control and fencing, as well as
SCSI VPD pages for device identification.

For this we need to pass the nfs4_client to the proc_getdeviceinfo method
to generate the reservation key, and add a new fence_client method
to allow for fence actions in the layout driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 81c39329 04-Mar-2016 Christoph Hellwig <hch@lst.de>

nfsd: add a new config option for the block layout driver

Split the config symbols into a generic pNFS one, which is invisible
and gets selected by the layout drivers, and one for the block layout
driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6b9b2107 08-Dec-2015 Jeff Layton <jlayton@kernel.org>

nfsd: give up on CB_LAYOUTRECALLs after two lease periods

Have the CB_LAYOUTRECALL code treat NFS4_OK and NFS4ERR_DELAY returns
equivalently. Change the code to periodically resend CB_LAYOUTRECALLS
until the ls_layouts list is empty or the client returns a different
error code.

If we go for two lease periods without the list being emptied or the
client sending a hard error, then we give up and clean out the list
anyway.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# be20aa00 29-Nov-2015 Jeff Layton <jlayton@kernel.org>

nfsd: don't hold ls_mutex across a layout recall

We do need to serialize layout stateid morphing operations, but we
currently hold the ls_mutex across a layout recall which is pretty
ugly. It's also unnecessary -- once we've bumped the seqid and
copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL
vs. anything else. Just drop the mutex once the copy is done.

This was causing a "workqueue leaked lock or atomic" warning and an
occasional deadlock.

There's more work to be done here but this fixes the immediate
regression.

Fixes: cc8a55320b5f "nfsd: serialize layout stateid morphing operations"
Cc: stable@vger.kernel.org
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c4cb8974 21-Nov-2015 Julia Lawall <Julia.Lawall@lip6.fr>

nfsd: constify nfsd4_callback_ops structure

The nfsd4_callback_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9767feb2 01-Oct-2015 Jeff Layton <jlayton@kernel.org>

nfsd: ensure that seqid morphing operations are atomic wrt to copies

Bruce points out that the increment of the seqid in stateids is not
serialized in any way, so it's possible for racing calls to bump it
twice and end up sending the same stateid. While we don't have any
reports of this problem it _is_ theoretically possible, and could lead
to spurious state recovery by the client.

In the current code, update_stateid is always followed by a memcpy of
that stateid, so we can combine the two operations. For better
atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
the seqid and copying the stateid.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cc8a5532 17-Sep-2015 Jeff Layton <jlayton@kernel.org>

nfsd: serialize layout stateid morphing operations

In order to allow the client to make a sane determination of what
happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we
must ensure that the seqids return accurately represent the order of
operations. The simplest way to do that is to ensure that operations on
a single stateid are serialized.

This patch adds a mutex to the layout stateid, and locks it when
checking the layout stateid's seqid. The mutex is held over the entire
operation and released after the seqid is bumped.

Note that in the case of CB_LAYOUTRECALL we must move the increment of
the seqid and setting into a new cb "prepare" operation. The lease
infrastructure will call the lm_break callback with a spinlock held, so
and we can't take the mutex in that codepath.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1ca4b88e 09-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Fix a file leak on nfsd4_layout_setlease failure

If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file.

Fix commit c5c707f96f "nfsd: implement pNFS layout recalls".

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f3f03330 30-Mar-2015 Christoph Hellwig <hch@lst.de>

nfsd: require an explicit option to enable pNFS

Turns out sending out layouts to any client is a bad idea if they
can't get at the storage device, so require explicit admin action
to enable pNFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7890203d 22-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix bad update of layout in nfsd4_return_file_layout

With return layout as, (seg is return layout, lo is record layout)
seg->offset <= lo->offset and layout_end(seg) < layout_end(lo),
nfsd should update lo's offset to seg's end,
and,
seg->offset > lo->offset and layout_end(seg) >= layout_end(lo),
nfsd should update lo's end to seg's offset.

Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6f8f28ec 19-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Check layout type when returning client layouts

According to RFC5661:
" When lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
to identify the file system and all layouts matching the client ID,
the fsid of the file system, lora_layout_type, and lora_iomode are
returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts
matching the client ID, lora_layout_type, and lora_iomode are
returned and the current filehandle is not used. "

When returning client layouts, always check layout type.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 715a03d2 20-Mar-2015 Kinglong Mee <kinglongmee@gmail.com>

NFSD: restore trace event lost in mismerge

31ef83dc05 "nfsd: add trace events" had a typo that dropped a trace
event and replaced it by an incorrect recursive call to
nfsd4_cb_layout_fail. 133d558216d9 "Subject: nfsd: don't recursively
call nfsd4_cb_layout_fail" fixed the crash, this restores the
tracepoint.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 133d5582 05-Mar-2015 Christoph Hellwig <hch@lst.de>

Subject: nfsd: don't recursively call nfsd4_cb_layout_fail

Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
fs/nfsd/nfs4layouts.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 3c1bfa1..1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

- nfsd4_cb_layout_fail(ls);
-
printk(KERN_WARNING
"nfsd: client %s failed to respond to layout recall. "
" Fencing..\n", addr_str);
--
1.9.1


# 8650b8a0 21-Jan-2015 Christoph Hellwig <hch@lst.de>

nfsd: pNFS block layout driver

Add a small shim between core nfsd and filesystems to translate the
somewhat cumbersome pNFS data structures and semantics to something
more palatable for Linux filesystems.

Thanks to Rick McNeal for the old prototype pNFS blocklayout server
code, which gave a lot of inspiration to this version even if no
code is left from it.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 31ef83dc 16-Aug-2014 Christoph Hellwig <hch@lst.de>

nfsd: add trace events

For now just a few simple events to trace the layout stateid lifetime, but
these already were enough to find several bugs in the Linux client layout
stateid handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# c5c707f9 22-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS layout recalls

Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9cf514cc 05-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS operations

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>