History log of /linux-master/fs/nfsd/state.h
Revision Date Author Comments
# 8ddb7142 23-Apr-2024 Chuck Lever <chuck.lever@oracle.com>

Revert "NFSD: Convert the callback workqueue to use delayed_work"

This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD:
Reschedule CB operations when backchannel rpc_clnt is shut down"),
which has already been reverted.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c5967721 15-Feb-2024 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

. update time_modify and time_metadata into file's metadata
with current time

. encode GETATTR as normal except the file size is encoded with
the value returned from CB_GETATTR

. mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6487a13b 15-Feb-2024 Dai Ngo <dai.ngo@oracle.com>

NFSD: add support for CB_GETATTR callback

Includes:
. CB_GETATTR proc for nfs4_cb_procedures[]
. XDR encoding and decoding function for CB_GETATTR request/reply
. add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR
and store file attributes from client's reply.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f8104027 08-Feb-2024 Chen Hanxiao <chenhx.fnst@fujitsu.com>

nfsd: clean up comments over nfs4_client definition

nfsd fault injection has been deprecated since
commit 9d60d93198c6 ("Deprecate nfsd fault injection")
and removed by
commit e56dc9e2949e ("nfsd: remove fault injection code")

So remove the outdated parts about fault injection.

Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1ac3629b 29-Jan-2024 NeilBrown <neilb@suse.de>

nfsd: prepare for supporting admin-revocation of state

The NFSv4 protocol allows state to be revoked by the admin and has error
codes which allow this to be communicated to the client.

This patch
- introduces a new state-id status SC_STATUS_ADMIN_REVOKED
which can be set on open, lock, or delegation state.
- reports NFS4ERR_ADMIN_REVOKED when these are accessed
- introduces a per-client counter of these states and returns
SEQ4_STATUS_ADMIN_STATE_REVOKED when the counter is not zero.
Decrements this when freeing any admin-revoked state.
- introduces stub code to find all interesting states for a given
superblock so they can be revoked via the 'unlock_filesystem'
file in /proc/fs/nfsd/
No actual states are handled yet.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3f29cc82 29-Jan-2024 NeilBrown <neilb@suse.de>

nfsd: split sc_status out of sc_type

sc_type identifies the type of a state - open, lock, deleg, layout - and
also the status of a state - closed or revoked.

This is a bit untidy and could get worse when "admin-revoked" states are
added. So clean it up.

With this patch, the type is now all that is stored in sc_type. This is
zero when the state is first added to ->cl_stateids (causing it to be
ignored), and is then set appropriately once it is fully initialised.
It is set under ->cl_lock to ensure atomicity w.r.t lookup. It is now
never cleared.

sc_type is still a bit-set even though at most one bit is set. This allows
lookup functions to be given a bitmap of acceptable types.

sc_type is now an unsigned short rather than char. There is no value in
restricting to just 8 bits.

All the constants now start SC_TYPE_ matching the field in which they
are stored. Keeping the existing names and ensuring clear separation
from non-type flags would have required something like
NFS4_STID_TYPE_CLOSED which is cumbersome. The "NFS4" prefix is
redundant was they only appear in NFS4 code, so remove that and change
STID to SC to match the field.

The status is stored in a separate unsigned short named "sc_status". It
has two flags: SC_STATUS_CLOSED and SC_STATUS_REVOKED.
CLOSED combines NFS4_CLOSED_STID, NFS4_CLOSED_DELEG_STID, and is used
for SC_TYPE_LOCK and SC_TYPE_LAYOUT instead of setting the sc_type to zero.
These flags are only ever set, never cleared.
For deleg stateids they are set under the global state_lock.
For open and lock stateids they are set under ->cl_lock.
For layout stateids they are set under ->ls_lock

nfs4_unhash_stid() has been removed, and we never set sc_type = 0. This
was only used for LOCK and LAYOUT stids and they now use
SC_STATUS_CLOSED.

Also TRACE_DEFINE_NUM() calls for the various STID #define have been
removed because these things are not enums, and so that call is
incorrect.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fe0e9580 25-Jan-2024 Chuck Lever <chuck.lever@oracle.com>

NFSD: Convert the callback workqueue to use delayed_work

Normally, NFSv4 callback operations are supposed to be sent to the
client as soon as they are queued up.

In a moment, I will introduce a recovery path where the server has
to wait for the client to reconnect. We don't want a hard busy wait
here -- the callback should be requeued to try again in several
milliseconds.

For now, convert nfsd4_callback from struct work_struct to struct
delayed_work, and queue with a zero delay argument. This should
avoid behavior changes for current operation.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1227561c 15-Dec-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0

There's nothing wrong with this commit, but this is dead code now
that nothing triggers a CB_GETATTR callback. It can be re-introduced
once the issues with handling conflicting GETATTRs are resolved.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 862bee84 16-Dec-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d

For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict()
is waiting forever, preventing a clean server shutdown. The
requesting client might also hang waiting for a reply to the
conflicting GETATTR.

Invoking wait_on_bit() in an nfsd thread context is a hazard. The
correct fix is to replace this wait_on_bit() call site with a
mechanism that defers the conflicting GETATTR until the CB_GETATTR
completes or is known to have failed.

That will require some surgery and extended testing and it's late
in the v6.7-rc cycle, so I'm reverting now in favor of trying again
in a subsequent kernel release.

This is my fault: I should have recognized the ramifications of
calling wait_on_bit() in here before accepting this patch.

Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue.

Reported-by: Wolfgang Walter <linux-nfs@stwm.de>
Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bd9d6a3e 11-Sep-2023 Lorenzo Bianconi <lorenzo@kernel.org>

NFSD: add rpc_status netlink support

Introduce rpc_status netlink support for NFSD in order to dump pending
RPC requests debugging information from userspace.

Closes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=366
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6c41d9a9 13-Sep-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect
and the request attributes include the change info and size attribute
then the request is handled as below:

Server sends CB_GETATTR to client to get the latest change info and file
size. If these values are the same as the server's cached values then
the GETATTR proceeds as normal.

If either the change info or file size is different from the server's
cached values, or the file was already marked as modified, then:

. update time_modify and time_metadata into file's metadata
with current time

. encode GETATTR as normal except the file size is encoded with
the value returned from CB_GETATTR

. mark the file as modified

If the CB_GETATTR fails for any reasons, the delegation is recalled
and NFS4ERR_DELAY is returned for the GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 738401a9 13-Sep-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: add support for CB_GETATTR callback

Includes:
. CB_GETATTR proc for nfs4_cb_procedures[]
. XDR encoding and decoding function for CB_GETATTR request/reply
. add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR
and store file attributes from client's reply.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fd19ca36 29-Jun-2023 Dai Ngo <dai.ngo@oracle.com>

NFSD: handle GETATTR conflict with write delegation

If the GETATTR request on a file that has write delegation in effect and
the request attributes include the change info and size attribute then
the write delegation is recalled. If the delegation is returned within
30ms then the GETATTR is serviced as normal otherwise the NFS4ERR_DELAY
error is returned for the GETATTR.

Add counter for write delegation recall due to conflict GETATTR. This is
used to evaluate the need to implement CB_GETATTR to adoid recalling the
delegation with conflit GETATTR.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fcb53097 17-Jan-2023 Jeff Layton <jlayton@kernel.org>

nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS

We're not doing any blocking operations for OP_OFFLOAD_STATUS, so taking
and putting a reference is a waste of effort. Take the client lock,
search for the copy and fetch the wr_bytes_written field and return.

Also, make find_async_copy a static function.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 44df6f43 16-Nov-2022 Dai Ngo <dai.ngo@oracle.com>

NFSD: add delegation reaper to react to low memory condition

The delegation reaper is called by nfsd memory shrinker's on
the 'count' callback. It scans the client list and sends the
courtesy CB_RECALL_ANY to the clients that hold delegations.

To avoid flooding the clients with CB_RECALL_ANY requests, the
delegation reaper sends only one CB_RECALL_ANY request to each
client per 5 seconds.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
[ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3959066b 16-Nov-2022 Dai Ngo <dai.ngo@oracle.com>

NFSD: add support for sending CB_RECALL_ANY

Add XDR encode and decode function for CB_RECALL_ANY.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d47b295e 28-Oct-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use rhashtable for managing nfs4_file objects

fh_match() is costly, especially when filehandles are large (as is
the case for NFSv4). It needs to be used sparingly when searching
data structures. Unfortunately, with common workloads, I see
multiple thousands of objects stored in file_hashtbl[], which has
just 256 buckets, making its bucket hash chains quite lengthy.

Walking long hash chains with the state_lock held blocks other
activity that needs that lock. Sizable hash chains are a common
occurrance once the server has handed out some delegations, for
example -- IIUC, each delegated file is held open on the server by
an nfs4_file object.

To help mitigate the cost of searching with fh_match(), replace the
nfs4_file hash table with an rhashtable, which can dynamically
resize its bucket array to minimize hash chain length.

The result of this modification is an improvement in the latency of
NFSv4 operations, and the reduction of nfsd CPU utilization due to
eliminating the cost of multiple calls to fh_match() and reducing
the CPU cache misses incurred while walking long hash chains in the
nfs4_file hash table.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>


# b95239ca 26-Sep-2022 Jeff Layton <jlayton@kernel.org>

nfsd: make nfsd4_run_cb a bool return function

queue_work can return false and not queue anything, if the work is
already queued. If that happens in the case of a CB_RECALL, we'll have
taken an extra reference to the stid that will never be put. Ensure we
throw a warning in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 781fde1a 22-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename the fields in copy_stateid_t

Code maintenance: The name of the copy_stateid_t::sc_count field
collides with the sc_count field in struct nfs4_stid, making the
latter difficult to grep for when auditing stateid reference
counting.

No behavior change expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 18224dc5 09-Sep-2022 Gaosheng Cui <cuigaosheng1@huawei.com>

nfsd: remove nfsd4_prepare_cb_recall() declaration

nfsd4_prepare_cb_recall() has been removed since
commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"),
so remove it.

Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 80e591ce 02-Sep-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND

When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
single large COMPOUND that chains a series of LOOKUPs to get to the
pseudo filesystem root directory that is to be mounted. The Linux
NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
not large enough to ensure that this works for paths that are more
than a few components deep.

Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
trigger any re-allocation of the operation array on the server),
increasing this maximum should result in little to no impact.

The ops array can get large now, so allocate it via vmalloc() to
help ensure memory fragmentation won't cause an allocation failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8ea6e2c9 27-Jul-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Make nfs4_put_copy() static

Clean up: All call sites are in fs/nfsd/nfs4proc.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 66af2579 02-May-2022 Dai Ngo <dai.ngo@oracle.com>

NFSD: add courteous server support for thread with only delegation

This patch provides courteous server support for delegation only.
Only expired client with delegation but no conflict and no open
or lock state is allowed to be in COURTESY state.

Delegation conflict with COURTESY/EXPIRABLE client is resolved by
setting it to EXPIRABLE, queue work for the laundromat and return
delay to the caller. Conflict is resolved when the laudromat runs
and expires the EXIRABLE client while the NFS client retries the
OPEN request. Local thread request that gets conflict is doing the
retry in _break_lease.

Client in COURTESY or EXPIRABLE state is allowed to reconnect and
continues to have access to its state. Access to the nfs4_client by
the reconnecting thread and the laundromat is serialized via the
client_lock.

Reviewed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 47446d74 16-Dec-2021 Vasily Averin <vvs@virtuozzo.com>

nfsd4: add refcount for nfsd4_blocked_lock

nbl allocated in nfsd4_lock can be released by a several ways:
directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs
command RELEASE_LOCKOWNER or via nfsd4_callback.
This structure should be refcounted to be used and released correctly
in all these cases.

Refcount is initialized to 1 during allocation and is incremented
when nbl is added into nbl_list/nbl_lru lists.

Usually nbl is linked into both lists together, so only one refcount
is used for both lists.

However nfsd4_lock() should keep in mind that nbl can be present
in one of lists only. This can happen if nbl was handled already
by nfs4_laundromat/nfsd4_callback/etc.

Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED,
because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.

Refcount is not changed in find_blocked_lock() because of it reuses counter
released after removing nbl from lists.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3dcd1d8a 07-Dec-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: improve stateid access bitmask documentation

The use of the bitmaps is confusing. Add a cross-reference to make it
easier to find the existing comment. Add an updated reference with URL
to make it quicker to look up. And a bit more editorializing about the
value of this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a0ce4837 16-Apr-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: track filehandle aliasing in nfs4_files

It's unusual but possible for multiple filehandles to point to the same
file. In that case, we may end up with multiple nfs4_files referencing
the same inode.

For delegation purposes it will turn out to be useful to flag those
cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f9b60e22 16-Apr-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: hash nfs4_files by inode number

The nfs4_file structure is per-filehandle, not per-inode, because the
spec requires open and other state to be per filehandle.

But it will turn out to be convenient for nfs4_files associated with the
same inode to be hashed to the same bucket, so let's hash on the inode
instead of the filehandle.

Filehandle aliasing is rare, so that shouldn't have much performance
impact.

(If you have a ton of exported filesystems, though, and all of them have
a root with inode number 2, could that get you an overlong hash chain?
Perhaps this (and the v4 open file cache) should be hashed on the inode
pointer instead.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 472d155a 19-Mar-2021 NeilBrown <neilb@suse.de>

nfsd: report client confirmation status in "info" file

mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.

Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.

So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.

This requires a bit of infrastructure to keep the dentry for the "info"
file. There is no need to take a counted reference as the dentry must
remain around until the client is removed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1722b046 21-Jan-2021 J. Bruce Fields <bfields@redhat.com>

nfsd: simplify nfsd4_check_open_reclaim

The set_client() was already taken care of by process_open1().

The comments here are mostly redundant with the code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e56dc9e2 30-Jul-2020 J. Bruce Fields <bfields@redhat.com>

nfsd: remove fault injection code

It was an interesting idea but nobody seems to be using it, it's buggy
at this point, and nfs4state.c is already complicated enough without it.
The new nfsd/clients/ code provides some of the same functionality, and
could probably do more if desired.

This feature has been deprecated since 9d60d93198c6 ("Deprecate nfsd
fault injection").

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dd5e3fbc 05-Apr-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add tracepoints to the NFSD state management code

Capture obvious events and replace dprintk() call sites. Introduce
infrastructure so that adding more tracepoints in this code later
is simplified.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 20b7d86f 04-Nov-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: use boottime for lease expiry calculation

A couple of time_t variables are only used to track the state of the
lease time and its expiration. The code correctly uses the 'time_after()'
macro to make this work on 32-bit architectures even beyond year 2038,
but the get_seconds() function and the time_t type itself are deprecated
as they behave inconsistently between 32-bit and 64-bit architectures
and often lead to code that is not y2038 safe.

As a minor issue, using get_seconds() leads to problems with concurrent
settimeofday() or clock_settime() calls, in the worst case timeout never
triggering after the time has been set backwards.

Change nfsd to use time64_t and ktime_get_boottime_seconds() here. This
is clearly excessive, as boottime by itself means we never go beyond 32
bits, but it does mean we handle this correctly and consistently without
having to worry about corner cases and should be no more expensive than
the previous implementation on 64-bit architectures.

The max_cb_time() function gets changed in order to avoid an expensive
64-bit division operation, but as the lease time is at most one hour,
there is no change in behavior.

Also do the same for server-to-server copy expiration time.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[bfields@redhat.com: fix up copy expiration]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9594497f 04-Nov-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: fix jiffies/time_t mixup in LRU list

The nfsd4_blocked_lock->nbl_time timestamp is recorded in jiffies,
but then compared to a CLOCK_REALTIME timestamp later on, which makes
no sense.

For consistency with the other timestamps, change this to use a time_t.

This is a change in behavior, which may cause regressions, but the
current code is not sensible. On a system with CONFIG_HZ=1000,
the 'time_after((unsigned long)nbl->nbl_time, (unsigned long)cutoff))'
check is false for roughly the first 18 days of uptime and then true
for the next 49 days.

Fixes: 7919d0a27f1e ("nfsd: add a LRU list for blocked locks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e29f4703 31-Oct-2019 Arnd Bergmann <arnd@arndb.de>

nfsd: print 64-bit timestamps in client_info_show

The nii_time field gets truncated to 'time_t' on 32-bit architectures
before printing.

Remove the use of 'struct timespec' to product the correct output
beyond 2038.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ce0887ac 09-Oct-2019 Olga Kornievskaia <kolga@netapp.com>

NFSD add nfs4 inter ssc to nfsd4_copy

Given a universal address, mount the source server from the destination
server. Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enable".

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 624322f1 04-Oct-2019 Olga Kornievskaia <olga.kornievskaia@gmail.com>

NFSD add COPY_NOTIFY operation

Introducing the COPY_NOTIFY operation.

Create a new unique stateid that will keep track of the copy
state and the upcoming READs that will use that stateid.
Each associated parent stateid has a list of copy
notify stateids. A copy notify structure makes a copy of
the parent stateid and a clientid and will use it to look
up the parent stateid during the READ request (suggested
by Trond Myklebust <trond.myklebust@hammerspace.com>).

At nfs4_put_stid() time, we walk the list of the associated
copy notify stateids and delete them.

Laundromat thread will traverse globally stored copy notify
stateid in idr and notice if any haven't been referenced in the
lease period, if so, it'll remove them.

Return single netaddr to advertise to the copy.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>


# 2bbfed98 23-Oct-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()

When we're destroying the client lease, and we call
nfsd4_shutdown_callback(), we must ensure that we do not return
before all outstanding callbacks have terminated and have
released their payloads.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ee95d1c 09-Sep-2019 Scott Mayhew <smayhew@redhat.com>

nfsd: add support for upcall version 2

Version 2 upcalls will allow the nfsd to include a hash of the kerberos
principal string in the Cld_Create upcall. If a principal is present in
the svc_cred, then the hash will be included in the Cld_Create upcall.
We attempt to use the svc_cred.cr_raw_principal (which is returned by
gssproxy) first, and then fall back to using the svc_cred.cr_principal
(which is returned by both gssproxy and rpc.svcgssd). Upon a subsequent
restart, the hash will be returned in the Cld_Gracestart downcall and
stored in the reclaim_str_hashtbl so it can be used when handling
reclaim opens.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5c4583b2 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# eb82dd39 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: convert fi_deleg_file and ls_file fields to nfsd_file

Have them keep an nfsd_file reference instead of a struct file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fd4f83fd 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: convert nfs4_file->fi_fds array to use nfsd_files

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 79123444 04-Jun-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: decode implementation id

Decode the implementation ID and display in nfsd/clients/#/info. It may
be help identify the client. It won't be used otherwise.

(When this went into the protocol, I thought the implementation ID would
be a slippery slope towards implementation-specific workarounds as with
the http user-agent. But I guess I was wrong, the risk seems pretty low
now.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e8a79fb1 22-Mar-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: add nfsd/clients directory

I plan to expose some information about nfsv4 clients here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 59f8e91b 20-Mar-2019 J. Bruce Fields <bfields@redhat.com>

nfsd4: use reference count to free client

Keep a second reference count which is what is really used to decide
when to free the client's memory.

Next I'm going to add an nfsd/clients/ directory with a subdirectory for
each NFSv4 client. File objects under nfsd/clients/ will hold these
references.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 14ed14cc 20-Mar-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: rename cl_refcount

Rename this to a more descriptive name: it counts the number of
in-progress rpc's referencing this client.

Next I'm going to add a second refcount with a slightly different use.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 15b6ff95 12-Jun-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190612152603.GB18440@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 6b189105 26-Mar-2019 Scott Mayhew <smayhew@redhat.com>

nfsd: make nfs4_client_reclaim use an xdr_netobj instead of a fixed char array

This will allow the reclaim_str_hashtbl to store either the recovery
directory names used by the legacy client tracking code or the full
client strings used by the nfsdcld client tracking code.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e6abc8ca 05-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Don't release the callback slot unless it was actually held

If there are multiple callbacks queued, waiting for the callback
slot when the callback gets shut down, then they all currently
end up acting as if they hold the slot, and call
nfsd4_cb_sequence_done() resulting in interesting side-effects.

In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
causes a loop back to nfsd4_cb_prepare() without first freeing the
slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
a second time.

This patch therefore adds a boolean to track whether or not the
callback did pick up the slot, so that it can do the right thing
in these 2 cases.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a52458b4 02-Dec-2018 NeilBrown <neilb@suse.com>

NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.

SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning. A look-up of a low-level cred will
map this to a machine credential.

Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# e0639dc5 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD introduce async copy feature

Upon receiving a request for async copy, create a new kthread. If we
get asynchronous request, make sure to copy the needed arguments/state
from the stack before starting the copy. Then start the thread and reply
back to the client indicating copy is asynchronous.

nfsd_copy_file_range() will copy in a loop over the total number of
bytes is needed to copy. In case a failure happens in the middle, we
ignore the error and return how much we copied so far. Once done
creating a workitem for the callback workqueue and send CB_OFFLOAD with
the results.

The lifetime of the copy stateid is bound to the vfs copy. This way we
don't need to keep the nfsd_net structure for the callback. We could
keep it around longer so that an OFFLOAD_STATUS that came late would
still get results, but clients should be able to deal without that.

We handle OFFLOAD_CANCEL by sending a signal to the copy thread and
calling kthread_stop.

A client should cancel any ongoing copies before calling DESTROY_CLIENT;
if not, we return a CLIENT_BUSY error.

If the client is destroyed for some other reason (lease expiration, or
server shutdown), we must clean up any ongoing copies ourselves.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
[colin.king@canonical.com: fix leak in error case]
[bfields@fieldses.org: remove signalling, merge patches]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9eb190fca8f 20-Jul-2018 Olga Kornievskaia <kolga@netapp.com>

NFSD CB_OFFLOAD xdr

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a26dd64f 15-Aug-2018 Chuck Lever <chuck.lever@oracle.com>

nfsd: Remove callback_cred

Clean up: The global callback_cred is no longer used, so it can be
removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 818a34eb 19-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_file.fi_ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cff7cb2e 19-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a15dfcd5 19-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 53da6a53 17-Oct-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: catch some false session retries

The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.

Catching every such case is difficult, but we may as well catch a few
easy ones. This also handles the case described in the previous patch,
in a different way.

The spec does however require us to catch the case where the difference
is in the rpc credentials. This prevents somebody from snooping another
user's replies by fabricating retries.

(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 085def3a 18-Oct-2017 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix cached replies to solo SEQUENCE compounds

Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to. And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops. This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id. If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much. But we can at
least attempt to return valid xdr!

Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f7d1ddbe 04-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

nfsd/callback: Cleanup callback cred on shutdown

The rpccred gotten from rpc_lookup_machine_cred() should be put when
state is shutdown.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d19fb70d 18-Jan-2017 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Fix a null reference case in find_or_create_lock_stateid()

nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ece7aa "nfsd: clean up races in lock stateid searching..."
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7919d0a2 16-Sep-2016 Jeff Layton <jlayton@kernel.org>

nfsd: add a LRU list for blocked locks

It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.

Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.

This also gives us a mechanism to clean up these objects at server
shutdown time as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 76d348fa 16-Sep-2016 Jeff Layton <jlayton@kernel.org>

nfsd: have nfsd4_lock use blocking locks for v4.1+ locks

Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.

When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a188620e 16-Sep-2016 Jeff Layton <jlayton@kernel.org>

nfsd: plumb in a CB_NOTIFY_LOCK operation

Add the encoding/decoding for CB_NOTIFY_LOCK operations.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 89dfdc96 16-Aug-2016 Jeff Layton <jlayton@kernel.org>

nfsd: eliminate cb_minorversion field

We already have that info in the client pointer. No need to pass around
a copy.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ed941643 14-Jun-2016 Andrew Elble <aweits@rit.edu>

nfsd: implement machine credential support for some operations

This addresses the conundrum referenced in RFC5661 18.35.3,
and will allow clients to return state to the server using the
machine credentials.

The biggest part of the problem is that we need to allow the client
to send a compound op with integrity/privacy on mounts that don't
have it enabled.

Add server support for properly decoding and using spo_must_enforce
and spo_must_allow bits. Add support for machine credentials to be
used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
and TEST/FREE STATEID.
Implement a check so as to not throw WRONGSEC errors when these
operations are used if integrity/privacy isn't turned on.

Without this, Linux clients with credentials that expired while holding
delegations were getting stuck in an endless loop.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# feb9dad5 14-Jun-2016 Oleg Drokin <green@linuxhacker.ru>

nfsd: Always lock state exclusively.

It used to be the case that state had an rwlock that was locked for write
by downgrades, but for read for upgrades (opens). Well, the problem is
if there are two competing opens for the same state, they step on
each other toes potentially leading to leaking file descriptors
from the state structure, since access mode is a bitmap only set once.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 14b7f4a1 05-May-2016 Jeff Layton <jlayton@kernel.org>

nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid

Move the existing static function to an inline helper, and call it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# aa0d6aed 02-Dec-2015 Anna Schumaker <Anna.Schumaker@netapp.com>

nfsd: Pass filehandle to nfs4_preprocess_stateid_op()

This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c4cb8974 21-Nov-2015 Julia Lawall <Julia.Lawall@lip6.fr>

nfsd: constify nfsd4_callback_ops structure

The nfsd4_callback_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9767feb2 01-Oct-2015 Jeff Layton <jlayton@kernel.org>

nfsd: ensure that seqid morphing operations are atomic wrt to copies

Bruce points out that the increment of the seqid in stateids is not
serialized in any way, so it's possible for racing calls to bump it
twice and end up sending the same stateid. While we don't have any
reports of this problem it _is_ theoretically possible, and could lead
to spurious state recovery by the client.

In the current code, update_stateid is always followed by a memcpy of
that stateid, so we can combine the two operations. For better
atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
the seqid and copying the stateid.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cc8a5532 17-Sep-2015 Jeff Layton <jlayton@kernel.org>

nfsd: serialize layout stateid morphing operations

In order to allow the client to make a sane determination of what
happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we
must ensure that the seqids return accurately represent the order of
operations. The simplest way to do that is to ensure that operations on
a single stateid are serialized.

This patch adds a mutex to the layout stateid, and locks it when
checking the layout stateid's seqid. The mutex is held over the entire
operation and released after the seqid is bumped.

Note that in the case of CB_LAYOUTRECALL we must move the increment of
the seqid and setting into a new cb "prepare" operation. The lease
infrastructure will call the lm_break callback with a spinlock held, so
and we can't take the mutex in that codepath.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 35a92fe8 17-Sep-2015 Jeff Layton <jlayton@kernel.org>

nfsd: serialize state seqid morphing operations

Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
running in parallel. The server would receive the OPEN_DOWNGRADE first
and check its seqid, but then an OPEN would race in and bump it. The
OPEN_DOWNGRADE would then complete and bump the seqid again. The result
was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
it should have been rejected since the seqid changed.

The only recourse we have here I think is to serialize operations that
bump the seqid in a stateid, particularly when we're given a seqid in
the call. To address this, we add a new rw_semaphore to the
nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
after looking up the stateid to ensure that nothing else is going to
bump it while we're operating on it.

In the case of OPEN, we do a down_read, as the call doesn't contain a
seqid. Those can run in parallel -- we just need to serialize them when
there is a concurrent OPEN_DOWNGRADE or CLOSE.

LOCK and LOCKU however always take the write lock as there is no
opportunity for parallelizing those.

Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7ba6cad6 24-Jun-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors

According to Christoph's advice, this patch introduce a new helper
nfsd4_cb_sequence_done() for processing more callback errors, following
the example of the client's nfs41_sequence_done().

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# af90f707 18-Jun-2015 Christoph Hellwig <hch@lst.de>

nfsd: take struct file setup fully into nfs4_preprocess_stateid_op

This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 276f03e3 02-Jun-2015 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Update callback sequnce id only CB_SEQUENCE success

When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED.
It is caused by nfs return NFS4ERR_DELAY before validate_seqid(),
don't update the sequnce id, but nfsd updates the sequnce id !!!

According to RFC5661 20.9.3,
" If CB_SEQUENCE returns an error, then the state of the slot
(sequence ID, cached reply) MUST NOT change. "

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cba5f62b 30-Apr-2015 Christoph Hellwig <hch@lst.de>

nfsd: fix callback restarts

Checking the rpc_client pointer is not a reliable way to detect
backchannel changes: cl_cb_client is changed only after shutting down
the rpc client, so the condition cl_cb_client = tk_client will always be
true.

Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
the buggy cl_callbacks list and fix the lifetime rules due to double
calls of the ->prepare callback operations method for this retry case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ef2a1b3e 30-Apr-2015 Christoph Hellwig <hch@lst.de>

nfsd: split transport vs operation errors for callbacks

We must only increment the sequence id if the client has seen and responded
to a request. If we failed to deliver it to the client we must resend with
the same sequence id. So just like the client track errors at the transport
level differently from those returned in the XDR.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8287f009 27-Apr-2015 Sachin Bhamare <sachin.bhamare@primarydata.com>

nfsd: fix pNFS return on close semantics

For the sake of forgetful clients, the server should return the layouts
to the file system on 'last close' of a file (assuming that there are no
delegations outstanding to that particular client) or on delegreturn
(assuming that there are no opens on a file from that particular
client).

In theory the information is all there in current data structures, but
it's not efficiently available; nfs4_file->fi_ref includes references on
the file across all clients, but we need a per-(client, file) count.
Walking through lots of stateid's to calculate this on each close or
delegreturn would be painful.

This patch introduces infrastructure to maintain per-client opens and
delegation counters on a per-file basis.

[hch: ported to the mainline pNFS support, merged various fixes from Jeff]
Signed-off-by: Sachin Bhamare <sachin.bhamare@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c5c707f9 22-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS layout recalls

Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9cf514cc 05-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: implement pNFS operations

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4d227fca 17-Aug-2014 Christoph Hellwig <hch@lst.de>

nfsd: make find_any_file available outside nfs4state.c

Signed-off-by: Christoph Hellwig <hch@lst.de>


# e6ba76e1 14-Aug-2014 Christoph Hellwig <hch@lst.de>

nfsd: make find/get/put file available outside nfs4state.c

Signed-off-by: Christoph Hellwig <hch@lst.de>


# cd61c522 14-Aug-2014 Christoph Hellwig <hch@lst.de>

nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 67db1034 13-Dec-2014 Jeff Layton <jlayton@kernel.org>

nfsd: fi_delegees doesn't need to be an atomic_t

fi_delegees is always handled under the fi_lock, so there's no need to
use an atomic_t for this field.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5b095e99 23-Oct-2014 Jeff Layton <jlayton@kernel.org>

nfsd: convert nfs4_file searches to use RCU

The global state_lock protects the file_hashtbl, and that has the
potential to be a scalability bottleneck.

Address this by making the file_hashtbl use RCU. Add a rcu_head to the
nfs4_file and use that when freeing ones that have been hashed. In order
to conserve space, we union the fi_rcu field with the fi_delegations
list_head which must be clear by the time the last reference to the file
is dropped.

Convert find_file_locked to use RCU lookup primitives and not to require
that the state_lock be held, and convert find_file to do a lockless
lookup. Convert find_or_add_file to attempt a lockless lookup first, and
then fall back to doing a locked search and insert if that fails to find
anything.

Also, minimize the number of times we need to calculate the hash value
by passing it in as an argument to the search and insert functions, and
optimize the order of arguments in nfsd4_init_file.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ccc6398e 16-Oct-2014 Jeff Layton <jlayton@kernel.org>

nfsd: clean up comments over nfs4_file definition

They're a bit outdated wrt to some recent changes.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0c637be8 21-Aug-2014 Jeff Layton <jlayton@kernel.org>

nfsd: don't keep a pointer to the lease in nfs4_file

Now that we don't need to pass in an actual lease pointer to
vfs_setlease on unlock, we can stop tracking a pointer to the lease in
the nfs4_file.

Switch all of the places that check the fi_lease to check fi_deleg_file
instead. We always set that at the same time so it will have the same
semantics.

Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 34549ab0 01-Oct-2014 Jeff Layton <jlayton@kernel.org>

nfsd: eliminate "to_delegation" define

We now have cb_to_delegation and to_delegation, which do the same thing
and are defined separately in different .c files. Move the
cb_to_delegation definition into a header file and eliminate the
redundant to_delegation definition.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>


# 0162ac2b 23-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: introduce nfsd4_callback_ops

Add a higher level abstraction than the rpc_ops for callback operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f0b5de1b 23-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: split nfsd4_callback initialization and use

Split out initializing the nfs4_callback structure from using it. For
the NULL callback this gets rid of tons of pointless re-initializations.

Note that I don't quite understand what protects us from running multiple
NULL callbacks at the same time, but at least this chance doesn't make
it worse..

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 326129d0 23-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: introduce a generic nfsd4_cb

Add a helper to queue up a callback. CB_NULL has a bit of special casing
because it is special in the specification, but all other new callback
operations will be able to share code with this and a few more changes
to refactor the callback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2faf3b43 23-Sep-2014 Christoph Hellwig <hch@lst.de>

nfsd: remove nfsd4_callback.cb_op

We can always get at the private data by using container_of, no need for
a void pointer. Also introduce a little to_delegation helper to avoid
opencoding the container_of everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d682e750 12-Sep-2014 Jeff Layton <jlayton@kernel.org>

nfsd: serialize nfsdcltrack upcalls for a particular client

In a later patch, we want to add a flag that will allow us to reduce the
need for upcalls. In order to handle that correctly, we'll need to
ensure that racing upcalls for the same client can't occur. In practice
it should be rare for this to occur with a well-behaved client, but it
is possible.

Convert one of the bits in the cl_flags field to be an upcall bitlock,
and use it to ensure that upcalls for the same client are serialized.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>


# 7f5ef2e9 12-Sep-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add a v4_end_grace file to /proc/fs/nfsd

Allow a privileged userland process to end the v4 grace period early.
Writing "Y", "y", or "1" to the file will cause the v4 grace period to
be lifted. The basic idea with this will be to allow the userland
client tracking program to lift the grace period once it knows that no
more clients will be reclaiming state.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>


# 919b8049 12-Sep-2014 Jeff Layton <jlayton@kernel.org>

nfsd: remove redundant boot_time parm from grace_done client tracking op

Since it's stored in nfsd_net, we don't need to pass it in separately.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>


# 14a571a8 05-Aug-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add some comments to the nfsd4 object definitions

Add some comments that describe what each of these objects is, and how
they related to one another.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b687f686 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 285abdee 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: remove old fault injection infrastructure

Remove the old nfsd_for_n_state function and move nfsd_find_client
higher up into the file to get rid of forward declaration. Remove
the struct nfsd_fault_inject_op arguments from the operations as
they are no longer needed by any of them.

Finally, remove the old "standard" get and set routines, which
also eliminates the client_mutex from this code.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 98d5c7c5 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add more granular locking to *_delegations fault injectors

...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 82e05efa 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add more granular locking to forget_openowners fault injector

...instead of relying on the client_mutex.

Also, fix up the printk output that is generated when the file is read.
It currently says that it's reporting the number of open files, but
it's actually reporting the number of openowners.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 016200c3 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add more granular locking to forget_locks fault injector

...instead of relying on the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 69fc9edf 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add nfsd_inject_forget_clients

...which uses the client_lock for protection instead of client_mutex.
Also remove nfsd_forget_client as there are no more callers.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a0926d15 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add a forget_client set_clnt routine

...that relies on the client_lock instead of client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7ec0e36f 30-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add a forget_clients "get" routine with proper locking

Add a new "get" routine for forget_clients that relies on the
client_lock instead of the client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 83e452fe 31-Jul-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix out of date comment

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d4f0489f 29-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Move the open owner hash table into struct nfs4_client

Preparation for removing the client_mutex.

Convert the open owner hash table into a per-client table and protect it
using the nfs4_client->cl_lock spin lock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d3134b10 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: make openstateids hold references to their openowners

Change it so that only openstateids hold persistent references to
openowners. References can still be held by compounds in progress.

With this, we can get rid of NFS4_OO_NEW. It's possible that we
will create a new openowner in the process of doing the open, but
something later fails. In the meantime, another task could find
that openowner and start using it on a successful open. If that
occurs we don't necessarily want to tear it down, just put the
reference that the failing compound holds.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8f4b54c5 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add an operation for unhashing a stateowner

Allow stateowners to be unhashed and destroyed when the last reference
is put. The unhashing must be idempotent. In a future patch, we'll add
some locking around it, but for now it's only protected by the
client_mutex.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 58fb12e6 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Add a mutex to protect the NFSv4.0 open owner replay cache

We don't want to rely on the client_mutex for protection in the case of
NFSv4 open owners. Instead, we add a mutex that will only be taken for
NFSv4.0 state mutating operations, and that will be released once the
entire compound is done.

Also, ensure that nfsd4_cstate_assign_replay/nfsd4_cstate_clear_replay
take a reference to the stateowner when they are using it for NFSv4.0
open and lock replay caching.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6b180f0b 29-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Add reference counting to state owners

The way stateowners are managed today is somewhat awkward. They need to
be explicitly destroyed, even though the stateids reference them. This
will be particularly problematic when we remove the client_mutex.

We may create a new stateowner and attempt to open a file or set a lock,
and have that fail. In the meantime, another RPC may come in that uses
that same stateowner and succeed. We can't have the first task tearing
down the stateowner in that situation.

To fix this, we need to change how stateowners are tracked altogether.
Refcount them and only destroy them once all stateids that reference
them have been destroyed. This patch starts by adding the refcounting
necessary to do that.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 11b9164a 29-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Add a struct nfs4_file field to struct nfs4_stid

All stateids are associated with a nfs4_file. Let's consolidate.
Replace delegation->dl_file with the dl_stid.sc_file, and
nfs4_ol_stateid->st_file with st_stid.sc_file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6011695d 29-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Add reference counting to the lock and open stateids

When we remove the client_mutex, we'll need to be able to ensure that
these objects aren't destroyed while we're not holding locks.

Add a ->free() callback to the struct nfs4_stid, so that we can
release a reference to the stid without caring about the contents.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 650ecc8f 25-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: remove dl_fh field from struct nfs4_delegation

Now that the nfs4_file has a filehandle in it, we no longer need to
keep a per-delegation copy of it. Switch to using the one in the
nfs4_file instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f9c00c3a 23-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Do not let nfs4_file pin the struct inode

Remove the fi_inode field in struct nfs4_file in order to remove the
possibility of struct nfs4_file pinning the inode when it does not have
any open state.

The only place we still need to get to an inode is in check_for_locks,
so change it to use find_any_file and use the inode from any that it
finds. If it doesn't find one, then just assume there aren't any.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e2cf80d7 23-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Store the filehandle with the struct nfs4_file

For use when we may not have a struct inode.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 72c0b0fb 21-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Move the delegation reference counter into the struct nfs4_stid

We will want to add reference counting to the lock stateid and open
stateids too in later patches.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0fc29d6 16-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Ensure stateids remain unique until they are freed

Add an extra delegation state to allow the stateid to remain in the idr
tree until the last reference has been released. This will be necessary
to ensure uniqueness once the client_mutex is removed.

[jlayton: reset the sc_type under the state_lock in unhash_delegation]

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 02e1215f 16-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: Avoid taking state_lock while holding inode lock in nfsd_break_one_deleg

state_lock is a heavily contended global lock. We don't want to grab
that while simultaneously holding the inode->i_lock.

Add a new per-nfs4_file lock that we can use to protect the
per-nfs4_file delegation list. Hold that while walking the list in the
break_deleg callback and queue the workqueue job for each one.

The workqueue job can then take the state_lock and do the list
manipulations without the i_lock being held prior to starting the
rpc call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e8051c83 16-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: eliminate nfsd4_init_callback

It's just an obfuscated INIT_WORK call. Just make the work_func_t a
non-static symbol and use a normal INIT_WORK call.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# baeb4ff0 10-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: make deny mode enforcement more efficient and close races in it

The current enforcement of deny modes is both inefficient and scattered
across several places, which makes it hard to guarantee atomicity. The
inefficiency is a problem now, and the lack of atomicity will mean races
once the client_mutex is removed.

First, we address the inefficiency. We have to track deny modes on a
per-stateid basis to ensure that open downgrades are sane, but when the
server goes to enforce them it has to walk the entire list of stateids
and check against each one.

Instead of doing that, maintain a per-nfs4_file deny mode. When a file
is opened, we simply set any deny bits in that mode that were specified
in the OPEN call. We can then use that unified deny mode to do a simple
check to see whether there are any conflicts without needing to walk the
entire stateid list.

The only time we'll need to walk the entire list of stateids is when a
stateid that has a deny mode on it is being released, or one is having
its deny mode downgraded. In that case, we must walk the entire list and
recalculate the fi_share_deny field. Since deny modes are pretty rare
today, this should be very rare under normal workloads.

To address the potential for races once the client_mutex is removed,
protect fi_share_deny with the fi_lock. In nfs4_get_vfs_file, check to
make sure that any deny mode we want to apply won't conflict with
existing access. If that's ok, then have nfs4_file_get_access check that
new access to the file won't conflict with existing deny modes.

If that also passes, then get file access references, set the correct
access and deny bits in the stateid, and update the fi_share_deny field.
If opening the file or truncating it fails, then unwind the whole mess
and return the appropriate error.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c11c591f 10-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: shrink st_access_bmap and st_deny_bmap

We never use anything above bit #3, so an unsigned long for each is
wasteful. Shrink them to a char each, and add some WARN_ON_ONCE calls if
we try to set or clear bits that would go outside those sizes.

Note too that because atomic bitops work on unsigned longs, we have to
abandon their use here. That shouldn't be a problem though since we
don't really care about the atomicity in this code anyway. Using them
was just a convenient way to flip bits.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# de18643d 10-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Add locking to the nfs4_file->fi_fds[] array

Preparation for removal of the client_mutex, which currently protects
this array. While we don't actually need the find_*_file_locked variants
just yet, a later patch will. So go ahead and add them now to reduce
future churn in this code.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1d31a253 10-Jul-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Add fine grained protection for the nfs4_file->fi_stateids list

Access to this list is currently serialized by the client_mutex. Add
finer grained locking around this list in preparation for its removal.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0fe492db 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Convert nfs4_check_open_reclaim() to work with lookup_clientid()

lookup_clientid is preferable to find_confirmed_client since it's able
to use the cached client in the compound state.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d4e19e70 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Don't get a session reference without a client reference

If the client were to disappear from underneath us while we're holding
a session reference, things would be bad. This cleanup helps ensure
that it cannot, which will be a possibility when the client_mutex is
removed.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fd44907c 30-Jun-2014 Jeff Layton <jlayton@kernel.org>

nfsd: clean up nfsd4_release_lockowner

Now that we know that we won't have several lockowners with the same,
owner->data, we can simplify nfsd4_release_lockowner and get rid of
the lo_list in the process.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b3c32bcd 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: NFSv4 lock-owners are not associated to a specific file

Just like open-owners, lock-owners are associated with a name, a clientid
and, in the case of minor version 0, a sequence id. There is no association
to a file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c87b9b7 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: lock owners are not per open stateid

In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not.
This patch replaces the current list of lock owners in the open stateids
with a list of lock stateids.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b607664e 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: Cleanup nfs4svc_encode_compoundres

Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c
instead of open coding in nfs4svc_encode_compoundres.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 24906f32 12-Mar-2014 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow larger 4.1 session drc slots

The client is actually asking for 2532 bytes. I suspect that's a
mistake. But maybe we can allow some more. In theory lock needs more
if it might return a maximum-length lockowner in the denied case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 50cc6231 18-Apr-2014 Trond Myklebust <trond.myklebust@primarydata.com>

NFSd: Mark nfs4_free_lockowner and nfs4_free_openowner as static functions

They do not need to be used outside fs/nfsd/nfs4state.c

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9c69de4c 06-May-2014 Christoph Hellwig <hch@lst.de>

nfsd: remove <linux/nfsd/nfsfh.h>

The only real user of this header is fs/nfsd/nfsfh.h, so merge the
two. Various lockѕ source files used it to indirectly get other
sunrpc or nfs headers, so fix those up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57266a6e 13-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement minimal SP4_MACH_CRED

Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
the client-provided spo_must_* arrays and just enforcing credential
checks for the minimum required operations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3bd64a5b 09-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED

A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag. The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID. The status flag will be unset once all such
revoked stateids are freed.

Our server's only recallable state is delegations. So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9411b1d4 01-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup handling of nfsv4.0 closed stateid's

Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 89876f8c 02-Apr-2013 Jeff Layton <jlayton@kernel.org>

nfsd: convert the file_hashtbl to a hlist

We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 66b2b9b2 18-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't destroy in-use session

This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY. This simplifies things somewhat and
helps meet a coming 4.2 requirement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 221a6876 01-Apr-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: don't destroy in-use clients

When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.

The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.

This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.

The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.

I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b0a9d3ab 07-Mar-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix race on client shutdown

Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL. An xpt_user
callback that encounters such a session may then crash:

[ 303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[ 303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] PGD 37811067 PUD 3d498067 PMD 0
[ 303.959061] Oops: 0002 [#8] PREEMPT SMP
[ 303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[ 303.959061] CPU 0
[ 303.959061] Pid: 264, comm: nfsd Tainted: G D 3.8.0-ARCH+ #156 Bochs Bochs
[ 303.959061] RIP: 0010:[<ffffffff81481a8e>] [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP: 0018:ffff880037877dd8 EFLAGS: 00010202
[ 303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[ 303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[ 303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[ 303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[ 303.959061] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 303.959061] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[ 303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[ 303.959061] Stack:
[ 303.959061] ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[ 303.959061] ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[ 303.959061] 0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[ 303.959061] Call Trace:
[ 303.959061] [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[ 303.959061] [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[ 303.959061] [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[ 303.959061] [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[ 303.959061] [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[ 303.959061] [<ffffffff8107ae00>] kthread+0xc0/0xd0
[ 303.959061] [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[ 303.959061] [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[ 303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[ 303.959061] RIP [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[ 303.959061] RSP <ffff880037877dd8>
[ 303.959061] CR2: 0000000000000318
[ 304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[ 304.001903] note: nfsd[264] exited with preempt_count 2

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 03bc6d1c 02-Feb-2013 Eric W. Biederman <ebiederm@xmission.com>

nfsd: Modify nfsd4_cb_sec to use kuids and kgids

Change uid and gid in struct nfsd4_cb_sec to be of type kuid_t and
kgid_t.

In nfsd4_decode_cb_sec when reading uids and gids off the wire convert
them to kuids and kgids, and if they don't convert to valid kuids or
valid kuids ignore RPC_AUTH_UNIX and don't fill in any of the fields.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 6c1e82a4 29-Nov-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Forget state for a specific client

Write the client's ip address to any state file and all appropriate
state for that client will be forgotten.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 184c1847 29-Nov-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Reading a fault injection file prints a state count

I also log basic information that I can figure out about the type of
state (such as number of locks for each client IP address). This can be
useful for checking that state was actually dropped and later for
checking if the client was able to recover.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8ce54e0d 29-Nov-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Fault injection operations take a per-client forget function

The eventual goal is to forget state based on ip address, so it makes
sense to call this function in a for-each-client loop until the correct
amount of state is forgotten. I also use this patch as an opportunity
to rename the forget function from "func()" to "forget()".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f3c7521f 27-Nov-2012 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Fold fault_inject.h into state.h

There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead. I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 12760c66 14-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass nfsd_net instead of net to grace enders

Passing net context looks as overkill.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3320fef19 14-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: use service net instead of hard-coded init_net

This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 52e19c09 14-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make reclaim_str_hashtbl allocated per net

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash is used only by legacy tracker. So let's allocate hash in
tracker init.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c212cecf 14-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make nfs4_client network namespace dependent

And use it's net where possible.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2216d449 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: get rid of cl_recdir field

Remove the cl_recdir field from the nfs4_client struct. Instead, just
compute it on the fly when and if it's needed, which is now only when
the legacy client tracking code is in effect.

The error handling in the legacy client tracker is also changed to
handle the case where md5 is unavailable. In that case, we'll warn
the admin with a KERN_ERR message and disable the client tracking.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ac55fdc4 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: move the confirmed and unconfirmed hlists to a rbtree

The current code requires that we md5 hash the name in order to store
the client in the confirmed and unconfirmed trees. Change it instead
to store the clients in a pair of rbtrees, and simply compare the
cl_names directly instead of hashing them. This also necessitates that
we add a new flag to the clp->cl_flags field to indicate which tree
the client is currently in.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0ce0c2b5 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: don't search for client by hash on legacy reboot recovery gracedone

When nfsd starts, the legacy reboot recovery code creates a tracking
struct for each directory in the v4recoverydir. When the grace period
ends, it basically does a "readdir" on the directory again, and matches
each dentry in there to an existing client id to see if it should be
removed or not. If the matching client doesn't exist, or hasn't
reclaimed its state then it will remove that dentry.

This is pretty inefficient since it involves doing a lot of hash-bucket
searching. It also means that we have to keep relying on being able to
search for a nfs4_client by md5 hashed cl_recdir name.

Instead, add a pointer to the nfs4_client that indicates the association
between the nfs4_client_reclaim and nfs4_client. When a reclaim operation
comes in, we set the pointer to make that association. On gracedone, the
legacy client tracker will keep the recdir around iff:

1/ there is a reclaim record for the directory

...and...

2/ there's an association between the reclaim record and a client record
-- that is, a create or check operation was performed on the client that
matches that directory.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 772a9bbb 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record

Later callers will need to make changes to the record.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ce30e539 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: break out reclaim record removal into separate function

We'll need to be able to call this from nfs4recover.c eventually.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 278c931c 12-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: have nfsd4_find_reclaim_client take a char * argument

Currently, it takes a client pointer, but later we're going to need to
search for these records without knowing whether a matching client even
exists.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a0af710a 09-Nov-2012 Jeff Layton <jlayton@kernel.org>

nfsd: remove unused argument to nfs4_has_reclaimed_state

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 12fc3e92 05-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: backchannel should use client-provided security flavor

For now this only adds support for AUTH_NULL. (Previously we assumed
AUTH_UNIX.) We'll also need AUTH_GSS, which is trickier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57725155 05-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: common helper to initialize callback work

I've found it confusing having the only references to
nfsd4_do_callback_rpc() in a different file.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# cb73a9f4 01-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: implement backchannel_ctl operation

This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c6bb3ca2 01-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: use callback security parameters in create_session

We're currently ignoring the callback security parameters specified in
create_session, and just assuming the client wants auth_sys, because
that's all the current linux client happens to care about. But this
could cause us callbacks to fail to a client that wanted something
different.

For now, all we're doing is no longer ignoring the uid and gid passed in
the auth_sys case. Further patches will add support for auth_null and
gss (and possibly use more of the auth_sys information; the spec wants
us to use exactly the credential we're passed, though it's hard to
imagine why a client would care).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# acb2887e 27-Mar-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up callback security parsing

Move the callback parsing into a separate function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d15c077e 13-Sep-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: enforce per-client sessions/no-sessions distinction

Something like creating a client with setclientid and then trying to
confirm it with create_session may not crash the server, but I'm not
completely positive of that, and in any case it's obviously bad client
behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1696c47c 06-Aug-2012 Jeff Layton <jlayton@kernel.org>

nfsd: trivial comment updates

locks.c doesn't use the BKL anymore and there is no fi_perfile field.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 39307655 16-Aug-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix security flavor of NFSv4.0 callback

Commit d5497fc693a446ce9100fcf4117c3f795ddfd0d2 "nfsd4: move rq_flavor
into svc_cred" forgot to remove cl_flavor from the client, leaving two
places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored.
After that patch, the latter was the one that was updated, but the
former was the one that the callback used.

Symptoms were a long delay on utime(). This is because the utime()
generated a setattr which recalled a delegation, but the cb_recall was
ignored by the client because it had the wrong security flavor.

Cc: stable@vger.kernel.org
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c142baa 25-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

NFSd: make boot_time variable per network namespace

NFSd's boot_time represents grace period start point in time.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ccb0066 25-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

LockD: pass actual network namespace to grace period management functions

Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7df302f7 29-May-2012 Chuck Lever <chuck.lever@oracle.com>

NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID

According to RFC 5661, the TEST_STATEID operation is not allowed to
return NFS4ERR_STALE_STATEID. In addition, RFC 5661 says:

15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023)

A stateid generated by an earlier server instance was used. This
error is moot in NFSv4.1 because all operations that take a stateid
MUST be preceded by the SEQUENCE operation, and the earlier server
instance is detected by the session infrastructure that supports
SEQUENCE.

I triggered NFS4ERR_STALE_STATEID while testing the Linux client's
NOGRACE recovery. Bruce suggested an additional test that could be
useful to client developers.

Lastly, RFC 5661, section 18.48.3 has this:

o Special stateids are always considered invalid (they result in the
error code NFS4ERR_BAD_STATEID).

An explicit check is made for those state IDs to avoid printk noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 03a4e1f6 14-May-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: move principal name into svc_cred

Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2a4317c5 21-Mar-2012 Jeff Layton <jlayton@kernel.org>

nfsd: add nfsd4_client_tracking_ops struct and a way to set it

Abstract out the mechanism that we use to track clients into a set of
client name tracking functions.

This gives us a mechanism to plug in a new set of client tracking
functions without disturbing the callers. It also gives us a way to
decide on what tracking scheme to use at runtime.

For now, this just looks like pointless abstraction, but later we'll
add a new alternate scheme for tracking clients on stable storage.

Note too that this patch anticipates the eventual containerization
of this code by passing in struct net pointers in places. No attempt
is made to containerize the legacy client tracker however.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a52d726b 21-Mar-2012 Jeff Layton <jlayton@kernel.org>

nfsd: convert nfs4_client->cl_cb_flags to a generic flags field

We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall. Currently, that's
recorded in the cl_firststate field of the client struct. Using an
entire u32 to store a flag is rather wasteful though.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new flags don't interfere.

Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 508dc6e1 23-Feb-2012 Benny Halevy <bhalevy@tonian.com>

nfsd41: free_session/free_client must be called under the client_lock

The session client is manipulated under the client_lock hence
both free_session and nfsd4_del_conns must be called under this lock.

This patch adds a BUG_ON that checks this condition in the
respective functions and implements the missing locks.

nfsd4_{get,put}_session helpers were moved to the C file that uses them
so to prevent use from external files and an unlocked version of
nfsd4_put_session is provided for external use from nfs4xdr.c

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bf5c43c8 13-Feb-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: check for uninitialized slot

This fixes an oops when a buggy client tries to use an initial seqid of
0 on a new slot, which we may misinterpret as a replay.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 73e79482 13-Feb-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: rearrange struct nfsd4_slot

Combine two booleans into a single flag field, move the smaller fields
to the end.

(In practice this doesn't make the struct any smaller. But we'll be
adding another flag here soon.)

Remove some debugging code that doesn't look useful, while we're in the
neighborhood.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7a6ef8c7 05-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd4: nfsd4_create_clid_dir return value is unused

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 009673b4 07-Nov-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: add a separate (lockowner, inode) lookup

Address the possible performance regression mentioned in "nfsd4: hash
lockowners to simplify RELEASE_LOCKOWNER" by providing a separate
(lockowner, inode) hash.

Really, I doubt this matters much, but I think it's likely we'll change
these data structures here and I'd rather that the need for (owner,
inode) lookups be well-documented.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5423732a 19-Oct-2011 Benny Halevy <bhalevy@panasas.com>

nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 996e0938 17-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: do idr preallocation with stateid allocation

Move idr preallocation out of stateid initialization, into stateid
allocation, so that we no longer have to handle any errors from the
former.

This is a little subtle due to the way the idr code manages these
preallocated items--document that in comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d29b20cd 13-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up open owners on OPEN failure

If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around. It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.

Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.

Fix both problems.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3557e43b 12-Oct-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: make is_open_owner boolean

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b31b30e5 28-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup state.h comments

These comments are mostly out of date.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>


# 6409a5a6 28-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up downgrading code

In response to some review comments, get rid of the somewhat obscure
for-loop with bitops, and improve a comment.

Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 38c2f4b1 23-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: look up stateid's per clientid

Use a separate stateid idr per client, and lookup a stateid by first
finding the client, then looking up the stateid relative to that client.

Also some minor refactoring.

This allows us to improve error returns: we can return expired when the
clientid is not found and bad_stateid when the clientid is found but not
the stateid, as opposed to returning expired for both cases.

I hope this will also help to replace the state lock mostly by a
per-client lock, but that hasn't been done yet.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 36279ac1 25-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: assume test_stateid always has session

Test_stateid is 4.1-only and only allowed after a sequence operation, so
this check is unnecessary.

Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6136d2b4 23-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: use idr for stateid's

The idr system is designed exactly for generating id and looking up
integer id's. Thanks to Trond for pointing it out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2a74aba7 23-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move client * to nfs4_stateid, add init_stid helper

This will be convenient.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f7a4d872 16-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: hash closed stateid's like any other

Look up closed stateid's in the stateid hash like any other stateid
rather than searching the close lru.

This is simpler, and fixes a bug: currently we handle only the case of a
close that is the last close for a given stateowner, but not the case of
a close for a stateowner that still has active opens on other files.
Thus in a case like:

open(owner, file1)
open(owner, file2)
close(owner, file2)
close(owner, file2)

the final close won't be recognized as a retransmission.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d3b313a4 15-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: construct stateid from clientid and counter

Including the full clientid in the on-the-wire stateid allows more
reliable detection of bad vs. expired stateid's, simplifies code, and
ensures we won't reuse the opaque part of the stateid (as we currently
do when the same openowner closes and reopens the same file).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 38c387b5 16-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: match close replays on stateid, not open owner id

Keep around an unhashed copy of the final stateid after the last close
using an openowner, and when identifying a replay, match against that
stateid instead of just against the open owner id. Free it the next
time the seqid is bumped or the stateowner is destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dad1c067 11-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: replace oo_confirmed by flag bit

I want at least one more bit here. So, let's haul out the caps lock key
and add a flags field.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f459e453 09-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: hash deleg stateid's like any other

It's simpler to look up delegation stateid's in the same hash table as
any other stateid.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d5477a8d 07-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: add common dl_stid field to delegation

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcef0413 07-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move some of nfs4_stateid into a separate structure

We want delegations to share more with open/lock stateid's, so first
we'll pull out some of the common stuff we want to share.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2288d0e3 06-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: pass around typemask instead of flags

We're only using those flags to choose lock or open stateid's at this
point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c0a5d93e 06-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: split preprocess_seqid, cleanup

Move most of this into helper functions. Also move the non-CONFIRM case
into caller, providing a helper function for that purpose.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fe0750e5 30-Jul-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: split stateowners into open and lockowners

The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f4dee24c 02-Sep-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: move CLOSE_STATE special case to caller

Move the CLOSE_STATE case into the unique caller that cares about it
rather than putting it in preprocess_seqid_op.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7c13f344 30-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: drop most stateowner refcounting

Maybe we'll bring it back some day, but we don't have much real use for
it now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 81b82965 23-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify stateid generation code, fix wraparound

Follow the recommendation from rfc3530bis for stateid generation number
wraparound, simplify some code, and fix or remove incorrect comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5fa0bbb4 31-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify distinguishing lock & open stateid's

The trick free_stateid is using is a little cheesy, and we'll have more
uses for this field later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c2d8eb7a 29-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove typoed replay field

Wow, I wonder how long that typo's been there.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 28dde241 22-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: remove HAS_SESSION

This flag doesn't really buy us anything.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 48483bf2 26-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify recovery dir setting

Move around some of this code, simplify a bit.

Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57616300 10-Aug-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix seqid_mutating_error

The set of errors here does *not* agree with the set of errors specified
in the rfc!

While we're there, turn this macros into a function, for the usual
reasons, and move it to the one place where it's actually used.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 17456804 13-Jul-2011 Bryan Schumaker <bjschuma@netapp.com>

NFSD: Added TEST_STATEID operation

This operation is used by the client to check the validity of a list of
stateids.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9ae78bcc 16-Mar-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix comment and remove unused nfsd4_file fields

A couple fields here were left over from a previous version of a patch,
and are no longer used.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# acfdf5c3 31-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: acquire only one lease per file

Instead of acquiring one lease each time another client opens a file,
nfsd can acquire just one lease to represent all of them, and reference
count it to determine when to release it.

This fixes a regression introduced by
c45821d263a8a5109d69a9e8942b8d65bcd5f31a "locks: eliminate fl_mylease
callback": after that patch, only the struct file * is used to determine
who owns a given lease. But since we recently converted the server to
share a single struct file per open, if we acquire multiple leases on
the same file from nfsd, it then becomes impossible on unlocking a lease
to determine which of those leases (all of whom share the same struct
file *) we meant to remove.

Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous
version of this patch.

Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ce8ba25 10-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd4: allow restarting callbacks

If we lose the backchannel and then the client repairs the problem,
resend any callbacks.

We use a new cb_done flag to track whether there is still work to be
done for the callback or whether it can be destroyed with the rpc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 84f5f7cc 09-Dec-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: make sure sequence flags are set after destroy_session

If this loses any backchannel, make sure we have a chance to notice that
and set the sequence flags.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 77a3569d 30-Apr-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: keep finer-grained callback status

Distinguish between when the callback channel is known to be down, and
when it is not yet confirmed. This will be useful in the 4.1 case.

Also, we don't seem to be using the fact that this field is atomic.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1d1bc8f2 04-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: support BIND_CONN_TO_SESSION

Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6f3d772f 14-Dec-2010 Takuma Umeya <tumeya@redhat.com>

nfs4: set source address when callback is generated

when callback is generated in NFSv4 server, it doesn't set the source
address. When an alias IP is utilized on NFSv4 server and suppose the
client is accessing via that alias IP (e.g. eth0:0), the client invokes
the callback to the IP address that is set on the original device (e.g.
eth0). This behavior results in timeout of xprt.
The patch sets the IP address that the client should invoke callback to.

Signed-off-by: Takuma Umeya <tumeya@redhat.com>
[bfields@redhat.com: Simplify gen_callback arguments, use helper function]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c84d500b 30-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: use a single struct file for delegations

When we converted to sharing struct filess between nfs4 opens I went too
far and also used the same mechanism for delegations. But keeping
a reference to the struct file ensures it will outlast the lease, and
allows us to remove the lease with the same file as we added it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8323c3b2 19-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: move minorversion to client

The minorversion seems more a property of the client than the callback
channel.

Some time we should probably also enforce consistent minorversion usage
from the client; for now, this is just a cosmetic change.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5a3c9d71 19-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: separate callback change and callback probe

Only one of the nfsd4_callback_probe callers actually cares about
changing the callback information.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8b5ce5cd 19-Oct-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: callback program number is per-session

The callback program is allowed to depend on the session which the
callback is going over.

No change in behavior yet, while we still only do callbacks over a
single session for the lifetime of the client.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ac7c46f2 14-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: make backchannel sequence number per-session

Currently we don't deal well with a client that has multiple sessions
associated with it (even simultaneously, or serially over the lifetime
of the client).

In particular, we don't attempt to keep the backchannel running after
the original session diseappears.

We will fix that soon.

Once we do that, we need the slot sequence number to be per-session;
otherwise, for example, we cannot correctly handle a case like this:

- All session 1 connections are lost.
- The client creates session 2. We use it for the backchannel
(since it's the only working choice).
- The client gives us a new connection to use with session 1.
- The client destroys session 2.

At this point our only choice is to go back to using session 1. When we
do so we must use the sequence number that is next for session 1. We
therefore need to maintain multiple sequence number streams.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 90c8145b 14-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: use client pointer to backchannel session

Instead of copying the sessionid, use the new cl_cb_session pointer,
which indicates which session we're using for the backchannel.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# edd76786 14-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: move callback setup into session init code

The backchannel should be associated with a session, it isn't really
global to the client.

We do, however, want a pointer global to the client which tracks which
session we're currently using for client-based callbacks.

This is a first step in that direction; for now, just reshuffling of
code with no significant change in behavior.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 19cf5c02 06-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: use callbacks on svc_xprt_deletion

Remove connections from the list when they go down.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c7662518 06-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: keep per-session list of connections

The spec requires us in various places to keep track of the connections
associated with each session.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6ff8da08 04-Jun-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: Move callback setup to callback queue

Instead of creating the new rpc client from a regular server thread,
set a flag, kick off a null call, and allow the null call to do the work
of setting up the client on the callback workqueue.

Use a spinlock to ensure the callback work gets a consistent view of the
callback parameters.

This allows, for example, changing the callback from contexts where
sleeping is not allowed. I hope it will also keep the locking simple as
we add more session and trunking features, by serializing most of the
callback-specific work.

This also closes a small race where the the new cb_ident could be used
with an old connection (or vice-versa).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# fb003923 31-May-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: remove separate cb_args struct

I don't see the point of the separate struct. It seems to just be
getting in the way.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# cee277d9 26-May-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: use generic callback code in null case

This will eventually allow us, for example, to kick off null callback
from contexts where we can't sleep.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5878453d 16-May-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: generic callback code

Make the recall callback code more generic, so that other callbacks
will be able to use it too.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1c855602 26-May-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: rename nfs4_rpc_args->nfsd4_cb_args

With apologies for the gratuitous rename, the new name seems more
helpful to me.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 586f3673 26-May-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequence

These two structs don't really need to be distinct as far as I can tell.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7d947842 20-Aug-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix downgrade/lock logic

If we already had a RW open for a file, and get a readonly open, we were
piggybacking on the existing RW open. That's inconsistent with the
downgrade logic which blows away the RW open assuming you'll still have
a readonly open.

Also, make sure there is a readonly or writeonly open available for
locking, again to prevent bad behavior in downgrade cases when any RW
open may be lost.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 18608ad4 20-Aug-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: typo fix in find_any_file

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f9d7562f 08-Jul-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: share file descriptors between stateid's

The vfs doesn't really allow us to "upgrade" a file descriptor from
read-only to read-write, and our attempt to do so in nfs4_upgrade_open
is ugly and incomplete.

Move to a different scheme where we keep multiple opens, shared between
open stateid's, in the nfs4_file struct. Each file will be opened at
most 3 times (for read, write, and read-write), and those opens will be
shared between all clients and openers. On upgrade we will do another
open if necessary instead of attempting to upgrade an existing open.
We keep count of the number of readers and writers so we know when to
close the shared files.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# d7682988 11-May-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: keep a reference count on client while in use

Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.

Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 07cd4909 11-May-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: mark_client_expired

Mark the client as expired under the client_lock so it won't be renewed
when an nfsv4.1 session is done, after it was explicitly expired
during processing of the compound.

Do not renew a client mark as expired (in particular, it is not
on the lru list anymore)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 46583e25 11-May-2010 Benny Halevy <bhalevy@panasas.com>

nfsd4: introduce nfs4_client.cl_refcount

Currently just initialize the cl_refcount to 1
and decrement in expire_client(), conditionally freeing the
client when the refcount reaches 0.

To be used later by nfsv4.1 compounds to keep the client from
timing out while in use.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 4b21d0de 07-Mar-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: allow 4.0 clients to change callback path

The rfc allows a client to change the callback parameters, but we didn't
previously implement it.

Teach the callbacks to rerun themselves (by placing themselves on a
workqueue) when they recognize that their rpc task has been killed and
that the callback connection has changed.

Then we can change the callback connection by setting up a new rpc
client, modifying the nfs4 client to point at it, waiting for any work
in progress to complete, and then shutting down the old client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2bf23875 07-Mar-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: rearrange cb data structures

Mainly I just want to separate the arguments used for setting up the tcp
client from the rest.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# b12a05cb 04-Mar-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: cl_count is unused

Now that the shutdown sequence guarantees callbacks are shut down before
the client is destroyed, we no longer have a use for cl_count.

We'll probably reinstate a reference count on the client some day, but
it will be held by users other than callbacks.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# b5a1a81e 03-Mar-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: don't sleep in lease-break callback

The NFSv4 server's fl_break callback can sleep (dropping the BKL), in
order to allocate a new rpc task to send a recall to the client.

As far as I can tell this doesn't cause any races in the current code,
but the analysis is difficult. Also, the sleep here may complicate the
move away from the BKL.

So, just schedule some work to do the job for us instead. The work will
later also prove useful for restarting a call after the callback
information is changed.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 227f98d9 18-Feb-2010 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: preallocate nfs4_rpc_args

Instead of allocating this small structure, just include it in the
delegation.

The nfsd4_callback structure isn't really necessary yet, but we plan to
add to it all the information necessary to perform a callback.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7663dacd 04-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: remove pointless paths in file headers

The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1557aca7 04-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: move most of nfsfh.h to fs/nfsd

Most of this can be trivially moved to a private header as well.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a74af21 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Move private headers to source directory

Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>