History log of /linux-master/fs/nfsd/nfssvc.c
Revision Date Author Comments
# 16fb9808 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

nfsd: make svc_stat per-network namespace instead of global

The final bit of stats that is global is the rpc svc_stat. Move this
into the nfsd_net struct and use that everywhere instead of the global
struct. Remove the unused global struct.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e41ee44c 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

nfsd: remove nfsd_stats, make th_cnt a global counter

This is the last global stat, take it out of the nfsd_stats struct and
make it a global part of nfsd, report it the same as always.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3f6ef182 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

sunrpc: remove ->pg_stats from svc_program

Now that this isn't used anywhere, remove it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f0943238 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

sunrpc: pass in the sv_stats struct through svc_create_pooled

Since only one service actually reports the rpc stats there's not much
of a reason to have a pointer to it in the svc_program struct. Adjust
the svc_create_pooled function to take the sv_stats as an argument and
pass the struct through there as desired instead of getting it from the
svc_program->pg_stats.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# a2214ed5 26-Jan-2024 Josef Bacik <josef@toxicpanda.com>

nfsd: stop setting ->pg_stats for unused stats

A lot of places are setting a blank svc_stats in ->pg_stats and never
utilizing these stats. Remove all of these extra structs as we're not
reporting these stats anywhere.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ffb40259 14-Dec-2023 NeilBrown <neilb@suse.de>

nfsd: Don't leave work of closing files to a work queue

The work of closing a file can have non-trivial cost. Doing it in a
separate work queue thread means that cost isn't imposed on the nfsd
threads and an imbalance can be created. This can result in files being
queued for the work queue more quickly that the work queue can process
them, resulting in unbounded growth of the queue and memory exhaustion.

To avoid this work imbalance that exhausts memory, this patch moves all
closing of files into the nfsd threads. This means that when the work
imposes a cost, that cost appears where it would be expected - in the
work of the nfsd thread. A subsequent patch will ensure the final
__fput() is called in the same (nfsd) thread which calls filp_close().

Files opened for NFSv3 are never explicitly closed by the client and are
kept open by the server in the "filecache", which responds to memory
pressure, is garbage collected even when there is no pressure, and
sometimes closes files when there is particular need such as for rename.
These files currently have filp_close() called in a dedicated work
queue, so their __fput() can have no effect on nfsd threads.

This patch discards the work queue and instead has each nfsd thread call
flip_close() on as many as 8 files from the filecache each time it acts
on a client request (or finds there are no pending client requests). If
there are more to be closed, more threads are woken. This spreads the
work of __fput() over multiple threads and imposes any cost on those
threads.

The number 8 is somewhat arbitrary. It needs to be greater than 1 to
ensure that files are closed more quickly than they can be added to the
cache. It needs to be small enough to limit the per-request delays that
will be imposed on clients when all threads are busy closing files.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 17419aef 14-Dec-2023 NeilBrown <neilb@suse.de>

nfsd: rename nfsd_last_thread() to nfsd_destroy_serv()

As this function now destroys the svc_serv, this is a better name.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1e3577a4 14-Dec-2023 NeilBrown <neilb@suse.de>

SUNRPC: discard sv_refcnt, and svc_get/svc_put

sv_refcnt is no longer useful.
lockd and nfs-cb only ever have the svc active when there are a non-zero
number of threads, so sv_refcnt mirrors sv_nrthreads.

nfsd also keeps the svc active between when a socket is added and when
the first thread is started, but we don't really need a refcount for
that. We can simply not destroy the svc while there are any permanent
sockets attached.

So remove sv_refcnt and the get/put functions.
Instead of a final call to svc_put(), call svc_destroy() instead.
This is changed to also store NULL in the passed-in pointer to make it
easier to avoid use-after-free situations.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7b207ccd 14-Dec-2023 NeilBrown <neilb@suse.de>

svc: don't hold reference for poolstats, only mutex.

A future patch will remove refcounting on svc_serv as it is of little
use.
It is currently used to keep the svc around while the pool_stats file is
open.
Change this to get the pointer, protected by the mutex, only in
seq_start, and the release the mutex in seq_stop.
This means that if the nfsd server is stopped and restarted while the
pool_stats file it open, then some pool stats info could be from the
first instance and some from the second. This might appear odd, but is
unlikely to be a problem in practice.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f3734cc4 26-Oct-2023 Oleg Nesterov <oleg@redhat.com>

NFSD: use read_seqbegin() rather than read_seqbegin_or_lock()

The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

[ cel: Note also that it eliminates this Sparse warning:

fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' -
different lock contexts for basic block

]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2a501f55 14-Dec-2023 NeilBrown <neilb@suse.de>

nfsd: call nfsd_last_thread() before final nfsd_put()

If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put()
without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing
to a structure that has been freed.

So remove 'static' from nfsd_last_thread() and call it when the
nfsd_serv is about to be destroyed.

Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ae191417 15-Dec-2023 Jens Axboe <axboe@kernel.dk>

cred: get rid of CONFIG_DEBUG_CREDENTIALS

This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bf51c52a 10-Nov-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix checksum mismatches in the duplicate reply cache

nfsd_cache_csum() currently assumes that the server's RPC layer has
been advancing rq_arg.head[0].iov_base as it decodes an incoming
request, because that's the way it used to work. On entry, it
expects that buf->head[0].iov_base points to the start of the NFS
header, and excludes the already-decoded RPC header.

These days however, head[0].iov_base now points to the start of the
RPC header during all processing. It no longer points at the NFS
Call header when execution arrives at nfsd_cache_csum().

In a retransmitted RPC the XID and the NFS header are supposed to
be the same as the original message, but the contents of the
retransmitted RPC header can be different. For example, for krb5,
the GSS sequence number will be different between the two. Thus if
the RPC header is always included in the DRC checksum computation,
the checksum of the retransmitted message might not match the
checksum of the original message, even though the NFS part of these
messages is identical.

The result is that, even if a matching XID is found in the DRC,
the checksum mismatch causes the server to execute the
retransmitted RPC transaction again.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 1caf5f61 10-Nov-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()

The "statp + 1" pointer that is passed to nfsd_cache_update() is
supposed to point to the start of the egress NFS Reply header. In
fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.

But both krb5i and krb5p add fields between the RPC header's
accept_stat field and the start of the NFS Reply header. In those
cases, "statp + 1" points at the extra fields instead of the Reply.
The result is that nfsd_cache_update() caches what looks to the
client like garbage.

A connection break can occur for a number of reasons, but the most
common reason when using krb5i/p is a GSS sequence number window
underrun. When an underrun is detected, the server is obliged to
drop the RPC and the connection to force a retransmit with a fresh
GSS sequence number. The client presents the same XID, it hits in
the server's DRC, and the server returns the garbage cache entry.

The "statp + 1" argument has been used since the oldest changeset
in the kernel history repo, so it has been in nfsd_dispatch()
literally since before history began. The problem arose only when
the server-side GSS implementation was added twenty years ago.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bf320752 24-Sep-2023 NeilBrown <neilb@suse.de>

NFSD: simplify error paths in nfsd_svc()

The error paths in nfsd_svc() are needlessly complex and can result in a
final call to svc_put() without nfsd_last_thread() being called. This
results in the listening sockets not being closed properly.

The per-netns setup provided by nfsd_startup_new() and removed by
nfsd_shutdown_net() is needed precisely when there are running threads.
So we don't need nfsd_up_before. We don't need to know if it *was* up.
We only need to know if any threads are left. If none are, then we must
call nfsd_shutdown_net(). But we don't need to do that explicitly as
nfsd_last_thread() does that for us.

So simply call nfsd_last_thread() before the last svc_put() if there are
no running threads. That will always do the right thing.

Also discard:
pr_info("nfsd: last server has exited, flushing export cache\n");
It may not be true if an attempt to start the first server failed, and
it isn't particularly helpful and it simply reports normal behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bd9d6a3e 11-Sep-2023 Lorenzo Bianconi <lorenzo@kernel.org>

NFSD: add rpc_status netlink support

Introduce rpc_status netlink support for NFSD in order to dump pending
RPC requests debugging information from userspace.

Closes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=366
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2e8fc923 11-Sep-2023 NeilBrown <neilb@suse.de>

SUNRPC: change sp_nrthreads to atomic_t

Using an atomic_t avoids the need to take a spinlock (which can soon be
removed).

Choosing a thread to kill needs to be careful as we cannot set the "die
now" bit atomically with the test on the count. Instead we temporarily
increase the count.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fa341560 11-Sep-2023 NeilBrown <neilb@suse.de>

SUNRPC: change how svc threads are asked to exit.

svc threads are currently stopped using kthread_stop(). This requires
identifying a specific thread. However we don't care which thread
stops, just as long as one does.

So instead, set a flag in the svc_pool to say that a thread needs to
die, and have each thread check this flag instead of calling
kthread_should_stop(). The first thread to find and clear this flag
then moves towards exiting.

This removes an explicit dependency on sp_all_threads which will make a
future patch simpler.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 88956eab 11-Sep-2023 NeilBrown <neilb@suse.de>

NFSD: fix possible oops when nfsd/pool_stats is closed.

If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then
when the file is closed a NULL pointer is dereferenced.
This is because nfsd_pool_stats_release() assumes that the
pointer to the svc_serv cannot become NULL while a reference is held.

This used to be the case but a recent patch split nfsd_last_thread() out
from nfsd_put(), and clearing the pointer is done in nfsd_last_thread().

This is easily reproduced by running
rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats

Fortunately nfsd_pool_stats_release() has easy access to the svc_serv
pointer, and so can call svc_put() on it directly.

Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c743b425 18-Jul-2023 NeilBrown <neilb@suse.de>

SUNRPC: remove timeout arg from svc_recv()

Most svc threads have no interest in a timeout.
nfsd sets it to 1 hour, but this is a wart of no significance.

lockd uses the timeout so that it can call nlmsvc_retry_blocked().
It also sometimes calls svc_wake_up() to ensure this is called.

So change lockd to be consistent and always use svc_wake_up() to trigger
nlmsvc_retry_blocked() - using a timer instead of a timeout to
svc_recv().

And change svc_recv() to not take a timeout arg.

This makes the sp_threads_timedout counter always zero.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 7b719e2b 18-Jul-2023 NeilBrown <neilb@suse.de>

SUNRPC: change svc_recv() to return void.

svc_recv() currently returns a 0 on success or one of two errors:
- -EAGAIN means no message was successfully received
- -EINTR means the thread has been told to stop

Previously nfsd would stop as the result of a signal as well as
following kthread_stop(). In that case the difference was useful: EINTR
means stop unconditionally. EAGAIN means stop if kthread_should_stop(),
continue otherwise.

Now threads only exit when kthread_should_stop() so we don't need the
distinction.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f78116d3 18-Jul-2023 NeilBrown <neilb@suse.de>

SUNRPC: call svc_process() from svc_recv().

All callers of svc_recv() go on to call svc_process() on success.
Simplify callers by having svc_recv() do that for them.

This loses one call to validate_process_creds() in nfsd. That was
debugging code added 14 years ago. I don't think we need to keep it.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9f28a971 31-Jul-2023 NeilBrown <neilb@suse.de>

nfsd: separate nfsd_last_thread() from nfsd_put()

Now that the last nfsd thread is stopped by an explicit act of calling
svc_set_num_threads() with a count of zero, we only have a limited
number of places that can happen, and don't need to call
nfsd_last_thread() in nfsd_put()

So separate that out and call it at the two places where the number of
threads is set to zero.

Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all()
into nfsd_last_thread(), as they are really part of the same action.

nfsd_put() is now a thin wrapper around svc_put(), so make it a static
inline.

nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of
places we have to use svc_put() instead.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 18e4cf91 31-Jul-2023 NeilBrown <neilb@suse.de>

nfsd: Simplify code around svc_exit_thread() call in nfsd()

Previously a thread could exit asynchronously (due to a signal) so some
care was needed to hold nfsd_mutex over the last svc_put() call. Now a
thread can only exit when svc_set_num_threads() is called, and this is
always called under nfsd_mutex. So no care is needed.

Not only is the mutex held when a thread exits now, but the svc refcount
is elevated, so the svc_put() in svc_exit_thread() will never be a final
put, so the mutex isn't even needed at this point in the code.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 39039024 18-Jul-2023 NeilBrown <neilb@suse.de>

nfsd: don't allow nfsd threads to be signalled.

The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.

In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.

This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.

This will allow for some simplification in later patches.

A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# e7421ce7 09-Jul-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename struct svc_cacherep

The svc_ prefix is identified with the SunRPC layer. Although the
duplicate reply cache caches RPC replies, it is only for the NFS
protocol. Rename the struct to better reflect its purpose.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cb18eca4 09-Jul-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove svc_rqst::rq_cacherep

Over time I'd like to see NFS-specific fields moved out of struct
svc_rqst, which is an RPC layer object. These fields are layering
violations.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5e092be7 16-Jun-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: Distinguish per-net namespace initialization

I find the naming of nfsd_init_net() and nfsd_startup_net() to be
confusingly similar. Rename the namespace initialization and tear-
down ops and add comments to distinguish their separate purposes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 90d21755 14-Feb-2023 Chuck Lever <chuck.lever@oracle.com>

NFSD: copy the whole verifier in nfsd_copy_write_verifier

Currently, we're only memcpy'ing the first __be32. Ensure we copy into
both words.

Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f5f9d4a3 11-Jan-2023 Jeff Layton <jlayton@kernel.org>

nfsd: move reply cache initialization into nfsd startup

There's no need to start the reply cache before nfsd is up and running,
and doing so means that we register a shrinker for every net namespace
instead of just the ones where nfsd is running.

Move it to the per-net nfsd startup instead.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cee4db19 08-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Refactor RPC server dispatch method

Currently, svcauth_gss_accept() pre-reserves response buffer space
for the RPC payload length and GSS sequence number before returning
to the dispatcher, which then adds the header's accept_stat field.

The problem is the accept_stat field is supposed to go before the
length and seq_num fields. So svcauth_gss_release() has to relocate
the accept_stat value (see svcauth_gss_prepare_to_wrap()).

To enable these fields to be added to the response buffer in the
correct (final) order, the pointer to the accept_stat has to be made
available to svcauth_gss_accept() so that it can set it before
reserving space for the length and seq_num fields.

As a first step, move the pointer to the location of the accept_stat
field into struct svc_rqst.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8dd41d70 08-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Push svcxdr_init_encode() into svc_process_common()

Now that all vs_dispatch functions invoke svcxdr_init_encode(), it
is common code and can be pushed down into the generic RPC server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# dba5eaa4 01-Jan-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Push svcxdr_init_decode() into svc_process_common()

Now that all vs_dispatch functions invoke svcxdr_init_decode(), it
is common code and can be pushed down into the generic RPC server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 789e1e10 22-Dec-2022 Jeff Layton <jlayton@kernel.org>

nfsd: shut down the NFSv4 state objects before the filecache

Currently, we shut down the filecache before trying to clean up the
stateids that depend on it. This leads to the kernel trying to free an
nfsd_file twice, and a refcount overput on the nf_mark.

Change the shutdown procedure to tear down all of the stateids prior
to shutting down the filecache.

Reported-and-tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Fixes: 5e113224c17e ("nfsd: nfsd_file cache entries should be per net namespace")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 93155647 26-Nov-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Use only RQ_DROPME to signal the need to drop a reply

Clean up: NFSv2 has the only two usages of rpc_drop_reply in the
NFSD code base. Since NFSv2 is going away at some point, replace
these in order to simplify the "drop this reply?" check in
nfsd_dispatch().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>


# 2f3a4b2a 18-Oct-2022 Jeff Layton <jlayton@kernel.org>

nfsd: allow disabling NFSv2 at compile time

rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical
step toward deprecating it and allow NFSv2 support to be compiled out.

Add a new CONFIG_NFSD_V2 option that can be turned off and rework the
CONFIG_NFSD_V?_ACL option dependencies. Add a description that
discourages enabling it.

Also, change the description of CONFIG_NFSD to state that the always-on
version is now 3 instead of 2.

Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2
is disabled at compile time, this should make the kernel ignore attempts
to disable it at runtime, but still error out when trying to enable it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 72f78ae0 18-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

NFSD: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5f9a62ff 05-Feb-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove CONFIG_NFSD_V3

Eventually support for NFSv2 in the Linux NFS server is to be
deprecated and then removed.

However, NFSv2 is the "always supported" version that is available
as soon as CONFIG_NFSD is set. Before NFSv2 support can be removed,
we need to choose a different "always supported" version.

This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported,
as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become
the only "always supported" NFS version.

The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 37902c63 15-Feb-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Move svc_serv_ops::svo_function into struct svc_serv

Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# f49169c9 15-Feb-2022 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove svc_serv_ops::svo_module

struct svc_serv_ops is about to be removed.

Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.

A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.

Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c7d7ec8f 26-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove svc_shutdown_net()

Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.

svc_close_net() is renamed to be the inverse of svc_xprt_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 352ad314 26-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Rename svc_create_xprt()

Clean up: Use the "svc_xprt_<task>" function naming convention as
is used for other external APIs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 87cdd864 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove svo_shutdown method

Clean up. Neil observed that "any code that calls svc_shutdown_net()
knows what the shutdown function should be, and so can call it
directly."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>


# a9ff2e99 25-Jan-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove the .svo_enqueue_xprt method

We have never been able to track down and address the underlying
cause of the performance issues with workqueue-based service
support. svo_enqueue_xprt is called multiple times per RPC, so
it adds instruction path length, but always ends up at the same
function: svc_xprt_do_enqueue(). We do not anticipate needing
this flexibility for dynamic nfsd thread management support.

As a micro-optimization, remove .svo_enqueue_xprt because
Spectre/Meltdown makes virtual function calls more costly.

This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc:
turn enqueueing a svc_xprt into a svc_serv operation").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ca3574bd 03-Dec-2021 Eric W. Biederman <ebiederm@xmission.com>

exit: Rename module_put_and_exit to module_put_and_kthread_exit

Update module_put_and_exit to call kthread_exit instead of do_exit.

Change the name to reflect this change in functionality. All of the
users of module_put_and_exit are causing the current kthread to exit
so this change makes it clear what is happening. There is no
functional change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 3988a578 30-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Rename boot verifier functions

Clean up: These functions handle what the specs call a write
verifier, which in the Linux NFS server implementation is now
divorced from the server's boot instance

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 91d2e9b5 29-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up the nfsd_net::nfssvc_boot field

There are two boot-time fields in struct nfsd_net: one called
boot_time and one called nfssvc_boot. The latter is used only to
form write verifiers, but its documenting comment declares:

/* Time of server startup */

Since commit 27c438f53e79 ("nfsd: Support the server resetting the
boot verifier"), this field can be reset at any time; it's no
longer tied to server restart. So that comment is stale.

Also, according to pahole, struct timespec64 is 16 bytes long on
x86_64. The nfssvc_boot field is used only to form a write verifier,
which is 8 bytes long.

Let's clarify this situation by manufacturing an 8-byte verifier
in nfs_reset_boot_verifier() and storing only that in struct
nfsd_net.

We're grabbing 128 bits of time, so compress all of those into a
64-bit verifier instead of throwing out the high-order bits.
In the future, the siphash_key can be re-used for other hashed
objects per-nfsd_net.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cdc55660 30-Dec-2021 Chuck Lever <chuck.lever@oracle.com>

NFSD: Write verifier might go backwards

When vfs_iter_write() starts to fail because a file system is full,
a bunch of writes can fail at once with ENOSPC. These writes
repeatedly invoke nfsd_reset_boot_verifier() in quick succession.

Ensure that the time it grabs doesn't go backwards due to an ntp
adjustment going on at the same time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d057cfec 28-Nov-2021 NeilBrown <neilb@suse.de>

NFSD: simplify locking for network notifier.

nfsd currently maintains an open-coded read/write semaphore (refcount
and wait queue) for each network namespace to ensure the nfs service
isn't shut down while the notifier is running.

This is excessive. As there is unlikely to be contention between
notifiers and they run without sleeping, a single spinlock is sufficient
to avoid problems.

Signed-off-by: NeilBrown <neilb@suse.de>
[ cel: ensure nfsd_notifier_lock is static ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3ebdbe52 28-Nov-2021 NeilBrown <neilb@suse.de>

SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()

The ->svo_setup callback serves no purpose. It is always called from
within the same module that chooses which callback is needed. So
discard it and call the relevant function directly.

Now that svc_set_num_threads() is no longer used remove it and rename
svc_set_num_threads_sync() to remove the "_sync" suffix.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 3409e4f1 28-Nov-2021 NeilBrown <neilb@suse.de>

NFSD: Make it possible to use svc_set_num_threads_sync

nfsd cannot currently use svc_set_num_threads_sync. It instead
uses svc_set_num_threads which does *not* wait for threads to all
exit, and has a separate mechanism (nfsd_shutdown_complete) to wait
for completion.

The reason that nfsd is unlike other services is that nfsd threads can
exit separately from svc_set_num_threads being called - they die on
receipt of SIGKILL. Also, when the last thread exits, the service must
be shut down (sockets closed).

For this, the nfsd_mutex needs to be taken, and as that mutex needs to
be held while svc_set_num_threads is called, the one cannot wait for
the other.

This patch changes the nfsd thread so that it can drop the ref on the
service without blocking on nfsd_mutex, so that svc_set_num_threads_sync
can be used:
- if it can drop a non-last reference, it does that. This does not
trigger shutdown and does not require a mutex. This will likely
happen for all but the last thread signalled, and for all threads
being shut down by nfsd_shutdown_threads()
- if it can get the mutex without blocking (trylock), it does that
and then drops the reference. This will likely happen for the
last thread killed by SIGKILL
- Otherwise there might be an unrelated task holding the mutex,
possibly in another network namespace, or nfsd_shutdown_threads()
might be just about to get a reference on the service, after which
we can drop ours safely.
We cannot conveniently get wakeup notifications on these events,
and we are unlikely to need to, so we sleep briefly and check again.

With this we can discard nfsd_shutdown_complete and
nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9d3792ae 28-Nov-2021 NeilBrown <neilb@suse.de>

NFSD: narrow nfsd_mutex protection in nfsd thread

There is nothing happening in the start of nfsd() that requires
protection by the mutex, so don't take it until shutting down the thread
- which does still require protection - but only for nfsd_put().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 2a36395f 28-Nov-2021 NeilBrown <neilb@suse.de>

SUNRPC: use sv_lock to protect updates to sv_nrthreads.

Using sv_lock means we don't need to hold the service mutex over these
updates.

In particular, svc_exit_thread() no longer requires synchronisation, so
threads can exit asynchronously.

Note that we could use an atomic_t, but as there are many more read
sites than writes, that would add unnecessary noise to the code.
Some reads are already racy, and there is no need for them to not be.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9b6c8c9b 28-Nov-2021 NeilBrown <neilb@suse.de>

nfsd: make nfsd_stats.th_cnt atomic_t

This allows us to move the updates for th_cnt out of the mutex.
This is a step towards reducing mutex coverage in nfsd().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# ec52361d 28-Nov-2021 NeilBrown <neilb@suse.de>

SUNRPC: stop using ->sv_nrthreads as a refcount

The use of sv_nrthreads as a general refcount results in clumsy code, as
is seen by various comments needed to explain the situation.

This patch introduces a 'struct kref' and uses that for reference
counting, leaving sv_nrthreads to be a pure count of threads. The kref
is managed particularly in svc_get() and svc_put(), and also nfsd_put();

svc_destroy() now takes a pointer to the embedded kref, rather than to
the serv.

nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This
happens when a transport is created before the first thread is started.
To support this, a 'keep_active' flag is introduced which holds a ref on
the svc_serv. This is set when any listening socket is successfully
added (unless there are running threads), and cleared when the number of
threads is set. So when the last thread exits, the nfs_serv will be
destroyed.
The use of 'keep_active' replaces previous code which checked if there
were any permanent sockets.

We no longer clear ->rq_server when nfsd() exits. This was done
to prevent svc_exit_thread() from calling svc_destroy().
Instead we take an extra reference to the svc_serv to prevent
svc_destroy() from being called.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 8c62d127 28-Nov-2021 NeilBrown <neilb@suse.de>

SUNRPC/NFSD: clean up get/put functions.

svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
it might just reduce the ref count.
nfsd_destroy() is poorly named for the same reason.

This patch:
- removes the refcount functionality from svc_destroy(), moving it to
a new svc_put(). Almost all previous callers of svc_destroy() now
call svc_put().
- renames nfsd_destroy() to nfsd_put() and improves the code, using
the new svc_destroy() rather than svc_put()
- removes a few comments that explain the important for balanced
get/put calls. This should be obvious.

The only non-trivial part of this is that svc_destroy() would call
svc_sock_update() on a non-final decrement. It can no longer do that,
and svc_put() isn't really a good place of it. This call is now made
from svc_exit_thread() which seems like a good place. This makes the
call *before* sv_nrthreads is decremented rather than after. This
is not particularly important as the call just sets a flag which
causes sv_nrthreads set be checked later. A subsequent patch will
improve the ordering.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 291cd656 18-Oct-2021 Changcheng Deng <deng.changcheng@zte.com.cn>

NFSD:fix boolreturn.cocci warning

./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function
'nfssvc_decode_voidarg' with return type bool

Return statements in functions returning bool should use true/false
instead of 1/0.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 130e2054 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_encode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_encode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fda49441 13-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_encode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.

Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c44b31c2 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change return value type of .pc_decode

Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_decode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 16c66364 12-Oct-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace the "__be32 *p" parameter to .pc_decode

The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.

Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f4e44b39 21-May-2021 Dai Ngo <dai.ngo@oracle.com>

NFSD: delay unmount source's export after inter-server copy completed.

Currently the source's export is mounted and unmounted on every
inter-server copy operation. This patch is an enhancement to delay
the unmount of the source export for a certain period of time to
eliminate the mount and unmount overhead on subsequent copy operations.

After a copy operation completes, a work entry is added to the
delayed unmount list with an expiration time. This list is serviced
by the laundromat thread to unmount the export of the expired entries.
Each time the export is being used again, its expiration time is
extended and the entry is re-inserted to the tail of the list.

The unmount task and the mount operation of the copy request are
synced to make sure the export is not unmounted while it's being
used.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 70c53075 15-Apr-2021 Vasily Averin <vvs@virtuozzo.com>

nfsd: removed unused argument in nfsd_startup_generic()

Since commit 501cb1849f86 ("nfsd: rip out the raparms cache")
nrservs is not used in nfsd_startup_generic()

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# b73ac680 06-Apr-2021 Guobin Huang <huangguobin4@huawei.com>

NFSD: Use DEFINE_SPINLOCK() for spinlock

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# c6c7f2a8 13-Mar-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted

In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.

This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# bddfdbcd 27-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5650682e 20-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove argument length checking in nfsd_dispatch()

Now that the argument decoders for NFSv2 and NFSv3 use the
xdr_stream mechanism, the version-specific length checking logic in
nfsd_dispatch() is no longer necessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d6c9e436 17-Dec-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix sparse warning in nfssvc.c

fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static?

The parameter was added by commit ce0887ac96d3 ("NFSD add nfs4 inter
ssc to nfsd4_copy"). Relocate it into the source file that uses it,
and make it static. This approach is similar to the
nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable
module parameters.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4420440c 26-Nov-2020 kazuo ito <kzpn200@gmail.com>

nfsd: Fix message level for normal termination

The warning message from nfsd terminating normally
can confuse system adminstrators or monitoring software.

Though it's not exactly fair to pin-point a commit where it
originated, the current form in the current place started
to appear in:

Fixes: e096bbc6488d ("knfsd: remove special handling for SIGHUP")
Signed-off-by: kazuo ito <kzpn200@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 0dfdad1c 19-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add tracepoints in nfsd_dispatch()

For troubleshooting purposes, record GARBAGE_ARGS and CANT_ENCODE
failures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 788f7183 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Add common helpers to decode void args and encode void results

Start off the conversion to xdr_stream by de-duplicating the functions
that decode void arguments and encode void results.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 5191955d 05-Nov-2020 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Prepare for xdr_stream-style decoding on the server-side

A "permanent" struct xdr_stream is allocated in struct svc_rqst so
that it is usable by all server-side decoders. A per-rqst scratch
buffer is also allocated to handle decoding XDR data items that
cross page boundaries.

To demonstrate how it will be used, add the first call site for the
new svcxdr_init_decode() API.

As an additional part of the overall conversion, add symbolic
constants for successful and failed XDR operations. Returning "0" is
overloaded. Sometimes it means something failed, but sometimes it
means success. To make it more clear when XDR decoding functions
succeed or fail, introduce symbolic constants.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# cc028a10 02-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Hoist status code encoding into XDR encoder functions

The original intent was presumably to reduce code duplication. The
trade-off was:

- No support for an NFSD proc function returning a non-success
RPC accept_stat value.
- No support for void NFS replies to non-NULL procedures.
- Everyone pays for the deduplication with a few extra conditional
branches in a hot path.

In addition, nfsd_dispatch() leaves *statp uninitialized in the
success path, unlike svc_generic_dispatch().

Address all of these problems by moving the logic for encoding
the NFS status code into the NFS XDR encoders themselves. Then
update the NFS .pc_func methods to return an RPC accept_stat
value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4b74fd79 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Map nfserr_wrongsec outside of nfsd_dispatch

Refactor: Handle this NFS version-specific mapping in the only
place where nfserr_wrongsec is generated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f0af2210 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Call NFSv2 encoders on error returns

Remove special dispatcher logic for NFSv2 error responses. These are
rare to the point of becoming extinct, but all NFS responses have to
pay the cost of the extra conditional branches.

With this change, the NFSv2 error cases now get proper
xdr_ressize_check() calls.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 85085aac 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Refactor nfsd_dispatch() error paths

nfsd_dispatch() is a hot path. Ensure the compiler takes the
processing of rare error cases out of line.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4c96cb56 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up nfsd_dispatch() variables

For consistency and code legibility, use a similar organization of
variables as svc_generic_dispatch().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 383c440d 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up stale comments in nfsd_dispatch()

Add a documenting comment for the function. Remove comments that
simply describe obvious aspects of the code, but leave comments
that explain the differences in processing of each NFS version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 84c138e7 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Clean up switch statement in nfsd_dispatch()

Reorder the arms so the compiler places checks for the most frequent
case first.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# dcc46991 01-Oct-2020 Chuck Lever <chuck.lever@oracle.com>

NFSD: Encoder and decoder functions are always present

nfsd_dispatch() is a hot path. Let's optimize the XDR method calls
for the by-far common case, which is that the XDR methods are indeed
present.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 13956160 25-Sep-2020 J. Bruce Fields <bfields@redhat.com>

nfsd: rq_lease_breaker cleanup

Since only the v4 code cares about it, maybe it's better to leave
rq_lease_breaker out of the common dispatch code?

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 44fb26c6 11-May-2020 Ma Feng <mafeng.ma@huawei.com>

nfsd: Fix old-style function definition

Fix warning:

fs/nfsd/nfssvc.c:604:6: warning: old-style function definition [-Wold-style-definition]
bool i_am_nfsd()
^~~~~~~~~

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ma Feng <mafeng.ma@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 28df3d15 28-Jul-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: clients don't need to break their own delegations

We currently revoke read delegations on any write open or any operation
that modifies file data or metadata (including rename, link, and
unlink). But if the delegation in question is the only read delegation
and is held by the client performing the operation, that's not really
necessary.

It's not always possible to prevent this in the NFSv4.0 case, because
there's not always a way to determine which client an NFSv4.0 delegation
came from. (In theory we could try to guess this from the transport
layer, e.g., by assuming all traffic on a given TCP connection comes
from the same client. But that's not really correct.)

In the NFSv4.1 case the session layer always tells us the client.

This patch should remove such self-conflicts in all cases where we can
reliably determine the client from the compound.

To do that we need to track "who" is performing a given (possibly
lease-breaking) file operation. We're doing that by storing the
information in the svc_rqst and using kthread_data() to map the current
task back to a svc_rqst.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7627d7dc 19-Feb-2020 Scott Mayhew <smayhew@redhat.com>

nfsd: set the server_scope during service startup

Currently, nfsd4_encode_exchange_id() encodes the utsname nodename
string in the server_scope field. In a multi-host container
environemnt, if an nfsd container is restarted on a different host than
it was originally running on, clients will see a server_scope mismatch
and will not attempt to reclaim opens.

Instead, set the server_scope while we're in a process context during
service startup, so we get the utsname nodename of the current process
and store that in nfsd_net.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[bfields: fix up major_id too]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 9542e6a6 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: Containerise filecache laundrette

Ensure that if the filecache laundrette gets stuck, it only affects
the knfsd instances of one container.

The notifier callbacks can be called from various contexts so avoid
using synchonous filesystem operations that might deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e44b4bf2 24-Dec-2019 zhengbin <zhengbin13@huawei.com>

nfsd: use true,false for bool variable in nfssvc.c

Fixes coccicheck warning:

fs/nfsd/nfssvc.c:394:2-14: WARNING: Assignment of 0/1 to bool variable
fs/nfsd/nfssvc.c:407:2-14: WARNING: Assignment of 0/1 to bool variable
fs/nfsd/nfssvc.c:422:2-14: WARNING: Assignment of 0/1 to bool variable

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ce0887ac 09-Oct-2019 Olga Kornievskaia <kolga@netapp.com>

NFSD add nfs4 inter ssc to nfsd4_copy

Given a universal address, mount the source server from the destination
server. Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enable".

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>


# 7c149057 19-Nov-2019 J. Bruce Fields <bfields@redhat.com>

nfsd: restore NFSv3 ACL support

An error in e333f3bbefe3 left the nfsd_acl_program->pg_vers array empty,
which effectively turned off the server's support for NFSv3 ACLs.

Fixes: e333f3bbefe3 "nfsd: Allow containers to set supported nfs versions"
Cc: stable@vger.kernel.org
Cc: Trond Myklebust <trondmy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 65643f4c 22-Sep-2019 YueHaibing <yuehaibing@huawei.com>

nfsd: Make nfsd_reset_boot_verifier_locked static

Fix sparse warning:

fs/nfsd/nfssvc.c:364:6: warning:
symbol 'nfsd_reset_boot_verifier_locked' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 27c438f5 02-Sep-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Support the server resetting the boot verifier

Add support to allow the server to reset the boot verifier in order to
force clients to resend I/O after a timeout failure.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5e113224 02-Sep-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: nfsd_file cache entries should be per net namespace

Ensure that we can safely clear out the file cache entries when the
nfs server is shut down on a container. Otherwise, the file cache
may end up pinning the mounts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 501cb184 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: rip out the raparms cache

The raparms cache was set up in order to ensure that we carry readahead
information forward from one RPC call to the next. In other words, it
was set up because each RPC call was forced to open a struct file, then
close it, causing the loss of readahead information that is normally
cached in that struct file, and used to keep the page cache filled when
a user calls read() multiple times on the same file descriptor.

Now that we cache the struct file, and reuse it for all the I/O calls
to a given file by a given user, we no longer have to keep a separate
readahead cache.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 65294c1f 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

nfsd: add a new struct file caching facility to nfsd

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.

Cache entries have a hardcoded 1s timeout, and we have a recurring
workqueue job that walks the cache and purges any entries that have
expired.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Richard Sharpe <richard.sharpe@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 40373b12 08-Apr-2019 Trond Myklebust <trondmy@gmail.com>

lockd: Pass the user cred from knfsd when starting the lockd server

When starting up a new knfsd server, pass the user cred to the
supporting lockd server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4df493a2 08-Apr-2019 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Cache the process user cred in the RPC server listener

In order to be able to interpret uids and gids correctly in knfsd, we
should cache the user namespace of the process that created the RPC
server's listener. To do so, we refcount the credential of that process.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e333f3bb 09-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Allow containers to set supported nfs versions

Support use of the --nfs-version/--no-nfs-version arguments to rpc.nfsd
in containers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 029be5d0 09-Apr-2019 Trond Myklebust <trondmy@gmail.com>

nfsd: Add custom rpcbind callbacks for knfsd

Add custom rpcbind callbacks in preparation for the knfsd
per-container version feature.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 642ee6b2 09-Apr-2019 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Allow further customisation of RPC program registration

Add a callback to allow customisation of the rpcbind registration.
When clients have the ability to turn on and off version support,
we want to allow them to also prevent registration of those
versions with the rpc portmapper.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8e5b6773 09-Apr-2019 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Add a callback to initialise server requests

Add a callback to help initialise server requests before they are
processed. This will allow us to clean up the NFS server version
support, and to make it container safe.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2317dc55 10-Nov-2017 Vasily Averin <vvs@virtuozzo.com>

race of nfsd inetaddr notifiers vs nn->nfsd_serv change

nfsd_inet[6]addr_event uses nn->nfsd_serv without taking nfsd_mutex,
which can be changed during execution of notifiers and crash the host.

Moreover if notifiers were enabled in one net namespace they are enabled
in all other net namespaces, from creation until destruction.

This patch allows notifiers to access nn->nfsd_serv only after the
pointer is correctly initialized and delays cleanup until notifiers are
no longer in use.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 256a89fa 18-Oct-2017 Arnd Bergmann <arnd@arndb.de>

nfds: avoid gettimeofday for nfssvc_boot time

do_gettimeofday() is deprecated and we should generally use time64_t
based functions instead.

In case of nfsd, all three users of nfssvc_boot only use the initial
time as a unique token, and are not affected by it overflowing, so they
are not affected by the y2038 overflow.

This converts the structure to timespec64 anyway and adds comments
to all uses, to document that we have thought about it and avoid
having to look at it again.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 44d8660d 19-Sep-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: increase DRC cache limit

An NFSv4.1+ client negotiates the size of its duplicate reply cache size
in the initial CREATE_SESSION request. The server preallocates the
memory for the duplicate reply cache to ensure that we'll never fail to
record the response to a nonidempotent operation.

To prevent a few CREATE_SESSIONs from consuming all of memory we set an
upper limit based on nr_free_buffer_pages(). 1/2^10 has been too
limiting in practice; 1/2^7 is still less than one percent.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# afea5657 31-Jul-2017 Chuck Lever <chuck.lever@oracle.com>

sunrpc: Const-ify struct sv_serv_ops

Close an attack vector by moving the arrays of per-server methods to
read-only memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# aa8217d5 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_version instances as const

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# b9c744c1 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_procinfo instances as const

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# d16d1867 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# cc6acc20 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1c8a5409 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_func callbacks

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# e9679189 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_version instances as const

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 860bda29 12-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: mark all struct svc_procinfo instances as const

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 63f8de37 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_encode callbacks

Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 026fec7e 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_decode callbacks

Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# a6beb732 08-May-2017 Christoph Hellwig <hch@lst.de>

sunrpc: properly type pc_func callbacks

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# e6838a29 21-Apr-2017 J. Bruce Fields <bfields@redhat.com>

nfsd: check for oversized NFSv2/v3 arguments

A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 800a938f 09-Mar-2017 NeilBrown <neilb@suse.com>

NFSD: fix nfsd_reset_versions for NFSv4.

If you write "-2 -3 -4" to the "versions" file, it will
notice that no versions are enabled, and nfsd_reset_versions()
is called.
This enables all major versions, not no minor versions.
So we lose the invariant that NFSv4 is only advertised when
at least one minor is enabled.

Fix the code to explicitly enable minor versions for v4,
change it to use nfsd_vers() to test and set, and simplify
the code.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 928c6fb3 09-Mar-2017 NeilBrown <neilb@suse.com>

NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)

Current code will return 1 if the version is supported,
and -1 if it isn't.
This is confusing and inconsistent with the one place where this
is used.
So change to return 1 if it is supported, and zero if not.
i.e. an error is never returned.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3f07c014 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d3635ff0 22-Feb-2017 Trond Myklebust <trond.myklebust@primarydata.com>

nfsd: fix configuration of supported minor versions

When the user turns off all minor versions of NFSv4, that should be
equivalent to turning off NFSv4 support, so a mount attempt using NFSv4
should get RPC_PROG_MISMATCH, not NFSERR_MINOR_VERS_MISMATCH.

Allow the user to use either '4.0' or '4' to enable or disable minor
version 0. Other minor versions are still enabled or disabled using the
'4.x' format.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7b19824d 05-Jan-2017 Scott Mayhew <smayhew@redhat.com>

nfsd: initialize sin6_scope_id in nfsd_inet6addr_event()

I noticed this was missing when I was testing with link local addresses.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 47057abd 12-Jan-2016 Andreas Gruenbacher <agruenba@redhat.com>

nfsd: add support for the umask attribute

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory. That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more
details.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1eca45f8 21-Sep-2016 Vasily Averin <vvs@virtuozzo.com>

NFSD: fix corruption in notifier registration

By design notifier can be registered once only, however nfsd registers
the same inetaddr notifiers per net-namespace. When this happen it
corrupts list of notifiers, as result some notifiers can be not called
on proper event, traverse on list can be cycled forever, and second
unregister can access already freed memory.

Cc: stable@vger.kernel.org
fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 691412b4 03-Jan-2016 Kinglong Mee <kinglongmee@gmail.com>

nfsd: Fix nfsd leaks sunrpc module references

Stefan Hajnoczi reports,
nfsd leaks 3 references to the sunrpc module here:

# echo -n "asdf 1234" >/proc/fs/nfsd/portlist
bash: echo: write error: Protocol not supported

Now stop nfsd and try unloading the kernel modules:

# systemctl stop nfs-server
# systemctl stop nfs
# systemctl stop proc-fs-nfsd.mount
# systemctl stop var-lib-nfs-rpc_pipefs.mount
# rmmod nfsd
# rmmod nfs_acl
# rmmod lockd
# rmmod auth_rpcgss
# rmmod sunrpc
rmmod: ERROR: Module sunrpc is in use
# lsmod | grep rpc
sunrpc 315392 3

It is caused by nfsd don't cleanup rpcb program for nfsd
when destroying svc service after creating xprt fail.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 36684996 11-Dec-2015 Scott Mayhew <smayhew@redhat.com>

nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain

Register callbacks on inetaddr_chain and inet6addr_chain to trigger
cleanup of nfsd transport sockets when an ip address is deleted.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 598e2359 08-Jun-2015 Jeff Layton <jlayton@kernel.org>

nfsd/sunrpc: abstract out svc_set_num_threads to sv_ops

Add an operation that will do setup of the service. In the case of a
classic thread-based service that means starting up threads. In the case
of a workqueue-based service, the setup will do something different.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Shirley Ma <shirliey.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b9e13cdf 08-Jun-2015 Jeff Layton <jlayton@kernel.org>

nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation

For now, all services use svc_xprt_do_enqueue, but once we add
workqueue-based service support, we'll need to do something different.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 758f62ff 08-Jun-2015 Jeff Layton <jlayton@kernel.org>

nfsd/sunrpc: move sv_module parm into sv_ops

...not technically an operation, but it's more convenient and cleaner
to pass the module pointer in this struct.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c369014f 08-Jun-2015 Jeff Layton <jlayton@kernel.org>

nfsd/sunrpc: move sv_function into sv_ops

Since we now have a container for holding svc_serv operations, move the
sv_function into it as well.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ea126e74 08-Jun-2015 Jeff Layton <jlayton@kernel.org>

nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into it

In later patches we'll need to abstract out more operations on a
per-service level, besides sv_shutdown and sv_function.

Declare a new svc_serv_ops struct to hold these operations, and move
sv_shutdown into this struct.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Tested-by: Shirley Ma <shirley.ma@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c23ae601 12-Jan-2015 J. Bruce Fields <bfields@redhat.com>

nfsd: default NFSv4.2 to on

The code seems to work. The protocol looks stable. The kernel's
version defaults can be overridden by rpc.nfsd arguments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 78b65eb3 19-Nov-2014 Jeff Layton <jlayton@kernel.org>

sunrpc: move rq_dropme flag into rq_flags

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d9499a95 30-Jul-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Decrease nfsd_users in nfsd_startup_generic fail

A memory allocation failure could cause nfsd_startup_generic to fail, in
which case nfsd_users wouldn't be incorrectly left elevated.

After nfsd restarts nfsd_startup_generic will then succeed without doing
anything--the first consequence is likely nfs4_start_net finding a bad
laundry_wq and crashing.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 4539f14981ce "nfsd: replace boolean nfsd_up flag by users counter"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5b8db00b 02-Jul-2014 Jeff Layton <jlayton@kernel.org>

nfsd: add a new /proc/fs/nfsd/max_connections file

Currently, the maximum number of connections that nfsd will allow
is based on the number of threads spawned. While this is fine for a
default, there really isn't a clear relationship between the two.

The number of threads corresponds to the number of concurrent requests
that we want to allow the server to process at any given time. The
connection limit corresponds to the maximum number of clients that we
want to allow the server to handle. These are two entirely different
quantities.

Break the dependency on increasing threads in order to allow for more
connections, by adding a new per-net parameter that can be set to a
non-zero value. The default is still to base it on the number of threads,
so there should be no behavior change for anyone who doesn't use it.

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3c7aa15d 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Using min/max/min_t/max_t for calculate

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8658452e 11-May-2014 NeilBrown <neilb@suse.de>

nfsd: Only set PF_LESS_THROTTLE when really needed.

PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
and live-locks while writing to the page cache in a loop-back
NFS mount situation.

It therefore makes sense to *only* set PF_LESS_THROTTLE in this
situation.
We now know when a request came from the local-host so it could be a
loop-back mount. We already know when we are handling write requests,
and when we are doing anything else.

So combine those two to allow nfsd to still be throttled (like any
other process) in every situation except when it is known to be
problematic.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ff88825f 05-Jan-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD: fix compile warning without CONFIG_NFSD_V3

Without CONFIG_NFSD_V3, compile will get warning as,

fs/nfsd/nfssvc.c: In function 'nfsd_svc':
>> fs/nfsd/nfssvc.c:246:60: warning: array subscript is above array bounds [-Warray-bounds]
return (nfsd_versions[2] != NULL) || (nfsd_versions[3] != NULL);
^

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8ef66714 30-Dec-2013 Kinglong Mee <kinglongmee@gmail.com>

NFSD: Don't start lockd when only NFSv4 is running

When starting without nfsv2 and nfsv3, nfsd does not need to start
lockd (and certainly doesn't need to fail because lockd failed to
register with the portmapper).

Reported-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 35f7a14f 08-Jul-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix minorversion support interface

You can turn on or off support for minorversions using e.g.

echo "-4.2" >/proc/fs/nfsd/versions

However, the current implementation is a little wonky. For example, the
above will turn off 4.2 support, but it will also turn *on* 4.1 support.

This didn't matter as long as we only had 2 minorversions, which was
true till very recently.

And do a little cleanup here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d1091481 08-Jul-2013 J. Bruce Fields <bfields@redhat.com>

nfsd4: support minorversion 1 by default

We now have minimal minorversion 1 support; turn it on by default.

This can still be turned off with "echo -4.1 >/proc/fs/nfsd/versions".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 697ce9be 22-Feb-2013 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used

The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 11f77942 01-Feb-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: containerize NFSd filesystem

This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".

Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.
This, obviously, means, that in case of kill of a container's "init" (which is not a mount
namespace, but network namespace creator) netowrk namespace won't be
destroyed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 09662d58 28-Jan-2013 Jeff Layton <jlayton@kernel.org>

nfsd: get rid of RC_INTR

The reply cache code never returns this status.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 88c47666 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass proper net to nfsd_destroy() from NFSd kthreads

Since NFSd service is per-net now, we have to pass proper network
context in nfsd_shutdown() from NFSd kthreads.

The simplest way I found is to get proper net from one of transports
with permanent sockets.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 541e864f 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: simplify service shutdown

Function nfsd_shutdown is called from two places: nfsd_last_thread (when last
kernel thread is exiting) and nfsd_svc (in case of kthreads starting error).
When calling from nfsd_svc(), we can be sure that per-net resources are
allocated, so we don't need to check per-net nfsd_net_up boolean flag.
This allows us to remove nfsd_shutdown function at all and move check for
per-net nfsd_net_up boolean flag to nfsd_last_thread.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4539f149 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: replace boolean nfsd_up flag by users counter

Since we have generic NFSd resurces, we have to introduce some way how to
allocate and destroy those resources on first per-net NFSd start and on
last per-net NFSd stop respectively.
This patch replaces global boolean nfsd_up flag (which is unused now) by users
counter and use it to determine either we need to allocate generic resources
or destroy them.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 903d9bf0 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: simplify NFSv4 state init and shutdown

This patch moves nfsd_startup_generic() and nfsd_shutdown_generic()
calls to nfsd_startup_net() and nfsd_shutdown_net() respectively, which
allows us to call nfsd_startup_net() instead of nfsd_startup() and makes
the code look clearer. It also modifies nfsd_svc() and nfsd_shutdown()
to check nn->nfsd_net_up instead of global nfsd_up. The latter is now
used only for generic resources shutdown and is currently useless. It
will replaced by NFSd users counter later in this series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bda9cac1 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: introduce helpers for generic resources init and shutdown

NFSd have per-net resources and resources, used globally.
Let's move generic resources init and shutdown to separated functions since
they are going to be allocated on first NFSd service start and destroyed after
last NFSd service shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9dd9845f 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make NFSd service structure allocated per net

This patch makes main step in NFSd containerisation.

There could be different approaches to how to make NFSd able to handle
incoming RPC request from different network namespaces. The two main
options are:

1) Share NFSd kthreads betwween all network namespaces.
2) Create separated pool of threads for each namespace.

While first approach looks more flexible, second one is simpler and
non-racy. This patch implements the second option.

To make it possible to allocate separate pools of threads, we have to
make it possible to allocate separate NFSd service structures per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b9c0ef85 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make NFSd service boot time per-net

This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c2fe290 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: per-net NFSd up flag introduced

This patch introduces introduces per-net "nfsd_net_up" boolean flag, which has
the same purpose as general "nfsd_up" flag - skip init or shutdown of per-net
resources in case of they are inited on shutted down respectively.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6ff50b3d 06-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: move per-net startup code to separated function

NFSd resources are partially per-net and partially globally used.
This patch splits resources init and shutdown and moves per-net code to
separated functions.
Generic and per-net init and shutdown are called sequentially for a while.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3938a0d5 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass net to nfsd_set_nrthreads()

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d41a9417 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass net to nfsd_svc()

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6777436b 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass net to nfsd_create_serv()

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# db42d1a7 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass net to nfsd_startup() and nfsd_shutdown()

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# db6e182c 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: pass net to nfsd_init_socks()

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f7fb86c6 09-Dec-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: use "init_net" for portmapper

There could be a situation, when NFSd was started in one network namespace, but
stopped in another one.
This will trigger kernel panic, because RPCBIND client is stored on per-net
NFSd data, and will be NULL on NFSd shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f252bc68 26-Nov-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: call state init and shutdown twice

Split NFSv4 state init and shutdown into two different calls: per-net one and
generic one.
Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
- only once on NSFd kthreads start and shutdown respectively.

Refresh of diff-nfsd-call-state-init-twice

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57d276d7 16-Nov-2012 J. Bruce Fields <bfields@redhat.com>

nfsd: fix v4 reply caching

Very embarassing: 1091006c5eb15cba56785bd5b498a8d0b9546903 "nfsd: turn
on reply cache for NFSv4" missed a line, effectively leaving the reply
cache off in the v4 case. I thought I'd tested that, but I guess not.

This time, wrote a pynfs test to confirm it works.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5b444cc9 17-Aug-2012 J. Bruce Fields <bfields@redhat.com>

svcrpc: remove handling of unknown errors from svc_recv

svc_recv() returns only -EINTR or -EAGAIN. If we really want to worry
about the case where it has a bug that causes it to return something
else, we could stick a WARN() in svc_recv. But it's silly to require
every caller to have all this boilerplate to handle that case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 38af2cab 14-Aug-2012 J. Bruce Fields <bfields@redhat.com>

nfsd: remove redundant "port" argument

"port" in all these functions is always NFS_PORT.

nfsd can already be run on a nonstandard port using the "nfsd/portlist"
interface.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 57c8b13e 03-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

NFSd: set nfsd_serv to NULL after service destruction

In nfsd_destroy():

if (destroy)
svc_shutdown_net(nfsd_serv, net);
svc_destroy(nfsd_server);

svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets
nfsd_serv to NULL, causing a NULL dereference on the following line.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 19f7e2ca 03-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

NFSd: introduce nfsd_destroy() helper

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 786185b5 03-May-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: move per-net operations from svc_destroy()

The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9793f7c8 02-May-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: new svc_bind() routine introduced

This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b3853e0e 11-Apr-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

nfsd: make export cache allocated per network namespace context

This patch also changes prototypes of nfsd_export_flush() and exp_rootfh():
network namespace parameter added.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e3f70ead 29-Mar-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

Lockd: pass network namespace to creation and destruction routines

v2: dereference of most probably already released nlm_host removed in
nlmclnt_done() and reclaimer().

These routines are called from locks reclaimer() kernel thread. This thread
works in "init_net" network context and currently relays on persence on lockd
thread and it's per-net resources. Thus lockd_up() and lockd_down() can't relay
on current network context. So let's pass corrent one into them.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 508f9227 30-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd: fix default iosize calculation on 32bit

The rpc buffers will be allocated out of low memory, so we should really
only be taking that into account.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 87b0fc7d 30-Jan-2012 J. Bruce Fields <bfields@redhat.com>

nfsd: cleanup setting of default max_block_size

Move calculation of the default into a helper function.

Get rid of an unused variable "err" while we're there.

Thanks to Mi Jinlong for catching an arithmetic error in a previous
version.

Cc: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5ecebb7c 13-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: unregister service on creation in current network namespace

On service shutdown we can be sure, that no more users of it left except
current. Thus it looks like using current network namespace context is safe in
this case.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 143cb494 01-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

fs: add module.h to files that were implicitly using it

Some files were using the complete module.h infrastructure without
actually including the header at all. Fix them up in advance so
once the implicit presence is removed, we won't get failures like this:

CC [M] fs/nfsd/nfssvc.o
fs/nfsd/nfssvc.c: In function 'nfsd_create_serv':
fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function)
fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once
fs/nfsd/nfssvc.c:335: error: for each function it appears in.)
fs/nfsd/nfssvc.c: In function 'nfsd':
fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit'
make[3]: *** [fs/nfsd/nfssvc.o] Error 1

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 16d05870 25-Oct-2011 Stanislav Kinsbursky <skinsbursky@parallels.com>

NFSd: call svc rpcbind cleanup explicitly

We have to call svc_rpcb_cleanup() explicitly from nfsd_last_thread() since
this function is registered as service shutdown callback and thus nobody else
will done it for us.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 1091006c 23-Jan-2011 J. Bruce Fields <bfields@redhat.com>

nfsd: turn on reply cache for NFSv4

It's sort of ridiculous that we've never had a working reply cache for
NFSv4.

On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4). What we really need here is some serious testing.

Anyway, here's a start.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 49b28684 20-Jun-2011 NeilBrown <neilb@suse.de>

nfsd: Remove deprecated nfsctl system call and related code.

As promised in feature-removal-schedule.txt it is time to
remove the nfsctl system call.

Userspace has perferred to not use this call throughout 2.6 and it has been
excluded in the default configuration since 2.6.36 (9 months ago).

So this patch removes all the code that was being compiled out.

There are still references to sys_nfsctl in various arch systemcall tables
and related code. These should be cleaned out too, probably in the next
merge window.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9e701c61 02-Jan-2011 J. Bruce Fields <bfields@redhat.com>

svcrpc: simpler request dropping

Currently we use -EAGAIN returns to determine when to drop a deferred
request. On its own, that is error-prone, as it makes us treat -EAGAIN
returns from other functions specially to prevent inadvertent dropping.

So, use a flag on the request instead.

Returning an error on request deferral is still required, to prevent
further processing, but we no longer need worry that an error return on
its own could result in a drop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fc5d00b0 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Add net argument to svc_create_xprt

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e844a7b9 06-Aug-2010 J. Bruce Fields <bfields@redhat.com>

nfsd: initialize nfsd versions before creating svc

Commit 59db4a0c102e0de226a3395dbf25ea51bf845937 "nfsd: move more into
nfsd_startup()" inadvertently moved nfsd_versions after
nfsd_create_svc(). On older distributions using an rpc.nfsd that does
not explicitly set the list of nfsd versions, this results in
svc-create_pooled() being called with an empty versions array. The
resulting incomplete initialization leads to a NULL dereference in
svc_process_common() the first time a client accesses the server.

Move nfsd_reset_versions() back before the svc_create_pooled(); this
time, put it closer to the svc_create_pooled() call, to make this
mistake more difficult in the future.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 774f8bbd 02-Aug-2010 J. Bruce Fields <bfields@redhat.com>

nfsd: fix startup/shutdown order bug

We must create the server before we can call init_socks or check the
number of threads.

Symptoms were a NULL pointer dereference in nfsd_svc(). Problem
identified by Jeff Layton.

Also fix a minor cleanup-on-error case in nfsd_startup().

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# af4718f3 21-Jul-2010 J. Bruce Fields <bfields@redhat.com>

nfsd: minor nfsd_svc() cleanup

More idiomatic to put the error case in the if clause.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 59db4a0c 21-Jul-2010 J. Bruce Fields <bfields@redhat.com>

nfsd: move more into nfsd_startup()

This is just cleanup--it's harmless to call nfsd_rachache_init,
nfsd_init_socks, and nfsd_reset_versions more than once. But there's no
point to it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ac77efbe 20-Jul-2010 Jeff Layton <jlayton@kernel.org>

nfsd: just keep single lockd reference for nfsd

Right now, nfsd keeps a lockd reference for each socket that it has
open. This is unnecessary and complicates the error handling on
startup and shutdown. Change it to just do a lockd_up when starting
the first nfsd thread just do a single lockd_down when taking down the
last nfsd thread. Because of the strange way the sv_count is handled
this requires an extra flag to tell whether the nfsd_serv holds a
reference for lockd or not.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 628b3687 21-Jul-2010 Jeff Layton <jlayton@kernel.org>

nfsd: clean up nfsd_create_serv error handling

There doesn't seem to be any need to reset the nfssvc_boot time if the
nfsd startup failed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4ad9a344 19-Jul-2010 Jeff Layton <jlayton@kernel.org>

nfsd4: fix v4 state shutdown error paths

If someone tries to shut down the laundry_wq while it isn't up it'll
cause an oops.

This can happen because write_ports can create a nfsd_svc before we
really start the nfs server, and we may fail before the server is ever
started.

Also make sure state is shutdown on error paths in nfsd_svc().

Use a common global nfsd_up flag instead of nfs4_init, and create common
helper functions for nfsd start/shutdown, as there will be other work
that we want done only when we the number of nfsd threads transitions
between zero and nonzero.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 15ddb4ae 14-May-2010 Pavel Emelyanov <xemul@openvz.org>

NFSD: don't report compiled-out versions as present

The /proc/fs/nfsd/versions file calls nfsd_vers() to check whether
the particular nfsd version is present/available. The problem is
that once I turn off e.g. NFSD-V4 this call returns -1 which is
true from the callers POV which is wrong.

The proposal is to report false in that case.

The bug has existed since 6658d3a7bbfd1768 "[PATCH] knfsd: remove
nfsd_versbits as intermediate storage for desired versions".

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: stable@kernel.org
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7663dacd 04-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: remove pointless paths in file headers

The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a74af21 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Move private headers to source directory

Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 341eb184 03-Dec-2009 Boaz Harrosh <bharrosh@panasas.com>

nfsd: Source files #include cleanups

Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.

This patch should improve the compilation speed of the nfsd module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0a3adade 04-Nov-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: make fs/nfsd/vfs.h for common includes

None of this stuff is used outside nfsd, so move it out of the common
linux include directory.

Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 29ab23cc 15-Sep-2009 J. Bruce Fields <bfields@citi.umich.edu>

nfsd4: allow nfs4 state startup to fail

The failure here is pretty unlikely, but we should handle it anyway.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e0e81739 02-Sep-2009 David Howells <dhowells@redhat.com>

CRED: Add some configurable debugging [try #6]

Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
for credential management. The additional code keeps track of the number of
pointers from task_structs to any given cred struct, and checks to see that
this number never exceeds the usage count of the cred struct (which includes
all references, not just those from task_structs).

Furthermore, if SELinux is enabled, the code also checks that the security
pointer in the cred struct is never seen to be invalid.

This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
kernel thread on seeing cred->security be a NULL pointer (it appears that the
credential struct has been previously released):

http://www.kerneloops.org/oops.php?number=252883

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# 557ce264 28-Aug-2009 Andy Adamson <andros@netapp.com>

nfsd41: replace page based DRC with buffer based DRC

Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.

Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.

Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.

Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.

The iov_len calculation in nfs4svc_encode_compoundres is now always
correct. Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.

The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().

Move useful nfsd4_cache_entry fields into nfsd4_slot.

Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ed2d8aed 15-Aug-2009 Ryusei Yamaguchi <mandel59@gmail.com>

knfsd: Replace lock_kernel with a mutex in nfsd pool stats.

lock_kernel() in knfsd was replaced with a mutex. The later
commit 03cf6c9f49a8fea953d38648d016e3f46e814991 ("knfsd:
add file to export stats about nfsd pools") did not follow
that change. This patch fixes the issue.

Also move the get and put of nfsd_serv to the open and close methods
(instead of start and stop methods) to allow atomic check and increment
of reference count in the open method (where we can still return an
error).

Signed-off-by: Ryusei Yamaguchi <mandel59@gmail.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Greg Banks <gnb@fmeh.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 0c193054 27-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: hange from page to memory based drc limits

NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.

For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6a14dd1a 27-Jul-2009 Andy Adamson <andros@netapp.com>

nfsd41: reserve less memory for DRC

Also remove a slightly misleading comment.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 4bd9b0f4 24-Jun-2009 Andy Adamson <andros@netapp.com>

nfsd41: use globals for DRC limits

The version 4.1 DRC memory limit and tracking variables are server wide and
session specific. Replace struct svc_serv fields with globals.
Stop using the svc_serv sv_lock.

Add a spinlock to serialize access to the DRC limit management variables which
change on session creation and deletion (usage counter) or (future)
administrative action to adjust the total DRC memory limit.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>


# 405f5571 11-Jul-2009 Alexey Dobriyan <adobriyan@gmail.com>

headers: smp_lock.h redux

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 671e1fcf 15-Jun-2009 NeilBrown <neilb@suse.de>

nfsd: optimise the starting of zero threads when none are running.

Currently, if we ask to set then number of nfsd threads to zero when
there are none running, we set up all the sockets and register the
service, and then tear it all down again.
This is pointless.

So detect that case and exit promptly.
(also remove an assignment to 'error' which was never used.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>


# 82e12fe9 15-Jun-2009 NeilBrown <neilb@suse.de>

nfsd: don't take nfsd_mutex twice when setting number of threads.

Currently when we write a number to 'threads' in nfsdfs,
we take the nfsd_mutex, update the number of threads, then take the
mutex again to read the number of threads.

Mostly this isn't a big deal. However if we are write '0', and
portmap happens to be dead, then we can get unpredictable behaviour.
If the nfsd threads all got killed quickly and the last thread is
waiting for portmap to respond, then the second time we take the mutex
we will block waiting for the last thread.
However if the nfsd threads didn't die quite that fast, then there
will be no contention when we try to take the mutex again.

Unpredictability isn't fun, and waiting for the last thread to exit is
pointless, so avoid taking the lock twice.
To achieve this, get nfsd_svc return a non-negative number of active
threads when not returning a negative error.

Signed-off-by: NeilBrown <neilb@suse.de>


# f0ad670d 05-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc

Fixes the following compiler error:
fs/nfsd/nfssvc.c: In function 'set_max_drc':
fs/nfsd/nfssvc.c:240: error: 'NFSD_DRC_SIZE_SHIFT' undeclared

CONFIG_NFSD_V4 is not set

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8daf220a 02-Apr-2009 Benny Halevy <bhalevy@panasas.com>

nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions

Support enabling and disabling nfsv4.1 via /proc/fs/nfsd/versions
by writing the strings "+4.1" or "-4.1" correspondingly.

Use user mode nfs-utils (rpc.nfsd option) to enable.
This will allow us to get rid of CONFIG_NFSD_V4_1

[nfsd41: disable support for minorversion by default]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c3d06f9c 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: hard page limit for DRC

Use no more than 1/128th of the number of free pages at nfsd startup for the
v4.1 DRC.

This is an arbitrary default which should probably end up under the control
of an administrator.

Signed-off-by: Andy Adamson <andros@netapp.com>
[moved added fields in struct svc_serv under CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix set_max_drc calculation of sv_drc_max_pages]
[moved NFSD_DRC_SIZE_SHIFT's declaration up in header file]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 074fe897 02-Apr-2009 Andy Adamson <andros@netapp.com>

nfsd41: DRC save, restore, and clear functions

Cache all the result pages, including the rpc header in rq_respages[0],
for a request in the slot table cache entry.

Cache the statp pointer from nfsd_dispatch which points into rq_respages[0]
just past the rpc header. When setting a cache entry, calculate and save the
length of the nfs data minus the rpc header for rq_respages[0].

When replaying a cache entry, replace the cached rpc header with the
replayed request rpc result header, unless there is not enough room in the
cached results first page. In that case, use the cached rpc header.

The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT
* PAGE_SIZE. For compounds we are cacheing with operations such as READDIR
that use the xdr_buf->pages to hold data, we choose to cache the extra page of
data rather than copying data from xdr_buf->pages into the xdr_buf->head page.

[nfsd41: limit cache to maxresponsesize_cached]
[nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1]
[nfsd41: rename nfsd4_move_pages]
[nfsd41: rename page_no variable]
[nfsd41: rename nfsd4_set_cache_entry]
[nfsd41: fix nfsd41_copy_replay_data comment]
[nfsd41: add to nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3e93cd67 29-Mar-2009 Al Viro <viro@zeniv.linux.org.uk>

Take fs_struct handling to new file (fs/fs_struct.c)

Pure code move; two new helper functions for nfsd and daemonize
(unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
the same code as used to be in callers). unshare_fs_struct()
exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
copy_fs_struct() and exit_fs() don't need exports anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 49a9072f 18-Mar-2009 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()

Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 9652ada3 18-Mar-2009 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Change svc_create_xprt() to take a @family argument

The sv_family field is going away. Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.

Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 03cf6c9f 13-Jan-2009 Greg Banks <gnb@sgi.com>

knfsd: add file to export stats about nfsd pools

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Harshula Jayasuriya <harshula@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 8bbfa9f3 13-Jan-2009 Greg Banks <gnb@sgi.com>

knfsd: remove the nfsd thread busy histogram

Stop gathering the data that feeds the 'th' line in /proc/net/rpc/nfsd
because the questionable data provided is not worth the scalability
impact of calculating it. Instead, always report zeroes. The current
approach suffers from three major issues:

1. update_thread_usage() increments buckets by call service
time or call arrival time...in jiffies. On lightly loaded
machines, call service times are usually < 1 jiffy; on
heavily loaded machines call arrival times will be << 1 jiffy.
So a large portion of the updates to the buckets are rounded
down to zero, and the histogram is undercounting.

2. As seen previously on the nfs mailing list, the format in which
the histogram is presented is cryptic, difficult to explain,
and difficult to use.

3. Updating the histogram requires taking a global spinlock and
dirtying the global variables nfsd_last_call, nfsd_busy, and
nfsdstats *twice* on every RPC call, which is a significant
scaling limitation.

Testing on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing
1K streaming reads at full line rate, shows the stats update code
(inlined into nfsd()) takes about 1.7% of each CPU. This patch drops
the contribution from nfsd() into the profile noise.

This patch is a forward-ported version of knfsd-remove-nfsd-threadstats
which has been shipping in the SGI "Enhanced NFS" product since 2006.
In that time, exactly one customer has noticed that the threadstats
were missing. It has been previously posted:

http://article.gmane.org/gmane.linux.nfs/10376

and more recently requested to be posted again.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 1cd9cd16 22-Oct-2008 Chuck Lever <chuck.lever@oracle.com>

NFSD: Fix BUG during NFSD shutdown processing

The Linux NFS server can be started via a user-space write to
/proc/fs/nfs/threads or to /proc/fs/nfs/portlist. In the first case,
all default listeners are started (both UDP and TCP). In the second,
a listener is started only for one specified transport.

The NFS server has to make sure lockd stays up until the last listener
transport goes away. To support both start-up interfaces, it should
do one lockd_up() for each NFSD listener.

The nfsd_init_socks() function used to do one lockd_up() call for each
svc_create_xprt(). Recently commit
26a414092353590ceaa5955bcb53f863d6ea7549 mistakenly changed
nfsd_init_socks() to do only one lockd_up() call even though it still
does two svc_create_xprt() calls.

The end result is a lockd_down() BUG during NFSD shutdown processing
because nfsd_last_threads() does a lockd_down() call for each entry
on the sv_permsocks list, but the start-up code doesn't do a matching
number of lockd_up() calls.

Add a second lockd_up() in nfsd_init_socks() to make sure the number
of lockd_up() calls matches the number of entries on the NFS servers's
sv_permsocks list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 26a41409 03-Oct-2008 Chuck Lever <chuck.lever@oracle.com>

NLM: Remove "proto" argument from lockd_up()

Clean up: Now that lockd_up() starts listeners for both transports, the
"proto" argument is no longer needed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e851db5b 30-Jun-2008 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add address family field to svc_serv data structure

Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 100766f8 30-Jun-2008 Jeff Layton <jlayton@kernel.org>

nfsd: treat all shutdown signals as equivalent

knfsd currently uses 2 signal masks when processing requests. A "loose"
mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and
then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it
allows when doing the actual operation on the local storage.

This is apparently unnecessarily complicated. The underlying filesystem
should be able to sanely handle a signal in the middle of an operation.
This patch removes the signal mask handling from knfsd altogether. When
knfsd is started as a kthread, all signals are ignored. It then allows
all of the signals in SHUTDOWN_SIGS. There's no need to set the mask
as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c7d106c9 11-Jun-2008 Neil Brown <neilb@suse.de>

nfsd: fix race in nfsd_nrthreads()

We need the nfsd_mutex before accessing nfsd_serv->sv_nrthreads or we
can't even guarantee nfsd_serv will still be there.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a75c5d01 10-Jun-2008 Jeff Layton <jlayton@kernel.org>

sunrpc: remove sv_kill_signal field from svc_serv struct

Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9867d76c 10-Jun-2008 Jeff Layton <jlayton@kernel.org>

knfsd: convert knfsd to kthread API

This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e096bbc6 10-Jun-2008 Jeff Layton <jlayton@kernel.org>

knfsd: remove special handling for SIGHUP

The special handling for SIGHUP in knfsd is a holdover from much
earlier versions of Linux where reloading the export table was
more expensive. That facility is not really needed anymore and
to my knowledge, is seldom-used.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3dd98a3b 10-Jun-2008 Jeff Layton <jlayton@kernel.org>

knfsd: clean up nfsd filesystem interfaces

Several of the nfsd filesystem interfaces allow changes to parameters
that don't have any effect on a running nfsd service. They are only ever
checked when nfsd is started. This patch fixes it so that changes to
those procfiles return -EBUSY if nfsd is already running to make it
clear that changes on the fly don't work.

The patch should also close some relatively harmless races between
changing the info in those interfaces and starting nfsd, since these
variables are being moved under the protection of the nfsd_mutex.

Finally, the nfsv4recoverydir file always returns -EINVAL if read. This
patch fixes it to return the recoverydir path as expected.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# bedbdd8b 10-Jun-2008 Neil Brown <neilb@suse.de>

knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.

Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5ea0dd61 11-Feb-2008 Chuck Lever <chuck.lever@oracle.com>

NFSD: Remove NFSD_TCP kernel build option

Likewise, distros usually leave CONFIG_NFSD_TCP enabled.

TCP support in the Linux NFS server is stable enough that we can leave it
on always. CONFIG_NFSD_TCP adds about 10 lines of code, and defaults to
"Y" anyway.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 7a182083 30-Dec-2007 Tom Tucker <tom@opengridcomputing.com>

svc: Make close transport independent

Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Thanks to Simon Holm Thøgersen for a compile fix.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>


# d7c9f1ed 30-Dec-2007 Tom Tucker <tom@opengridcomputing.com>

svc: Change services to use new svc_create_xprt service

Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e8ff2a84 01-Aug-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: move nfsv4 slab creation/destruction to module init/exit

We have some slabs that the nfs4 server uses to store state objects.
We're currently creating and destroying those slabs whenever the server
is brought up or down. That seems excessive; may as well just do that
in module initialization and exit.

Also add some minor header cleanup. (Thanks to Andrew Morton for that
and a compile fix.)

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 3b398f0ef 24-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: delete code made redundant by map_new_errors

I moved this check into map_new_errors, but forgot to delete the
original. Oops.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 45457e09 22-Jun-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: tone down inaccurate dprintk

The nfserr_dropit happens routinely on upcalls (so a kmalloc failure is
almost never the actual cause), but I occasionally get a complant from
some tester that's worried because they ran across this message after
turning on debugging to research some unrelated problem.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>


# 32c1eb0c 17-Jul-2007 Andy Adamson <andros@citi.umich.edu>

knfsd: nfsd4: return nfserr_wrongsec

Make the first actual use of the secinfo information by using it to return
nfserr_wrongsec when an export is found that doesn't allow the flavor used on
this request.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 83144186 17-Jul-2007 Rafael J. Wysocki <rjw@rjwysocki.net>

Freezer: make kernel threads nonfreezable by default

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e8edc6e0 20-May-2007 Alexey Dobriyan <adobriyan@gmail.com>

Detach sched.h from mm.h

First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).

Cross-compile tested on

all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 482fb94e 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com>

[PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper

Sometimes we need to create an RPC service but not register it with the local
portmapper. NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a8eff6d 26-Jan-2007 NeilBrown <neilb@suse.de>

[PATCH] knfsd: fix setting of ACL server versions

Due to silly typos, if the nfs versions are explicitly set, no NFSACL versions
get enabled.

Also improve an error message that would have made this bug a little easier to
find.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7afef1f 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] nfsd: misc endianness annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# ad451d38 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] xdr annotations: nfsd_dispatch()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c6b0a9f8 06-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc

There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.

In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.

So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.

max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.

Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 44c55600 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: fix auto-sizing of nfsd request/reply buffers

totalram is measured in pages, not bytes, so PAGE_SHIFT must be used when
trying to find 1/4096 of RAM.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 596bbe53 04-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Allow max size of NFSd payload to be configured

The max possible is the maximum RPC payload. The default depends on amount of
total memory.

The value can be set within reason as long as no nfsd threads are currently
running. The value can also be ready, allowing the default to be determined
after nfsd has started.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# eed2965a 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: allow admin to set nthreads per node

Add /proc/fs/nfsd/pool_threads which allows the sysadmin (or a userspace
daemon) to read and change the number of nfsd threads in each pool. The
format is a list of space-separated integers, one per pool.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# eec09661 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: use svc_set_num_threads to manage threads in knfsd

Replace the existing list of all nfsd threads with new code using
svc_create_pooled().

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9a24ab57 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: add svc_get

add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4a3ae42d 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Correctly handle error condition from lockd_up

If lockd_up fails - what should we expect? Do we have to later call
lockd_down?

Well the nfs client thinks "no", the nfs server thinks "yes". lockd thinks
"yes".

The only answer that really makes sense is "no" !!

So:
Make lockd_up only increment nlmsvc_users on success.
Make nfsd handle errors from lockd_up properly.
Make sure lockd_up(0) never fails when lockd is running
so that the 'reclaimer' call to lockd_up doesn't need to
be error checked.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6fb2b47f 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process

It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b41b66d6 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist'

Userspace should create and bind a socket (but not connectted) and write the
'fd' to portlist. This will cause the nfs server to listen on that socket.

To close a socket, the name of the socket - as read from 'portlist' can be
written to 'portlist' with a preceding '-'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 02a375f0 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: separate out some parts of nfsd_svc, which start nfs servers

Separate out the code for creating a new service, and for creating initial
sockets.

Some of these new functions will have multiple callers soon.

[akpm@osdl.org: cleanups]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6658d3a7 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions

We have an array 'nfsd_version' which lists the available versions of nfsd,
and 'nfsd_versions' (poor choice there :-() which lists the currently active
versions.

Then we have a bitmap - nfsd_versbits which says which versions are wanted.
The bits in this bitset cause content to be copied from nfsd_version to
nfsd_versions when nfsd starts.

This patch removes nfsd_versbits and moves information directly from
nfsd_version to nfsd_versions when requests for version changes arrive.

Note that this doesn't make it possible to change versions while the server is
running. This is because serv->sv_xdrsize is calculated when a service is
created, and used when threads are created, and xdrsize depends on the active
versions.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 24e36663 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: be more selective in which sockets lockd listens on

Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.

However as lockd performs services of the client as well, this is a problem.
If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
able to call back to lockd.

So:
- add an option to lockd_up saying which protocol is needed
- Always open sockets for which an explicit port was given, otherwise
only open a socket of the type required
- Change nfsd to do one lockd_up per socket rather than one per thread.

This
- removes the dependancy on CONFIG_NFSD_TCP
- means that lockd may open sockets other than at startup
- means that lockd will *not* listen on UDP if the only
mounts are TCP mount (and nfsd hasn't started).

The latter is the only one that concerns me at all - I don't know if this
might be a problem with some servers.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# bc591ccf 02-Oct-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: add a callback for when last rpc thread finishes

nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# e8c96f8c 24-Mar-2006 Tobias Klauser <tklauser@nuerscht.ch>

[PATCH] fs: Use ARRAY_SIZE macro

Use ARRAY_SIZE macro instead of sizeof(x)/sizeof(x[0]) and remove a
duplicate of ARRAY_SIZE. Some trailing whitespaces are also deleted.

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Cc: David Howells <dhowells@redhat.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3fb803a9 01-Feb-2006 Andreas Gruenbacher <agruen@suse.de>

[PATCH] knfsd: Restore recently broken ACL functionality to NFS server

A recent patch to
Allow run-time selection of NFS versions to export

meant that NO nfsacl service versions were exported. This patch restored
that functionality.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 80d188a6 07-Nov-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: make sure svc_process call the correct pg_authenticate for multi-service port

If an RPC socket is serving multiple programs, then the pg_authenticate of
the first program in the list is called, instead of pg_authenticate for the
program to be run.

This does not cause a problem with any programs in the current kernel, but
could confuse future code.

Also set pg_authenticate for nfsd_acl_program incase it ever gets used.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 70c3b76c 07-Nov-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: Allow run-time selection of NFS versions to export

Provide a file in the NFSD filesystem that allows setting and querying of
which version of NFS are being exported. Changes are only allowed while no
server is running.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c4f92dba 17-Aug-2005 Steven Rostedt <rostedt@goodmis.org>

[PATCH] nfsd to unlock kernel before exiting

The nfsd holds the big kernel lock upon exit, when it really shouldn't.
Not to mention that this breaks Ingo's RT patch. This is a trivial fix
to release the lock.

Ingo, this patch also works with your kernel, and stops the problem with
nfsd.

Note, there's a "goto out;" where "out:" is right above svc_exit_thread.
The point of the goto also holds the kernel_lock, so I don't see any
problem here in releasing it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 76a3550e 23-Jun-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] knfsd: nfsd4: rename nfs4_state_init

Somewhat gratuitous rename to simplify following patch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 21348425 22-Jun-2005 Andreas Gruenbacher <agruen@suse.de>

[PATCH] fix nfsacl pointer arithmetic and pg_class initialization bugs

* Pointer arithmetic bug: p is in word units. This fixes a memory
corruption with big acls.
* Initialize pg_class to prevent a NULL pointer access.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a257cdd0 22-Jun-2005 Andreas Gruenbacher <agruen@suse.de>

[PATCH] NFSD: Add server support for NFSv3 ACLs.

This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 9e416052 16-Apr-2005 NeilBrown <neilb@cse.unsw.edu.au>

[PATCH] nfsd: clear signals before exiting the nfsd() thread

Fixes the error "RPC: failed to contact portmap (errno -512)." when the server
later tries to unregister from the portmapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!