History log of /linux-master/include/linux/sunrpc/xprt.h
Revision Date Author Comments
# 3f7edeac 12-Dec-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Add a transport callback to handle dequeuing of an RPC request

Add a transport level callback to allow it to handle the consequences of
dequeuing the request that was in the process of being transmitted.
For something like a TCP connection, we may need to disconnect if the
request was partially transmitted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 57331a59 04-Jan-2024 Benjamin Coddington <bcodding@redhat.com>

NFSv4.1: Use the nfs_client's rpc timeouts for backchannel

For backchannel requests that lookup the appropriate nfs_client, use the
state-management rpc_clnt's rpc_timeout parameters for the backchannel's
response. When the nfs_client cannot be found, fall back to using the
xprt's default timeout parameters.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 15d39883 11-Sep-2023 NeilBrown <neilb@suse.de>

SUNRPC: change the back-channel queue to lwq

This removes the need to store and update back-links in the list.
It also remove the need for the _bh version of spin_lock().

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# d2ee4138 19-Aug-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Allow specification of TCP client connect timeout at setup

When we create a TCP transport, the connect timeout parameters are
currently fixed to be 90s. This is problematic in the pNFS flexfiles
case, where we may have multiple mirrors, and we would like to fail over
quickly to the next mirror if a data server is down.

This patch adds the ability to specify the connection parameters at RPC
client creation time.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 75eb6af7 07-Jun-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add a TCP-with-TLS RPC transport class

Use the new TLS handshake API to enable the SunRPC client code
to request a TLS handshake. This implements support for RFC 9289,
only on TCP sockets.

Upper layers such as NFS use RPC-with-TLS to protect in-transit
traffic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 50005319 07-Jun-2023 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Plumb an API for setting transport layer security

Add an initial set of policies along with fields for upper layers to
pass the requested policy down to the transport layer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 72691a26 27-Jul-2022 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Don't reuse bvec on retransmission of the request

If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.

Fixes: ff053dbbaffe ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 7ffcdaa6 25-Jul-2022 Olga Kornievskaia <olga.kornievskaia@gmail.com>

SUNRPC expose functions for offline remote xprt functionality

Re-arrange the code that make offline transport and delete transport
callable functions.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# eb07d5a4 29-Mar-2022 NeilBrown <neilb@suse.de>

SUNRPC: handle malloc failure in ->request_prepare

If ->request_prepare() detects an error, it sets ->rq_task->tk_status.
This is easy for callers to ignore.
The only caller is xprt_request_enqueue_receive() and it does ignore the
error, as does call_encode() which calls it. This can result in a
request being queued to receive a reply without an allocated receive buffer.

So instead of setting rq_task->tk_status, return an error, and store in
->tk_status only in call_encode();

The call to xprt_request_enqueue_receive() is now earlier in
call_encode(), where the error can still be handled.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 421ab1be 25-Mar-2022 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Do not dereference non-socket transports in sysfs

Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or
TCP transport. Otherwise the call to lock the mutex will scribble over
whatever structure is actually there. This has been seen to cause hard
system lockups when the underlying transport was RDMA.

Fixes: b49ea673e119 ("SUNRPC: lock against ->sock changing during sysfs read")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# b9a0d6d1 27-Jan-2022 Eric Dumazet <edumazet@google.com>

SUNRPC: add netns refcount tracker to struct rpc_xprt

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2dc3e5f 26-Jul-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Fix potential memory corruption

We really should not call rpc_wake_up_queued_task_set_status() with
xprt->snd_task as an argument unless we are certain that is actually an
rpc_task.

Fixes: 0445f92c5d53 ("SUNRPC: Fix disconnection races")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# a4ae3081 05-Aug-2021 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move client-side disconnect injection

Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

Convert the existing client-side disconnect injection infrastructure
to use the kernel's generic error injection facility. The generic
facility has a richer set of injection criteria.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 6f081693 23-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

sunrpc: remove an offlined xprt using sysfs

Once a transport has been put offline, this transport can be also
removed from the list of transports. Any tasks that have been stuck
on this transport would find the next available active transport
and be re-tried. This transport would be removed from the xprt_switch
list and freed.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 5b7eb784 23-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

SUNRPC: take a xprt offline using sysfs

Using sysfs's xprt_state attribute, mark a particular transport offline.
It will not be picked during the round-robin selection. It's not allowed
to take the main (1st created transport associated with the rpc_client)
offline. Also bring a transport back online via sysfs by writing "online"
and that would allow for this transport to be picked during the round-
robin selection.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 587bc725 08-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

sunrpc: add dst_attr attributes to the sysfs xprt directory

Allow to query and set the destination's address of a transport.
Setting of the destination address is allowed only for TCP or RDMA
based connections.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# e091853e 23-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

SUNRPC mark the first transport

When an RPC client gets created it's first transport is special
and should be marked a main transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# d408ebe0 08-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

sunrpc: add add sysfs directory per xprt under each xprt_switch

Add individual transport directories under each transport switch
group. For instance, for each nconnect=X connections there will be
a transport directory. Naming conventions also identifies transport
type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# d3abc739 08-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

sunrpc: keep track of the xprt_class in rpc_xprt structure

We need to keep track of the type for a given transport.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 572caba4 08-Jun-2021 Olga Kornievskaia <kolga@netapp.com>

sunrpc: add xprt id

This adds a unique identifier for a sunrpc transport in sysfs, which is
similarly managed to the unique IDs of clients.

Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# e86be3a0 25-May-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: More fixes for backlog congestion

Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d1f06 ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# d737e5d4 09-Feb-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Set TCP_CORK until the transmit queue is empty

When we have multiple RPC requests queued up, it makes sense to set the
TCP_CORK option while the transmit queue is non-empty.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# c87b056e 10-Nov-2020 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Remove unused function xprt_load_transport()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 1fc5f131 10-Nov-2020 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Add a helper to return the transport identifier given a netid

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# d5aa6b22 06-Nov-2020 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: xprt_load_transport() needs to support the netid "rdma6"

According to RFC5666, the correct netid for an IPv6 addressed RDMA
transport is "rdma6", which we've supported as a mount option since
Linux-4.7. The problem is when we try to load the module "xprtrdma6",
that will fail, since there is no modulealias of that name.

Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 7de62bc0 15-Jul-2020 Olga Kornievskaia <olga.kornievskaia@gmail.com>

SUNRPC dont update timeout value on connection reset

Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply.

Instead, this patch proposes to keep track off when the minor timeout
should happen and if it didn't, then don't update the new timeout.
Value is updated based on the previous value to make timeouts
predictable.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# d05a0201 12-Nov-2019 Christoph Hellwig <hch@lst.de>

sunrpc: remove __KERNEL__ ifdefs

Remove the __KERNEL__ ifdefs from the non-UAPI sunrpc headers,
as those can't be included from user space programs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e6237b6f 17-Oct-2019 Trond Myklebust <trond.myklebust@hammerspace.com>

NFSv4.1: Don't rebind to the same source port when reconnecting to the server

NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.

NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# cc204d01 10-Sep-2019 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Dequeue the request from the receive queue while we're re-encoding

Ensure that we dequeue the request from the transport receive queue
while we're re-encoding to prevent issues like use-after-free when
we release the bvec.

Fixes: 7536908982047 ("SUNRPC: Ensure the bvecs are reset when we re-encode...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 7402a4fe 16-Jul-2019 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Fix up backchannel slot table accounting

Add a per-transport maximum limit in the socket case, and add
helpers to allow the NFSv4 code to discover that limit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 675dd90a 19-Jun-2019 Chuck Lever <chuck.lever@oracle.com>

xprtrdma: Modernize ops->connect

Adapt and apply changes that were made to the TCP socket connect
code. See the following commits for details on the purpose of
these changes:

Commit 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
Commit 3851f1cdb2b8 ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout")
Commit 02910177aede ("SUNRPC: Fix reconnection timeouts")

Some common transport code is moved to xprt.c to satisfy the code
duplication police.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 21f0ffaf 28-Apr-2017 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Add basic load balancing to the transport switch

For now, just count the queue length. It is less accurate than counting
number of bytes queued, but easier to implement.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 8ba6a92d 07-Apr-2019 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Refactor xprt_request_wait_receive()

Convert the transport callback to actually put the request to sleep
instead of just setting a timeout. This is in preparation for
rpc_sleep_on_timeout().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 067fb11b 11-Feb-2019 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove rpc_xprt::tsh_size

tsh_size was added to accommodate transports that send a pre-amble
before each RPC message. However, this assumes the pre-amble is
fixed in size, which isn't true for some transports. That makes
tsh_size not very generic.

Also I'd like to make the estimation of RPC send and receive
buffer sizes more precise. tsh_size doesn't currently appear to be
accounted for at all by call_allocate.

Therefore let's just remove the tsh_size concept, and make the only
transports that have a non-zero tsh_size employ a direct approach.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 4aa5cffe 24-Dec-2018 Vasily Averin <vvs@virtuozzo.com>

sunrpc: remove unused bc_up operation from rpc_xprt_ops

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9d96acbc 12-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()

Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 95f7691d 07-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Convert xprt receive queue to use an rbtree

If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# adfa7144 03-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# c544577d 03-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Clean up transport write space handling

Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 36bd7de9 03-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Turn off throttling of RPC slots for TCP sockets

The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 75891f50 03-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Support for congestion control when queuing is enabled

Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 918f3c1f 09-Sep-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Improve latency for interactive tasks

One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 50f484e2 30-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()

When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 762e4e67 24-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Refactor RPC call encoding

Move the call encoding so that it occurs before the transport connection
etc.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 944b0429 09-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Add a transmission queue for RPC requests

Add the queue that will enforce the ordering of RPC task transmission.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# ef3f5434 08-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Distinguish between the slot allocation list and receive queue

When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 7f3a1d1e 22-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Refactor xprt_transmit() to remove wait for reply code

Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# edc81dcd 22-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Refactor xprt_transmit() to remove the reply queue code

Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 75c84151 31-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Rename xprt->recv_lock to xprt->queue_lock

We will use the same lock to protect both the transmit and receive queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# cf9946cd 05-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Refactor the transport request pinning

We are going to need to pin for both send and receive.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# 9dc6edcf 22-Aug-2018 Trond Myklebust <trond.myklebust@hammerspace.com>

SUNRPC: Clean up initialisation of the struct rpc_rqst

Move the initialisation back into xprt.c.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>


# edb41e61 04-May-2018 Chuck Lever <chuck.lever@oracle.com>

xprtrdma: Make rpc_rqst part of rpcrdma_req

This simplifies allocation of the generic RPC slot and xprtrdma
specific per-RPC resources.

It also makes xprtrdma more like the socket-based transports:
->buf_alloc and ->buf_free are now responsible only for send and
receive buffers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# a9cde23a 04-May-2018 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add a ->free_slot transport callout

Refactor: xprtrdma needs to have better control over when RPCs are
awoken from the backlog queue, so replace xprt_free_slot with a
transport op callout.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 37ac86c3 04-May-2018 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock

alloc_slot is a transport-specific op, but initializing an rpc_rqst
is common to all transports. In addition, the only part of initial-
izing an rpc_rqst that needs serialization is getting a fresh XID.

Move rpc_rqst initialization to common code in preparation for
adding a transport-specific alloc_slot to xprtrdma.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# ff699ea8 05-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Make num_reqs a non-atomic integer

If recording xprt->stat.max_slots is moved into xprt_alloc_slot,
then xprt->num_reqs is never manipulated outside
xprt->reserve_lock. There's no longer a need for xprt->num_reqs to
be atomic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# ecd465ee 05-Mar-2018 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move xprt_update_rtt callsite

Since commit 33849792cbcd ("xprtrdma: Detect unreachable NFS/RDMA
servers more reliably"), the xprtrdma transport now has a ->timer
callout. But xprtrdma does not need to compute RTT data, only UDP
needs that. Move the xprt_update_rtt call into the UDP transport
implementation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ce7c252a 16-Aug-2017 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Add a separate spinlock to protect the RPC request receive list

This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 729749bb 13-Aug-2017 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Don't hold the transport lock across socket copy operations

Instead add a mechanism to ensure that the request doesn't disappear
from underneath us while copying from the socket. We do this by
preventing xprt_release() from freeing the XDR buffers until the
flag RPC_TASK_MSG_RECV has been cleared from the request.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>


# d31ae254 31-Jul-2017 Chuck Lever <chuck.lever@oracle.com>

sunrpc: Const-ify all instances of struct rpc_xprt_ops

After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 7196dbb0 08-Feb-2017 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Allow changing of the TCP timeout parameters on the fly

When the NFSv4 server tells us the lease period, we usually want
to adjust down the timeout parameters on the TCP connection to
ensure that we don't miss lease renewals due to a faulty connection.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 5a6d1db4 15-Sep-2016 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add a transport-specific private field in rpc_rqst

Currently there's a hidden and indirect mechanism for finding the
rpcrdma_req that goes with an rpc_rqst. It depends on getting from
the rq_buffer pointer in struct rpc_rqst to the struct
rpcrdma_regbuf that controls that buffer, and then to the struct
rpcrdma_req it goes with.

This was done back in the day to avoid the need to add a per-rqst
pointer or to alter the buf_free API when support for RPC-over-RDMA
was introduced.

I'm about to change the way regbuf's work to support larger inline
thresholds. Now is a good time to replace this indirect mechanism
with something that is more straightforward. I guess this should be
considered a clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 68778945 15-Sep-2016 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Separate buffer pointers for RPC Call and Reply messages

For xprtrdma, the RPC Call and Reply buffers are involved in real
I/O operations.

To start with, the DMA direction of the I/O for a Call is opposite
that of a Reply.

In the current arrangement, the Reply buffer address is on a
four-byte alignment just past the call buffer. Would be friendlier
on some platforms if that was at a DMA cache alignment instead.

Because the current arrangement allocates a single memory region
which contains both buffers, the RPC Reply buffer often contains a
page boundary in it when the Call buffer is large enough (which is
frequent).

It would be a little nicer for setting up DMA operations (and
possible registration of the Reply buffer) if the two buffers were
separated, well-aligned, and contained as few page boundaries as
possible.

Now, I could just pad out the single memory region used for the pair
of buffers. But frequently that would mean a lot of unused space to
ensure the Reply buffer did not have a page boundary.

Add a separate pointer to rpc_rqst that points right to the RPC
Reply buffer. This makes no difference to xprtsock, but it will help
xprtrdma in subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 3435c74a 15-Sep-2016 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Generalize the RPC buffer release API

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Instead of passing just the rq_buffer into the buf_free method, pass
the task structure and let buf_free take care of freeing both
XDR buffers at once.

There's a micro-optimization here. In the common case, both
xprt_release and the transport's buf_free method were checking if
rq_buffer was NULL. Now the check is done only once per RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 5fe6eaa1 15-Sep-2016 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Generalize the RPC buffer allocation API

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Transports that want to allocate separate Call and Reply buffers
will ignore the "size" argument anyway. Don't bother passing it.

The buf_alloc method can't return two pointers. Instead, make the
method's return value an error code, and set the rq_buffer pointer
in the method itself.

This gives call_allocate an opportunity to terminate an RPC instead
of looping forever when a permanent problem occurs. If a request is
just bogus, or the transport is in a state where it can't allocate
resources for any request, there needs to be a way to kill the RPC
right there and not loop.

This immediately fixes a rare problem in the backchannel send path,
which loops if the server happens to send a CB request whose
call+reply size is larger than a page (which it shouldn't do yet).

One more issue: looks like xprt_inject_disconnect was incorrectly
placed in the failure path in call_allocate. It needs to be in the
success path, as it is for other call-sites.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# b9c5bc03 15-Sep-2016 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Refactor rpc_xdr_buf_init()

Clean up: there is some XDR initialization logic that is common
to the forward channel and backchannel. Move it to an XDR header
so it can be shared.

rpc_rqst::rq_buffer points to a buffer containing big-endian data.
Update its annotation as part of the clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 3851f1cd 03-Aug-2016 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout

...and ensure that we propagate it to new transports on the same
client.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 39a9beab 16-May-2016 J. Bruce Fields <bfields@redhat.com>

rpc: share one xps between all backchannels

The spec allows backchannels for multiple clients to share the same tcp
connection. When that happens, we need to use the same xprt for all of
them. Similarly, we need the same xps.

This fixes list corruption introduced by the multipath code.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Trond Myklebust <trondmy@primarydata.com>


# 6b26cc8c 02-May-2016 Chuck Lever <chuck.lever@oracle.com>

sunrpc: Advertise maximum backchannel payload size

RPC-over-RDMA transports have a limit on how large a backward
direction (backchannel) RPC message can be. Ensure that the NFSv4.x
CREATE_SESSION operation advertises this limit to servers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 80b14d5e 14-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Add a structure to track multiple transports

In order to support multipathing/trunking we will need the ability to
track multiple transports. This patch sets up a basic structure for
doing so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# fda1bfef 14-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Make freeing of struct xprt rcu-safe

Have it call kfree_rcu() to ensure that we can use it on rcu-protected
lists.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 30c5116b 24-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Uninline xprt_get(); It isn't performance critical.

Also allow callers to pass NULL arguments to xprt_get() and xprt_put().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 76566773 24-Oct-2015 Chuck Lever <chuck.lever@oracle.com>

NFS: Enable client side NFSv4.1 backchannel to use other transports

Forechannel transports get their own "bc_up" method to create an
endpoint for the backchannel service.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Anna Schumaker: Add forward declaration of struct net to xprt.h]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 94684319 24-Oct-2015 Chuck Lever <chuck.lever@oracle.com>

svcrdma: Add backward direction service for RPC/RDMA transport

On NFSv4.1 mount points, the Linux NFS client uses this transport
endpoint to receive backward direction calls and route replies back
to the NFSv4.1 server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 42e5c3e2 24-Oct-2015 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Abstract backchannel operations

xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA
bi-direction. In particular, receive buffers have to be pre-
registered and posted in order to receive incoming backchannel
requests.

Add a virtual function call to allow the insertion of appropriate
backchannel setup and destruction methods for each transport.

In addition, freeing a backchannel request is a little different
for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 4a068258 11-May-2015 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Transport fault injection

It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss. To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.

Fault injection is disabled by default. It is enabled with

$ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.

These hooks are disabled when SUNRPC_DEBUG is turned off.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# d67fa4d8 03-Jun-2015 Jeff Layton <jlayton@kernel.org>

sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops

RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 8e228133 03-Jun-2015 Jeff Layton <jlayton@kernel.org>

sunrpc: make xprt->swapper an atomic_t

Split xs_swapper into enable/disable functions and eliminate the
"enable" flag.

Currently, it's racy if you have multiple swapon/swapoff operations
running in parallel over the same xprt. Also fix it so that we only
set it to a memalloc socket on a 0->1 transition and only clear it
on a 1->0 transition.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 0d2a970d 04-Jun-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Fix a backchannel race

We need to allow the server to send a new request immediately after we've
replied to the previous one. Right now, there is a window between the
send and the release of the old request in rpc_put_task(), where the
server could send us a new backchannel RPC call, and we have no
request to service it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 9e2b9f37 08-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 505936f5 08-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 4efdd92c 08-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Remove TCP client connection reset hack

Instead we rely on SO_REUSEPORT to provide the reconnection semantics
that we need for NFSv2/v3.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 718ba5b8 08-Feb-2015 Trond Myklebust <trond.myklebust@primarydata.com>

SUNRPC: Add helpers to prevent socket create from racing

The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.

This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 388f0c77 26-Nov-2014 Jeff Layton <jlayton@kernel.org>

sunrpc: add a debugfs rpc_xprt directory with an info file in it

Add a new directory heirarchy under the debugfs sunrpc/ directory:

sunrpc/
rpc_xprt/
<xprt id>/

Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.

For now, this directory just holds an "info" file, but we may add
other files to it in the future.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# a743419f 22-Sep-2014 Benjamin Coddington <bcodding@redhat.com>

SUNRPC: Don't wake tasks during connection abort

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>


# 4f4cf5ad 28-May-2014 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move congestion window constants to header file

I would like to use one of the RPC client's congestion algorithm
constants in transport-specific code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 4e857c58 17-Mar-2014 Peter Zijlstra <peterz@infradead.org>

arch: Mass conversion of smp_mb__*()

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d531c008 23-Mar-2014 Kinglong Mee <kinglongmee@gmail.com>

NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp

Besides checking rpc_xprt out of xs_setup_bc_tcp,
increase it's reference (it's important).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 90051ea7 24-Sep-2013 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Clean up - convert xprt_prepare_transmit to return a bool

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 33d90ac0 11-Apr-2013 J. Bruce Fields <bfields@redhat.com>

SUNRPC: allow disabling idle timeout

In the gss-proxy case we don't want to have to reconnect at random--we
want to connect only on gss-proxy startup when we can steal gss-proxy's
context to do the connect in the right namespace.

So, provide a flag that allows the rpc_create caller to turn off the
idle timeout.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7993ceb 14-Apr-2013 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Allow rpc_create() to request that TCP slots be unlimited

This is mainly for use by NFSv4.1, where the session negotiation
ultimately wants to decide how many RPC slots we can fill.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ba60eb25 14-Apr-2013 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix a livelock problem in the xprt->backlog queue

This patch ensures that we throttle new RPC requests if there are
requests already waiting in the xprt->backlog queue. The reason for
doing this is to fix livelock issues that can occur when an existing
(high priority) task is waiting in the backlog queue, gets woken up
by xprt_free_slot(), but a new task then steals the slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 6a24dfb6 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Pass pointers to struct rpc_xprt to the congestion window

Avoid access to task->tk_xprt

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 1b092092 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback

Avoid another RCU dereference by passing the pointer to struct rpc_xprt
from the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# d19751e7 11-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Get rid of the redundant xprt->shutdown bit field

It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# f39c1bfb 07-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix a UDP transport regression

Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]


# a564b8f0 31-Jul-2012 Mel Gorman <mgorman@suse.de>

nfs: enable swap on NFS

Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4e0038b6 01-Mar-2012 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Move clnt->cl_server into struct rpc_xprt

When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 15a45206 14-Feb-2012 Andy Adamson <andros@netapp.com>

SUNRPC: add sending,pending queue and max slot to xprt stats

With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f58c0d2ff5029e7002ab43f913b36f9, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 88338124 06-Feb-2012 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Change the default limit to the number of TCP slots

Since the scheme of limiting the number of TCP slots to whatever will
fit in the current TCP window seems to be working well (Andy reports
getting within 20% of the 'iperf' send performance on a 10GigE link)
we should just let that be the default mode of operation.

Users may still set their own limits using the tcp_max_slot_table_entries
parameter if they need to.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 34006cee 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Replace xprt->resend and xprt->sending with a priority queue

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# d9ba131d 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Support dynamic slot allocation for TCP connections

Allow the number of available slots to grow with the TCP window size.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 21de0a95 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Clean up the slot table allocation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 43cedbf0e 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot

This throttles the allocation of new slots when the socket is busy
reconnecting and/or is out of buffer space.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 9e00abc3 13-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: sunrpc should not explicitly depend on NFS config options

Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 176e21ee 09-May-2011 Chuck Lever <chuck.lever@ORACLE.COM>

SUNRPC: Support for RPC over AF_LOCAL transports

TI-RPC introduces the capability of performing RPC over AF_LOCAL
sockets. It uses this mainly for registering and unregistering
local RPC services securely with the local rpcbind, but we could
also conceivably use it as a generic upcall mechanism.

This patch provides a client-side only implementation for the moment.
We might also consider a server-side implementation to provide
AF_LOCAL access to NLM (for statd downcalls, and such like).

Autobinding is not supported on kernel AF_LOCAL transports at this
time. Kernel ULPs must specify the pathname of the remote endpoint
when an AF_LOCAL transport is created. rpcbind supports registering
services available via AF_LOCAL, so the kernel could handle it with
some adjustment to ->rpcbind and ->set_port. But we don't need this
feature for doing upcalls via well-known named sockets.

This has not been tested with ULPs that move a substantial amount of
data. Thus, I can't attest to how robust the write_space and
congestion management logic is.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a8de240a 15-Mar-2011 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Convert struct rpc_xprt to use atomic_t counters

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# f0418aa4 08-Dec-2010 J. Bruce Fields <bfields@redhat.com>

rpc: allow xprt_class->setup to return a preexisting xprt

This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 37aa2133 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Tag rpc_xprt with net

The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9a23e332 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Add net to xprt_create

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e204e621 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Factor out rpc_xprt freeing

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# bd1722d4 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Factor out rpc_xprt allocation

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a17c2153 31-Jul-2010 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Move the bound cred to struct rpc_rqst

This will allow us to save the original generic cred in rpc_message, so
that if we migrate from one server to another, we can generate a new bound
cred without having to punt back to the NFS layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# d60dbb20 12-May-2010 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst

It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ff839970 07-May-2010 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Replace jiffies-based metrics with ktime-based metrics

Currently RPC performance metrics that tabulate elapsed time use
jiffies time values. This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments). It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation. As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds. In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools. At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# bbc72cea 07-May-2010 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: RPC metrics and RTT estimator should use same RTT value

Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a8ce4a8f 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fail over more quickly on connect errors

We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# f300baba 10-Sep-2009 Alexandros Batsakis <batsakis@netapp.com>

nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel

[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 4cfc7e60 10-Sep-2009 Rahul Iyer <iyer@netapp.com>

nfsd41: sunrpc: Added rpc server-side backchannel handling

When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# c740eff8 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Kill RPC_DISPLAY_ALL

At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL.
Currently there are no uses of RPC_DISPLAY_ALL outside the transport
modules themselves, so we can safely get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ba809130 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove duplicate universal address generation

RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the
redundant cases that convert a socket address to a universal
address. The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.

Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses. This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.

The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# dd2b63d0 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfs41: Rename rq_received to rq_reply_bytes_recvd

The 'rq_received' member of 'struct rpc_rqst' is used to track when we
have received a reply to our request. With v4.1, the backchannel
can now accept callback requests over the existing connection. Rename
this field to make it clear that it is only used for tracking reply bytes
and not all bytes received on the connection.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>


# 55ae1aab 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfs41: Add backchannel processing support to RPC state machine

Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests. It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.

It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.

Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.

Introduce a new transmit state for the client to reply on to backchannel
requests. This new state simply reserves the transport and issues the
reply. In case of a connection related error, disconnects the transport and
drops the reply. It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".

Note: There is no need to loop attempting to reserve the transport. If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit. rpc_execute() will invoke it again
after the task is taken off the sleep queue.

[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>


# fb7a0b9a 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfs41: New backchannel helper routines

This patch introduces support to setup the callback xprt on the client side.
It allocates/ destroys the preallocated memory structures used to process
backchannel requests.

At setup time, xprt_setup_backchannel() is invoked to allocate one or
more rpc_rqst structures and substructures. This ensures that they
are available when an RPC callback arrives. The rpc_rqst structures
are maintained in a linked list attached to the rpc_xprt structure.
We keep track of the number of allocations so that they can be correctly
removed when the channel is destroyed.

When an RPC callback arrives, xprt_alloc_bc_request() is invoked to
obtain a preallocated rpc_rqst structure. An rpc_xprt structure is
returned, and its RPC_BC_PREALLOC_IN_USE bit is set in
rpc_xprt->bc_flags. The structure is removed from the the list
since it is now in use, and it will be later added back when its
user is done with it.

After the RPC callback replies, the rpc_rqst structure is returned
by invoking xprt_free_bc_request(). This clears the
RPC_BC_PREALLOC_IN_USE bit and adds it back to the list, allowing it
to be reused by a subsequent RPC callback request.

To be consistent with the reception of RPC messages, the backchannel requests
should be placed into the 'struct rpc_rqst' rq_rcv_buf, which is then in turn
copied to the 'struct rpc_rqst' rq_private_buf.

[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright notice and explain page allocation]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>


# 56632b5b 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com>

nfs41: client callback structures

Adds new list of rpc_xprt structures, and a readers/writers lock to
protect the list. The list is used to preallocate resources for
the backchannel during backchannel requests. Callbacks are not
expected to cause significant latency, so only one callback will
be allowed at this time.

It also adds a pointer to the NFS callback service so that
requests can be directed to it for processing.

New callback members added to svc_serv. The NFSv4.1 callback service will
sleep on the svc_serv->svc_cb_waitq until new callback requests arrive.
The request will be queued in svc_serv->svc_cb_list. This patch adds this
list, the sleep queue and spinlock to svc_serv.

[nfs41: NFSv4.1 callback support]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>


# f75e6745 21-Apr-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect

See http://bugzilla.kernel.org/show_bug.cgi?id=13034

If the port gets into a TIME_WAIT state, then we cannot reconnect without
binding to a new port.

Tested-by: Petr Vandrovec <petr@vandrovec.name>
Tested-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7d1e8255 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets

This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f958bf7f9f8fae15f0c6f519953fb0257c (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.

We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.

The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 441e3e24 11-Mar-2009 Tom Talpey <tmtalpey@gmail.com>

SUNRPC: dynamically load RPC transport modules on-demand

Provide an api to attempt to load any necessary kernel RPC
client transport module automatically. By convention, the
desired module name is "xprt"+"transport name". For example,
when NFS mounting with "-o proto=rdma", attempt to load the
"xprtrdma" module.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c977a2ef 23-Dec-2008 Benny Halevy <bhalevy@panasas.com>

sunrpc: get rid of rpc_rqst.rq_bufsize

rq_bufsize is not used.

Signed-off-by: Mike Sager <Mike.Sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 7c1d71cf 17-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests

NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.

We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# b6ddf64f 17-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix up xprt_write_space()

The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# b454ae90 07-Jan-2008 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: fewer conditionals in the format_ip_address routines

Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID. This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ba7392bb 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Add support for per-client timeout values

In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 2881ae74 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Clean up the transport timeout initialisation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 62da3b24 06-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Rename xprt_disconnect()

xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 3b948ae5 05-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Allow the client to detect if the TCP connection is closed

Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 66af1e55 06-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix a race in xs_tcp_state_change()

When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 4fa016eb 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com>

NFS/SUNRPC: support transport protocol naming

To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 49c36fcc 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com>

SUNRPC: rearrange RPC sockets definitions

To prepare for including non-sockets-based RPC transports, move the
sockets-dependent definitions into their own file.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 3c341b0b 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com>

SUNRPC: rename the rpc_xprtsock_create structure

To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 81c098af 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com>

SUNRPC: Provide a new API for registering transport implementations

To allow transport capabilities to be loaded dynamically, provide an API
for registering and unregistering the transports with the RPC client.
Eventually xprt_create_transport() will be changed to search the list of
registered transports when initializing a fresh transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 4417c8c4 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com>

SUNRPC: export per-transport rpcbind netid's

The rpcbind (v3+) netid is provided by each RPC client transport. This fixes
an omission in IPv6 rpcbind client support, and enables future extension.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 756805e7 16-Aug-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add support for formatted universal addresses

"Universal addresses" are a string representation of an IP address and
port. They are described fully in RFC 3530, section 2.2. Add support
for generating them in the RPC client's socket transport module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# fbfe3cc6 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Add hex-formatted address support to rpc_peeraddr2str()

Add support for the NFS client's need to export volume information
with IP addresses formatted in hex instead of decimal.

This isn't used yet, but subsequent patches (not in this series) will
change the NFS client to use this functionality.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# d3bc9a1d 09-Jul-2007 Frank van Maarseveen <frankvm@frankvm.com>

SUNRPC client: add interface for binding to a local address

In addition to binding to a local privileged port the NFS client should
allow binding to a specific local address. This is used by the server
for callbacks. The patch adds the necessary interface.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 96802a09 08-Jul-2007 Frank van Maarseveen <frankvm@frankvm.com>

SUNRPC: cleanup transport creation argument passing

Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 7531d692 14-May-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Fix sparse warnings

- net/sunrpc/xprtsock.c:1635:5: warning: symbol 'init_socket_xprt' was not
declared. Should it be static?
- net/sunrpc/xprtsock.c:1649:6: warning: symbol 'cleanup_socket_xprt' was
not declared. Should it be static?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a509050b 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: introduce rpcbind: replacement for in-kernel portmapper

Introduce a replacement for the in-kernel portmapper client that supports
all 3 versions of the rpcbind protocol. This code is not used yet.

Original code by Groupe Bull updated for the latest kernel, with multiple
bug fixes.

Note that rpcb_clnt.c does not yet support registering via versions 3 and
4 of the rpcbind protocol. That is planned for a later patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c5a4dd8b 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Eliminate side effects from rpc_malloc

Currently rpc_malloc sets req->rq_buffer internally. Make this a more
generic interface: return a pointer to the new buffer (or NULL) and
make the caller set req->rq_buffer and req->rq_bufsize. This looks much
more like kmalloc and eliminates the side effects.

To fix a potential deadlock, this patch also replaces GFP_NOFS with
GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside
the RPC's task scheduler while allocating their buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 2bea90d4 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: RPC buffer size estimates are too large

The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two. That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 7559c7a2 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Make address format buffers more generic

For now we will assume that all transports will use the address format
buffers in the rpc_xprt struct to store their addresses. Change
rpc_peer2str() to be a generic routine to handle this, and get rid of the
print_address() op in the rpc_xprt_ops vector.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 314dfd79 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: move saved socket callback functions to a private data structure

Move the three fields for saving socket callback functions out of the
rpc_xprt structure and into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 7c6e066e 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move the UDP socket bufsize parameters to a private data structure

Move the socket-specific buffer size parameters for UDP sockets to a
private data structure maintained in net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c8475461 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move rpc_xprt socket connect fields into private data structure

Move the socket-specific connection management fields out of the generic
rpc_xprt structure into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# e136d092 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move TCP state flags into xprtsock.c

Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename
them to use the naming scheme in use in xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 51971139 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Move TCP receive state variables into private data structure

Move the TCP receive state variables from the generic rpc_xprt structure to
a private structure maintained inside net/sunrpc/xprtsock.c.

Also rename a function/variable pair to refer to RPC fragment headers
instead of record markers, to be consistent with types defined in
sunrpc/*.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ee0ac0c2 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Remove sock and inet fields from rpc_xprt

The "sock" and "inet" fields are socket-specific. Move them to a private
data structure maintained entirely within net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c8541ecd 17-Oct-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Make the transport-specific setup routine allocate rpc_xprt

Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# e744cf2e 17-Oct-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: minor optimization of "xid" field in rpc_xprt

Move the xid field in the rpc_xprt structure to be in the same cache line
as the reserve_lock, since these are used at the same time.

Test plan:
None.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 52bad64d 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Separate delayable and non-delayable events.

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 7adae489 04-Oct-2006 Greg Banks <gnb@melbourne.sgi.com>

[PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP

The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet. That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d8ed029d 26-Sep-2006 Alexey Dobriyan <adobriyan@gmail.com>

[SUNRPC]: trivial endianness annotations

pure s/u32/__be32/

[AV: large part based on Alexey's patches]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b6ca86b 04-Sep-2006 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Add refcounting to the struct rpc_xprt

In a subsequent patch, this will allow the portmapper to take a reference
to the rpc_xprt for which it is updating the port number, fixing an Oops.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ff9aa5e5 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Eliminate xprt_create_proto and rpc_create_client

The two function call API for creating a new RPC client is now obsolete.
Remove it.

Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.

Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c2866763 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: use sockaddr + size when creating remote transport endpoints

Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.

Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API. Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c4efcb1d 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address

IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# edb267a6 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: add xprt switch API for printing formatted remote peer addresses

Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# bbf7c1dd 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Introduce transport switch callout for pluggable rpcbind

Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms. For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.

Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 4a68179d 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Make RPC portmapper use per-transport storage

Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure. This will allow the creation of
a clean API for plugging in different types of bind mechanisms.

This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding. A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ec739ef0 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Create a helper to tell whether a transport is bound

Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).

Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 8e037094 22-Aug-2006 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: avoid choosing an IPMI port for RPC traffic

Some hardware uses port 664 for its hardware-based IPMI listener. Teach
the RPC client to avoid using that port by raising the default minimum port
number to 665.

Test plan:
Find a mainboard known to use port 664 for IPMI; enable IPMI; mount NFS
servers in a tight loop.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 58e8cb3a035d22fc386e1c53a5d98c3f219530fb commit)


# e0ab53de 27-Jul-2006 Trond Myklebust <Trond.Myklebust@netapp.com>

RPC: Ensure that we disconnect TCP socket when client requests error out

If we're part way through transmitting a TCP request, and the client
errors, then we need to disconnect and reconnect the TCP socket in order to
avoid confusing the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 031a50c8b9ea82616abd4a4e18021a25848941ce commit)


# e99170ff 18-Apr-2006 Trond Myklebust <Trond.Myklebust@netapp.com>

NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unset

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 262ca07d 20-Mar-2006 Chuck Lever <cel@netapp.com>

SUNRPC: add a handful of per-xprt counters

Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 632e3bdc 03-Jan-2006 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Ensure client closes the socket when server initiates a close

If the server decides to close the RPC socket, we currently don't actually
respond until either another RPC call is scheduled, or until xprt_autoclose()
gets called by the socket expiry timer (which may be up to 5 minutes
later).

This patch ensures that xprt_autoclose() is called much sooner if the
server closes the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 92200412 03-Jan-2006 Chuck Lever <cel@netapp.com>

SUNRPC: transport switch API for setting port number

At some point, transport endpoint addresses will no longer be IPv4. To hide
the structure of the rpc_xprt's address field from ULPs and port mappers,
add an API for setting the port number during an RPC bind operation.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 02107148 03-Jan-2006 Chuck Lever <cel@netapp.com>

SUNRPC: switchable buffer allocation

Add RPC client transport switch support for replacing buffer management
on a per-transport basis.

In the current IPv4 socket transport implementation, RPC buffers are
allocated as needed for each RPC message that is sent. Some transport
implementations may choose to use pre-allocated buffers for encoding,
sending, receiving, and unmarshalling RPC messages, however. For
transports capable of direct data placement, the buffers can be carved
out of a pre-registered area of memory rather than from a slab cache.

Test-plan:
Millions of fsx operations. Performance characterization with "sio" and
"iozone". Use oprofile and other tools to look for significant regression
in CPU utilization.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ead5e1c2 13-Oct-2005 J. Bruce Fields <bfields@fieldses.org>

SUNRPC: Provide a callback to allow free pages allocated during xdr encoding

For privacy, we need to allocate pages to store the encrypted data (passed
in pages can't be used without the risk of corrupting data in the page cache).
So we need a way to free that memory after the request has been transmitted.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 5e5ce5be 18-Oct-2005 Trond Myklebust <Trond.Myklebust@netapp.com>

RPC: allow call_encode() to delay transmission of an RPC call.

Currently, call_encode will cause the entire RPC call to abort if it returns
an error. This is unnecessarily rigid, and gets in the way of attempts
to allow the NFSv4 layer to order RPC calls that carry sequence ids.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 470056c2 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: rationalize set_buffer_size

In fact, ->set_buffer_size should be completely functionless for non-UDP.

Test-plan:
Check socket buffer size on UDP sockets over time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 03bf4b70 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: parametrize various transport connect timeouts

Each transport implementation can now set unique bind, connect,
reestablishment, and idle timeout values. These are variables,
allowing the values to be modified dynamically. This permits
exponential backoff of any of these values, for instance.

As an example, we implement exponential backoff for the connection
reestablishment timeout.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 529b33c6 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: allow RPC client's port range to be adjustable

Select an RPC client source port between 650 and 1023 instead of between
1 and 800. The old range conflicts with a number of network services.
Provide sysctls to allow admins to select a different port range.

Note that this doesn't affect user-level RPC library behavior, which
still uses 1 to 800.

Based on a suggestion by Olaf Kirch <okir@suse.de>.

Test-plan:
Repeated mount and unmount. Destructive testing. Idle timeouts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 555ee3af 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: clean up after nocong was removed

Clean-up: Move some macros that are specific to the Van Jacobson
implementation into xprt.c. Get rid of the cong_wait field in
rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog.

Test-plan:
Compile with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# ed63c003 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: remove xprt->nocong

Get rid of the "xprt->nocong" variable.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
Look for significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a58dd398 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: add a release_rqst callout to the RPC transport switch

The final place where congestion control state is adjusted is in
xprt_release, where each request is finally released. Add a callout
there to allow transports to perform additional processing when a
request is about to be released.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 1570c1e4 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: add generic interface for adjusting the congestion window

A new interface that allows transports to adjust their congestion window
using the Van Jacobson implementation in xprt.c is provided.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 46c0ee8b 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: separate xprt_timer implementations

Allow transports to hook the retransmit timer interrupt. Some transports
calculate their congestion window here so that a retransmit timeout has
immediate effect on the congestion window.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 49e9a890 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: expose API for serializing access to RPC transports

The next method we abstract is the one that releases a transport,
allowing another task to have access to the transport.

Again, one generic version of this is provided for transports that
don't need the RPC client to perform congestion control, and one
version is for transports that can use the original Van Jacobson
implementation in xprt.c.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 12a80469 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: expose API for serializing access to RPC transports

The next several patches introduce an API that allows transports to
choose whether the RPC client provides congestion control or whether
the transport itself provides it.

The first method we abstract is the one that serializes access to the
RPC transport to prevent the bytes from different requests from mingling
together. This method provides proper request serialization and the
opportunity to prevent new requests from being started because the
transport is congested.

The normal situation is for the transport to handle congestion control
itself. Although NFS over UDP was first, it has been recognized after
years of experience that having the transport provide congestion control
is much better than doing it in the RPC client. Thus TCP, and probably
every future transport implementation, will use the default method,
xprt_lock_write, provided in xprt.c, which does not provide any kind
of congestion control. UDP can continue using the xprt.c-provided
Van Jacobson congestion avoidance implementation.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# fe3aca29 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: add API to set transport-specific timeouts

Prepare the way to remove the "xprt->nocong" variable by adding a callout
to the RPC client transport switch API to handle setting RPC retransmit
timeouts.

Add a pair of generic helper functions that provide the ability to set a
simple fixed timeout, or to set a timeout based on the state of a round-
trip estimator.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 43118c29 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: get rid of xprt->stream

Now we can fix up the last few places that use the "xprt->stream"
variable, and get rid of it from the rpc_xprt structure.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 808012fb 25-Aug-2005 Chuck Lever <cel@netapp.com>

[PATCH] RPC: skip over transport-specific heads automatically

Add a generic mechanism for skipping over transport-specific headers
when constructing an RPC request. This removes another "xprt->stream"
dependency.

Test-plan:
Write-intensive workload on a single mount point (try both UDP and
TCP).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# c7b2cae8 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: separate TCP and UDP write space callbacks

Split the socket write space callback function into a TCP version and UDP
version, eliminating one dependence on the "xprt->stream" variable.

Keep the common pieces of this path in xprt.c so other transports can use
it too.

Test-plan:
Write-intensive workload on a single mount point.

Version: Thu, 11 Aug 2005 16:07:51 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 55aa4f58 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: client-side transport switch cleanup

Clean-up: change some comments to reflect the realities of the new RPC
transport switch mechanism. Get rid of unused xprt_receive() prototype.

Also, organize function prototypes in xprt.h by usage and scope.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:07:21 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 44fbac22 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: Add helper for waking tasks pending on a transport

Clean-up: remove only reference to xprt->pending from the socket transport
implementation. This makes a cleaner interface for other transport
implementations as well.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:06:52 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 2226feb6 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: rename the sockstate field

Clean-up: get rid of a name reference to sockets in the generic parts of the
RPC client by renaming the sockstate field in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:53 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 5dc07727 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: Rename xprt_lock

Clean-up: Replace the xprt_lock with something more aptly named. This lock
single-threads the XID and request slot reservation process.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:26 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 4a0f8c04 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: Rename sock_lock

Clean-up: replace a name reference to sockets in the generic parts of the RPC
client by renaming sock_lock in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:00 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# a246b010 11-Aug-2005 Chuck Lever <cel@citi.umich.edu>

[PATCH] RPC: introduce client-side transport switch

Move the bulk of client-side socket-specific code into a separate source
file, net/sunrpc/xprtsock.c.

Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone". Destructive testing (unplugging the network temporarily, server
reboots). Connectathon with v2, v3, and v4.

Version: Thu, 11 Aug 2005 16:03:38 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!