History log of /linux-master/net/rxrpc/call_accept.c
Revision Date Author Comments
# 8395406b 19-Dec-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix trace string

Fix a trace string to indicate that it's discarding the local endpoint for
a preallocated peer, not a preallocated connection.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 42f229c3 06-Jan-2023 David Howells <dhowells@redhat.com>

rxrpc: Fix incoming call setup race

An incoming call can race with rxrpc socket destruction, leading to a
leaked call. This may result in an oops when the call timer eventually
expires:

BUG: kernel NULL pointer dereference, address: 0000000000000874
RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50
Call Trace:
<IRQ>
try_to_wake_up+0x59/0x550
? __local_bh_enable_ip+0x37/0x80
? rxrpc_poke_call+0x52/0x110 [rxrpc]
? rxrpc_poke_call+0x110/0x110 [rxrpc]
? rxrpc_poke_call+0x110/0x110 [rxrpc]
call_timer_fn+0x24/0x120

with a warning in the kernel log looking something like:

rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)!

incurred during rmmod of rxrpc. The 1061d is the call flags:

RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED,
IS_SERVICE, RELEASED

but no DISCONNECTED flag (0x800), so it's an incoming (service) call and
it's still connected.

The race appears to be that:

(1) rxrpc_new_incoming_call() consults the service struct, checks sk_state
and allocates a call - then pauses, possibly for an interrupt.

(2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer,
discards the prealloc and releases all calls attached to the socket.

(3) rxrpc_new_incoming_call() resumes, launching the new call, including
its timer and attaching it to the socket.

Fix this by read-locking local->services_lock to access the AF_RXRPC socket
providing the service rather than RCU in rxrpc_new_incoming_call().
There's no real need to use RCU here as local->services_lock is only
write-locked by the socket side in two places: when binding and when
shutting down.

Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org


# 96b4059f 27-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Remove call->state_lock

All the setters of call->state are now in the I/O thread and thus the state
lock is now unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 57af281e 06-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Tidy up abort generation infrastructure

Tidy up the abort generation infrastructure in the following ways:

(1) Create an enum and string mapping table to list the reasons an abort
might be generated in tracing.

(2) Replace the 3-char string with the values from (1) in the places that
use that to log the abort source. This gets rid of a memcpy() in the
tracepoint.

(3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
and use values from (1) to indicate the trace reason.

(4) Always make a call to an abort function at the point of the abort
rather than stashing the values into variables and using goto to get
to a place where it reported. The C optimiser will collapse the calls
together as appropriate. The abort functions return a value that can
be returned directly if appropriate.

Note that this extends into afs also at the points where that generates an
abort. To aid with this, the afs sources need to #define
RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
because they don't have access to the rxrpc internal structures that some
of the tracepoints make use of.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 8a758d98 20-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Stash the network namespace pointer in rxrpc_local

Stash the network namespace pointer in the rxrpc_local struct in addition
to a pointer to the rxrpc-specific net namespace info. Use this to remove
some places where the socket is passed as a parameter.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 31d35a02 15-Dec-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix the return value of rxrpc_new_incoming_call()

Dan Carpenter sayeth[1]:

The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the
call and local processor work" from Jan 23, 2020, leads to the
following Smatch static checker warning:

net/rxrpc/io_thread.c:283 rxrpc_input_packet()
warn: bool is not less than zero.

Fix this (for now) by changing rxrpc_new_incoming_call() to return an int
with 0 or error code rather than bool. Note that the actual return value
of rxrpc_input_packet() is currently ignored. I have a separate patch to
clean that up.

Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1]
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3dd9c8b5 24-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Remove the _bh annotation from all the spinlocks

None of the spinlocks in rxrpc need a _bh annotation now as the RCU
callback routines no longer take spinlocks and the bulk of the packet
wrangling code is now run in the I/O thread, not softirq context.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 5e6ef4f1 23-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Make the I/O thread take over the call and local processor work

Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.

The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets. In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.

The call event processor becomes purely received-skb driven. It only
transmits things in response to events. We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data. Timer expiry
also results in pokes.

The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.

The local event processor is removed and VERSION response packets are
generated directly from the packet parser. Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.

Changes:
========
ver #2)
- Fix a couple of introduced lock context imbalances.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 393a2a20 20-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Extract the peer address from an incoming packet earlier

Extract the peer address from an incoming packet earlier, at the beginning
of rxrpc_input_packet() and thence pass a pointer to it to various
functions that use it as part of the lookup rather than doing it on several
separate paths.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# cd21effb 08-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Reduce the use of RCU in packet input

Shrink the region of rxrpc_input_packet() that is covered by the RCU read
lock so that it only covers the connection and call lookup. This means
that the bits now outside of that can call sleepable functions such as
kmalloc and sendmsg.

Also take a ref on the conn or call we're going to use before we drop the
RCU read lock.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 29fb4ec3 12-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Remove RCU from peer->error_targets list

Remove the RCU requirements from the peer's list of error targets so that
the error distributor can call sleeping functions.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# f3441d41 20-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Copy client call parameters into rxrpc_call earlier

Copy client call parameters into rxrpc_call earlier so that that can be
used to convey them to the connection code - which can then be offloaded to
the I/O thread.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 3cec055c 24-Nov-2022 David Howells <dhowells@redhat.com>

rxrpc: Don't hold a ref for connection workqueue

Currently, rxrpc gives the connection's work item a ref on the connection
when it queues it - and this is called from the timer expiration function.
The problem comes when queue_work() fails (ie. the work item is already
queued): the timer routine must put the ref - but this may cause the
cleanup code to run.

This has the unfortunate effect that the cleanup code may then be run in
softirq context - which means that any spinlocks it might need to touch
have to be guarded to disable softirqs (ie. they need a "_bh" suffix).

(1) Don't give a ref to the work item.

(2) Simplify handling of service connections by adding a separate active
count so that the refcount isn't also used for this.

(3) Connection destruction for both client and service connections can
then be cleaned up by putting rxrpc_put_connection() out of line and
making a tidy progression through the destruction code (offloaded to a
workqueue if put from softirq or processor function context). The RCU
part of the cleanup then only deals with the freeing at the end.

(4) Make rxrpc_queue_conn() return immediately if it sees the active count
is -1 rather then queuing the connection.

(5) Make sure that the cleanup routine waits for the work item to
complete.

(6) Stash the rxrpc_net pointer in the conn struct so that the rcu free
routine can use it, even if the local endpoint has been freed.

Unfortunately, neither the timer nor the work item can simply get around
the problem by just using refcount_inc_not_zero() as the waits would still
have to be done, and there would still be the possibility of having to put
the ref in the expiration function.

Note the connection work item is mostly going to go away with the main
event work being transferred to the I/O thread, so the wait in (6) will
become obsolete.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# cb0fc0c9 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_call tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 7fa25105 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_conn tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 47c810a7 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_peer tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 0fde882f 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_local tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 2cc80086 19-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle

Remove the rxrpc_conn_parameters struct from the rxrpc_connection and
rxrpc_bundle structs and emplace the members directly. These are going to
get filled in from the rxrpc_call struct in future.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 1fc4fa2a 03-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix congestion management

rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

(1) Don't save cwnd between calls, but rather reset back down to the
initial cwnd and re-enter slow-start if data transmission is idle for
more than an RTT.

(2) Preserve ssthresh instead, as that is a handy estimate of pipe
capacity. Knowing roughly when to stop slow start and enter
congestion avoidance can reduce the tendency to overshoot and drop
larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.

Reported-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 72f0c6fb 30-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Allocate ACK records at proposal and queue for transmission

Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# ad25f5cb 21-May-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix locking issue

There's a locking issue with the per-netns list of calls in rxrpc. The
pieces of code that add and remove a call from the list use write_lock()
and the calls procfile uses read_lock() to access it. However, the timer
callback function may trigger a removal by trying to queue a call for
processing and finding that it's already queued - at which point it has a
spare refcount that it has to do something with. Unfortunately, if it puts
the call and this reduces the refcount to 0, the call will be removed from
the list. Unfortunately, since the _bh variants of the locking functions
aren't used, this can deadlock.

================================
WARNING: inconsistent lock state
5.18.0-rc3-build4+ #10 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b
{SOFTIRQ-ON-W} state was registered at:
...
Possible unsafe locking scenario:

CPU0
----
lock(&rxnet->call_lock);
<Interrupt>
lock(&rxnet->call_lock);

*** DEADLOCK ***

1 lock held by ksoftirqd/2/25:
#0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d

Changes
=======
ver #2)
- Changed to using list_next_rcu() rather than rcu_dereference() directly.

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0575429 21-May-2022 David Howells <dhowells@redhat.com>

rxrpc: Use refcount_t rather than atomic_t

Move to using refcount_t rather than atomic_t for refcounts in rxrpc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8323f72 28-Jan-2021 Takeshi Misawa <jeliantsurux@gmail.com>

rxrpc: Fix memory leak in rxrpc_lookup_local

Commit 9ebeddef58c4 ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Then release ref in __rxrpc_put_peer and rxrpc_put_peer_locked.

struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
- peer->local = local;
+ peer->local = rxrpc_get_local(local);

rxrpc_discard_prealloc also need ref release in discarding.

syzbot report:
BUG: memory leak
unreferenced object 0xffff8881080ddc00 (size 256):
comm "syz-executor339", pid 8462, jiffies 4294942238 (age 12.350s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 0a 00 00 00 00 c0 00 08 81 88 ff ff ................
backtrace:
[<000000002b6e495f>] kmalloc include/linux/slab.h:552 [inline]
[<000000002b6e495f>] kzalloc include/linux/slab.h:682 [inline]
[<000000002b6e495f>] rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline]
[<000000002b6e495f>] rxrpc_lookup_local+0x1c1/0x760 net/rxrpc/local_object.c:244
[<000000006b43a77b>] rxrpc_bind+0x174/0x240 net/rxrpc/af_rxrpc.c:149
[<00000000fd447a55>] afs_open_socket+0xdb/0x200 fs/afs/rxrpc.c:64
[<000000007fd8867c>] afs_net_init+0x2b4/0x340 fs/afs/main.c:126
[<0000000063d80ec1>] ops_init+0x4e/0x190 net/core/net_namespace.c:152
[<00000000073c5efa>] setup_net+0xde/0x2d0 net/core/net_namespace.c:342
[<00000000a6744d5b>] copy_net_ns+0x19f/0x3e0 net/core/net_namespace.c:483
[<0000000017d3aec3>] create_new_namespaces+0x199/0x4f0 kernel/nsproxy.c:110
[<00000000186271ef>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226
[<000000002de7bac4>] ksys_unshare+0x2fe/0x5c0 kernel/fork.c:2957
[<00000000349b12ba>] __do_sys_unshare kernel/fork.c:3025 [inline]
[<00000000349b12ba>] __se_sys_unshare kernel/fork.c:3023 [inline]
[<00000000349b12ba>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3023
[<000000006d178ef7>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
[<00000000637076d4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 9ebeddef58c4 ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
Reported-and-tested-by: syzbot+305326672fed51b205f7@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161183091692.3506637.3206605651502458810.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ec832bd0 16-Sep-2020 David Howells <dhowells@redhat.com>

rxrpc: Don't retain the server key in the connection

Don't retain a pointer to the server key in the connection, but rather get
it on demand when the server has to deal with a response packet.

This is necessary to implement RxGK (GSSAPI-mediated transport class),
where we can't know which key we'll need until we've challenged the client
and got back the response.

This also means that we don't need to do a key search in the accept path in
softirq mode.

Also, whilst we're at it, allow the security class to ask for a kvno and
encoding-type variant of a server key as RxGK needs different keys for
different encoding types. Keys of this type have an extra bit in the
description:

"<service-id>:<security-index>:<kvno>:<enctype>"

Signed-off-by: David Howells <dhowells@redhat.com>


# 2d914c1b 30-Sep-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix accept on a connection that need securing

When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket. This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref. This means that the call runs out of refs too
early and the kernel oopses.

By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.

Change the mode of operation for userspace rxrpc server sockets to work
like this too. Although this is a UAPI change, server sockets aren't
currently functional.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 0041cd5a 19-Jun-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix notification call on completion of discarded calls

When preallocated service calls are being discarded, they're passed to
->discard_new_call() to have the caller clean up any attached higher-layer
preallocated pieces before being marked completed. However, the act of
marking them completed now invokes the call's notification function - which
causes a problem because that function might assume that the previously
freed pieces of memory are still there.

Fix this by setting a dummy notification function on the socket after
calling ->discard_new_call().

This results in the following kasan message when the kafs module is
removed.

==================================================================
BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21

CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
__rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
__rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
process_one_work+0x965/0x1690 kernel/workqueue.c:2269
worker_thread+0x96/0xe10 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

Allocated by task 6820:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
ops_init+0xaf/0x420 net/core/net_namespace.c:151
setup_net+0x2de/0x860 net/core/net_namespace.c:341
copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
__do_sys_unshare kernel/fork.c:3051 [inline]
__se_sys_unshare kernel/fork.c:3049 [inline]
__x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 21:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
__cache_free mm/slab.c:3426 [inline]
kfree+0x109/0x2b0 mm/slab.c:3757
afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
process_one_work+0x965/0x1690 kernel/workqueue.c:2269
worker_thread+0x96/0xe10 kernel/workqueue.c:2415
kthread+0x3b5/0x4a0 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff8880946c3800
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 484 bytes inside of
1024-byte region [ffff8880946c3800, ffff8880946c3c00)
The buggy address belongs to the page:
page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Reported-by: syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com
Fixes: 5ac0d62226a0 ("rxrpc: Fix missing notification")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c410bf01 11-May-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix the excessive initial retransmission timeout

rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled. This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.

Fix this by:

(1) Copying the RTT/RTO calculation code from Linux's TCP implementation
and altering it to fit rxrpc.

(2) Altering the various users of the RTT to make use of the new SRTT
value.

(3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
value instead (which is needed in jiffies), along with a backoff.

Notes:

(1) rxrpc provides RTT samples by matching the serial numbers on outgoing
DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
against the reference serial number in incoming REQUESTED ACK and
PING-RESPONSE ACK packets.

(2) Each packet that is transmitted on an rxrpc connection gets a new
per-connection serial number, even for retransmissions, so an ACK can
be cross-referenced to a specific trigger packet. This allows RTT
information to be drawn from retransmitted DATA packets also.

(3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
on an rxrpc_call because many RPC calls won't live long enough to
generate more than one sample.

(4) The calculated SRTT value is in units of 8ths of a microsecond rather
than nanoseconds.

The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: David Howells <dhowells@redhat.com>


# 063c60d3 20-Dec-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix missing security check on incoming calls

Fix rxrpc_new_incoming_call() to check that we have a suitable service key
available for the combination of service ID and security class of a new
incoming call - and to reject calls for which we don't.

This causes an assertion like the following to appear:

rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false
kernel BUG at net/rxrpc/call_object.c:456!

Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than
RXRPC_CALL_COMPLETE (12).

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>


# 13b7955a 20-Dec-2019 David Howells <dhowells@redhat.com>

rxrpc: Don't take call->user_mutex in rxrpc_new_incoming_call()

Standard kernel mutexes cannot be used in any way from interrupt or softirq
context, so the user_mutex which manages access to a call cannot be a mutex
since on a new call the mutex must start off locked and be unlocked within
the softirq handler to prevent userspace interfering with a call we're
setting up.

Commit a0855d24fc22d49cdc25664fb224caee16998683 ("locking/mutex: Complain
upon mutex API misuse in IRQ contexts") causes big warnings to be splashed
in dmesg for each a new call that comes in from the server. Whilst it
*seems* like it should be okay, since the accept path uses trylock, there
are issues with PI boosting and marking the wrong task as the owner.

Fix this by not taking the mutex in the softirq path at all. It's not
obvious that there should be any need for it as the state is set before the
first notification is generated for the new call.

There's also no particular reason why the link-assessing ping should be
triggered inside the mutex. It's not actually transmitted there anyway,
but rather it has to be deferred to a workqueue.

Further, I don't think that there's any particular reason that the socket
notification needs to be done from within rx->incoming_lock, so the amount
of time that lock is held can be shortened too and the ping prepared before
the new call notification is sent.

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>


# f33121cb 18-Dec-2019 David Howells <dhowells@redhat.com>

rxrpc: Unlock new call in rxrpc_new_incoming_call() rather than the caller

Move the unlock and the ping transmission for a new incoming call into
rxrpc_new_incoming_call() rather than doing it in the caller. This makes
it clearer to see what's going on.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Will Deacon <will@kernel.org>
cc: Davidlohr Bueso <dave@stgolabs.net>


# 91fcfbe8 07-Oct-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix call crypto state cleanup

Fix the cleanup of the crypto state on a call after the call has been
disconnected. As the call has been disconnected, its connection ref has
been discarded and so we can't go through that to get to the security ops
table.

Fix this by caching the security ops pointer in the rxrpc_call struct and
using that when freeing the call security state. Also use this in other
places we're dealing with call-specific security.

The symptoms look like:

BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
net/rxrpc/call_object.c:481
Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764

Fixes: 1db88c534371 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>


# 48c9e0ec 07-Oct-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix trace-after-put looking at the put call record

rxrpc_put_call() calls trace_rxrpc_call() after it has done the decrement
of the refcount - which looks at the debug_id in the call record. But
unless the refcount was reduced to zero, we no longer have the right to
look in the record and, indeed, it may be deleted by some other thread.

Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.

Fixes: e34d4234b0b7 ("rxrpc: Trace rxrpc_call usage")
Signed-off-by: David Howells <dhowells@redhat.com>


# 4c1295dc 07-Oct-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix trace-after-put looking at the put connection record

rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the
decrement of the refcount - which looks at the debug_id in the connection
record. But unless the refcount was reduced to zero, we no longer have the
right to look in the record and, indeed, it may be deleted by some other
thread.

Fix this by getting the debug_id out before decrementing the refcount and
then passing that into the tracepoint.

Fixes: 363deeab6d0f ("rxrpc: Add connection tracepoint and client conn state tracepoint")
Signed-off-by: David Howells <dhowells@redhat.com>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d7b4c24f 11-Oct-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix an uninitialised variable

Fix an uninitialised variable introduced by the last patch. This can cause
a crash when a new call comes in to a local service, such as when an AFS
fileserver calls back to the local cache manager.

Fixes: c1e15b4944c9 ("rxrpc: Fix the packet reception routine")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1e15b49 08-Oct-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix the packet reception routine

The rxrpc_input_packet() function and its call tree was built around the
assumption that data_ready() handler called from UDP to inform a kernel
service that there is data to be had was non-reentrant. This means that
certain locking could be dispensed with.

This, however, turns out not to be the case with a multi-queue network card
that can deliver packets to multiple cpus simultaneously. Each of those
cpus can be in the rxrpc_input_packet() function at the same time.

Fix by adding or changing some structure members:

(1) Add peer->rtt_input_lock to serialise access to the RTT buffer.

(2) Make conn->service_id into a 32-bit variable so that it can be
cmpxchg'd on all arches.

(3) Add call->input_lock to serialise access to the Rx/Tx state. Note
that although the Rx and Tx states are (almost) entirely separate,
there's no point completing the separation and having separate locks
since it's a bi-phasal RPC protocol rather than a bi-direction
streaming protocol. Data transmission and data reception do not take
place simultaneously on any particular call.

and making the following functional changes:

(1) In rxrpc_input_data(), hold call->input_lock around the core to
prevent simultaneous producing of packets into the Rx ring and
updating of tracking state for a particular call.

(2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
The bit test and bit clear can then be combined. No further locking
is needed here.

(3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
the ACK packet. The superseded ACK check is then done both before and
after the lock is taken.

The handing of ackinfo data is split, parsing before the lock is taken
and processing with it held. This is keyed on rxMTU being non-zero.

Congestion management is also done within the locked section.

(4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
rotation. The ACKALL packet carries no information and is only really
useful after all packets have been transmitted since it's imprecise.

(5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
prevent calls being simultaneously implicitly ended on two cpus and
also to prevent any races with incoming call setup.

(6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
on a connection. It is only permitted to happen once for a
connection.

(7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
rx->incoming_lock to see if someone else set up the call, connection
or peer whilst we were getting there. We can't trust the values from
the earlier routing check unless we pin refs on them - which we want
to avoid.

Further, we need to allow for an incoming call to have its state
changed on another CPU between us making it live and us adjusting it
because the conn is now in the RXRPC_CONN_SERVICE state.

(8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
to the RTT buffer. Don't need to lock around setting peer->rtt.

For reference, the inventory of state-accessing or state-altering functions
used by the packet input procedure is:

> rxrpc_input_packet()
* PACKET CHECKING

* ROUTING
> rxrpc_post_packet_to_local()
> rxrpc_find_connection_rcu() - uses RCU
> rxrpc_lookup_peer_rcu() - uses RCU
> rxrpc_find_service_conn_rcu() - uses RCU
> idr_find() - uses RCU

* CONNECTION-LEVEL PROCESSING
- Service upgrade
- Can only happen once per conn
! Changed to use cmpxchg
> rxrpc_post_packet_to_conn()
- Setting conn->hi_serial
- Probably safe not using locks
- Maybe use cmpxchg

* CALL-LEVEL PROCESSING
> Old-call checking
> rxrpc_input_implicit_end_call()
> rxrpc_call_completed()
> rxrpc_queue_call()
! Need to take rx->incoming_lock
> __rxrpc_disconnect_call()
> rxrpc_notify_socket()
> rxrpc_new_incoming_call()
- Uses rx->incoming_lock for the entire process
- Might be able to drop this earlier in favour of the call lock
> rxrpc_incoming_call()
! Conflicts with rxrpc_input_implicit_end_call()
> rxrpc_send_ping()
- Don't need locks to check rtt state
> rxrpc_propose_ACK

* PACKET DISTRIBUTION
> rxrpc_input_call_packet()
> rxrpc_input_data()
* QUEUE DATA PACKET ON CALL
> rxrpc_reduce_call_timer()
- Uses timer_reduce()
! Needs call->input_lock()
> rxrpc_receiving_reply()
! Needs locking around ack state
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_proto_abort()
> rxrpc_input_dup_data()
- Fills the Rx buffer
- rxrpc_propose_ACK()
- rxrpc_notify_socket()

> rxrpc_input_ack()
* APPLY ACK PACKET TO CALL AND DISCARD PACKET
> rxrpc_input_ping_response()
- Probably doesn't need any extra locking
! Need READ_ONCE() on call->ping_serial
> rxrpc_input_check_for_lost_ack()
- Takes call->lock to consult Tx buffer
> rxrpc_peer_add_rtt()
! Needs to take a lock (peer->rtt_input_lock)
! Could perhaps manage with cmpxchg() and xadd() instead
> rxrpc_input_requested_ack
- Consults Tx buffer
! Probably needs a lock
> rxrpc_peer_add_rtt()
> rxrpc_propose_ack()
> rxrpc_input_ackinfo()
- Changes call->tx_winsize
! Use cmpxchg to handle change
! Should perhaps track serial number
- Uses peer->lock to record MTU specification changes
> rxrpc_proto_abort()
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_input_soft_acks()
- Consults the Tx buffer
> rxrpc_congestion_management()
- Modifies the Tx annotations
! Needs call->input_lock()
> rxrpc_queue_call()

> rxrpc_input_abort()
* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
> rxrpc_set_call_completion()
> rxrpc_notify_socket()

> rxrpc_input_ackall()
* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()

> rxrpc_reject_packet()

There are some functions used by the above that queue the packet, after
which the procedure is terminated:

- rxrpc_post_packet_to_local()
- local->event_queue is an sk_buff_head
- local->processor is a work_struct
- rxrpc_post_packet_to_conn()
- conn->rx_queue is an sk_buff_head
- conn->processor is a work_struct
- rxrpc_reject_packet()
- local->reject_queue is an sk_buff_head
- local->processor is a work_struct

And some that offload processing to process context:

- rxrpc_notify_socket()
- Uses RCU lock
- Uses call->notify_lock to call call->notify_rx
- Uses call->recvmsg_lock to queue recvmsg side
- rxrpc_queue_call()
- call->processor is a work_struct
- rxrpc_propose_ACK()
- Uses call->lock to wrap __rxrpc_propose_ACK()

And a bunch that complete a call, all of which use call->state_lock to
protect the call state:

- rxrpc_call_completed()
- rxrpc_set_call_completion()
- rxrpc_abort_call()
- rxrpc_proto_abort()
- Also uses rxrpc_queue_call()

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>


# 64753092 08-Oct-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix connection-level abort handling

Fix connection-level abort handling to cache the abort and error codes
properly so that a new incoming call can be properly aborted if it races
with the parent connection being aborted by another CPU.

The abort_code and error parameters can then be dropped from
rxrpc_abort_calls().

Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
Signed-off-by: David Howells <dhowells@redhat.com>


# 5e33a23b 05-Oct-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix some missed refs to init_net

Fix some refs to init_net that should've been changed to the appropriate
network namespace.

Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>


# 5a790b73 04-Oct-2018 David Howells <dhowells@redhat.com>

rxrpc: Drop the local endpoint arg from rxrpc_extract_addr_from_skb()

rxrpc_extract_addr_from_skb() doesn't use the argument that points to the
local endpoint, so remove the argument.

Signed-off-by: David Howells <dhowells@redhat.com>


# 0099dc58 27-Sep-2018 David Howells <dhowells@redhat.com>

rxrpc: Make service call handling more robust

Make the following changes to improve the robustness of the code that sets
up a new service call:

(1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
service ID check and pass that along to rxrpc_new_incoming_call().
This means that I can remove the check from rxrpc_new_incoming_call()
without the need to worry about the socket attached to the local
endpoint getting replaced - which would invalidate the check.

(2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
done once. The peer is passed to rxrpc_new_incoming_call(), thereby
saving the need to repeat the search.

This also reduces the possibility of rxrpc_publish_service_conn()
BUG()'ing due to the detection of a duplicate connection, despite the
initial search done by rxrpc_find_connection_rcu() having turned up
nothing.

This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
non-reentrant and the result of the initial search should still hold
true, but it has proven possible to hit.

I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
the iteration over the hash table if it finds a matching peer with a
zero usage count, but I don't know for sure since it's only ever been
hit once that I know of.

Another possibility is that a bug in rxrpc_data_ready() that checked
the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
might've let through a packet that caused a spurious and invalid call
to be set up. That is addressed in another patch.

(3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
usage count rather than stopping and returning not found, just in case
there's another peer record behind it in the bucket.

(4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
rather either use the peer cached in (2) or, if one wasn't found,
preemptively install a new one.

Fixes: 8496af50eb38 ("rxrpc: Use RCU to access a peer's service connection tree")
Signed-off-by: David Howells <dhowells@redhat.com>


# ece64fec 27-Sep-2018 David Howells <dhowells@redhat.com>

rxrpc: Emit BUSY packets when supposed to rather than ABORTs

In the input path, a received sk_buff can be marked for rejection by
setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
(such as an abort code) in skb->priority. The rejection is handled by
queueing the sk_buff up for dealing with in process context. The output
code reads the mark and priority and, theoretically, generates an
appropriate response packet.

However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
message with a random abort code is generated (since skb->priority wasn't
set to anything).

Fix this by outputting the appropriate sort of packet.

Also, whilst we're at it, most of the marks are no longer used, so remove
them and rename the remaining two to something more obvious.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>


# c01f6c9b 01-Aug-2018 YueHaibing <yuehaibing@huawei.com>

rxrpc: Fix user call ID check in rxrpc_service_prealloc_one

There just check the user call ID isn't already in use, hence should
compare user_call_ID with xcall->user_call_ID, which is current
node's user_call_ID.

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31f5f9a16 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix apparent leak of rxrpc_local objects

rxrpc_local objects cannot be disposed of until all the connections that
point to them have been RCU'd as a connection object holds refcount on the
local endpoint it is communicating through. Currently, this can cause an
assertion failure to occur when a network namespace is destroyed as there's
no check that the RCU destructors for the connections have been run before
we start trying to destroy local endpoints.

The kernel reports:

rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5}
------------[ cut here ]------------
kernel BUG at ../net/rxrpc/local_object.c:439!

Fix this by keeping a count of the live connections and waiting for it to
go to zero at the end of rxrpc_destroy_all_connections().

Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls")
Signed-off-by: David Howells <dhowells@redhat.com>


# 09d2bf59 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Add a tracepoint to track rxrpc_local refcounting

Add a tracepoint to track reference counting on the rxrpc_local struct.

Signed-off-by: David Howells <dhowells@redhat.com>


# d3be4d24 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix potential call vs socket/net destruction race

rxrpc_call structs don't pin sockets or network namespaces, but may attempt
to access both after their refcount reaches 0 so that they can detach
themselves from the network namespace. However, there's no guarantee that
the socket still exists at this point (so sock_net(&call->socket->sk) may
be invalid) and the namespace may have gone away if the call isn't pinning
a peer.

Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
waiting for all calls to be destroyed when the network namespace goes away.

This was detected by checker:

net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
net/rxrpc/call_object.c:634:57: expected struct sock const *sk
net/rxrpc/call_object.c:634:57: got struct sock [noderef] <asn:4>*<noident>

Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>


# 88f2a825 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix checker warnings and errors

Fix various issues detected by checker.

Errors:

(*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
call->socket.

Warnings:

(*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
trace_rxrpc_conn() as the where argument.

(*) rxrpc_disconnect_client_call() should get its net pointer via the
call->conn rather than call->sock to avoid a warning about accessing
an RCU pointer without protection.

(*) Proc seq start/stop functions need annotation as they pass locks
between the functions.

False positives:

(*) Checker doesn't correctly handle of seq-retry lock context balance in
rxrpc_find_service_conn_rcu().

(*) Checker thinks execution may proceed past the BUG() in
rxrpc_publish_service_conn().

(*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
rxkad.c.

Signed-off-by: David Howells <dhowells@redhat.com>


# a25e21f0 27-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc, afs: Use debug_ids rather than pointers in traces

In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful. Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowells@redhat.com>


# 9faaff59 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls

Provide a different lockdep key for rxrpc_call::user_mutex when the call is
made on a kernel socket, such as by the AFS filesystem.

The problem is that lockdep registers a false positive between userspace
calling the sendmsg syscall on a user socket where call->user_mutex is held
whilst userspace memory is accessed whereas the AFS filesystem may perform
operations with mmap_sem held by the caller.

In such a case, the following warning is produced.

======================================================
WARNING: possible circular locking dependency detected
4.14.0-fscache+ #243 Tainted: G E
------------------------------------------------------
modpost/16701 is trying to acquire lock:
(&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]

but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++}:
__might_fault+0x61/0x89
_copy_from_iter_full+0x40/0x1fa
rxrpc_send_data+0x8dc/0xff3
rxrpc_do_sendmsg+0x62f/0x6a1
rxrpc_sendmsg+0x166/0x1b7
sock_sendmsg+0x2d/0x39
___sys_sendmsg+0x1ad/0x22b
__sys_sendmsg+0x41/0x62
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #2 (&call->user_mutex){+.+.}:
__mutex_lock+0x86/0x7d2
rxrpc_new_client_call+0x378/0x80e
rxrpc_kernel_begin_call+0xf3/0x154
afs_make_call+0x195/0x454 [kafs]
afs_vl_get_capabilities+0x193/0x198 [kafs]
afs_vl_lookup_vldb+0x5f/0x151 [kafs]
afs_create_volume+0x2e/0x2f4 [kafs]
afs_mount+0x56a/0x8d7 [kafs]
mount_fs+0x6a/0x109
vfs_kern_mount+0x67/0x135
do_mount+0x90b/0xb57
SyS_mount+0x72/0x98
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #1 (k-sk_lock-AF_RXRPC){+.+.}:
lock_sock_nested+0x74/0x8a
rxrpc_kernel_begin_call+0x8a/0x154
afs_make_call+0x195/0x454 [kafs]
afs_fs_get_capabilities+0x17a/0x17f [kafs]
afs_probe_fileserver+0xf7/0x2f0 [kafs]
afs_select_fileserver+0x83f/0x903 [kafs]
afs_fetch_status+0x89/0x11d [kafs]
afs_iget+0x16f/0x4f8 [kafs]
afs_mount+0x6c6/0x8d7 [kafs]
mount_fs+0x6a/0x109
vfs_kern_mount+0x67/0x135
do_mount+0x90b/0xb57
SyS_mount+0x72/0x98
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

-> #0 (&vnode->io_lock){+.+.}:
lock_acquire+0x174/0x19f
__mutex_lock+0x86/0x7d2
afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_fetch_data+0x80/0x12a [kafs]
afs_readpages+0x314/0x405 [kafs]
__do_page_cache_readahead+0x203/0x2ba
filemap_fault+0x179/0x54d
__do_fault+0x17/0x60
__handle_mm_fault+0x6d7/0x95c
handle_mm_fault+0x24e/0x2a3
__do_page_fault+0x301/0x486
do_page_fault+0x236/0x259
page_fault+0x22/0x30
__clear_user+0x3d/0x60
padzero+0x1c/0x2b
load_elf_binary+0x785/0xdc7
search_binary_handler+0x81/0x1ff
do_execveat_common.isra.14+0x600/0x888
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x89/0x1be
return_from_SYSCALL_64+0x0/0x75

other info that might help us debug this:

Chain exists of:
&vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&call->user_mutex);
lock(&mm->mmap_sem);
lock(&vnode->io_lock);

*** DEADLOCK ***

1 lock held by modpost/16701:
#0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486

stack backtrace:
CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
dump_stack+0x67/0x8e
print_circular_bug+0x341/0x34f
check_prev_add+0x11f/0x5d4
? add_lock_to_list.isra.12+0x8b/0x8b
? add_lock_to_list.isra.12+0x8b/0x8b
? __lock_acquire+0xf77/0x10b4
__lock_acquire+0xf77/0x10b4
lock_acquire+0x174/0x19f
? afs_begin_vnode_operation+0x33/0x77 [kafs]
__mutex_lock+0x86/0x7d2
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
? afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_begin_vnode_operation+0x33/0x77 [kafs]
afs_fetch_data+0x80/0x12a [kafs]
afs_readpages+0x314/0x405 [kafs]
__do_page_cache_readahead+0x203/0x2ba
? filemap_fault+0x179/0x54d
filemap_fault+0x179/0x54d
__do_fault+0x17/0x60
__handle_mm_fault+0x6d7/0x95c
handle_mm_fault+0x24e/0x2a3
__do_page_fault+0x301/0x486
do_page_fault+0x236/0x259
page_fault+0x22/0x30
RIP: 0010:__clear_user+0x3d/0x60
RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
padzero+0x1c/0x2b
load_elf_binary+0x785/0xdc7
search_binary_handler+0x81/0x1ff
do_execveat_common.isra.14+0x600/0x888
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x89/0x1be
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdb6009ee07
RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>


# 7b674e39 29-Aug-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix IPv6 support

Fix IPv6 support in AF_RXRPC in the following ways:

(1) When extracting the address from a received IPv4 packet, if the local
transport socket is open for IPv6 then fill out the sockaddr_rxrpc
struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
of an AF_INET one.

(2) When sending CHALLENGE or RESPONSE packets, the transport length needs
to be set from the sockaddr_rxrpc::transport_len field rather than
sizeof() on the IPv4 transport address.

(3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
the address correctly before searching for the affected peer.

Signed-off-by: David Howells <dhowells@redhat.com>


# 9a19bad7 17-Aug-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix oops when discarding a preallocated service call

rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call
it preallocates, but does add it to the rxrpc net namespace call list.
This, however, causes rxrpc_put_call() to oops when the call is discarded
when the socket is closed. rxrpc_put_call() needs the socket to be able to
reach the namespace so that it can use a lock held therein.

Fix this by setting a call's socket pointer immediately before discarding
it.

This can be triggered by unloading the kafs module, resulting in an oops
like the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: rxrpc_put_call+0x1e2/0x32d
PGD 0
P4D 0
Oops: 0000 [#1] SMP
Modules linked in: kafs(E-)
CPU: 3 PID: 3037 Comm: rmmod Tainted: G E 4.12.0-fscache+ #213
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
task: ffff8803fc92e2c0 task.stack: ffff8803fef74000
RIP: 0010:rxrpc_put_call+0x1e2/0x32d
RSP: 0018:ffff8803fef77e08 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8803fab99ac0 RCX: 000000000000000f
RDX: ffffffff81c50a40 RSI: 000000000000000c RDI: ffff8803fc92ea88
RBP: ffff8803fef77e30 R08: ffff8803fc87b941 R09: ffffffff82946d20
R10: ffff8803fef77d10 R11: 00000000000076fc R12: 0000000000000005
R13: ffff8803fab99c20 R14: 0000000000000001 R15: ffffffff816c6aee
FS: 00007f915a059700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000003fef39000 CR4: 00000000001406e0
Call Trace:
rxrpc_discard_prealloc+0x325/0x341
rxrpc_listen+0xf9/0x146
kernel_listen+0xb/0xd
afs_close_socket+0x3e/0x173 [kafs]
afs_exit+0x1f/0x57 [kafs]
SyS_delete_module+0x10f/0x19a
do_syscall_64+0x8a/0x149
entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f7aec129 14-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Cache the congestion window setting

Cache the congestion window setting that was determined during a call's
transmission phase when it finishes so that it can be used by the next call
to the same peer, thereby shortcutting the slow-start algorithm.

The value is stored in the rxrpc_peer struct and is accessed without
locking. Each call takes the value that happens to be there when it starts
and just overwrites the value when it finishes.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4722974d 05-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Implement service upgrade

Implement AuriStor's service upgrade facility. There are three problems
that this is meant to deal with:

(1) Various of the standard AFS RPC calls have IPv4 addresses in their
requests and/or replies - but there's no room for including IPv6
addresses.

(2) Definition of IPv6-specific RPC operations in the standard operation
sets has not yet been achieved.

(3) One could envision the creation a new service on the same port that as
the original service. The new service could implement improved
operations - and the client could try this first, falling back to the
original service if it's not there.

Unfortunately, certain servers ignore packets addressed to a service
they don't implement and don't respond in any way - not even with an
ABORT. This means that the client must then wait for the call timeout
to occur.

What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats. Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.

In the protocol, the procedure is then as follows:

(1) To request an upgrade, the first DATA packet in a new connection must
have the userStatus set to 1 (this is normally 0). The userStatus
value is normally ignored by the server.

(2) If the server doesn't support upgrading, the reply packets will
contain the same service ID as for the first request packet.

(3) If the server does support upgrading, all future reply packets on that
connection will contain the new service ID and the new service ID will
be applied to *all* further calls on that connection as well.

(4) The RPC op used to probe the upgrade must take the same request data
as the shadow call in the upgrade set (but may return a different
reply). GetCapability RPC ops were added to all standard sets for
just this purpose. Ops where the request formats differ cannot be
used for probing.

(5) The client must wait for completion of the probe before sending any
further RPC ops to the same destination. It should then use the
service ID that recvmsg() reported back in all future calls.

(6) The shadow service must have call definitions for all the operation
IDs defined by the original service.


To support service upgrading, a server should:

(1) Call bind() twice on its AF_RXRPC socket before calling listen().
Each bind() should supply a different service ID, but the transport
addresses must be the same. This allows the server to receive
requests with either service ID.

(2) Enable automatic upgrading by calling setsockopt(), specifying
RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
unsigned shorts as the argument:

unsigned short optval[2];

This specifies a pair of service IDs. They must be different and must
match the service IDs bound to the socket. Member 0 is the service ID
to upgrade from and member 1 is the service ID to upgrade to.

Signed-off-by: David Howells <dhowells@redhat.com>


# 28036f44 05-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Permit multiple service binding

Permit bind() to be called on an AF_RXRPC socket more than once (currently
maximum twice) to bind multiple listening services to it. There are some
restrictions:

(1) All bind() calls involved must have a non-zero service ID.

(2) The service IDs must all be different.

(3) The rest of the address (notably the transport part) must be the same
in all (a single UDP socket is shared).

(4) This must be done before listen() or sendmsg() is called.

This allows someone to connect to the service socket with different service
IDs and lays the foundation for service upgrading.

The service ID used by an incoming call can be extracted from the msg_name
returned by recvmsg().

Signed-off-by: David Howells <dhowells@redhat.com>


# 2baec2c3 24-May-2017 David Howells <dhowells@redhat.com>

rxrpc: Support network namespacing

Support network namespacing in AF_RXRPC with the following changes:

(1) All the local endpoint, peer and call lists, locks, counters, etc. are
moved into the per-namespace record.

(2) All the connection tracking is moved into the per-namespace record
with the exception of the client connection ID tree, which is kept
global so that connection IDs are kept unique per-machine.

(3) Each namespace gets its own epoch. This allows each network namespace
to pretend to be a separate client machine.

(4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
the contents reflect the namespace.

fs/afs/ should be okay with this patch as it explicitly requires the current
net namespace to be init_net to permit a mount to proceed at the moment. It
will, however, need updating so that cells, IP addresses and DNS records are
per-namespace also.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a92789a 06-Apr-2017 David Howells <dhowells@redhat.com>

rxrpc: Use negative error codes in rxrpc_call struct

Use negative error codes in struct rxrpc_call::error because that's what
the kernel normally deals with and to make the code consistent. We only
turn them positive when transcribing into a cmsg for userspace recvmsg.

Signed-off-by: David Howells <dhowells@redhat.com>


# 540b1c48 27-Feb-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix deadlock between call creation and sendmsg/recvmsg

All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:

(1) If a number of calls on the same socket are in the process of
connection to the same peer, a maximum of four concurrent live calls
are permitted before further calls need to wait for a slot.

(2) If a call is waiting for a slot, it is deep inside sendmsg() or
rxrpc_kernel_begin_call() and the entry function is holding the socket
lock.

(3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
from servicing the other calls as they need to take the socket lock to
do so.

(4) The socket is stuck until a call is aborted and makes its slot
available to the waiter.

Fix this by:

(1) Provide each call with a mutex ('user_mutex') that arbitrates access
by the users of rxrpc separately for each specific call.

(2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
they've got a call and taken its mutex.

Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
set but someone else has the lock. Should I instead only return
EWOULDBLOCK if there's nothing currently to be done on a socket, and
sleep in this particular instance because there is something to be
done, but we appear to be blocked by the interrupt handler doing its
ping?

(3) Make rxrpc_new_client_call() unlock the socket after allocating a new
call, locking its user mutex and adding it to the socket's call tree.
The call is returned locked so that sendmsg() can add data to it
immediately.

From the moment the call is in the socket tree, it is subject to
access by sendmsg() and recvmsg() - even if it isn't connected yet.

(4) Lock new service calls in the UDP data_ready handler (in
rxrpc_new_incoming_call()) because they may already be in the socket's
tree and the data_ready handler makes them live immediately if a user
ID has already been preassigned.

Note that the new call is locked before any notifications are sent
that it is live, so doing mutex_trylock() *ought* to always succeed.
Userspace is prevented from doing sendmsg() on calls that are in a
too-early state in rxrpc_do_sendmsg().

(5) Make rxrpc_new_incoming_call() return the call with the user mutex
held so that a ping can be scheduled immediately under it.

Note that it might be worth moving the ping call into
rxrpc_new_incoming_call() and then we can drop the mutex there.

(6) Make rxrpc_accept_call() take the lock on the call it is accepting and
release the socket after adding the call to the socket's tree. This
is slightly tricky as we've dequeued the call by that point and have
to requeue it.

Note that requeuing emits a trace event.

(7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
new mutex immediately and don't bother with the socket mutex at all.

This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.

Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.

We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.c.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 210f0353 05-Jan-2017 David Howells <dhowells@redhat.com>

rxrpc: Allow listen(sock, 0) to be used to disable listening

Allow listen() with a backlog of 0 to be used to disable listening on an
AF_RXRPC socket. This also releases any preallocation, thereby making it
easier for a kernel service to account for all allocated call structures
when shutting down the service.

The socket cannot thereafter have listening reenabled, but must rather be
closed and reopened.

Signed-off-by: David Howells <dhowells@redhat.com>


# 26cb02aa 06-Oct-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix warning by splitting rxrpc_send_call_packet()

Split rxrpc_send_data_packet() to separate ACK generation (which is more
complicated) from ABORT generation. This simplifies the code a bit and
fixes the following warning:

In file included from ../net/rxrpc/output.c:20:0:
net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/rxrpc/output.c:103:24: note: 'top' was declared here
net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>


# 7212a57e 06-Oct-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix oops on incoming call to serviceless endpoint

If an call comes in to a local endpoint that isn't listening for any
incoming calls at the moment, an oops will happen. We need to check that
the local endpoint's service pointer isn't NULL before we dereference it.

Signed-off-by: David Howells <dhowells@redhat.com>


# 1e9e5c95 29-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Reduce the rxrpc_local::services list to a pointer

Reduce the rxrpc_local::services list to just a pointer as we don't permit
multiple service endpoints to bind to a single transport endpoints (this is
excluded by rxrpc_lookup_local()).

The reason we don't allow this is that if you send a request to an AFS
filesystem service, it will try to talk back to your cache manager on the
port you sent from (this is how file change notifications are handled). To
prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
sockets share a UDP socket if at least one of them has a service bound.

Signed-off-by: David Howells <dhowells@redhat.com>


# 58dc63c9 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add a tracepoint to follow packets in the Rx buffer

Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.

Signed-off-by: David Howells <dhowells@redhat.com>


# 363deeab 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add connection tracepoint and client conn state tracepoint

Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.

Signed-off-by: David Howells <dhowells@redhat.com>


# e6f3afb3 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Record calls that need to be accepted

Record calls that need to be accepted using sk_acceptq_added() otherwise
the backlog counter goes negative because sk_acceptq_removed() is called.
This causes the preallocator to malfunction.

Calls that are preaccepted by AFS within the kernel aren't affected by
this.

Signed-off-by: David Howells <dhowells@redhat.com>


# 3432a757 13-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix prealloc refcounting

The preallocated call buffer holds a ref on the calls within that buffer.
The ref was being released in the wrong place - it worked okay for incoming
calls to the AFS cache manager service, but doesn't work right for incoming
calls to a userspace service.

Instead of releasing an extra ref service calls in rxrpc_release_call(),
the ref needs to be released during the acceptance/rejectance process. To
this end:

(1) The prealloc ref is now normally released during
rxrpc_new_incoming_call().

(2) For preallocated kernel API calls, the kernel API's ref needs to be
released when the call is discarded on socket close.

(3) We shouldn't take a second ref in rxrpc_accept_call().

(4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds
the call to the to_be_accepted socket queue.

In doing (4) above, we would prefer not to put the call's refcount down to
0 as that entails doing cleanup in softirq context, but it's unlikely as
there are several refs held elsewhere, at least one of which must be put by
someone in process context calling rxrpc_release_call(). However, it's not
a problem if we do have to do that.

Signed-off-by: David Howells <dhowells@redhat.com>


# cbd00891 13-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Adjust the call ref tracepoint to show kernel API refs

Adjust the call ref tracepoint to show references held on a call by the
kernel API separately as much as possible and add an additional trace to at
the allocation point from the preallocation buffer for an incoming call.

Note that this doesn't show the allocation of a client call for the kernel
separately at the moment.

Signed-off-by: David Howells <dhowells@redhat.com>


# b25de360 13-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add missing unlock in rxrpc_call_accept()

Add a missing unlock in rxrpc_call_accept() in the path taken if there's no
call to wake up.

Signed-off-by: David Howells <dhowells@redhat.com>


# 248f219c 08-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Rewrite the data and ack handling code

Rewrite the data and ack handling code such that:

(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).

(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.

(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.

(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).

(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).

(6) sk_buffs no longer take a ref on their parent call.

To make this work, the following changes are made:

(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.

(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.

Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.

(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).

The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.

The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.

Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.

Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.

(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around

(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.

However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.

(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.

Additional changes:

(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).

(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.

(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.

Future additional changes that will need to be considered:

(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.

(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.

Signed-off-by: David Howells <dhowells@redhat.com>


# 00e90712 08-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Preallocate peers, conns and calls for incoming service requests

Make it possible for the data_ready handler called from the UDP transport
socket to completely instantiate an rxrpc_call structure and make it
immediately live by preallocating all the memory it might need. The idea
is to cut out the background thread usage as much as possible.

[Note that the preallocated structs are not actually used in this patch -
that will be done in a future patch.]

If insufficient resources are available in the preallocation buffers, it
will be possible to discard the DATA packet in the data_ready handler or
schedule a BUSY packet without the need to schedule an attempt at
allocation in a background thread.

To this end:

(1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
maximum number each of the listen backlog size. The backlog size is
limited to a maxmimum of 32. Only this many of each can be in the
preallocation buffer.

(2) For userspace sockets, the preallocation is charged initially by
listen() and will be recharged by accepting or rejecting pending
new incoming calls.

(3) For kernel services {,re,dis}charging of the preallocation buffers is
handled manually. Two notifier callbacks have to be provided before
kernel_listen() is invoked:

(a) An indication that a new call has been instantiated. This can be
used to trigger background recharging.

(b) An indication that a call is being discarded. This is used when
the socket is being released.

A function, rxrpc_kernel_charge_accept() is called by the kernel
service to preallocate a single call. It should be passed the user ID
to be used for that call and a callback to associate the rxrpc call
with the kernel service's side of the ID.

(4) Discard the preallocation when the socket is closed.

(5) Temporarily bump the refcount on the call allocated in
rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
preallocation ref on service calls unconditionally. This will no
longer be necessary once the preallocation is used.

Note that this does not yet control the number of active service calls on a
client - that will come in a later patch.

A future development would be to provide a setsockopt() call that allows a
userspace server to manually charge the preallocation buffer. This would
allow user call IDs to be provided in advance and the awkward manual accept
stage to be bypassed.

Signed-off-by: David Howells <dhowells@redhat.com>


# de8d6c74 08-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Convert rxrpc_local::services to an hlist

Convert the rxrpc_local::services list to an hlist so that it can be
accessed under RCU conditions more readily.

Signed-off-by: David Howells <dhowells@redhat.com>


# 8d94aa38 07-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Calls shouldn't hold socket refs

rxrpc calls shouldn't hold refs on the sock struct. This was done so that
the socket wouldn't go away whilst the call was in progress, such that the
call could reach the socket's queues.

However, we can mark the socket as requiring an RCU release and rely on the
RCU read lock.

To make this work, we do:

(1) rxrpc_release_call() removes the call's call user ID. This is now
only called from socket operations and not from the call processor:

rxrpc_accept_call() / rxrpc_kernel_accept_call()
rxrpc_reject_call() / rxrpc_kernel_reject_call()
rxrpc_kernel_end_call()
rxrpc_release_calls_on_socket()
rxrpc_recvmsg()

Though it is also called in the cleanup path of
rxrpc_accept_incoming_call() before we assign a user ID.

(2) Pass the socket pointer into rxrpc_release_call() rather than getting
it from the call so that we can get rid of uninitialised calls.

(3) Fix call processor queueing to pass a ref to the work queue and to
release that ref at the end of the processor function (or to pass it
back to the work queue if we have to requeue).

(4) Skip out of the call processor function asap if the call is complete
and don't requeue it if the call is complete.

(5) Clean up the call immediately that the refcount reaches 0 rather than
trying to defer it. Actual deallocation is deferred to RCU, however.

(6) Don't hold socket refs for allocated calls.

(7) Use the RCU read lock when queueing a message on a socket and treat
the call's socket pointer according to RCU rules and check it for
NULL.

We also need to use the RCU read lock when viewing a call through
procfs.

(8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
if this hasn't been done yet so that we can then disconnect the call.
Once the call is disconnected, it won't have any access to the
connection struct and the UDP socket for the call work processor to be
able to send the ACK. Terminal retransmission will be handled by the
connection processor.

(9) Release all calls immediately on the closing of a socket rather than
trying to defer this. Incomplete calls will be aborted.

The call refcount model is much simplified. Refs are held on the call by:

(1) A socket's user ID tree.

(2) A socket's incoming call secureq and acceptq.

(3) A kernel service that has a call in progress.

(4) A queued call work processor. We have to take care to put any call
that we failed to queue.

(5) sk_buffs on a socket's receive queue. A future patch will get rid of
this.

Whilst we're at it, we can do:

(1) Get rid of the RXRPC_CALL_EV_RELEASE event. Release is now done
entirely from the socket routines and never from the call's processor.

(2) Get rid of the RXRPC_CALL_DEAD state. Calls now end in the
RXRPC_CALL_COMPLETE state.

(3) Get rid of the rxrpc_call::destroyer work item. Calls are now torn
down when their refcount reaches 0 and then handed over to RCU for
final cleanup.

(4) Get rid of the rxrpc_call::deadspan timer. Calls are cleaned up
immediately they're finished with and don't hang around.
Post-completion retransmission is handled by the connection processor
once the call is disconnected.

(5) Get rid of the dead call expiry setting as there's no longer a timer
to set.

(6) rxrpc_destroy_all_calls() can just check that the call list is empty.

Signed-off-by: David Howells <dhowells@redhat.com>


# fff72429 07-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Improve the call tracking tracepoint

Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:

(1) Getting and putting refs for the socket call user ID tree.

(2) Getting and putting refs for queueing and failing to queue the call
processor work item.

Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.

An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.

Signed-off-by: David Howells <dhowells@redhat.com>


# d001648e 30-Aug-2016 David Howells <dhowells@redhat.com>

rxrpc: Don't expose skbs to in-kernel users [ver #2]

Don't expose skbs to in-kernel users, such as the AFS filesystem, but
instead provide a notification hook the indicates that a call needs
attention and another that indicates that there's a new call to be
collected.

This makes the following possibilities more achievable:

(1) Call refcounting can be made simpler if skbs don't hold refs to calls.

(2) skbs referring to non-data events will be able to be freed much sooner
rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
will be able to consult the call state.

(3) We can shortcut the receive phase when a call is remotely aborted
because we don't have to go through all the packets to get to the one
cancelling the operation.

(4) It makes it easier to do encryption/decryption directly between AFS's
buffers and sk_buffs.

(5) Encryption/decryption can more easily be done in the AFS's thread
contexts - usually that of the userspace process that issued a syscall
- rather than in one of rxrpc's background threads on a workqueue.

(6) AFS will be able to wait synchronously on a call inside AF_RXRPC.

To make this work, the following interface function has been added:

int rxrpc_kernel_recv_data(
struct socket *sock, struct rxrpc_call *call,
void *buffer, size_t bufsize, size_t *_offset,
bool want_more, u32 *_abort_code);

This is the recvmsg equivalent. It allows the caller to find out about the
state of a specific call and to transfer received data into a buffer
piecemeal.

afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
logic between them. They don't wait synchronously yet because the socket
lock needs to be dealt with.

Five interface functions have been removed:

rxrpc_kernel_is_data_last()
rxrpc_kernel_get_abort_code()
rxrpc_kernel_get_error_number()
rxrpc_kernel_free_skb()
rxrpc_kernel_data_consumed()

As a temporary hack, sk_buffs going to an in-kernel call are queued on the
rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
in-kernel user. To process the queue internally, a temporary function,
temp_deliver_data() has been added. This will be replaced with common code
between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
future patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e34d4234 30-Aug-2016 David Howells <dhowells@redhat.com>

rxrpc: Trace rxrpc_call usage

Add a trace event for debuging rxrpc_call struct usage.

Signed-off-by: David Howells <dhowells@redhat.com>


# f5c17aae 30-Aug-2016 David Howells <dhowells@redhat.com>

rxrpc: Calls should only have one terminal state

Condense the terminal states of a call state machine to a single state,
plus a separate completion type value. The value is then set, along with
error and abort code values, only when the call is transitioned to the
completion state.

Helpers are provided to simplify this.

Signed-off-by: David Howells <dhowells@redhat.com>


# df844fd4 23-Aug-2016 David Howells <dhowells@redhat.com>

rxrpc: Use a tracepoint for skb accounting debugging

Use a tracepoint to log various skb accounting points to help in debugging
refcounting errors.

Signed-off-by: David Howells <dhowells@redhat.com>


# 372ee163 03-Aug-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix races between skb free, ACK generation and replying

Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().

(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.

The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.

(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.

(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.

(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.

(6) There are many bits of unmarshalling code where:

ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}

is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:

ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;

(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:

if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;

becomes:

ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;

(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
0000000000000006 000000000be04930 0000000000000000 ffff880400000000
ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
[<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
[<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
[<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
[<ffffffff810915f4>] lock_acquire+0x122/0x1b6
[<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff814c928f>] skb_dequeue+0x18/0x61
[<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
[<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
[<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
[<ffffffff81063a3a>] process_one_work+0x29d/0x57c
[<ffffffff81064ac2>] worker_thread+0x24a/0x385
[<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
[<ffffffff810696f5>] kthread+0xf3/0xfb
[<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
[<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d991b4a3 29-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Move peer lookup from call-accept to new-incoming-conn

Move the lookup of a peer from a call that's being accepted into the
function that creates a new incoming connection. This will allow us to
avoid incrementing the peer's usage count in some cases in future.

Note that I haven't bother to integrate rxrpc_get_addr_from_skb() with
rxrpc_extract_addr_from_skb() as I'm going to delete the former in the very
near future.

Signed-off-by: David Howells <dhowells@redhat.com>


# 2c4579e4 27-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Move usage count getting into rxrpc_queue_conn()

Rather than calling rxrpc_get_connection() manually before calling
rxrpc_queue_conn(), do it inside the queue wrapper.

This allows us to do some important fixes:

(1) If the usage count is 0, do nothing. This prevents connections from
being reanimated once they're dead.

(2) If rxrpc_queue_work() fails because the work item is already queued,
retract the usage count increment which would otherwise be lost.

(3) Don't take a ref on the connection in the work function. By passing
the ref through the work item, this is unnecessary. Doing it in the
work function is too late anyway. Previously, connection-directed
packets held a ref on the connection, but that's not really the best
idea.

And another useful changes:

(*) Don't need to take a refcount on the connection in the data_ready
handler unless we invoke the connection's work item. We're using RCU
there so that's otherwise redundant.

Signed-off-by: David Howells <dhowells@redhat.com>


# bba304db 27-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Turn connection #defines into enums and put outside struct def

Turn the connection event and state #define lists into enums and move
outside of the struct definition.

Whilst we're at it, change _SERVER to _SERVICE in those identifiers and add
EV_ into the event name to distinguish them from flags and states.

Also add a symbol indicating the number of states and use that in the state
text array.

Signed-off-by: David Howells <dhowells@redhat.com>


# aa390bbe 17-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Kill off the rxrpc_transport struct

The rxrpc_transport struct is now redundant, given that the rxrpc_peer
struct is now per peer port rather than per peer host, so get rid of it.

Service connection lists are transferred to the rxrpc_peer struct, as is
the conn_lock. Previous patches moved the client connection handling out
of the rxrpc_transport struct and discarded the connection bundling code.

Signed-off-by: David Howells <dhowells@redhat.com>


# 5627cc8b 04-Apr-2016 David Howells <dhowells@redhat.com>

rxrpc: Provide more refcount helper functions

Provide refcount helper functions for connections so that the code doesn't
touch local or connection usage counts directly.

Also make it such that local and peer put functions can take a NULL
pointer.

Signed-off-by: David Howells <dhowells@redhat.com>


# 42886ffe 16-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Pass sk_buff * rather than rxrpc_host_header * to functions

Pass a pointer to struct sk_buff rather than struct rxrpc_host_header to
functions so that they can in the future get at transport protocol parameters
rather than just RxRPC parameters.

Signed-off-by: David Howells <dhowells@redhat.com>


# 0e4699e4 18-Jun-2016 Dan Carpenter <dan.carpenter@oracle.com>

rxrpc: checking for IS_ERR() instead of NULL

rxrpc_lookup_peer_rcu() and rxrpc_lookup_peer() return NULL on error, never
error pointers, so IS_ERR() can't be used.

Fix three callers of those functions.

Fixes: be6e6707f6ee ('rxrpc: Rework peer object handling to use hash table and RCU')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>


# 4f95dd78 04-Apr-2016 David Howells <dhowells@redhat.com>

rxrpc: Rework local endpoint management

Rework the local RxRPC endpoint management.

Local endpoint objects are maintained in a flat list as before. This
should be okay as there shouldn't be more than one per open AF_RXRPC socket
(there can be fewer as local endpoints can be shared if their local service
ID is 0 and they share the same local transport parameters).

Changes:

(1) Local endpoints may now only be shared if they have local service ID 0
(ie. they're not being used for listening).

This prevents a scenario where process A is listening of the Cache
Manager port and process B contacts a fileserver - which may then
attempt to send CM requests back to B. But if A and B are sharing a
local endpoint, A will get the CM requests meant for B.

(2) We use a mutex to handle lookups and don't provide RCU-only lookups
since we only expect to access the list when opening a socket or
destroying an endpoint.

The local endpoint object is pointed to by the transport socket's
sk_user_data for the life of the transport socket - allowing us to
refer to it directly from the sk_data_ready and sk_error_report
callbacks.

(3) atomic_inc_not_zero() now exists and can be used to only share a local
endpoint if the last reference hasn't yet gone.

(4) We can remove rxrpc_local_lock - a spinlock that had to be taken with
BH processing disabled given that we assume sk_user_data won't change
under us.

(5) The transport socket is shut down before we clear the sk_user_data
pointer so that we can be sure that the transport socket's callbacks
won't be invoked once the RCU destruction is scheduled.

(6) Local endpoints have a work item that handles both destruction and
event processing. The means that destruction doesn't then need to
wait for event processing. The event queues can then be cleared after
the transport socket is shut down.

(7) Local endpoints are no longer available for resurrection beyond the
life of the sockets that had them open. As soon as their last ref
goes, they are scheduled for destruction and may not have their usage
count moved from 0.

Signed-off-by: David Howells <dhowells@redhat.com>


# be6e6707 04-Apr-2016 David Howells <dhowells@redhat.com>

rxrpc: Rework peer object handling to use hash table and RCU

Rework peer object handling to use a hash table instead of a flat list and
to use RCU. Peer objects are no longer destroyed by passing them to a
workqueue to process, but rather are just passed to the RCU garbage
collector as kfree'able objects.

The hash function uses the local endpoint plus all the components of the
remote address, except for the RxRPC service ID. Peers thus represent a
UDP port on the remote machine as contacted by a UDP port on this machine.

The RCU read lock is used to handle non-creating lookups so that they can
be called from bottom half context in the sk_error_report handler without
having to lock the hash table against modification.
rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
the future, this will be passed to a work item for error distribution in
the error_report path and this function will cease being used in the
data_ready path.

Creating lookups are done under spinlock rather than mutex as they might be
set up due to an external stimulus if the local endpoint is a server.

Captured network error messages (ICMP) are handled with respect to this
struct and MTU size and RTT are cached here.

Signed-off-by: David Howells <dhowells@redhat.com>


# 8c3e34a4 12-Jun-2016 David Howells <dhowells@redhat.com>

rxrpc: Rename files matching ar-*.c to git rid of the "ar-" prefix

Rename files matching net/rxrpc/ar-*.c to get rid of the "ar-" prefix.
This will aid splitting those files by making easier to come up with new
names.

Note that the not all files are simply renamed from ar-X.c to X.c. The
following exceptions are made:

(*) ar-call.c -> call_object.c
ar-ack.c -> call_event.c

call_object.c is going to contain the core of the call object
handling. Call event handling is all going to be in call_event.c.

(*) ar-accept.c -> call_accept.c

Incoming call handling is going to be here.

(*) ar-connection.c -> conn_object.c
ar-connevent.c -> conn_event.c

The former file is going to have the basic connection object handling,
but there will likely be some differentiation between client
connections and service connections in additional files later. The
latter file will have all the connection-level event handling.

(*) ar-local.c -> local_object.c

This will have the local endpoint object handling code. The local
endpoint event handling code will later be split out into
local_event.c.

(*) ar-peer.c -> peer_object.c

This will have the peer endpoint object handling code. Peer event
handling code will be placed in peer_event.c (for the moment, there is
none).

(*) ar-error.c -> peer_event.c

This will become the peer event handling code, though for the moment
it's actually driven from the local endpoint's perspective.

Note that I haven't renamed ar-transport.c to transport_object.c as the
intention is to delete it when the rxrpc_transport struct is excised.

The only file that actually has its contents changed is net/rxrpc/Makefile.

net/rxrpc/ar-internal.h will need its section marker comments updating, but
I'll do that in a separate patch to make it easier for git to follow the
history across the rename. I may also want to rename ar-internal.h at some
point - but that would mean updating all the #includes and I'd rather do
that in a separate step.

Signed-off-by: David Howells <dhowells@redhat.com.