#
49489bb0 |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags Switch from keeping the transmission buffers in the rxrpc_txbuf struct and allocated from the slab, to allocating them using page fragment allocators (which uses raw pages), thereby allowing them to be passed to MSG_SPLICE_PAGES and avoid copying into the UDP buffers. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
61e4a866 |
|
26-Oct-2023 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix two connection reaping bugs Fix two connection reaping bugs: (1) rxrpc_connection_expiry is in units of seconds, so rxrpc_disconnect_call() needs to multiply it by HZ when adding it to jiffies. (2) rxrpc_client_conn_reap_timeout() should set RXRPC_CLIENT_REAP_TIMER if local->kill_all_client_conns is clear, not if it is set (in which case we don't need the timer). Without this, old client connections don't get cleaned up until the local endpoint is cleaned up. Fixes: 5040011d073d ("rxrpc: Make the local endpoint hold a ref on a connected call") Fixes: 0d6bf319bc5a ("rxrpc: Move the client conn cache management to the I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/783911.1698364174@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9d35d880 |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move client call connection to the I/O thread Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
0d6bf319 |
|
02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the client conn cache management to the I/O thread Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
1bab27af |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f06cb291 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Make the set of connection IDs per local endpoint Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f2cce89a |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement a mechanism to send an event notification to a connection Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5040011d |
|
02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Make the local endpoint hold a ref on a connected call Make the local endpoint and it's I/O thread hold a reference on a connected call until that call is disconnected. Without this, we're reliant on either the AF_RXRPC socket to hold a ref (which is dropped when the call is released) or a queued work item to hold a ref (the work item is being replaced with the I/O thread). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5e6ef4f1 |
|
23-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Make the I/O thread take over the call and local processor work Move the functions from the call->processor and local->processor work items into the domain of the I/O thread. The call event processor, now called from the I/O thread, then takes over the job of cranking the call state machine, processing incoming packets and transmitting DATA, ACK and ABORT packets. In a future patch, rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it for later transmission. The call event processor becomes purely received-skb driven. It only transmits things in response to events. We use "pokes" to queue a dummy skb to make it do things like start/resume transmitting data. Timer expiry also results in pokes. The connection event processor, becomes similar, though crypto events, such as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item to avoid doing crypto in the I/O thread. The local event processor is removed and VERSION response packets are generated directly from the packet parser. Similarly, ABORTs generated in response to protocol errors will be transmitted immediately rather than being pushed onto a queue for later transmission. Changes: ======== ver #2) - Fix a couple of introduced lock context imbalances. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
393a2a20 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Extract the peer address from an incoming packet earlier Extract the peer address from an incoming packet earlier, at the beginning of rxrpc_input_packet() and thence pass a pointer to it to various functions that use it as part of the lookup rather than doing it on several separate paths. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
29fb4ec3 |
|
12-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove RCU from peer->error_targets list Remove the RCU requirements from the peer's list of error targets so that the error distributor can call sleeping functions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
3cec055c |
|
24-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't hold a ref for connection workqueue Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). (1) Don't give a ref to the work item. (2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this. (3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end. (4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection. (5) Make sure that the cleanup routine waits for the work item to complete. (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
fa3492ab |
|
02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace rxrpc_bundle refcount Add a tracepoint for the rxrpc_bundle refcounting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
7fa25105 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
47c810a7 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
0fde882f |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
2cc80086 |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly. These are going to get filled in from the rxrpc_call struct in future. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
e969c92c |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the [_k]net() debugging macros Remove the _net() and knet() debugging macros in favour of tracepoints. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
1fc4fa2a |
|
03-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix congestion management rxrpc has a problem in its congestion management in that it saves the congestion window size (cwnd) from one call to another, but if this is 0 at the time is saved, then the next call may not actually manage to ever transmit anything. To this end: (1) Don't save cwnd between calls, but rather reset back down to the initial cwnd and re-enter slow-start if data transmission is idle for more than an RTT. (2) Preserve ssthresh instead, as that is a handy estimate of pipe capacity. Knowing roughly when to stop slow start and enter congestion avoidance can reduce the tendency to overshoot and drop larger amounts of packets when probing. In future, cwind growth also needs to be constrained when the window isn't being filled due to being application limited. Reported-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5d7edbc9 |
|
27-Aug-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Get rid of the Rx ring Get rid of the Rx ring and replace it with a pair of queues instead. One queue gets the packets that are in-sequence and are ready for processing by recvmsg(); the other queue gets the out-of-sequence packets for addition to the first queue as the holes get filled. The annotation ring is removed and replaced with a SACK table. The SACK table has the bits set that correspond exactly to the sequence number of the packet being acked. The SACK ring is copied when an ACK packet is being assembled and rotated so that the first ACK is in byte 0. Flow control handling is altered so that packets that are moved to the in-sequence queue are hard-ACK'd even before they're consumed - and then the Rx window size in the ACK packet (rsize) is shrunk down to compensate (even going to 0 if the window is full). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
de696c47 |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc, afs: Fix selection of abort codes The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0575429 |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Use refcount_t rather than atomic_t Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7d775b1 |
|
15-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Ask the security class how much space to allow in a packet Ask the security class how much header and trailer space to allow for when allocating a packet, given how much data is remaining. This will allow the rxgk security class to stick both a trailer in as well as a header as appropriate in the future. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ec832bd0 |
|
16-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't retain the server key in the connection Don't retain a pointer to the server key in the connection, but rather get it on demand when the server has to deal with a response packet. This is necessary to implement RxGK (GSSAPI-mediated transport class), where we can't know which key we'll need until we've challenged the client and got back the response. This also means that we don't need to do a key search in the accept path in softirq mode. Also, whilst we're at it, allow the security class to ask for a kvno and encoding-type variant of a server key as RxGK needs different keys for different encoding types. Keys of this type have an extra bit in the description: "<service-id>:<security-index>:<kvno>:<enctype>" Signed-off-by: David Howells <dhowells@redhat.com>
|
#
245500d8 |
|
01-Jul-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Rewrite the client connection manager Rewrite the rxrpc client connection manager so that it can support multiple connections for a given security key to a peer. The following changes are made: (1) For each open socket, the code currently maintains an rbtree with the connections placed into it, keyed by communications parameters. This is tricky to maintain as connections can be culled from the tree or replaced within it. Connections can require replacement for a number of reasons, e.g. their IDs span too great a range for the IDR data type to represent efficiently, the call ID numbers on that conn would overflow or the conn got aborted. This is changed so that there's now a connection bundle object placed in the tree, keyed on the same parameters. The bundle, however, does not need to be replaced. (2) An rxrpc_bundle object can now manage the available channels for a set of parallel connections. The lock that manages this is moved there from the rxrpc_connection struct (channel_lock). (3) There'a a dummy bundle for all incoming connections to share so that they have a channel_lock too. It might be better to give each incoming connection its own bundle. This bundle is not needed to manage which channels incoming calls are made on because that's the solely at whim of the client. (4) The restrictions on how many client connections are around are removed. Instead, a previous patch limits the number of client calls that can be allocated. Ordinarily, client connections are reaped after 2 minutes on the idle queue, but when more than a certain number of connections are in existence, the reaper starts reaping them after 2s of idleness instead to get the numbers back down. It could also be made such that new call allocations are forced to wait until the number of outstanding connections subsides. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
65550098 |
|
28-Jul-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix race between recvmsg and sendmsg on immediate call failure There's a race between rxrpc_sendmsg setting up a call, but then failing to send anything on it due to an error, and recvmsg() seeing the call completion occur and trying to return the state to the user. An assertion fails in rxrpc_recvmsg() because the call has already been released from the socket and is about to be released again as recvmsg deals with it. (The recvmsg_q queue on the socket holds a ref, so there's no problem with use-after-free.) We also have to be careful not to end up reporting an error twice, in such a way that both returns indicate to userspace that the user ID supplied with the call is no longer in use - which could cause the client to malfunction if it recycles the user ID fast enough. Fix this by the following means: (1) When sendmsg() creates a call after the point that the call has been successfully added to the socket, don't return any errors through sendmsg(), but rather complete the call and let recvmsg() retrieve them. Make sendmsg() return 0 at this point. Further calls to sendmsg() for that call will fail with ESHUTDOWN. Note that at this point, we haven't send any packets yet, so the server doesn't yet know about the call. (2) If sendmsg() returns an error when it was expected to create a new call, it means that the user ID wasn't used. (3) Mark the call disconnected before marking it completed to prevent an oops in rxrpc_release_call(). (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate that the user ID is no longer known by the kernel. An oops like the following is produced: kernel BUG at net/rxrpc/recvmsg.c:605! ... RIP: 0010:rxrpc_recvmsg+0x256/0x5ae ... Call Trace: ? __init_waitqueue_head+0x2f/0x2f ____sys_recvmsg+0x8a/0x148 ? import_iovec+0x69/0x9c ? copy_msghdr_from_user+0x5c/0x86 ___sys_recvmsg+0x72/0xaa ? __fget_files+0x22/0x57 ? __fget_light+0x46/0x51 ? fdget+0x9/0x1b do_recvmmsg+0x15e/0x232 ? _raw_spin_unlock+0xa/0xb ? vtime_delta+0xf/0x25 __x64_sys_recvmmsg+0x2c/0x2f do_syscall_64+0x4c/0x78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()") Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b39a934e |
|
06-Feb-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix service call disconnection The recent patch that substituted a flag on an rxrpc_call for the connection pointer being NULL as an indication that a call was disconnected puts the set_bit in the wrong place for service calls. This is only a problem if a call is implicitly terminated by a new call coming in on the same connection channel instead of a terminating ACK packet. In such a case, rxrpc_input_implicit_end_call() calls __rxrpc_disconnect_call(), which is now (incorrectly) setting the disconnection bit, meaning that when rxrpc_release_call() is later called, it doesn't call rxrpc_disconnect_call() and so the call isn't removed from the peer's error distribution list and the list gets corrupted. KASAN finds the issue as an access after release on a call, but the position at which it occurs is confusing as it appears to be related to a different call (the call site is where the latter call is being removed from the error distribution list and either the next or pprev pointer points to a previously released call). Fix this by moving the setting of the flag from __rxrpc_disconnect_call() to rxrpc_disconnect_call() in the same place that the connection pointer was being cleared. Fixes: 5273a191dca6 ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5273a191 |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect When a call is disconnected, the connection pointer from the call is cleared to make sure it isn't used again and to prevent further attempted transmission for the call. Unfortunately, there might be a daemon trying to use it at the same time to transmit a packet. Fix this by keeping call->conn set, but setting a flag on the call to indicate disconnection instead. Remove also the bits in the transmission functions where the conn pointer is checked and a ref taken under spinlock as this is now redundant. Fixes: 8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4c1295dc |
|
07-Oct-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix trace-after-put looking at the put connection record rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the decrement of the refcount - which looks at the debug_id in the connection record. But unless the refcount was reduced to zero, we no longer have the right to look in the record and, indeed, it may be deleted by some other thread. Fix this by getting the debug_id out before decrementing the refcount and then passing that into the tracepoint. Fixes: 363deeab6d0f ("rxrpc: Add connection tracepoint and client conn state tracepoint") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d12040b6 |
|
29-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] When a local endpoint is ceases to be in use, such as when the kafs module is unloaded, the kernel will emit an assertion failure if there are any outstanding client connections: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:433! and even beyond that, will evince other oopses if there are service connections still present. Fix this by: (1) Removing the triggering of connection reaping when an rxrpc socket is released. These don't actually clean up the connections anyway - and further, the local endpoint may still be in use through another socket. (2) Mark the local endpoint as dead when we start the process of tearing it down. (3) When destroying a local endpoint, strip all of its client connections from the idle list and discard the ref on each that the list was holding. (4) When destroying a local endpoint, call the service connection reaper directly (rather than through a workqueue) to immediately kill off all outstanding service connections. (5) Make the service connection reaper reap connections for which the local endpoint is marked dead. Only after destroying the connections can we close the socket lest we get an oops in a workqueue that's looking at a connection or a peer. Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
5a790b73 |
|
04-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop the local endpoint arg from rxrpc_extract_addr_from_skb() rxrpc_extract_addr_from_skb() doesn't use the argument that points to the local endpoint, so remove the argument. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
46894a13 |
|
04-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Use IPv4 addresses throught the IPv6 AF_RXRPC opens an IPv6 socket through which to send and receive network packets, both IPv6 and IPv4. It currently turns AF_INET addresses into AF_INET-as-AF_INET6 addresses based on an assumption that this was necessary; on further inspection of the code, however, it turns out that the IPv6 code just farms packets aimed at AF_INET addresses out to the IPv4 code. Fix AF_RXRPC to use AF_INET addresses directly when given them. Fixes: 7b674e390e51 ("rxrpc: Fix IPv6 support") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f3344303 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix error distribution Fix error distribution by immediately delivering the errors to all the affected calls rather than deferring them to a worker thread. The problem with the latter is that retries and things can happen in the meantime when we want to stop that sooner. To this end: (1) Stop the error distributor from removing calls from the error_targets list so that peer->lock isn't needed to synchronise against other adds and removals. (2) Require the peer's error_targets list to be accessed with RCU, thereby avoiding the need to take peer->lock over distribution. (3) Don't attempt to affect a call's state if it is already marked complete. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0099dc58 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Make service call handling more robust Make the following changes to improve the robustness of the code that sets up a new service call: (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a service ID check and pass that along to rxrpc_new_incoming_call(). This means that I can remove the check from rxrpc_new_incoming_call() without the need to worry about the socket attached to the local endpoint getting replaced - which would invalidate the check. (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be done once. The peer is passed to rxrpc_new_incoming_call(), thereby saving the need to repeat the search. This also reduces the possibility of rxrpc_publish_service_conn() BUG()'ing due to the detection of a duplicate connection, despite the initial search done by rxrpc_find_connection_rcu() having turned up nothing. This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be non-reentrant and the result of the initial search should still hold true, but it has proven possible to hit. I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short the iteration over the hash table if it finds a matching peer with a zero usage count, but I don't know for sure since it's only ever been hit once that I know of. Another possibility is that a bug in rxrpc_data_ready() that checked the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag might've let through a packet that caused a spurious and invalid call to be set up. That is addressed in another patch. (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero usage count rather than stopping and returning not found, just in case there's another peer record behind it in the bucket. (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but rather either use the peer cached in (2) or, if one wasn't found, preemptively install a new one. Fixes: 8496af50eb38 ("rxrpc: Use RCU to access a peer's service connection tree") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dc71db34 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix checks as to whether we should set up a new call There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED flag in the packet type field rather than in the packet flags field. Fix this by creating a pair of helper functions to check whether the packet is going to the client or to the server and use them generally. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
092ffc51 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove dup code from rxrpc_find_connection_rcu() rxrpc_find_connection_rcu() initialises variable k twice with the same information. Remove one of the initialisations. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
bfc18e38 |
|
21-Jun-2018 |
Mark Rutland <mark.rutland@arm.com> |
atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless() While __atomic_add_unless() was originally intended as a building-block for atomic_add_unless(), it's now used in a number of places around the kernel. It's the only common atomic operation named __atomic*(), rather than atomic_*(), and for consistency it would be better named atomic_fetch_add_unless(). This lack of consistency is slightly confusing, and gets in the way of scripting atomics. Given that, let's clean things up and promote it to an official part of the atomics API, in the form of atomic_fetch_add_unless(). This patch converts definitions and invocations over to the new name, including the instrumented version, using the following script: ---- git grep -w __atomic_add_unless | while read line; do sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}"; done git grep -w __arch_atomic_add_unless | while read line; do sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}"; done ---- Note that we do not have atomic{64,_long}_fetch_add_unless(), which will be introduced by later patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
31f5f9a16 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix apparent leak of rxrpc_local objects rxrpc_local objects cannot be disposed of until all the connections that point to them have been RCU'd as a connection object holds refcount on the local endpoint it is communicating through. Currently, this can cause an assertion failure to occur when a network namespace is destroyed as there's no check that the RCU destructors for the connections have been run before we start trying to destroy local endpoints. The kernel reports: rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5} ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/local_object.c:439! Fix this by keeping a count of the live connections and waiting for it to go to zero at the end of rxrpc_destroy_all_connections(). Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
88f2a825 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix checker warnings and errors Fix various issues detected by checker. Errors: (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set call->socket. Warnings: (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to trace_rxrpc_conn() as the where argument. (*) rxrpc_disconnect_client_call() should get its net pointer via the call->conn rather than call->sock to avoid a warning about accessing an RCU pointer without protection. (*) Proc seq start/stop functions need annotation as they pass locks between the functions. False positives: (*) Checker doesn't correctly handle of seq-retry lock context balance in rxrpc_find_service_conn_rcu(). (*) Checker thinks execution may proceed past the BUG() in rxrpc_publish_service_conn(). (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in rxkad.c. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
17e9e23b |
|
06-Feb-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix received abort handling AF_RXRPC is incorrectly sending back to the server any abort it receives for a client connection. This is due to the final-ACK offload to the connection event processor patch. The abort code is copied into the last-call information on the connection channel and then the event processor is set. Instead, the following should be done: (1) In the case of a final-ACK for a successful call, the ACK should be scheduled as before. (2) In the case of a locally generated ABORT, the ABORT details should be cached for sending in response to further packets related to that call and no further action scheduled at call disconnect time. (3) In the case of an ACK received from the peer, the call should be considered dead, no ABORT should be transmitted at this time. In response to further non-ABORT packets from the peer relating to this call, an RX_USER_ABORT ABORT should be transmitted. (4) In the case of a call killed due to network error, an RX_USER_ABORT ABORT should be cached for transmission in response to further packets, but no ABORT should be sent at this time. Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d7682af |
|
29-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Clean up whitespace Clean up some whitespace from rxrpc. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3d18cbb7 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix conn expiry timers Fix the rxrpc connection expiry timers so that connections for closed AF_RXRPC sockets get deleted in a more timely fashion, freeing up the transport UDP port much more quickly. (1) Replace the delayed work items with work items plus timers so that timer_reduce() can be used to shorten them and so that the timer doesn't requeue the work item if the net namespace is dead. (2) Don't use queue_delayed_work() as that won't alter the timeout if the timer is already running. (3) Don't rearm the timers if the network namespace is dead. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f859ab61 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix service endpoint expiry RxRPC service endpoints expire like they're supposed to by the following means: (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the global service conn timeout, otherwise the first rxrpc_net struct to die will cause connections on all others to expire immediately from then on. (2) Mark local service endpoints for which the socket has been closed (->service_closed) so that the expiration timeout can be much shortened for service and client connections going through that endpoint. (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage count reaches 1, not 0, as idle conns have a 1 count. (4) The accumulator for the earliest time we might want to schedule for should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as the comparison functions use signed arithmetic. (5) Simplify the expiration handling, adding the expiration value to the idle timestamp each time rather than keeping track of the time in the past before which the idle timestamp must go to be expired. This is much easier to read. (6) Ignore the timeouts if the net namespace is dead. (7) Restart the service reaper work item rather the client reaper. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3136ef49 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Delay terminal ACK transmission on a client call Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7b674e39 |
|
29-Aug-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix IPv6 support Fix IPv6 support in AF_RXRPC in the following ways: (1) When extracting the address from a received IPv4 packet, if the local transport socket is open for IPv6 then fill out the sockaddr_rxrpc struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead of an AF_INET one. (2) When sending CHALLENGE or RESPONSE packets, the transport length needs to be set from the sockaddr_rxrpc::transport_len field rather than sizeof() on the IPv4 transport address. (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up the address correctly before searching for the affected peer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f7aec129 |
|
14-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Cache the congestion window setting Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm. The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68d6d1ae |
|
05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Separate the connection's protocol service ID from the lookup ID Keep the rxrpc_connection struct's idea of the service ID that is exposed in the protocol separate from the service ID that's used as a lookup key. This allows the protocol service ID on a client connection to get upgraded without making the connection unfindable for other client calls that also would like to use the upgraded connection. The connection's actual service ID is then returned through recvmsg() by way of msg_name. Whilst we're at it, we get rid of the last_service_id field from each channel. The service ID is per-connection, not per-call and an entire connection is upgraded in one go. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
2baec2c3 |
|
24-May-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Support network namespacing Support network namespacing in AF_RXRPC with the following changes: (1) All the local endpoint, peer and call lists, locks, counters, etc. are moved into the per-namespace record. (2) All the connection tracking is moved into the per-namespace record with the exception of the client connection ID tree, which is kept global so that connection IDs are kept unique per-machine. (3) Each namespace gets its own epoch. This allows each network namespace to pretend to be a separate client machine. (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and the contents reflect the namespace. fs/afs/ should be okay with this patch as it explicitly requires the current net namespace to be init_net to permit a mount to proceed at the moment. It will, however, need updating so that cells, IP addresses and DNS records are per-namespace also. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1d9f7fd |
|
05-Jan-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Add some more tracing Add the following extra tracing information: (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as this is varied by the slow-start algorithm. (2) Modify the rxrpc_rx_ack tracepoint to record more information from received ACK packets. (3) Add an rxrpc_rx_data tracepoint to record the information in DATA packets. (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection, including the reason the call was disconnected. (5) Add an rxrpc_improper_term tracepoint to record implicit termination of a call by a client either by starting a new call on a particular connection channel without first transmitting the final ACK for the previous call. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5a924b89 |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs Don't store the rxrpc protocol header in sk_buffs on the transmit queue, but rather generate it on the fly and pass it to kernel_sendmsg() as a separate iov. This reduces the amount of storage required. Note that the security header is still stored in the sk_buff as it may get encrypted along with the data (and doesn't change with each transmission). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
363deeab |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add connection tracepoint and client conn state tracepoint Add a pair of tracepoints, one to track rxrpc_connection struct ref counting and the other to track the client connection cache state. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d1912747 |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Make IPv6 support conditional on CONFIG_IPV6 Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it. This is then made conditional on CONFIG_IPV6. Without this, the following can be seen: net/built-in.o: In function `rxrpc_init_peer': >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75b54cb5 |
|
13-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add IPv6 support Add IPv6 support to AF_RXRPC. With this, AF_RXRPC sockets can be created: service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6); instead of: service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET); The AFS filesystem doesn't support IPv6 at the moment, though, since that requires upgrades to some of the RPC calls. Note that a good portion of this patch is replacing "%pI4:%u" in print statements with "%pISpc" which is able to handle both protocols and print the port. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
248f219c |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rewrite the data and ack handling code Rewrite the data and ack handling code such that: (1) Parsing of received ACK and ABORT packets and the distribution and the filing of DATA packets happens entirely within the data_ready context called from the UDP socket. This allows us to process and discard ACK and ABORT packets much more quickly (they're no longer stashed on a queue for a background thread to process). (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead keep track of the offset and length of the content of each packet in the sk_buff metadata. This means we don't do any allocation in the receive path. (3) Jumbo DATA packet parsing is now done in data_ready context. Rather than cloning the packet once for each subpacket and pulling/trimming it, we file the packet multiple times with an annotation for each indicating which subpacket is there. From that we can directly calculate the offset and length. (4) A call's receive queue can be accessed without taking locks (memory barriers do have to be used, though). (5) Incoming calls are set up from preallocated resources and immediately made live. They can than have packets queued upon them and ACKs generated. If insufficient resources exist, DATA packet #1 is given a BUSY reply and other DATA packets are discarded). (6) sk_buffs no longer take a ref on their parent call. To make this work, the following changes are made: (1) Each call's receive buffer is now a circular buffer of sk_buff pointers (rxtx_buffer) rather than a number of sk_buff_heads spread between the call and the socket. This permits each sk_buff to be in the buffer multiple times. The receive buffer is reused for the transmit buffer. (2) A circular buffer of annotations (rxtx_annotations) is kept parallel to the data buffer. Transmission phase annotations indicate whether a buffered packet has been ACK'd or not and whether it needs retransmission. Receive phase annotations indicate whether a slot holds a whole packet or a jumbo subpacket and, if the latter, which subpacket. They also note whether the packet has been decrypted in place. (3) DATA packet window tracking is much simplified. Each phase has just two numbers representing the window (rx_hard_ack/rx_top and tx_hard_ack/tx_top). The hard_ack number is the sequence number before base of the window, representing the last packet the other side says it has consumed. hard_ack starts from 0 and the first packet is sequence number 1. The top number is the sequence number of the highest-numbered packet residing in the buffer. Packets between hard_ack+1 and top are soft-ACK'd to indicate they've been received, but not yet consumed. Four macros, before(), before_eq(), after() and after_eq() are added to compare sequence numbers within the window. This allows for the top of the window to wrap when the hard-ack sequence number gets close to the limit. Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also to indicate when rx_top and tx_top point at the packets with the LAST_PACKET bit set, indicating the end of the phase. (4) Calls are queued on the socket 'receive queue' rather than packets. This means that we don't need have to invent dummy packets to queue to indicate abnormal/terminal states and we don't have to keep metadata packets (such as ABORTs) around (5) The offset and length of a (sub)packet's content are now passed to the verify_packet security op. This is currently expected to decrypt the packet in place and validate it. However, there's now nowhere to store the revised offset and length of the actual data within the decrypted blob (there may be a header and padding to skip) because an sk_buff may represent multiple packets, so a locate_data security op is added to retrieve these details from the sk_buff content when needed. (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is individually secured and needs to be individually decrypted. The code to do this is broken out into rxrpc_recvmsg_data() and shared with the kernel API. It now iterates over the call's receive buffer rather than walking the socket receive queue. Additional changes: (1) The timers are condensed to a single timer that is set for the soonest of three timeouts (delayed ACK generation, DATA retransmission and call lifespan). (2) Transmission of ACK and ABORT packets is effected immediately from process-context socket ops/kernel API calls that cause them instead of them being punted off to a background work item. The data_ready handler still has to defer to the background, though. (3) A shutdown op is added to the AF_RXRPC socket so that the AFS filesystem can shut down the socket and flush its own work items before closing the socket to deal with any in-progress service calls. Future additional changes that will need to be considered: (1) Make sure that a call doesn't hog the front of the queue by receiving data from the network as fast as userspace is consuming it to the exclusion of other calls. (2) Transmit delayed ACKs from within recvmsg() when we've consumed sufficiently more packets to avoid the background work item needing to run. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
00e90712 |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Preallocate peers, conns and calls for incoming service requests Make it possible for the data_ready handler called from the UDP transport socket to completely instantiate an rxrpc_call structure and make it immediately live by preallocating all the memory it might need. The idea is to cut out the background thread usage as much as possible. [Note that the preallocated structs are not actually used in this patch - that will be done in a future patch.] If insufficient resources are available in the preallocation buffers, it will be possible to discard the DATA packet in the data_ready handler or schedule a BUSY packet without the need to schedule an attempt at allocation in a background thread. To this end: (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a maximum number each of the listen backlog size. The backlog size is limited to a maxmimum of 32. Only this many of each can be in the preallocation buffer. (2) For userspace sockets, the preallocation is charged initially by listen() and will be recharged by accepting or rejecting pending new incoming calls. (3) For kernel services {,re,dis}charging of the preallocation buffers is handled manually. Two notifier callbacks have to be provided before kernel_listen() is invoked: (a) An indication that a new call has been instantiated. This can be used to trigger background recharging. (b) An indication that a call is being discarded. This is used when the socket is being released. A function, rxrpc_kernel_charge_accept() is called by the kernel service to preallocate a single call. It should be passed the user ID to be used for that call and a callback to associate the rxrpc call with the kernel service's side of the ID. (4) Discard the preallocation when the socket is closed. (5) Temporarily bump the refcount on the call allocated in rxrpc_incoming_call() so that rxrpc_release_call() can ditch the preallocation ref on service calls unconditionally. This will no longer be necessary once the preallocation is used. Note that this does not yet control the number of active service calls on a client - that will come in a later patch. A future development would be to provide a setsockopt() call that allows a userspace server to manually charge the preallocation buffer. This would allow user call IDs to be provided in advance and the awkward manual accept stage to be bypassed. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f5c17aae |
|
30-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Calls should only have one terminal state Condense the terminal states of a call state machine to a single state, plus a separate completion type value. The value is then set, along with error and abort code values, only when the call is transitioned to the completion state. Helpers are provided to simplify this. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
45025bce |
|
24-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Improve management and caching of client connection objects Improve the management and caching of client rxrpc connection objects. From this point, client connections will be managed separately from service connections because AF_RXRPC controls the creation and re-use of client connections but doesn't have that luxury with service connections. Further, there will be limits on the numbers of client connections that may be live on a machine. No direct restriction will be placed on the number of client calls, excepting that each client connection can support a maximum of four concurrent calls. Note that, for a number of reasons, we don't want to simply discard a client connection as soon as the last call is apparently finished: (1) Security is negotiated per-connection and the context is then shared between all calls on that connection. The context can be negotiated again if the connection lapses, but that involves holding up calls whilst at least two packets are exchanged and various crypto bits are performed - so we'd ideally like to cache it for a little while at least. (2) If a packet goes astray, we will need to retransmit a final ACK or ABORT packet. To make this work, we need to keep around the connection details for a little while. (3) The locally held structures represent some amount of setup time, to be weighed against their occupation of memory when idle. To this end, the client connection cache is managed by a state machine on each connection. There are five states: (1) INACTIVE - The connection is not held in any list and may not have been exposed to the world. If it has been previously exposed, it was discarded from the idle list after expiring. (2) WAITING - The connection is waiting for the number of client conns to drop below the maximum capacity. Calls may be in progress upon it from when it was active and got culled. The connection is on the rxrpc_waiting_client_conns list which is kept in to-be-granted order. Culled conns with waiters go to the back of the queue just like new conns. (3) ACTIVE - The connection has at least one call in progress upon it, it may freely grant available channels to new calls and calls may be waiting on it for channels to become available. The connection is on the rxrpc_active_client_conns list which is kept in activation order for culling purposes. (4) CULLED - The connection got summarily culled to try and free up capacity. Calls currently in progress on the connection are allowed to continue, but new calls will have to wait. There can be no waiters in this state - the conn would have to go to the WAITING state instead. (5) IDLE - The connection has no calls in progress upon it and must have been exposed to the world (ie. the EXPOSED flag must be set). When it expires, the EXPOSED flag is cleared and the connection transitions to the INACTIVE state. The connection is on the rxrpc_idle_client_conns list which is kept in order of how soon they'll expire. A connection in the ACTIVE or CULLED state must have at least one active call upon it; if in the WAITING state it may have active calls upon it; other states may not have active calls. As long as a connection remains active and doesn't get culled, it may continue to process calls - even if there are connections on the wait queue. This simplifies things a bit and reduces the amount of checking we need do. There are a couple flags of relevance to the cache: (1) EXPOSED - The connection ID got exposed to the world. If this flag is set, an extra ref is added to the connection preventing it from being reaped when it has no calls outstanding. This flag is cleared and the ref dropped when a conn is discarded from the idle list. (2) DONT_REUSE - The connection should be discarded as soon as possible and should not be reused. This commit also provides a number of new settings: (*) /proc/net/rxrpc/max_client_conns The maximum number of live client connections. Above this number, new connections get added to the wait list and must wait for an active conn to be culled. Culled connections can be reused, but they will go to the back of the wait list and have to wait. (*) /proc/net/rxrpc/reap_client_conns If the number of desired connections exceeds the maximum above, the active connection list will be culled until there are only this many left in it. (*) /proc/net/rxrpc/idle_conn_expiry The normal expiry time for a client connection, provided there are fewer than reap_client_conns of them around. (*) /proc/net/rxrpc/idle_conn_fast_expiry The expedited expiry time, used when there are more than reap_client_conns of them around. Note that I combined the Tx wait queue with the channel grant wait queue to save space as only one of these should be in use at once. Note also that, for the moment, the service connection cache still uses the old connection management code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4d028b2c |
|
24-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Dup the main conn list for the proc interface The main connection list is used for two independent purposes: primarily it is used to find connections to reap and secondarily it is used to list connections in procfs. Split the procfs list out from the reap list. This allows us to stop using the reap list for client connections when they acquire a separate management strategy from service collections. The client connections will not be on a management single list, and sometimes won't be on a management list at all. This doesn't leave them floating, however, as they will also be on an rb-tree rooted on the socket so that the socket can find them to dispatch calls. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
18bfeba5 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor Perform terminal call ACK/ABORT retransmission in the connection processor rather than in the call processor. With this change, once last_call is set, no more incoming packets will be routed to the corresponding call or any earlier calls on that channel (call IDs must only increase on a channel on a connection). Further, if a packet's callNumber is before the last_call ID or a packet is aimed at successfully completed service call then that packet is discarded and ignored. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f51b4480 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Set connection expiry on idle, not put Set the connection expiry time when a connection becomes idle rather than doing this in rxrpc_put_connection(). This makes the put path more efficient (it is likely to be called occasionally whilst a connection has outstanding calls because active workqueue items needs to be given a ref). The time is also preset in the connection allocator in case the connection never gets used. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
01a90a45 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop channel number field from rxrpc_call struct Drop the channel number (channel) field from the rxrpc_call struct to reduce the size of the call struct. The field is redundant: if the call is attached to a connection, the channel can be obtained from there by AND'ing with RXRPC_CHANNELMASK. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8496af50 |
|
01-Jul-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use RCU to access a peer's service connection tree Move to using RCU access to a peer's service connection tree when routing an incoming packet. This is done using a seqlock to trigger retrying of the tree walk if a change happened. Further, we no longer get a ref on the connection looked up in the data_ready handler unless we queue the connection's work item - and then only if the refcount > 0. Note that I'm avoiding the use of a hash table for service connections because each service connection is addressed by a 62-bit number (constructed from epoch and connection ID >> 2) that would allow the client to engage in bucket stuffing, given knowledge of the hash algorithm. Peers, however, are hashed as the network address is less controllable by the client. The total number of peers will also be limited in a future commit. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1291e9d1 |
|
29-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Move data_ready peer lookup into rxrpc_find_connection() Move the peer lookup done in input.c by data_ready into rxrpc_find_connection(). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
001c1122 |
|
30-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Maintain an extra ref on a conn for the cache list Overhaul the usage count accounting for the rxrpc_connection struct to make it easier to implement RCU access from the data_ready handler. The problem is that currently we're using a lock to prevent the garbage collector from trying to clean up a connection that we're contemplating unidling. We could just stick incoming packets on the connection we find, but we've then got a problem that we may race when dispatching a work item to process it as we need to give that a ref to prevent the rxrpc_connection struct from disappearing in the meantime. Further, incoming packets may get discarded if attached to an rxrpc_connection struct that is going away. Whilst this is not a total disaster - the client will presumably resend - it would delay processing of the call. This would affect the AFS client filesystem's service manager operation. To this end: (1) We now maintain an extra count on the connection usage count whilst it is on the connection list. This mean it is not in use when its refcount is 1. (2) When trying to reuse an old connection, we only increment the refcount if it is greater than 0. If it is 0, we replace it in the tree with a new candidate connection. (3) Two connection flags are added to indicate whether or not a connection is in the local's client connection tree (used by sendmsg) or the peer's service connection tree (used by data_ready). This makes sure that we don't try and remove a connection if it got replaced. The flags are tested under lock with the removal operation to prevent the reaper from killing the rxrpc_connection struct whilst someone else is trying to effect a replacement. This could probably be alleviated by using memory barriers between the flag set/test and the rb_tree ops. The rb_tree op would still need to be under the lock, however. (4) When trying to reap an old connection, we try to flip the usage count from 1 to 0. If it's not 1 at that point, then it must've come back to life temporarily and we ignore it. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7877a4a4 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Split service connection code out into its own file Split the service-specific connection code out into into its own file. The client-specific code has already been split out. This will leave just the common code in the original file. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c6d2b8d7 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Split client connection code out into its own file Split the client-specific connection code out into its own file. It will behave somewhat differently from the service-specific connection code, so it makes sense to separate them. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a1399f8b |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Call channels should have separate call number spaces Each channel on a connection has a separate, independent number space from which to allocate callNumber values. It is entirely possible, for example, to have a connection with four active calls, each with call number 1. Note that the callNumber values for any particular channel don't have to start at 1, but they are supposed to increment monotonically for that channel from a client's perspective and may not be reused once the call number is transmitted (until the epoch cycles all the way back round). Currently, however, call numbers are allocated on a per-connection basis and, further, are held in an rb-tree. The rb-tree is redundant as the four channel pointers in the rxrpc_connection struct are entirely capable of pointing to all the calls currently in progress on a connection. To this end, make the following changes: (1) Handle call number allocation independently per channel. (2) Get rid of the conn->calls rb-tree. This is overkill as a connection may have a maximum of four calls in progress at any one time. Use the pointers in the channels[] array instead, indexed by the channel number from the packet. (3) For each channel, save the result of the last call that was in progress on that channel in conn->channels[] so that the final ACK or ABORT packet can be replayed if necessary. Any call earlier than that is just ignored. If we've seen the next call number in a packet, the last one is most definitely defunct. (4) When generating a RESPONSE packet for a connection, the call number counter for each channel must be included in it. (5) When parsing a RESPONSE packet for a connection, the call number counters contained therein should be used to set the minimum expected call numbers on each channel. To do in future commits: (1) Replay terminal packets based on the last call stored in conn->channels[]. (2) Connections should be retired before the callNumber space on any channel runs out. (3) A server is expected to disregard or reject any new incoming call that has a call number less than the current call number counter. The call number counter for that channel must be advanced to the new call number. Note that the server cannot just require that the next call that it sees on a channel be exactly the call number counter + 1 because then there's a scenario that could cause a problem: The client transmits a packet to initiate a connection, the network goes out, the server sends an ACK (which gets lost), the client sends an ABORT (which also gets lost); the network then reconnects, the client then reuses the call number for the next call (it doesn't know the server already saw the call number), but the server thinks it already has the first packet of this call (it doesn't know that the client doesn't know that it saw the call number the first time). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dee46364 |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add RCU destruction for connections and calls Add RCU destruction for connections and calls as the RCU lookup from the transport socket data_ready handler is going to come along shortly. Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage into the destruction code for the objects that need it (locals and connections) and add the extra RCU barrier required for connection cleanup. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e653cfe4 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Release a call's connection ref on call disconnection When a call is disconnected, clear the call's pointer to the connection and release the associated ref on that connection. This means that the call no longer pins the connection and the connection can be discarded even before the call is. As the code currently stands, the call struct is effectively pinned by userspace until userspace has enacted a recvmsg() to retrieve the final call state as sk_buffs on the receive queue pin the call to which they're related because: (1) The rxrpc_call struct contains the userspace ID that recvmsg() has to include in the control message buffer to indicate which call is being referred to. This ID must remain valid until the terminal packet is completely read and must be invalidated immediately at that point as userspace is entitled to immediately reuse it. (2) The final ACK to the reply to a client call isn't sent until the last data packet is entirely read (it's probably worth altering this in future to be send the ACK as soon as all the data has been received). This change requires a bit of rearrangement to make sure that the call isn't going to try and access the connection again after protocol completion: (1) Delete the error link earlier when we're releasing the call. Possibly network errors should be distributed via connections at the cost of adding in an access to the rxrpc_connection struct. (2) Remove the call from the connection's call tree before disconnecting the call. The call tree needs to be removed anyway and incoming packets delivered by channel pointer instead. (3) The release call event should be considered last after all other events have been processed so that we don't need access to the connection again. (4) Move the channel_lock taking from rxrpc_release_call() to rxrpc_disconnect_call() where it will be required in future. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
bba304db |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Turn connection #defines into enums and put outside struct def Turn the connection event and state #define lists into enums and move outside of the struct definition. Whilst we're at it, change _SERVER to _SERVICE in those identifiers and add EV_ into the event name to distinguish them from flags and states. Also add a symbol indicating the number of states and use that in the state text array. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a263629d |
|
26-Jun-2016 |
Herbert Xu <herbert@gondor.apana.org.au> |
rxrpc: Avoid using stack memory in SG lists in rxkad rxkad uses stack memory in SG lists which would not work if stacks were allocated from vmalloc memory. In fact, in most cases this isn't even necessary as the stack memory ends up getting copied over to kmalloc memory. This patch eliminates all the unnecessary stack memory uses by supplying the final destination directly to the crypto API. In two instances where a temporary buffer is actually needed we also switch use a scratch area in the rxrpc_call struct (only one DATA packet will be being secured or verified at a time). Finally there is no need to split a split-page buffer into two SG entries so code dealing with that has been removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
689f4c64 |
|
30-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Check the source of a packet to a client conn When looking up a client connection to which to route a packet, we need to check that the packet came from the correct source so that a peer can't try to muck around with another peer's connection. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
88b99d0b |
|
01-Jul-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix some sparse errors Fix the following sparse errors: ../net/rxrpc/conn_object.c:77:17: warning: incorrect type in assignment (different base types) ../net/rxrpc/conn_object.c:77:17: expected restricted __be32 [usertype] call_id ../net/rxrpc/conn_object.c:77:17: got unsigned int [unsigned] [usertype] call_id ../net/rxrpc/conn_object.c:84:21: warning: restricted __be32 degrades to integer ../net/rxrpc/conn_object.c:86:26: warning: restricted __be32 degrades to integer ../net/rxrpc/conn_object.c:357:15: warning: incorrect type in assignment (different base types) ../net/rxrpc/conn_object.c:357:15: expected restricted __be32 [usertype] epoch ../net/rxrpc/conn_object.c:357:15: got unsigned int [unsigned] [usertype] epoch ../net/rxrpc/conn_object.c:369:21: warning: restricted __be32 degrades to integer ../net/rxrpc/conn_object.c:371:26: warning: restricted __be32 degrades to integer ../net/rxrpc/conn_object.c:411:21: warning: restricted __be32 degrades to integer ../net/rxrpc/conn_object.c:413:26: warning: restricted __be32 degrades to integer Signed-off-by: David Howells <dhowells@redhat.com>
|
#
aa390bbe |
|
17-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Kill off the rxrpc_transport struct The rxrpc_transport struct is now redundant, given that the rxrpc_peer struct is now per peer port rather than per peer host, so get rid of it. Service connection lists are transferred to the rxrpc_peer struct, as is the conn_lock. Previous patches moved the client connection handling out of the rxrpc_transport struct and discarded the connection bundling code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
999b69f8 |
|
17-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Kill the client connection bundle concept Kill off the concept of maintaining a bundle of connections to a particular target service to increase the number of call slots available for any beyond four for that service (there are four call slots per connection). This will make cleaning up the connection handling code easier and facilitate removal of the rxrpc_transport struct. Bundling can be reintroduced later if necessary. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5627cc8b |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide more refcount helper functions Provide refcount helper functions for connections so that the code doesn't touch local or connection usage counts directly. Also make it such that local and peer put functions can take a NULL pointer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4a3388c8 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use IDR to allocate client conn IDs on a machine-wide basis Use the IDR facility to allocate client connection IDs on a machine-wide basis so that each client connection has a unique identifier. When the connection ID space wraps, we advance the epoch by 1, thereby effectively having a 62-bit ID space. The IDR facility is then used to look up client connections during incoming packet routing instead of using an rbtree rooted on the transport. This change allows for the removal of the transport in the future and also means that client connections can be looked up directly in the data-ready handler by connection ID. The ID management code is placed in a new file, conn-client.c, to which all the client connection-specific code will eventually move. Note that the IDR tree gets very expensive on memory if the connection IDs are widely scattered throughout the number space, so we shall need to retire connections that have, say, an ID more than four times the maximum number of client conns away from the current allocation point to try and keep the IDs concentrated. We will also need to retire connections from an old epoch. Also note that, for the moment, a pointer to the transport has to be passed through into the ID allocation function so that we can take a BH lock to prevent a locking issue against in-BH lookup of client connections. This will go away later when RCU is used for server connections also. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b3f57504 |
|
21-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: rxrpc_connection_lock shouldn't be a BH lock, but conn_lock is rxrpc_connection_lock shouldn't be accessed as a BH-excluding lock. It's only accessed in a few places and none of those are in BH-context. rxrpc_transport::conn_lock, however, *is* a BH-excluding lock and should be accessed so consistently. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
42886ffe |
|
16-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Pass sk_buff * rather than rxrpc_host_header * to functions Pass a pointer to struct sk_buff rather than struct rxrpc_host_header to functions so that they can in the future get at transport protocol parameters rather than just RxRPC parameters. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
cc8feb8e |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix exclusive connection handling "Exclusive connections" are meant to be used for a single client call and then scrapped. The idea is to limit the use of the negotiated security context. The current code, however, isn't doing this: it is instead restricting the socket to a single virtual connection and doing all the calls over that. This is changed such that the socket no longer maintains a special virtual connection over which it will do all the calls, but rather gets a new one each time a new exclusive call is made. Further, using a socket option for this is a poor choice. It should be done on sendmsg with a control message marker instead so that calls can be marked exclusive individually. To that end, add RXRPC_EXCLUSIVE_CALL which, if passed to sendmsg() as a control message element, will cause the call to be done on an single-use connection. The socket option (RXRPC_EXCLUSIVE_CONNECTION) still exists and, if set, will override any lack of RXRPC_EXCLUSIVE_CALL being specified so that programs using the setsockopt() will appear to work the same. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
19ffa01c |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use structs to hold connection params and protocol info Define and use a structure to hold connection parameters. This makes it easier to pass multiple connection parameters around. Define and use a structure to hold protocol information used to hash a connection for lookup on incoming packet. Most of these fields will be disposed of eventually, including the duplicate local pointer. Whilst we're at it rename "proto" to "family" when referring to a protocol family. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8c3e34a4 |
|
12-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rename files matching ar-*.c to git rid of the "ar-" prefix Rename files matching net/rxrpc/ar-*.c to get rid of the "ar-" prefix. This will aid splitting those files by making easier to come up with new names. Note that the not all files are simply renamed from ar-X.c to X.c. The following exceptions are made: (*) ar-call.c -> call_object.c ar-ack.c -> call_event.c call_object.c is going to contain the core of the call object handling. Call event handling is all going to be in call_event.c. (*) ar-accept.c -> call_accept.c Incoming call handling is going to be here. (*) ar-connection.c -> conn_object.c ar-connevent.c -> conn_event.c The former file is going to have the basic connection object handling, but there will likely be some differentiation between client connections and service connections in additional files later. The latter file will have all the connection-level event handling. (*) ar-local.c -> local_object.c This will have the local endpoint object handling code. The local endpoint event handling code will later be split out into local_event.c. (*) ar-peer.c -> peer_object.c This will have the peer endpoint object handling code. Peer event handling code will be placed in peer_event.c (for the moment, there is none). (*) ar-error.c -> peer_event.c This will become the peer event handling code, though for the moment it's actually driven from the local endpoint's perspective. Note that I haven't renamed ar-transport.c to transport_object.c as the intention is to delete it when the rxrpc_transport struct is excised. The only file that actually has its contents changed is net/rxrpc/Makefile. net/rxrpc/ar-internal.h will need its section marker comments updating, but I'll do that in a separate patch to make it easier for git to follow the history across the rename. I may also want to rename ar-internal.h at some point - but that would mean updating all the #includes and I'd rather do that in a separate step. Signed-off-by: David Howells <dhowells@redhat.com.
|