#
4b68137a |
|
01-Feb-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Extract useful fields from a received ACK to skb priv data Extract useful fields from a received ACK packet into the skb private data early on in the process of parsing incoming packets. This makes the ACK fields available even before we've matched the ACK up to a call and will allow us to deal with path MTU discovery probe responses even after the relevant call has been completed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
153f90a0 |
|
30-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Use ktimes for call timeout tracking and set the timer lazily Track the call timeouts as ktimes rather than jiffies as the latter's granularity is too high and only set the timer at the end of the event handling function. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
49489bb0 |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags Switch from keeping the transmission buffers in the rxrpc_txbuf struct and allocated from the slab, to allocating them using page fragment allocators (which uses raw pages), thereby allowing them to be passed to MSG_SPLICE_PAGES and avoid copying into the UDP buffers. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
8985f2b0 |
|
30-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire Use rxrpc_txbuf::kvec[0] instead of rxrpc_txbuf::wire to gain access to the Rx protocol header. In future, the wire header will be stored in a page frag, not in the rxrpc_txbuf struct making it possible to use MSG_SPLICE_PAGES when sending it. Similarly, access the ack header as being immediately after the wire header when filling out an ACK packet. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
99afb28c |
|
30-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Move rxrpc_send_ACK() to output.c with rxrpc_send_ack_packet() Move rxrpc_send_ACK() to output.c to so that it is with rxrpc_send_ack_packet() prior to merging the two. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
44125d5a |
|
26-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Split up the DATA packet transmission function Split (sub)packet preparation and timestamping out of the DATA packet transmission function to make it easier to glue multiple txbufs together into a jumbo DATA packet. This will require preparation and timestamping of all the subpackets in a txbuf, and these functions provide convenient points to place the required iteration. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
ff342bdc |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a kvec[] to the rxrpc_txbuf struct Add a kvec[] to the rxrpc_txbuf struct to point to the contributory buffers for a packet. Start with just a single element for now, but this will be expanded later. Make the ACK sending function use it, which means that rxrpc_fill_out_ack() doesn't need to return the size of the sack table, padding and trailer. Make the data sending code use it, both in where sendmsg() packages code up into txbufs and where those txbufs are transmitted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
41b8debb |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Note cksum in txbuf Add a field to rxrpc_txbuf in which to store the checksum to go in the header as this may get overwritten in the wire header struct when transmitting as part of a jumbo packet. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
12bdff73 |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics Convert the transmission buffer flags into a mask and use | and & rather than bitops functions (atomic ops are not required as only the I/O thread can manipulate them once submitted for transmission). The bottom byte can then correspond directly to the Rx protocol header flags. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
ba132d84 |
|
29-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Record the Tx serial in the rxrpc_txbuf and retransmit trace Each Rx protocol packet contains a per-connection monotonically increasing serial number used to correlate outgoing messages with their replies - something that can be used for RTT calculation. Note this value in the rxrpc_txbuf struct in addition to the wire header and then log it in the rxrpc_retransmit trace for reference. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
41b7fa15 |
|
02-Feb-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix counting of new acks and nacks Fix the counting of new acks and nacks when parsing a packet - something that is used in congestion control. As the code stands, it merely notes if there are any nacks whereas what we really should do is compare the previous SACK table to the new one, assuming we get two successive ACK packets with nacks in them. However, we really don't want to do that if we can avoid it as the tables might not correspond directly as one may be shifted from the other - something that will only get harder to deal with once extended ACK tables come into full use (with a capacity of up to 8192). Instead, count the number of nacks shifted out of the old SACK, the number of nacks retained in the portion still active and the number of new acks and nacks in the new table then calculate what we need. Note this ends up a bit of an estimate as the Rx protocol allows acks to be withdrawn by the receiver and packets requested to be retransmitted. Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7870cf1 |
|
02-Feb-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix delayed ACKs to not set the reference serial number Fix the construction of delayed ACKs to not set the reference serial number as they can't be used as an RTT reference. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3104141 |
|
02-Feb-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix generation of serial numbers to skip zero In the Rx protocol, every packet generated is marked with a per-connection monotonically increasing serial number. This number can be referenced in an ACK packet generated in response to an incoming packet - thereby allowing the sender to use this for RTT determination, amongst other things. However, if the reference field in the ACK is zero, it doesn't refer to any incoming packet (it could be a ping to find out if a packet got lost, for example) - so we shouldn't generate zero serial numbers. Fix the generation of serial numbers to retry if it comes up with a zero. Furthermore, since the serial numbers are only ever allocated within the I/O thread this connection is bound to, there's no need for atomics so remove that too. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87220143 |
|
09-Jan-2024 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix use of Don't Fragment flag rxrpc normally has the Don't Fragment flag set on the UDP packets it transmits, except when it has decided that DATA packets aren't getting through - in which case it turns it off just for the DATA transmissions. This can be a problem, however, for RESPONSE packets that convey authentication and crypto data from the client to the server as ticket may be larger than can fit in the MTU. In such a case, rxrpc gets itself into an infinite loop as the sendmsg returns an error (EMSGSIZE), which causes rxkad_send_response() to return -EAGAIN - and the CHALLENGE packet is put back on the Rx queue to retry, leading to the I/O thread endlessly attempting to perform the transmission. Fix this by disabling DF on RESPONSE packets for now. The use of DF and best data MTU determination needs reconsidering at some point in the future. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/1581852.1704813048@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d2ce4a84 |
|
26-Oct-2023 |
David Howells <dhowells@redhat.com> |
rxrpc: Create a procfile to display outstanding client conn bundles Create /proc/net/rxrpc/bundles to display outstanding rxrpc client connection bundles. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
72904d7b |
|
18-Oct-2023 |
David Howells <dhowells@redhat.com> |
rxrpc, afs: Allow afs to pin rxrpc_peer objects Change rxrpc's API such that: (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an rxrpc_peer record for a remote address and a corresponding function, rxrpc_kernel_put_peer(), is provided to dispose of it again. (2) When setting up a call, the rxrpc_peer object used during a call is now passed in rather than being set up by rxrpc_connect_call(). For afs, this meenat passing it to rxrpc_kernel_begin_call() rather than the full address (the service ID then has to be passed in as a separate parameter). (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can get a pointer to the transport address for display purposed, and another, rxrpc_kernel_remote_srx(), to gain a pointer to the full rxrpc address. (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(), is then altered to take a peer. This now returns the RTT or -1 if there are insufficient samples. (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer(). (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a peer the caller already has. This allows the afs filesystem to pin the rxrpc_peer records that it is using, allowing faster lookups and pointer comparisons rather than comparing sockaddr_rxrpc contents. It also makes it easier to get hold of the RTT. The following changes are made to afs: (1) The addr_list struct's addrs[] elements now hold a peer struct pointer and a service ID rather than a sockaddr_rxrpc. (2) When displaying the transport address, rxrpc_kernel_remote_addr() is used. (3) The port arg is removed from afs_alloc_addrlist() since it's always overridden. (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may now return an error that must be handled. (5) afs_find_server() now takes a peer pointer to specify the address. (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{} now do peer pointer comparison rather than address comparison. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
020c69c1 |
|
25-May-2023 |
David Howells <dhowells@redhat.com> |
rxrpc: Truncate UTS_RELEASE for rxrpc version UTS_RELEASE has a maximum length of 64 which can cause rxrpc_version to exceed the 65 byte message limit. Per the rx spec[1]: "If a server receives a packet with a type value of 13, and the client-initiated flag set, it should respond with a 65-byte payload containing a string that identifies the version of AFS software it is running." The current implementation causes a compile error when WERROR is turned on and/or UTS_RELEASE exceeds the length of 49 (making the version string more than 64 characters). Fix this by generating the string during module initialisation and limiting the UTS_RELEASE segment of the string does not exceed 49 chars. We need to make sure that the 64 bytes includes "linux-" at the front and " AF_RXRPC" at the back as this may be used in pattern matching. Fixes: 44ba06987c0b ("RxRPC: Handle VERSION Rx protocol packets") Reported-by: Kenny Ho <Kenny.Ho@amd.com> Link: https://lore.kernel.org/r/20230523223944.691076-1-Kenny.Ho@amd.com/ Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Kenny Ho <Kenny.Ho@amd.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Andrew Lunn <andrew@lunn.ch> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Link: https://web.mit.edu/kolya/afs/rx/rx-spec [1] Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Link: https://lore.kernel.org/r/654974.1685100894@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
db099c62 |
|
28-Apr-2023 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix timeout of a call that hasn't yet been granted a channel afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may get stalled in the background waiting for a connection to become available); it then calls rxrpc_kernel_set_max_life() to set the timeouts - but that starts the call timer so the call timer might then expire before we get a connection assigned - leading to the following oops if the call stalled: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701 RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157 ... Call Trace: <TASK> rxrpc_send_ACK+0x50/0x13b rxrpc_input_call_event+0x16a/0x67d rxrpc_io_thread+0x1b6/0x45f ? _raw_spin_unlock_irqrestore+0x1f/0x35 ? rxrpc_input_packet+0x519/0x519 kthread+0xe7/0xef ? kthread_complete_and_exit+0x1b/0x1b ret_from_fork+0x22/0x30 Fix this by noting the timeouts in struct rxrpc_call when the call is created. The timer will be started when the first packet is transmitted. It shouldn't be possible to trigger this directly from userspace through AF_RXRPC as sendmsg() will return EBUSY if the call is in the waiting-for-conn state if it dropped out of the wait due to a signal. Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7f40f4a |
|
17-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove local->defrag_sem We no longer need local->defrag_sem as all DATA packet transmission is now done from one thread, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f21e9348 |
|
16-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Simplify ACK handling Now that general ACK transmission is done from the same thread as incoming DATA packet wrangling, there's no possibility that the SACK table will be being updated by the latter whilst the former is trying to copy it to an ACK. This means that we can safely rotate the SACK table whilst updating it without having to take a lock, rather than keeping all the bits inside it in fixed place and copying and then rotating it in the transmitter. Therefore, simplify SACK handing by keeping track of starting point in the ring and rotate slots down as we consume them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5bbf9533 |
|
17-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked call->ackr_window doesn't need to be atomic as ACK generation and ACK transmission are now done in the same thread, so drop the atomic64 handling and split it into two separate members. Similarly, call->ackr_nr_unacked doesn't need to be atomic now either. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
af094824 |
|
17-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow a delay to be injected into packet reception If CONFIG_AF_RXRPC_DEBUG_RX_DELAY=y, then a delay is injected between packets and errors being received and them being made available to the processing code, thereby allowing the RTT to be artificially increased. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
223f5901 |
|
12-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Convert call->recvmsg_lock to a spinlock Convert call->recvmsg_lock to a spinlock as it's only ever write-locked. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
b2ba00c2 |
|
05-Jan-2023 |
Stephen Rothwell <sfr@canb.auug.org.au> |
rxrpc: replace zero-lenth array with DECLARE_FLEX_ARRAY() helper 0-length arrays are deprecated, and cause problems with bounds checking. Replace with a flexible array: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from arch/x86/include/asm/paravirt.h:17, from arch/x86/include/asm/cpuid.h:62, from arch/x86/include/asm/processor.h:19, from arch/x86/include/asm/cpufeature.h:5, from arch/x86/include/asm/thread_info.h:53, from include/linux/thread_info.h:60, from arch/x86/include/asm/preempt.h:9, from include/linux/preempt.h:78, from include/linux/percpu.h:6, from include/linux/prandom.h:13, from include/linux/random.h:153, from include/linux/net.h:18, from net/rxrpc/output.c:10: In function 'fortify_memcpy_chk', inlined from 'rxrpc_fill_out_ack' at net/rxrpc/output.c:158:2: include/linux/fortify-string.h:520:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 520 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://lore.kernel.org/linux-next/20230105132535.0d65378f@canb.auug.org.au/ Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-afs@lists.infradead.org Cc: netdev@vger.kernel.org Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
42f229c3 |
|
06-Jan-2023 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix incoming call setup race An incoming call can race with rxrpc socket destruction, leading to a leaked call. This may result in an oops when the call timer eventually expires: BUG: kernel NULL pointer dereference, address: 0000000000000874 RIP: 0010:_raw_spin_lock_irqsave+0x2a/0x50 Call Trace: <IRQ> try_to_wake_up+0x59/0x550 ? __local_bh_enable_ip+0x37/0x80 ? rxrpc_poke_call+0x52/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] ? rxrpc_poke_call+0x110/0x110 [rxrpc] call_timer_fn+0x24/0x120 with a warning in the kernel log looking something like: rxrpc: Call 00000000ba5e571a still in use (1,SvAwtACK,1061d,0)! incurred during rmmod of rxrpc. The 1061d is the call flags: RECVMSG_READ_ALL, RX_HEARD, BEGAN_RX_TIMER, RX_LAST, EXPOSED, IS_SERVICE, RELEASED but no DISCONNECTED flag (0x800), so it's an incoming (service) call and it's still connected. The race appears to be that: (1) rxrpc_new_incoming_call() consults the service struct, checks sk_state and allocates a call - then pauses, possibly for an interrupt. (2) rxrpc_release_sock() sets RXRPC_CLOSE, nulls the service pointer, discards the prealloc and releases all calls attached to the socket. (3) rxrpc_new_incoming_call() resumes, launching the new call, including its timer and attaching it to the socket. Fix this by read-locking local->services_lock to access the AF_RXRPC socket providing the service rather than RCU in rxrpc_new_incoming_call(). There's no real need to use RCU here as local->services_lock is only write-locked by the socket side in two places: when binding and when shutting down. Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
|
#
9d35d880 |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move client call connection to the I/O thread Move the connection setup of client calls to the I/O thread so that a whole load of locking and barrierage can be eliminated. This necessitates the app thread waiting for connection to complete before it can begin encrypting data. This also completes the fix for a race that exists between call connection and call disconnection whereby the data transmission code adds the call to the peer error distribution list after the call has been disconnected (say by the rxrpc socket getting closed). The fix is to complete the process of moving call connection, data transmission and call disconnection into the I/O thread and thus forcibly serialising them. Note that the issue may predate the overhaul to an I/O thread model that were included in the merge window for v6.2, but the timing is very much changed by the change given below. Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item") Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
0d6bf319 |
|
02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the client conn cache management to the I/O thread Move the management of the client connection cache to the I/O thread rather than managing it from the namespace as an aggregate across all the local endpoints within the namespace. This will allow a load of locking to be got rid of in a future patch as only the I/O thread will be looking at the this. The downside is that the total number of cached connections on the system can get higher because the limit is now per-local rather than per-netns. We can, however, keep the number of client conns in use across the entire netfs and use that to reduce the expiration time of idle connection. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
96b4059f |
|
27-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove call->state_lock All the setters of call->state are now in the I/O thread and thus the state lock is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
93368b6b |
|
26-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move call state changes from recvmsg to I/O thread Move the call state changes that are made in rxrpc_recvmsg() to the I/O thread. This means that, thenceforth, only the I/O thread does this and the call state lock can be removed. This requires the Rx phase to be ended when the last packet is received, not when it is processed. Since this now changes the rxrpc call state to SUCCEEDED before we've consumed all the data from it, rxrpc_kernel_check_life() mustn't say the call is dead until the recvmsg queue is empty (unless the call has failed). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
d41b3f5b |
|
19-Dec-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Wrap accesses to get call state to put the barrier in one place Wrap accesses to get the state of a call from outside of the I/O thread in a single place so that the barrier needed to order wrt the error code and abort code is in just that place. Also use a barrier when setting the call state and again when reading the call state such that the auxiliary completion info (error code, abort code) can be read without taking a read lock on the call state lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
0b9bb322 |
|
26-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Split out the call state changing functions into their own file Split out the functions that change the state of an rxrpc call into their own file. The idea being to remove anything to do with changing the state of a call directly from the rxrpc sendmsg() and recvmsg() paths and have all that done in the I/O thread only, with the ultimate aim of removing the state lock entirely. Moving the code out of sendmsg.c and recvmsg.c makes that easier to manage. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
1bab27af |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters Use the information now stored in struct rxrpc_call to configure the connection bundle and thence the connection, rather than using the rxrpc_conn_parameters struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
2953d3b8 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Offload the completion of service conn security to the I/O thread Offload the completion of the challenge/response cycle on a service connection to the I/O thread. After the RESPONSE packet has been successfully decrypted and verified by the work queue, offloading the changing of the call states to the I/O thread makes iteration over the conn's channel list simpler. Do this by marking the RESPONSE skbuff and putting it onto the receive queue for the I/O thread to collect. We put it on the front of the queue as we've already received the packet for it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f06cb291 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Make the set of connection IDs per local endpoint Make the set of connection IDs per local endpoint so that endpoints don't cause each other's connections to get dismissed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
57af281e |
|
06-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Tidy up abort generation infrastructure Tidy up the abort generation infrastructure in the following ways: (1) Create an enum and string mapping table to list the reasons an abort might be generated in tracing. (2) Replace the 3-char string with the values from (1) in the places that use that to log the abort source. This gets rid of a memcpy() in the tracepoint. (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint and use values from (1) to indicate the trace reason. (4) Always make a call to an abort function at the point of the abort rather than stashing the values into variables and using goto to get to a place where it reported. The C optimiser will collapse the calls together as appropriate. The abort functions return a value that can be returned directly if appropriate. Note that this extends into afs also at the points where that generates an abort. To aid with this, the afs sources need to #define RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header because they don't have access to the rxrpc internal structures that some of the tracepoints make use of. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a00ce28b |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Clean up connection abort Clean up connection abort, using the connection state_lock to gate access to change that state, and use an rxrpc_call_completion value to indicate the difference between local and remote aborts as these can be pasted directly into the call state. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f2cce89a |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement a mechanism to send an event notification to a connection Provide a means by which an event notification can be sent to a connection through such that the I/O thread can pick it up and handle it rather than doing it in a separate workqueue. This is then used to move the deferred final ACK of a call into the I/O thread rather than a separate work queue as part of the drive to do all transmission from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a343b174 |
|
12-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Only set/transmit aborts in the I/O thread Only set the abort call completion state in the I/O thread and only transmit ABORT packets from there. rxrpc_abort_call() can then be made to actually send the packet. Further, ABORT packets should only be sent if the call has been exposed to the network (ie. at least one attempted DATA transmission has occurred for it). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
30df927b |
|
08-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Separate call retransmission from other conn events Call the rxrpc_conn_retransmit_call() directly from rxrpc_input_packet() rather than calling it via connection event handling. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
8a758d98 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Stash the network namespace pointer in rxrpc_local Stash the network namespace pointer in the rxrpc_local struct in addition to a pointer to the rxrpc-specific net namespace info. Use this to remove some places where the socket is passed as a parameter. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
31d35a02 |
|
15-Dec-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix the return value of rxrpc_new_incoming_call() Dan Carpenter sayeth[1]: The patch 5e6ef4f1017c: "rxrpc: Make the I/O thread take over the call and local processor work" from Jan 23, 2020, leads to the following Smatch static checker warning: net/rxrpc/io_thread.c:283 rxrpc_input_packet() warn: bool is not less than zero. Fix this (for now) by changing rxrpc_new_incoming_call() to return an int with 0 or error code rather than bool. Note that the actual return value of rxrpc_input_packet() is currently ignored. I have a separate patch to clean that up. Fixes: 5e6ef4f1017c ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1] Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
608aecd1 |
|
15-Dec-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix locking issues in rxrpc_put_peer_locked() Now that rxrpc_put_local() may call kthread_stop(), it can't be called under spinlock as it might sleep. This can cause a problem in the peer keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock from the point it needs to re-add peer->keepalive_link to going round the loop again in rxrpc_peer_keepalive_dispatch(). Fix this by just dropping the lock when we don't need it and accepting that we'll have to take it again. This code is only called about every 20s for each peer, so not very often. This allows rxrpc_put_peer_unlocked() to be removed also. If triggered, this bug produces an oops like the following, as reproduced by a syzbot reproducer for a different oops[1]: BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 ... RCU nest depth: 0, expected: 0 3 locks held by kworker/u9:0/50: #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f ... Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f __might_resched+0x2cf/0x2f2 __wait_for_common+0x87/0x1e8 kthread_stop+0x14d/0x255 rxrpc_peer_keepalive_dispatch+0x333/0x35f rxrpc_peer_keepalive_worker+0x2e9/0x449 process_one_work+0x3c1/0x636 worker_thread+0x25f/0x359 kthread+0x1a6/0x1b5 ret_from_fork+0x1f/0x30 Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1] Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8fbcc833 |
|
15-Dec-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix I/O thread startup getting skipped When starting a kthread, the __kthread_create_on_node() function, as called from kthread_run(), waits for a completion to indicate that the task_struct (or failure state) of the new kernel thread is available before continuing. This does not wait, however, for the thread function to be invoked and, indeed, will skip it if kthread_stop() gets called before it gets there. If this happens, though, kthread_run() will have returned successfully, indicating that the thread was started and returning the task_struct pointer. The actual error indication is returned by kthread_stop(). Note that this is ambiguous, as the caller cannot tell whether the -EINTR error code came from kthread() or from the thread function. This was encountered in the new rxrpc I/O thread, where if the system is being pounded hard by, say, syzbot, the check of KTHREAD_SHOULD_STOP can be delayed long enough for kthread_stop() to get called when rxrpc releases a socket - and this causes an oops because the I/O thread function doesn't get started and thus doesn't remove the rxrpc_local struct from the local_endpoints list. Fix this by using a completion to wait for the thread to actually enter rxrpc_io_thread(). This makes sure the thread can't be prematurely stopped and makes sure the relied-upon cleanup is done. Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/000000000000229f1505ef2b6159@google.com/ Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0346843 |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Transmit ACKs at the point of generation For ACKs generated inside the I/O thread, transmit the ACK at the point of generation. Where the ACK is generated outside of the I/O thread, it's offloaded to the I/O thread to transmit it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a2cf3264 |
|
15-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local() Fold __rxrpc_unuse_local() into rxrpc_unuse_local() as the latter is now the only user of the former. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5086d9a9 |
|
11-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the cwnd degradation after transmitting packets When we've gone for >1RTT without transmitting a packet, we should reduce the ssthresh and cut the cwnd by half (as suggested in RFC2861 sec 3.1). However, we may receive ACK packets in a batch and the first of these may cut the cwnd, preventing further transmission, and each subsequent one cuts the cwnd yet further, reducing it to the floor and killing performance. Fix this by moving the cwnd reset to after doing the transmission and resetting the base time such that we don't cut the cwnd by half again for at least another RTT. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
32cf8edb |
|
11-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace/count transmission underflows and cwnd resets Add a tracepoint to log when a cwnd reset occurs due to lack of transmission on a call. Add stat counters to count transmission underflows (ie. when we have tx window space, but sendmsg doesn't manage to keep up), cwnd resets and transmission failures. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5e6ef4f1 |
|
23-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Make the I/O thread take over the call and local processor work Move the functions from the call->processor and local->processor work items into the domain of the I/O thread. The call event processor, now called from the I/O thread, then takes over the job of cranking the call state machine, processing incoming packets and transmitting DATA, ACK and ABORT packets. In a future patch, rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it for later transmission. The call event processor becomes purely received-skb driven. It only transmits things in response to events. We use "pokes" to queue a dummy skb to make it do things like start/resume transmitting data. Timer expiry also results in pokes. The connection event processor, becomes similar, though crypto events, such as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item to avoid doing crypto in the I/O thread. The local event processor is removed and VERSION response packets are generated directly from the packet parser. Similarly, ABORTs generated in response to protocol errors will be transmitted immediately rather than being pushed onto a queue for later transmission. Changes: ======== ver #2) - Fix a couple of introduced lock context imbalances. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
393a2a20 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Extract the peer address from an incoming packet earlier Extract the peer address from an incoming packet earlier, at the beginning of rxrpc_input_packet() and thence pass a pointer to it to various functions that use it as part of the lookup rather than doing it on several separate paths. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
cd21effb |
|
08-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Reduce the use of RCU in packet input Shrink the region of rxrpc_input_packet() that is covered by the RCU read lock so that it only covers the connection and call lookup. This means that the bits now outside of that can call sleepable functions such as kmalloc and sendmsg. Also take a ref on the conn or call we're going to use before we drop the RCU read lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
cf37b598 |
|
31-Mar-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move DATA transmission into call processor work item Move DATA transmission into the call processor work item. In a future patch, this will be called from the I/O thread rather than being itsown work item. This will allow DATA transmission to be driven directly by incoming ACKs, pokes and timers as those are processed. The Tx queue is also split: The queue of packets prepared by sendmsg is now places in call->tx_sendmsg and the packet dispatcher decants the packets into call->tx_buffer as space becomes available in the transmission window. This allows sendmsg to run ahead of the available space to try and prevent an underflow in transmission. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f3441d41 |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Copy client call parameters into rxrpc_call earlier Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offloaded to the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
15f661dc |
|
10-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement a mechanism to send an event notification to a call Provide a means by which an event notification can be sent to a call such that the I/O thread can process it rather than it being done in a separate workqueue. This will allow a lot of locking to be removed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
4041a8ff |
|
23-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove call->input_lock Remove call->input_lock as it was only necessary to serialise access to the state stored in the rxrpc_call struct by simultaneous softirq handlers presenting received packets. They now dump the packets in a queue and a single process-context handler now processes them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
ff734825 |
|
10-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move error processing into the local endpoint I/O thread Move the processing of error packets into the local endpoint I/O thread, leaving the handover from UDP to merely transfer them into the local endpoint queue. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
446b3e14 |
|
10-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Move packet reception processing into I/O thread Split the packet input handler to make the softirq side just dump the received packet into the local endpoint receive queue and then call the remainder of the input function from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a275da62 |
|
10-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Create a per-local endpoint receive queue and I/O thread Create a per-local receive queue to which, in a future patch, all incoming packets will be directed and an I/O thread that will process those packets and perform all transmission of packets. Destruction of the local endpoint is also moved from the local processor work item (which will be absorbed) to the thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
96b2d69b |
|
23-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Split the receive code Split the code that handles packet reception in softirq mode as a prelude to moving all the packet processing beyond routing to the appropriate call and setting up of a new call out into process context. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
3cec055c |
|
24-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't hold a ref for connection workqueue Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). (1) Don't give a ref to the work item. (2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this. (3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end. (4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection. (5) Make sure that the cleanup routine waits for the work item to complete. (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
3feda9d6 |
|
25-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't hold a ref for call timer or workqueue Currently, rxrpc gives the call timer a ref on the call when it starts it and this is passed along to the workqueue by the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). Fix this by: (1) Don't give a ref to the timer. (2) Making the expiration function not do anything if the refcount is 0. Note that this is more of an optimisation. (3) Make sure that the cleanup routine waits for timer to complete. However, this has a consequence that timer cannot give a ref to the work item. Therefore the following fixes are also necessary: (4) Don't give a ref to the work item. (5) Make the work item return asap if it sees the ref count is 0. (6) Make sure that the cleanup routine waits for the work item to complete. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the call work item is going to go away with the work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
fa3492ab |
|
02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace rxrpc_bundle refcount Add a tracepoint for the rxrpc_bundle refcounting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
cb0fc0c9 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
7fa25105 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
47c810a7 |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
0fde882f |
|
21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
2cc80086 |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly. These are going to get filled in from the rxrpc_call struct in future. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
e969c92c |
|
19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the [_k]net() debugging macros Remove the _net() and knet() debugging macros in favour of tracepoints. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
2ebdb26e |
|
20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the [k_]proto() debugging macros Remove the kproto() and _proto() debugging macros in preference to using tracepoints for this. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
30d95efe |
|
04-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Allocate an skcipher each time needed rather than reusing In the rxkad security class, allocate the skcipher used to do packet encryption and decription rather than allocating one up front and reusing it for each packet. Reusing the skcipher precludes doing crypto in parallel. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
1fc4fa2a |
|
03-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix congestion management rxrpc has a problem in its congestion management in that it saves the congestion window size (cwnd) from one call to another, but if this is 0 at the time is saved, then the next call may not actually manage to ever transmit anything. To this end: (1) Don't save cwnd between calls, but rather reset back down to the initial cwnd and re-enter slow-start if data transmission is idle for more than an RTT. (2) Preserve ssthresh instead, as that is a handy estimate of pipe capacity. Knowing roughly when to stop slow start and enter congestion avoidance can reduce the tendency to overshoot and drop larger amounts of packets when probing. In future, cwind growth also needs to be constrained when the window isn't being filled due to being application limited. Reported-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
6869ddb8 |
|
15-Jun-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the rxtx ring The Rx/Tx ring is no longer used, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
d57a3a15 |
|
07-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Save last ACK's SACK table rather than marking txbufs Improve the tracking of which packets need to be transmitted by saving the last ACK packet that we receive that has a populated soft-ACK table rather than marking packets. Then we can step through the soft-ACK table and look at the packets we've transmitted beyond that to determine which packets we might want to retransmit. We also look at the highest serial number that has been acked to try and guess which packets we've transmitted the peer is likely to have seen. If necessary, we send a ping to retrieve that number. One downside that might be a problem is that we can't then compare the previous acked/unacked state so easily in rxrpc_input_soft_acks() - which is a potential problem for the slow-start algorithm. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
4e76bd40 |
|
06-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove call->lock call->lock is no longer necessary, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a4ea4c47 |
|
31-Mar-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't use a ring buffer for call Tx queue Change the way the Tx queueing works to make the following ends easier to achieve: (1) The filling of packets, the encryption of packets and the transmission of packets can be handled in parallel by separate threads, rather than rxrpc_sendmsg() allocating, filling, encrypting and transmitting each packet before moving onto the next one. (2) Get rid of the fixed-size ring which sets a hard limit on the number of packets that can be retained in the ring. This allows the number of packets to increase without having to allocate a very large ring or having variable-sized rings. [Note: the downside of this is that it's then less efficient to locate a packet for retransmission as we then have to step through a list and examine each buffer in the list.] (3) Allow the filler/encrypter to run ahead of the transmission window. (4) Make it easier to do zero copy UDP from the packet buffers. (5) Make it easier to do zero copy from userspace to the packet buffers - and thence to UDP (only if for unauthenticated connections). To that end, the following changes are made: (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets to be transmitted in. This allows them to be placed on multiple queues simultaneously. An sk_buff isn't really necessary as it's never passed on to lower-level networking code. (2) Keep the transmissable packets in a linked list on the call struct rather than in a ring. As a consequence, the annotation buffer isn't used either; rather a flag is set on the packet to indicate ackedness. (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be transmitted has been queued. Add RXRPC_CALL_TX_ALL_ACKED to indicate that all packets up to and including the last got hard acked. (4) Wire headers are now stored in the txbuf rather than being concocted on the stack and they're stored immediately before the data, thereby allowing zerocopy of a single span. (5) Don't bother with instant-resend on transmission failure; rather, leave it for a timer or an ACK packet to trigger. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
5d7edbc9 |
|
27-Aug-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Get rid of the Rx ring Get rid of the Rx ring and replace it with a pair of queues instead. One queue gets the packets that are in-sequence and are ready for processing by recvmsg(); the other queue gets the out-of-sequence packets for addition to the first queue as the holes get filled. The annotation ring is removed and replaced with a SACK table. The SACK table has the bits set that correspond exactly to the sequence number of the packet being acked. The SACK ring is copied when an ACK packet is being assembled and rotated so that the first ACK is in byte 0. Flow control handling is altered so that packets that are moved to the in-sequence queue are hard-ACK'd even before they're consumed - and then the Rx window size in the ACK packet (rsize) is shrunk down to compensate (even going to 0 if the window is full). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
d4d02d8b |
|
07-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Clone received jumbo subpackets and queue separately Split up received jumbo packets into separate skbuffs by cloning the original skbuff for each subpacket and setting the offset and length of the data in that subpacket in the skbuff's private data. The subpackets are then placed on the recvmsg queue separately. The security class then gets to revise the offset and length to remove its metadata. If we fail to clone a packet, we just drop it and let the peer resend it. The original packet gets used for the final subpacket. This should make it easier to handle parallel decryption of the subpackets. It also simplifies the handling of lost or misordered packets in the queuing/buffering loop as the possibility of overlapping jumbo packets no longer needs to be considered. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
530403d9 |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Clean up ACK handling Clean up the rxrpc_propose_ACK() function. If deferred PING ACK proposal is split out, it's only really needed for deferred DELAY ACKs. All other ACKs, bar terminal IDLE ACK are sent immediately. The deferred IDLE ACK submission can be handled by conversion of a DELAY ACK into an IDLE ACK if there's nothing to be SACK'd. Also, because there's a delay between an ACK being generated and being transmitted, it's possible that other ACKs of the same type will be generated during that interval. Apart from the ACK time and the serial number responded to, most of the ACK body, including window and SACK parameters, are not filled out till the point of transmission - so we can avoid generating a new ACK if there's one pending that will cover the SACK data we need to convey. Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one already pending. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
72f0c6fb |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Allocate ACK records at proposal and queue for transmission Allocate rxrpc_txbuf records for ACKs and put onto a queue for the transmitter thread to dispatch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
02a19356 |
|
05-Apr-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a socket buffer so that it can be placed onto multiple queues at once. This also allows the data buffer to be in the same allocation as the internal data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
a11e6ff9 |
|
07-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove call->tx_phase Remove call->tx_phase as it's only ever set. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
27f699cc |
|
07-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the flags from the rxrpc_skb tracepoint Remove the flags from the rxrpc_skb tracepoint as we're no longer going to be using this for the transmission buffers and so marking which are transmission buffers isn't going to be necessary. Note that this also remove the rxrpc skb flag that indicates if this is a transmission buffer and so the count is not updated for the moment. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
b6c66c43 |
|
12-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Use the core ICMP/ICMP6 parsers Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or ipv6_icmp_error() as appropriate to do the parsing rather than trying to do it in rxrpc. This pushes an error report onto the UDP socket's error queue and calls ->sk_error_report() from which point rxrpc can pick it up. It would be preferable to steal the packet directly from ip*_icmp_error() rather than letting it get queued, but this is probably good enough. Also note that __udp4_lib_err() calls sk_error_report() twice in some cases. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
42fb06b3 |
|
12-Oct-2022 |
David Howells <dhowells@redhat.com> |
net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error() Change the udp encap_err_rcv signature to match ip_icmp_error() and ipv6_icmp_error() so that those can be used from the called function and export them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org
|
#
f7fa5242 |
|
18-Aug-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Record stats for why the REQUEST-ACK flag is being set Record stats for why the REQUEST-ACK flag is being set. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
f2a676d1 |
|
11-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Record statistics about ACK types Record statistics about the different types of ACKs that have been transmitted and received and the number of ACKs that have been filled out and transmitted or that have been skipped. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
b0154246 |
|
11-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Add stats procfile and DATA packet stats Add a procfile, /proc/net/rxrpc/stats, to display some statistics about what rxrpc has been doing. Writing a blank line to the stats file will clear the increment-only counters. Allocated resource counters don't get cleared. Add some counters to count various things about DATA packets, including the number created, transmitted and retransmitted and the number received, the number of ACK-requests markings and the number of jumbo packets received. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
589a0c1e |
|
28-Apr-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Track highest acked serial Keep track of the highest DATA serial number that has been acked by the peer for future purposes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
|
#
3bcd6c7e |
|
16-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with available channels and, if not, removes and attempts to destroy the bundle. Whilst it does check after grabbing client_bundles_lock that there are no connections attached, this races with rxrpc_look_up_bundle() retrieving the bundle, but not attaching a connection for the connection to be attached later. There is therefore a window in which the bundle can get destroyed before we manage to attach a new connection to it. Fix this by adding an "active" counter to struct rxrpc_bundle: (1) rxrpc_connect_call() obtains an active count by prepping/looking up a bundle and ditches it before returning. (2) If, during rxrpc_connect_call(), a connection is added to the bundle, this obtains an active count, which is held until the connection is discarded. (3) rxrpc_deactivate_bundle() is created to drop an active count on a bundle and destroy it when the active count reaches 0. The active count is checked inside client_bundles_lock() to prevent a race with rxrpc_look_up_bundle(). (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle(). Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9621e74f |
|
09-Sep-2022 |
Gaosheng Cui <cuigaosheng1@huawei.com> |
rxrpc: remove rxrpc_max_call_lifetime declaration rxrpc_max_call_lifetime has been removed since commit a158bdd3247b ("rxrpc: Fix call timeouts"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20220909064042.1149404-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ac56a0b4 |
|
26-Aug-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix ICMP/ICMP6 error handling Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
9a3dedcf |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix decision on when to generate an IDLE ACK Fix the decision on when to generate an IDLE ACK by keeping a count of the number of packets we've received, but not yet soft-ACK'd, and the number of packets we've processed, but not yet hard-ACK'd, rather than trying to keep track of which DATA sequence numbers correspond to those points. We then generate an ACK when either counter exceeds 2. The counters are both cleared when we transcribe the information into any sort of ACK packet for transmission. IDLE and DELAY ACKs are skipped if both counters are 0 (ie. no change). Fixes: 805b21b929e2 ("rxrpc: Send an ACK after every few DATA packets we receive") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81524b63 |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't let ack.previousPacket regress The previousPacket field in the rx ACK packet should never go backwards - it's now the highest DATA sequence number received, not the last on received (it used to be used for out of sequence detection). Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8940ba3c |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix overlapping ACK accounting Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK accounting through variables shared between the two. call->acks_* members refer to ACKs received in the Tx phase and call->ackr_* members to ACKs sent/to be sent during the Rx phase. Fixes: 1a2391c30c0b ("rxrpc: Fix detection of out of order acks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad25f5cb |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix locking issue There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0575429 |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Use refcount_t rather than atomic_t Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33912c26 |
|
21-May-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc Allow the list of in-use local UDP endpoints in the current network namespace to be viewed in /proc. To aid with this, the endpoint list is converted to an hlist and RCU-safe manipulation is used so that the list can be read with only the RCU read lock held. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4a7f62f9 |
|
30-Mar-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix call timer start racing with call destruction The rxrpc_call struct has a timer used to handle various timed events relating to a call. This timer can get started from the packet input routines that are run in softirq mode with just the RCU read lock held. Unfortunately, because only the RCU read lock is held - and neither ref or other lock is taken - the call can start getting destroyed at the same time a packet comes in addressed to that call. This causes the timer - which was already stopped - to get restarted. Later, the timer dispatch code may then oops if the timer got deallocated first. Fix this by trying to take a ref on the rxrpc_call struct and, if successful, passing that ref along to the timer. If the timer was already running, the ref is discarded. The timer completion routine can then pass the ref along to the call's work item when it queues it. If the timer or work item where already queued/running, the extra ref is discarded. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
d7d775b1 |
|
15-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Ask the security class how much space to allow in a packet Ask the security class how much header and trailer space to allow for when allocating a packet, given how much data is remaining. This will allow the rxgk security class to stick both a trailer in as well as a header as appropriate in the future. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
521bb304 |
|
22-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Organise connection security to use a union Organise the security information in the rxrpc_connection struct to use a union to allow for different data for different security classes. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f4bdf3d6 |
|
17-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't reserve security header in Tx DATA skbuff Insert the security header into the skbuff representing a DATA packet to be transmitted rather than using skb_reserve() when the packet is allocated. This makes it easier to apply crypto that spans the security header and the data, particularly in the upcoming RxGK class where we have a common encrypt-and-checksum function that is used in a number of circumstances. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8d47a43c |
|
15-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Merge prime_packet_security into init_connection_security Merge the ->prime_packet_security() into the ->init_connection_security() hook as they're always called together. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d5953f65 |
|
27-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow security classes to give more info on server keys Allow a security class to give more information on an rxrpc_s-type key when it is viewed in /proc/keys. This will allow the upcoming RxGK security class to show the enctype name here. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
12da59fc |
|
16-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Hand server key parsing off to the security class Hand responsibility for parsing a server key off to the security class. We can determine which class from the description. This is necessary as rxgk server keys have different lookup requirements and different content requirements (dependent on crypto type) to those of rxkad server keys. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ca7fb100 |
|
16-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Split the server key type (rxrpc_s) into its own file Split the server private key type (rxrpc_s) out into its own file rather than mingling it with the authentication/client key type (rxrpc) since they don't really bear any relation. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ec832bd0 |
|
16-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't retain the server key in the connection Don't retain a pointer to the server key in the connection, but rather get it on demand when the server has to deal with a response packet. This is necessary to implement RxGK (GSSAPI-mediated transport class), where we can't know which key we'll need until we've challenged the client and got back the response. This also means that we don't need to do a key search in the accept path in softirq mode. Also, whilst we're at it, allow the security class to ask for a kvno and encoding-type variant of a server key as RxGK needs different keys for different encoding types. Keys of this type have an extra bit in the description: "<service-id>:<security-index>:<kvno>:<enctype>" Signed-off-by: David Howells <dhowells@redhat.com>
|
#
41057ebd |
|
16-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Support keys with multiple authentication tokens rxrpc-type keys can have multiple tokens attached for different security classes. Currently, rxrpc always picks the first one, whether or not the security class it indicates is supported. Add preliminary support for choosing which security class will be used (this will need to be directed from a higher layer) and go through the tokens to find one that's supported. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ddc7834a |
|
30-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix loss of final ack on shutdown Fix the loss of transmission of a call's final ack when a socket gets shut down. This means that the server will retransmit the last data packet or send a ping ack and then get an ICMP indicating the port got closed. The server will then view this as a failure. Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
2d914c1b |
|
30-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix accept on a connection that need securing When a new incoming call arrives at an userspace rxrpc socket on a new connection that has a security class set, the code currently pushes it onto the accept queue to hold a ref on it for the socket. This doesn't work, however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING state and discards the ref. This means that the call runs out of refs too early and the kernel oopses. By contrast, a kernel rxrpc socket manually pre-charges the incoming call pool with calls that already have user call IDs assigned, so they are ref'd by the call tree on the socket. Change the mode of operation for userspace rxrpc server sockets to work like this too. Although this is a UAPI change, server sockets aren't currently functional. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8806245a |
|
14-Sep-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix rxrpc_bundle::alloc_error to be signed The alloc_error field in the rxrpc_bundle struct should be signed as it has negative error codes assigned to it. Checks directly on it may then fail, and may produce a warning like this: net/rxrpc/conn_client.c:662 rxrpc_wait_for_channel() warn: 'bundle->alloc_error' is unsigned Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
245500d8 |
|
01-Jul-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Rewrite the client connection manager Rewrite the rxrpc client connection manager so that it can support multiple connections for a given security key to a peer. The following changes are made: (1) For each open socket, the code currently maintains an rbtree with the connections placed into it, keyed by communications parameters. This is tricky to maintain as connections can be culled from the tree or replaced within it. Connections can require replacement for a number of reasons, e.g. their IDs span too great a range for the IDR data type to represent efficiently, the call ID numbers on that conn would overflow or the conn got aborted. This is changed so that there's now a connection bundle object placed in the tree, keyed on the same parameters. The bundle, however, does not need to be replaced. (2) An rxrpc_bundle object can now manage the available channels for a set of parallel connections. The lock that manages this is moved there from the rxrpc_connection struct (channel_lock). (3) There'a a dummy bundle for all incoming connections to share so that they have a channel_lock too. It might be better to give each incoming connection its own bundle. This bundle is not needed to manage which channels incoming calls are made on because that's the solely at whim of the client. (4) The restrictions on how many client connections are around are removed. Instead, a previous patch limits the number of client calls that can be allocated. Ordinarily, client connections are reaped after 2 minutes on the idle queue, but when more than a certain number of connections are in existence, the reaper starts reaping them after 2s of idleness instead to get the numbers back down. It could also be made such that new call allocations are forced to wait until the number of outstanding connections subsides. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b7a7d674 |
|
02-Jul-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Impose a maximum number of client calls Impose a maximum on the number of client rxrpc calls that are allowed simultaneously. This will be in lieu of a maximum number of client connections as this is easier to administed as, unlike connections, calls aren't reusable (to be changed in a subsequent patch).. This doesn't affect the limits on service calls and connections. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4700c4d8 |
|
19-Aug-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix loss of RTT samples due to interposed ACK The Rx protocol has a mechanism to help generate RTT samples that works by a client transmitting a REQUESTED-type ACK when it receives a DATA packet that has the REQUEST_ACK flag set. The peer, however, may interpose other ACKs before transmitting the REQUESTED-ACK, as can be seen in the following trace excerpt: rxrpc_tx_data: c=00000044 DATA d0b5ece8:00000001 00000001 q=00000001 fl=07 rxrpc_rx_ack: c=00000044 00000001 PNG r=00000000 f=00000002 p=00000000 n=0 rxrpc_rx_ack: c=00000044 00000002 REQ r=00000001 f=00000002 p=00000001 n=0 ... DATA packet 1 (q=xx) has REQUEST_ACK set (bit 1 of fl=xx). The incoming ping (labelled PNG) hard-acks the request DATA packet (f=xx exceeds the sequence number of the DATA packet), causing it to be discarded from the Tx ring. The ACK that was requested (labelled REQ, r=xx references the serial of the DATA packet) comes after the ping, but the sk_buff holding the timestamp has gone and the RTT sample is lost. This is particularly noticeable on RPC calls used to probe the service offered by the peer. A lot of peers end up with an unknown RTT because we only ever sent a single RPC. This confuses the server rotation algorithm. Fix this by caching the information about the outgoing packet in RTT calculations in the rxrpc_call struct rather than looking in the Tx ring. A four-deep buffer is maintained and both REQUEST_ACK-flagged DATA and PING-ACK transmissions are recorded in there. When the appropriate response ACK is received, the buffer is checked for a match and, if found, an RTT sample is recorded. If a received ACK refers to a packet with a later serial number than an entry in the cache, that entry is presumed lost and the entry is made available to record a new transmission. ACKs types other than REQUESTED-type and PING-type cause any matching sample to be cancelled as they don't necessarily represent a useful measurement. If there's no space in the buffer on ping/data transmission, the sample base is discarded. Fixes: 50235c4b5a2f ("rxrpc: Obtain RTT data by requesting ACKs on DATA packets") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a7b75c5a |
|
23-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
net: pass a sockptr_t into ->setsockopt Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3067bf8c |
|
03-Jun-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the call completion handling out of line Move the handling of call completion out of line so that the next patch can add more code in that area. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
|
#
c410bf01 |
|
11-May-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix the excessive initial retransmission timeout rxrpc currently uses a fixed 4s retransmission timeout until the RTT is sufficiently sampled. This can cause problems with some fileservers with calls to the cache manager in the afs filesystem being dropped from the fileserver because a packet goes missing and the retransmission timeout is greater than the call expiry timeout. Fix this by: (1) Copying the RTT/RTO calculation code from Linux's TCP implementation and altering it to fit rxrpc. (2) Altering the various users of the RTT to make use of the new SRTT value. (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO value instead (which is needed in jiffies), along with a backoff. Notes: (1) rxrpc provides RTT samples by matching the serial numbers on outgoing DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets against the reference serial number in incoming REQUESTED ACK and PING-RESPONSE ACK packets. (2) Each packet that is transmitted on an rxrpc connection gets a new per-connection serial number, even for retransmissions, so an ACK can be cross-referenced to a specific trigger packet. This allows RTT information to be drawn from retransmitted DATA packets also. (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than on an rxrpc_call because many RPC calls won't live long enough to generate more than one sample. (4) The calculated SRTT value is in units of 8ths of a microsecond rather than nanoseconds. The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers. Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7d7587db |
|
12-Mar-2020 |
David Howells <dhowells@redhat.com> |
afs: Fix client call Rx-phase signal handling Fix the handling of signals in client rxrpc calls made by the afs filesystem. Ignore signals completely, leaving call abandonment or connection loss to be detected by timeouts inside AF_RXRPC. Allowing a filesystem call to be interrupted after the entire request has been transmitted and an abort sent means that the server may or may not have done the action - and we don't know. It may even be worse than that for older servers. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e138aa7d |
|
13-Mar-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix call interruptibility handling Fix the interruptibility of kernel-initiated client calls so that they're either only interruptible when they're waiting for a call slot to come available or they're not interruptible at all. Either way, they're not interruptible during transmission. This should help prevent StoreData calls from being interrupted when writeback is in progress. It doesn't, however, handle interruption during the receive phase. Userspace-initiated calls are still interruptable. After the signal has been handled, sendmsg() will return the amount of data copied out of the buffer and userspace can perform another sendmsg() call to continue transmission. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5273a191 |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect When a call is disconnected, the connection pointer from the call is cleared to make sure it isn't used again and to prevent further attempted transmission for the call. Unfortunately, there might be a daemon trying to use it at the same time to transmit a packet. Fix this by keeping call->conn set, but setting a flag on the call to indicate disconnection instead. Remove also the bits in the transmission functions where the conn pointer is checked and a ref taken under spinlock as this is now redundant. Fixes: 8d94aa381dab ("rxrpc: Calls shouldn't hold socket refs") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
04d36d74 |
|
30-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix missing active use pinning of rxrpc_local object The introduction of a split between the reference count on rxrpc_local objects and the usage count didn't quite go far enough. A number of kernel work items need to make use of the socket to perform transmission. These also need to get an active count on the local object to prevent the socket from being closed. Fix this by getting the active count in those places. Also split out the raw active count get/put functions as these places tend to hold refs on the rxrpc_local object already, so getting and putting an extra object ref is just a waste of time. The problem can lead to symptoms like: BUG: kernel NULL pointer dereference, address: 0000000000000018 .. CPU: 2 PID: 818 Comm: kworker/u9:0 Not tainted 5.5.0-fscache+ #51 ... RIP: 0010:selinux_socket_sendmsg+0x5/0x13 ... Call Trace: security_socket_sendmsg+0x2c/0x3e sock_sendmsg+0x1a/0x46 rxrpc_send_keepalive+0x131/0x1ae rxrpc_peer_keepalive_worker+0x219/0x34b process_one_work+0x18e/0x271 worker_thread+0x1a3/0x247 kthread+0xe6/0xeb ret_from_fork+0x1f/0x30 Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
063c60d3 |
|
20-Dec-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix missing security check on incoming calls Fix rxrpc_new_incoming_call() to check that we have a suitable service key available for the combination of service ID and security class of a new incoming call - and to reject calls for which we don't. This causes an assertion like the following to appear: rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false kernel BUG at net/rxrpc/call_object.c:456! Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than RXRPC_CALL_COMPLETE (12). Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f9c32435 |
|
30-Oct-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix handling of last subpacket of jumbo packet When rxrpc_recvmsg_data() sets the return value to 1 because it's drained all the data for the last packet, it checks the last-packet flag on the whole packet - but this is wrong, since the last-packet flag is only set on the final subpacket of the last jumbo packet. This means that a call that receives its last packet in a jumbo packet won't complete properly. Fix this by having rxrpc_locate_data() determine the last-packet state of the subpacket it's looking at and passing that back to the caller rather than having the caller look in the packet header. The caller then needs to cache this in the rxrpc_call struct as rxrpc_locate_data() isn't then called again for this packet. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91fcfbe8 |
|
07-Oct-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix call crypto state cleanup Fix the cleanup of the crypto state on a call after the call has been disconnected. As the call has been disconnected, its connection ref has been discarded and so we can't go through that to get to the security ops table. Fix this by caching the security ops pointer in the rxrpc_call struct and using that when freeing the call security state. Also use this in other places we're dealing with call-specific security. The symptoms look like: BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60 net/rxrpc/call_object.c:481 Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764 Fixes: 1db88c534371 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto") Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d12040b6 |
|
29-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2] When a local endpoint is ceases to be in use, such as when the kafs module is unloaded, the kernel will emit an assertion failure if there are any outstanding client connections: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:433! and even beyond that, will evince other oopses if there are service connections still present. Fix this by: (1) Removing the triggering of connection reaping when an rxrpc socket is released. These don't actually clean up the connections anyway - and further, the local endpoint may still be in use through another socket. (2) Mark the local endpoint as dead when we start the process of tearing it down. (3) When destroying a local endpoint, strip all of its client connections from the idle list and discard the ref on each that the list was holding. (4) When destroying a local endpoint, call the service connection reaper directly (rather than through a workqueue) to immediately kill off all outstanding service connections. (5) Make the service connection reaper reap connections for which the local endpoint is marked dead. Only after destroying the connections can we close the socket lest we get an oops in a workqueue that's looking at a connection or a peer. Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0d5c0cd |
|
27-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Use skb_unshare() rather than skb_cow_data() The in-place decryption routines in AF_RXRPC's rxkad security module currently call skb_cow_data() to make sure the data isn't shared and that the skb can be written over. This has a problem, however, as the softirq handler may be still holding a ref or the Rx ring may be holding multiple refs when skb_cow_data() is called in rxkad_verify_packet() - and so skb_shared() returns true and __pskb_pull_tail() dislikes that. If this occurs, something like the following report will be generated. kernel BUG at net/core/skbuff.c:1463! ... RIP: 0010:pskb_expand_head+0x253/0x2b0 ... Call Trace: __pskb_pull_tail+0x49/0x460 skb_cow_data+0x6f/0x300 rxkad_verify_packet+0x18b/0xb10 [rxrpc] rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc] rxrpc_kernel_recv_data+0x126/0x240 [rxrpc] afs_extract_data+0x51/0x2d0 [kafs] afs_deliver_fs_fetch_data+0x188/0x400 [kafs] afs_deliver_to_call+0xac/0x430 [kafs] afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs] afs_make_call+0x282/0x3f0 [kafs] afs_fs_fetch_data+0x164/0x300 [kafs] afs_fetch_data+0x54/0x130 [kafs] afs_readpages+0x20d/0x340 [kafs] read_pages+0x66/0x180 __do_page_cache_readahead+0x188/0x1a0 ondemand_readahead+0x17d/0x2e0 generic_file_read_iter+0x740/0xc10 __vfs_read+0x145/0x1a0 vfs_read+0x8c/0x140 ksys_read+0x4a/0xb0 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by using skb_unshare() instead in the input path for DATA packets that have a security index != 0. Non-DATA packets don't need in-place encryption and neither do unencrypted DATA packets. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Julian Wollrath <jwollrath@web.de> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
987db9f7 |
|
19-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Use the tx-phase skb flag to simplify tracing Use the previously-added transmit-phase skbuff private flag to simplify the socket buffer tracing a bit. Which phase the skbuff comes from can now be divined from the skb rather than having to be guessed from the call state. We can also reduce the number of rxrpc_skb_trace values by eliminating the difference between Tx and Rx in the symbols. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b311e684 |
|
19-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a private skb flag to indicate transmission-phase skbs Add a flag in the private data on an skbuff to indicate that this is a transmission-phase buffer rather than a receive-phase buffer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e2de6c40 |
|
27-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Use info in skbuff instead of reparsing a jumbo packet Use the information now cached in the skbuff private data to avoid the need to reparse a jumbo packet. We can find all the subpackets by dead reckoning, so it's only necessary to note how many there are, whether the last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK flag set. This is necessary as once recvmsg() can see the packet, it can start modifying it, such as doing in-place decryption. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c3c9e3df |
|
19-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Improve jumbo packet counting Improve the information stored about jumbo packets so that we don't need to reparse them so much later. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
|
#
e8c3af6b |
|
09-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't bother generating maxSkew in the ACK packet Don't bother generating maxSkew in the ACK packet as it has been obsolete since AFS 3.1. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
|
#
730c5fd4 |
|
09-Aug-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix local endpoint refcounting The object lifetime management on the rxrpc_local struct is broken in that the rxrpc_local_processor() function is expected to clean up and remove an object - but it may get requeued by packets coming in on the backing UDP socket once it starts running. This may result in the assertion in rxrpc_local_rcu() firing because the memory has been scheduled for RCU destruction whilst still queued: rxrpc: Assertion failed ------------[ cut here ]------------ kernel BUG at net/rxrpc/local_object.c:468! Note that if the processor comes around before the RCU free function, it will just do nothing because ->dead is true. Fix this by adding a separate refcount to count active users of the endpoint that causes the endpoint to be destroyed when it reaches 0. The original refcount can then be used to refcount objects through the work processor and cause the memory to be rcu freed when that reaches 0. Fixes: 4f95dd78a77e ("rxrpc: Rework local endpoint management") Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1db88c53 |
|
30-Jul-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto rxkad sometimes triggers a warning about oversized stack frames when building with clang for a 32-bit architecture: net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=] net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=] The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in rxkad_verify_packet()/rxkad_secure_packet() with the relatively large scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt(). The warning does not show up when using gcc, which does not inline the functions as aggressively, but the problem is still the same. Allocate the cipher buffers from the slab instead, caching the allocated packet crypto request memory used for DATA packet crypto in the rxrpc_call struct. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60034d3d |
|
30-Jul-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix potential deadlock There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which the tries to take the already held lock. Fix this by providing a version of rxrpc_put_peer() that can be called in situations where the lock is already held. The bug may produce the following lockdep report: ============================================ WARNING: possible recursive locking detected 5.2.0-next-20190718 #41 Not tainted -------------------------------------------- kworker/0:3/21678 is trying to acquire lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435 but task is already holding lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430 Fixes: 330bdcfadcee ("rxrpc: Fix the keepalive generator [ver #2]") Reported-by: syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
b960a34b |
|
09-May-2019 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow the kernel to mark a call as being non-interruptible Allow kernel services using AF_RXRPC to indicate that a call should be non-interruptible. This allows kafs to make things like lock-extension and writeback data storage calls non-interruptible. If this is set, signals will be ignored for operations on that call where possible - such as waiting to get a call channel on an rxrpc connection. It doesn't prevent UDP sendmsg from being interrupted, but that will be handled by packet retransmission. rxrpc_kernel_recv_data() isn't affected by this since that never waits, preferring instead to return -EAGAIN and leave the waiting to the caller. Userspace initiated calls can't be set to be uninterruptible at this time. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1a2391c3 |
|
12-Apr-2019 |
Jeffrey Altman <jaltman@auristor.com> |
rxrpc: Fix detection of out of order acks The rxrpc packet serial number cannot be safely used to compute out of order ack packets for several reasons: 1. The allocation of serial numbers cannot be assumed to imply the order by which acks are populated and transmitted. In some rxrpc implementations, delayed acks and ping acks are transmitted asynchronously to the receipt of data packets and so may be transmitted out of order. As a result, they can race with idle acks. 2. Serial numbers are allocated by the rxrpc connection and not the call and as such may wrap independently if multiple channels are in use. In any case, what matters is whether the ack packet provides new information relating to the bounds of the window (the firstPacket and previousPacket in the ACK data). Fix this by discarding packets that appear to wind back the window bounds rather than on serial number procession. Fixes: 298bc15b2079 ("rxrpc: Only take the rwind and mtu values from latest ACK") Signed-off-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e122d845 |
|
10-Jan-2019 |
David Howells <dhowells@redhat.com> |
Revert "rxrpc: Allow failed client calls to be retried" The changes introduced to allow rxrpc calls to be retried creates an issue when it comes to refcounting afs_call structs. The problem is that when rxrpc_send_data() queues the last packet for an asynchronous call, the following sequence can occur: (1) The notify_end_tx callback is invoked which causes the state in the afs_call to be changed from AFS_CALL_CL_REQUESTING or AFS_CALL_SV_REPLYING. (2) afs_deliver_to_call() can then process event notifications from rxrpc on the async_work queue. (3) Delivery of events, such as an abort from the server, can cause the afs_call state to be changed to AFS_CALL_COMPLETE on async_work. (4) For an asynchronous call, afs_process_async_call() notes that the call is complete and tried to clean up all the refs on async_work. (5) rxrpc_send_data() might return the amount of data transferred (success) or an error - which could in turn reflect a local error or a received error. Synchronising the clean up after rxrpc_kernel_send_data() returns an error with the asynchronous cleanup is then tricky to get right. Mostly revert commit c038a58ccfd6704d4d7d60ed3d6a0fca13cf13a4. The two API functions the original commit added aren't currently used. This makes rxrpc_kernel_send_data() always return successfully if it queued the data it was given. Note that this doesn't affect synchronous calls since their Rx notification function merely pokes a wait queue and does not refcounting. The asynchronous call notification function *has* to do refcounting and pass a ref over the work item to avoid the need to sync the workqueue in call cleanup. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7e86acf |
|
01-Nov-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix lockup due to no error backoff after ack transmit error If the network becomes (partially) unavailable, say by disabling IPv6, the background ACK transmission routine can get itself into a tizzy by proposing immediate ACK retransmission. Since we're in the call event processor, that happens immediately without returning to the workqueue manager. The condition should clear after a while when either the network comes back or the call times out. Fix this by: (1) When re-proposing an ACK on failed Tx, don't schedule it immediately. This will allow a certain amount of time to elapse before we try again. (2) Enforce a return to the workqueue manager after a certain number of iterations of the call processing loop. (3) Add a backoff delay that increases the delay on deferred ACKs by a jiffy per failed transmission to a limit of HZ. The backoff delay is cleared on a successful return from kernel_sendmsg(). (4) Cancel calls immediately if the opening sendmsg fails. The layer above can arrange retransmission or rotate to another server. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc0e7cf4 |
|
15-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Add /proc/net/rxrpc/peers to display peer list Add /proc/net/rxrpc/peers to display the list of peers currently active. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1e15b49 |
|
08-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix the packet reception routine The rxrpc_input_packet() function and its call tree was built around the assumption that data_ready() handler called from UDP to inform a kernel service that there is data to be had was non-reentrant. This means that certain locking could be dispensed with. This, however, turns out not to be the case with a multi-queue network card that can deliver packets to multiple cpus simultaneously. Each of those cpus can be in the rxrpc_input_packet() function at the same time. Fix by adding or changing some structure members: (1) Add peer->rtt_input_lock to serialise access to the RTT buffer. (2) Make conn->service_id into a 32-bit variable so that it can be cmpxchg'd on all arches. (3) Add call->input_lock to serialise access to the Rx/Tx state. Note that although the Rx and Tx states are (almost) entirely separate, there's no point completing the separation and having separate locks since it's a bi-phasal RPC protocol rather than a bi-direction streaming protocol. Data transmission and data reception do not take place simultaneously on any particular call. and making the following functional changes: (1) In rxrpc_input_data(), hold call->input_lock around the core to prevent simultaneous producing of packets into the Rx ring and updating of tracking state for a particular call. (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and check it before checking RXRPC_CALL_PINGING as that's a cheaper test. The bit test and bit clear can then be combined. No further locking is needed here. (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of the ACK packet. The superseded ACK check is then done both before and after the lock is taken. The handing of ackinfo data is split, parsing before the lock is taken and processing with it held. This is keyed on rxMTU being non-zero. Congestion management is also done within the locked section. (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window rotation. The ACKALL packet carries no information and is only really useful after all packets have been transmitted since it's imprecise. (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to prevent calls being simultaneously implicitly ended on two cpus and also to prevent any races with incoming call setup. (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade on a connection. It is only permitted to happen once for a connection. (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside rx->incoming_lock to see if someone else set up the call, connection or peer whilst we were getting there. We can't trust the values from the earlier routing check unless we pin refs on them - which we want to avoid. Further, we need to allow for an incoming call to have its state changed on another CPU between us making it live and us adjusting it because the conn is now in the RXRPC_CONN_SERVICE state. (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access to the RTT buffer. Don't need to lock around setting peer->rtt. For reference, the inventory of state-accessing or state-altering functions used by the packet input procedure is: > rxrpc_input_packet() * PACKET CHECKING * ROUTING > rxrpc_post_packet_to_local() > rxrpc_find_connection_rcu() - uses RCU > rxrpc_lookup_peer_rcu() - uses RCU > rxrpc_find_service_conn_rcu() - uses RCU > idr_find() - uses RCU * CONNECTION-LEVEL PROCESSING - Service upgrade - Can only happen once per conn ! Changed to use cmpxchg > rxrpc_post_packet_to_conn() - Setting conn->hi_serial - Probably safe not using locks - Maybe use cmpxchg * CALL-LEVEL PROCESSING > Old-call checking > rxrpc_input_implicit_end_call() > rxrpc_call_completed() > rxrpc_queue_call() ! Need to take rx->incoming_lock > __rxrpc_disconnect_call() > rxrpc_notify_socket() > rxrpc_new_incoming_call() - Uses rx->incoming_lock for the entire process - Might be able to drop this earlier in favour of the call lock > rxrpc_incoming_call() ! Conflicts with rxrpc_input_implicit_end_call() > rxrpc_send_ping() - Don't need locks to check rtt state > rxrpc_propose_ACK * PACKET DISTRIBUTION > rxrpc_input_call_packet() > rxrpc_input_data() * QUEUE DATA PACKET ON CALL > rxrpc_reduce_call_timer() - Uses timer_reduce() ! Needs call->input_lock() > rxrpc_receiving_reply() ! Needs locking around ack state > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_proto_abort() > rxrpc_input_dup_data() - Fills the Rx buffer - rxrpc_propose_ACK() - rxrpc_notify_socket() > rxrpc_input_ack() * APPLY ACK PACKET TO CALL AND DISCARD PACKET > rxrpc_input_ping_response() - Probably doesn't need any extra locking ! Need READ_ONCE() on call->ping_serial > rxrpc_input_check_for_lost_ack() - Takes call->lock to consult Tx buffer > rxrpc_peer_add_rtt() ! Needs to take a lock (peer->rtt_input_lock) ! Could perhaps manage with cmpxchg() and xadd() instead > rxrpc_input_requested_ack - Consults Tx buffer ! Probably needs a lock > rxrpc_peer_add_rtt() > rxrpc_propose_ack() > rxrpc_input_ackinfo() - Changes call->tx_winsize ! Use cmpxchg to handle change ! Should perhaps track serial number - Uses peer->lock to record MTU specification changes > rxrpc_proto_abort() ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_input_soft_acks() - Consults the Tx buffer > rxrpc_congestion_management() - Modifies the Tx annotations ! Needs call->input_lock() > rxrpc_queue_call() > rxrpc_input_abort() * APPLY ABORT PACKET TO CALL AND DISCARD PACKET > rxrpc_set_call_completion() > rxrpc_notify_socket() > rxrpc_input_ackall() * APPLY ACKALL PACKET TO CALL AND DISCARD PACKET ! Need to take call->input_lock > rxrpc_rotate_tx_window() > rxrpc_end_tx_phase() > rxrpc_reject_packet() There are some functions used by the above that queue the packet, after which the procedure is terminated: - rxrpc_post_packet_to_local() - local->event_queue is an sk_buff_head - local->processor is a work_struct - rxrpc_post_packet_to_conn() - conn->rx_queue is an sk_buff_head - conn->processor is a work_struct - rxrpc_reject_packet() - local->reject_queue is an sk_buff_head - local->processor is a work_struct And some that offload processing to process context: - rxrpc_notify_socket() - Uses RCU lock - Uses call->notify_lock to call call->notify_rx - Uses call->recvmsg_lock to queue recvmsg side - rxrpc_queue_call() - call->processor is a work_struct - rxrpc_propose_ACK() - Uses call->lock to wrap __rxrpc_propose_ACK() And a bunch that complete a call, all of which use call->state_lock to protect the call state: - rxrpc_call_completed() - rxrpc_set_call_completion() - rxrpc_abort_call() - rxrpc_proto_abort() - Also uses rxrpc_queue_call() Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
64753092 |
|
08-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix connection-level abort handling Fix connection-level abort handling to cache the abort and error codes properly so that a new incoming call can be properly aborted if it races with the parent connection being aborted by another CPU. The abort_code and error parameters can then be dropped from rxrpc_abort_calls(). Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5271953c |
|
04-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Use the UDP encap_rcv hook Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception in which a packet is placed onto the UDP receive queue and then immediately removed again by rxrpc. Going via the queue in this manner seems like it should be unnecessary. This does, however, require the invention of a value to place in encap_type as that's one of the conditions to switch packets out to the encap_rcv hook. Possibly the value doesn't actually matter for anything other than sockopts on the UDP socket, which aren't accessible outside of rxrpc anyway. This seems to cut a bit of time out of the time elapsed between each sk_buff being timestamped and turning up in rxrpc (the final number in the following trace excerpts). I measured this by making the rxrpc_rx_packet trace point print the time elapsed between the skb being timestamped and the current time (in ns), e.g.: ... 424.278721: rxrpc_rx_packet: ... ACK 25026 So doing a 512MiB DIO read from my test server, with an unmodified kernel: N min max sum mean stddev 27605 2626 7581 7.83992e+07 2840.04 181.029 and with the patch applied: N min max sum mean stddev 27547 1895 12165 6.77461e+07 2459.29 255.02 Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5e33a23b |
|
05-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix some missed refs to init_net Fix some refs to init_net that should've been changed to the appropriate network namespace. Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com>
|
#
5a790b73 |
|
04-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop the local endpoint arg from rxrpc_extract_addr_from_skb() rxrpc_extract_addr_from_skb() doesn't use the argument that points to the local endpoint, so remove the argument. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d2944b1c |
|
04-Oct-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Use rxrpc_free_skb() rather than rxrpc_lose_skb() rxrpc_lose_skb() is now exactly the same as rxrpc_free_skb(), so remove it and use the latter instead. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f3344303 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix error distribution Fix error distribution by immediately delivering the errors to all the affected calls rather than deferring them to a worker thread. The problem with the latter is that retries and things can happen in the meantime when we want to stop that sooner. To this end: (1) Stop the error distributor from removing calls from the error_targets list so that peer->lock isn't needed to synchronise against other adds and removals. (2) Require the peer's error_targets list to be accessed with RCU, thereby avoiding the need to take peer->lock over distribution. (3) Don't attempt to affect a call's state if it is already marked complete. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0099dc58 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Make service call handling more robust Make the following changes to improve the robustness of the code that sets up a new service call: (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a service ID check and pass that along to rxrpc_new_incoming_call(). This means that I can remove the check from rxrpc_new_incoming_call() without the need to worry about the socket attached to the local endpoint getting replaced - which would invalidate the check. (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be done once. The peer is passed to rxrpc_new_incoming_call(), thereby saving the need to repeat the search. This also reduces the possibility of rxrpc_publish_service_conn() BUG()'ing due to the detection of a duplicate connection, despite the initial search done by rxrpc_find_connection_rcu() having turned up nothing. This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be non-reentrant and the result of the initial search should still hold true, but it has proven possible to hit. I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short the iteration over the hash table if it finds a matching peer with a zero usage count, but I don't know for sure since it's only ever been hit once that I know of. Another possibility is that a bug in rxrpc_data_ready() that checked the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag might've let through a packet that caused a spurious and invalid call to be set up. That is addressed in another patch. (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero usage count rather than stopping and returning not found, just in case there's another peer record behind it in the bucket. (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but rather either use the peer cached in (2) or, if one wasn't found, preemptively install a new one. Fixes: 8496af50eb38 ("rxrpc: Use RCU to access a peer's service connection tree") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ece64fec |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Emit BUSY packets when supposed to rather than ABORTs In the input path, a received sk_buff can be marked for rejection by setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data (such as an abort code) in skb->priority. The rejection is handled by queueing the sk_buff up for dealing with in process context. The output code reads the mark and priority and, theoretically, generates an appropriate response packet. However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT message with a random abort code is generated (since skb->priority wasn't set to anything). Fix this by outputting the appropriate sort of packet. Also, whilst we're at it, most of the marks are no longer used, so remove them and rename the remaining two to something more obvious. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dc71db34 |
|
27-Sep-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix checks as to whether we should set up a new call There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED flag in the packet type field rather than in the packet flags field. Fix this by creating a pair of helper functions to check whether the packet is going to the client or to the server and use them generally. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
69d826fa |
|
18-Sep-2018 |
Kees Cook <keescook@chromium.org> |
rxrpc: Remove VLA usage of skcipher In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
330bdcfa |
|
08-Aug-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix the keepalive generator [ver #2] AF_RXRPC has a keepalive message generator that generates a message for a peer ~20s after the last transmission to that peer to keep firewall ports open. The implementation is incorrect in the following ways: (1) It mixes up ktime_t and time64_t types. (2) It uses ktime_get_real(), the output of which may jump forward or backward due to adjustments to the time of day. (3) If the current time jumps forward too much or jumps backwards, the generator function will crank the base of the time ring round one slot at a time (ie. a 1s period) until it catches up, spewing out VERSION packets as it goes. Fix the problem by: (1) Only using time64_t. There's no need for sub-second resolution. (2) Use ktime_get_seconds() rather than ktime_get_real() so that time isn't perceived to go backwards. (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two parts: (a) The "worker" function that manages the buckets and the timer. (b) The "dispatch" function that takes the pending peers and potentially transmits a keepalive packet before putting them back in the ring into the slot appropriate to the revised last-Tx time. (4) Taking everything that's pending out of the ring and splicing it into a temporary collector list for processing. In the case that there's been a significant jump forward, the ring gets entirely emptied and then the time base can be warped forward before the peers are processed. The warping can't happen if the ring isn't empty because the slot a peer is in is keepalive-time dependent, relative to the base time. (5) Limit the number of iterations of the bucket array when scanning it. (6) Set the timer to skip any empty slots as there's no point waking up if there's nothing to do yet. This can be triggered by an incoming call from a server after a reboot with AF_RXRPC and AFS built into the kernel causing a peer record to be set up before userspace is started. The system clock is then adjusted by userspace, thereby potentially causing the keepalive generator to have a meltdown - which leads to a message like: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23] ... Workqueue: krxrpcd rxrpc_peer_keepalive_worker EIP: lock_acquire+0x69/0x80 ... Call Trace: ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? _raw_spin_lock_bh+0x29/0x60 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? rxrpc_peer_keepalive_worker+0x5e/0x350 ? __lock_acquire+0x3d3/0x870 ? process_one_work+0x110/0x340 ? process_one_work+0x166/0x340 ? process_one_work+0x110/0x340 ? worker_thread+0x39/0x3c0 ? kthread+0xdb/0x110 ? cancel_delayed_work+0x90/0x90 ? kthread_stop+0x70/0x70 ? ret_from_fork+0x19/0x24 Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0b35a42 |
|
23-Jul-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Transmit more ACKs during data reception Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() - which transfers data to the target buffers - if we previously had an Rx underrun (ie. we returned -EAGAIN because we ran out of received data). This lets the server know what we've managed to receive something. Also flush any outstanding ACK after calling the function if it hit -EAGAIN to let the server know we processed some data. It might be better to send more ACKs, possibly on a time-based scheme, but that needs some more consideration. With this and some additional AFS patches, it is possible to get large unencrypted O_DIRECT reads to be almost as fast as NFS over TCP. It looks like it might be theoretically possible to improve performance yet more for a server running a single operation as investigation of packet timestamps indicates that the server keeps stalling. The issue appears to be that rxrpc runs in to trouble with ACK packets getting batched together (up to ~32 at a time) somewhere between the IP transmit queue on the client and the ethernet receive queue on the server. However, this case isn't too much of a worry as even a lightly loaded server should be receiving sufficient packet flux to flush the ACK packets to the UDP socket. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4075295a |
|
23-Jul-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Increase the size of a call's Rx window Increase the size of a call's Rx window from 32 to 63 - ie. one less than the size of the ring buffer. This makes large data transfers perform better when the Tx window on the other side is around 64 (as is the case with Auristor's YFS fileserver). If the server window size is ~32 or smaller, this should make no difference. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4764c0da |
|
23-Jul-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace packet transmission Trace successful packet transmission (kernel_sendmsg() succeeded, that is) in AF_RXRPC. We can share the enum that defines the transmission points with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be applicable to both. Also, save the internal call->debug_id in the rxrpc_channel struct so that it can be used in retransmission trace lines. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1a025028 |
|
02-Jun-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix handling of call quietly cancelled out on server Sometimes an in-progress call will stop responding on the fileserver when the fileserver quietly cancels the call with an internally marked abort (RX_CALL_DEAD), without sending an ABORT to the client. This causes the client's call to eventually expire from lack of incoming packets directed its way, which currently leads to it being cancelled locally with ETIME. Note that it's not currently clear as to why this happens as it's really hard to reproduce. The rotation policy implement by kAFS, however, doesn't differentiate between ETIME meaning we didn't get any response from the server and ETIME meaning the call got cancelled mid-flow. The latter leads to an oops when fetching data as the rotation partially resets the afs_read descriptor, which can result in a cleared page pointer being dereferenced because that page has already been filled. Handle this by the following means: (1) Set a flag on a call when we receive a packet for it. (2) Store the highest packet serial number so far received for a call (bearing in mind this may wrap). (3) If, when the "not received anything recently" timeout expires on a call, we've received at least one packet for a call and the connection as a whole has received packets more recently than that call, then cancel the call locally with ECONNRESET rather than ETIME. This indicates that the call was definitely in progress on the server. (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME, don't try the next server, but rather abort the call. This avoids the oops as we don't try to reuse the afs_read struct. Rather, as-yet ungotten pages will be reread at a later data. Also: (5) Add an rxrpc tracepoint to log detection of the call being reset. Without this, I occasionally see an oops like the following: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:_copy_to_iter+0x204/0x310 RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206 RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560 RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000 RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400 R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958 R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560 FS: 00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0 Call Trace: skb_copy_datagram_iter+0x14e/0x289 rxrpc_recvmsg_data.isra.0+0x6f3/0xf68 ? trace_buffer_unlock_commit_regs+0x4f/0x89 rxrpc_kernel_recv_data+0x149/0x421 afs_extract_data+0x1e0/0x798 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_deliver_fs_fetch_data+0x33a/0x5ab afs_deliver_to_call+0x1ee/0x5e0 ? afs_wait_for_call_to_complete+0xc9/0x52e afs_wait_for_call_to_complete+0x12b/0x52e ? wake_up_q+0x54/0x54 afs_make_call+0x287/0x462 ? afs_fs_fetch_data+0x3e6/0x3ed ? rcu_read_lock_sched_held+0x5d/0x63 afs_fs_fetch_data+0x3e6/0x3ed afs_fetch_data+0xbb/0x14a afs_readpages+0x317/0x40d __do_page_cache_readahead+0x203/0x2ba ? ondemand_readahead+0x3a7/0x3c1 ondemand_readahead+0x3a7/0x3c1 generic_file_buffered_read+0x18b/0x62f __vfs_read+0xdb/0xfe vfs_read+0xb2/0x137 ksys_read+0x50/0x8c do_syscall_64+0x7d/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Note the weird value in RDI which is a result of trying to kmap() a NULL page pointer. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3506372 |
|
10-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_net{,_data} Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
c54e43d7 |
|
10-May-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix missing start of call timeout The expect_rx_by call timeout is supposed to be set when a call is started to indicate that we need to receive a packet by that point. This is currently put back every time we receive a packet, but it isn't started when we first send a packet. Without this, the call may wait forever if the server doesn't deign to reply. Fix this by setting the timeout upon a successful UDP sendmsg call for the first DATA packet. The timeout is initiated only for initial transmission and not for subsequent retries as we don't want the retry mechanism to extend the timeout indefinitely. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
17226f12 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix leak of rxrpc_peer objects When a new client call is requested, an rxrpc_conn_parameters struct object is passed in with a bunch of parameters set, such as the local endpoint to use. A pointer to the target peer record is also placed in there by rxrpc_get_client_conn() - and this is removed if and only if a new connection object is allocated. Thus it leaks if a new connection object isn't allocated. Fix this by putting any peer object attached to the rxrpc_conn_parameters object in the function that allocated it. Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1159d4b4 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint to track rxrpc_peer refcounting Add a tracepoint to track reference counting on the rxrpc_peer struct. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
31f5f9a16 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix apparent leak of rxrpc_local objects rxrpc_local objects cannot be disposed of until all the connections that point to them have been RCU'd as a connection object holds refcount on the local endpoint it is communicating through. Currently, this can cause an assertion failure to occur when a network namespace is destroyed as there's no check that the RCU destructors for the connections have been run before we start trying to destroy local endpoints. The kernel reports: rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5} ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/local_object.c:439! Fix this by keeping a count of the live connections and waiting for it to go to zero at the end of rxrpc_destroy_all_connections(). Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
09d2bf59 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint to track rxrpc_local refcounting Add a tracepoint to track reference counting on the rxrpc_local struct. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d3be4d24 |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix potential call vs socket/net destruction race rxrpc_call structs don't pin sockets or network namespaces, but may attempt to access both after their refcount reaches 0 so that they can detach themselves from the network namespace. However, there's no guarantee that the socket still exists at this point (so sock_net(&call->socket->sk) may be invalid) and the namespace may have gone away if the call isn't pinning a peer. Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b) waiting for all calls to be destroyed when the network namespace goes away. This was detected by checker: net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces) net/rxrpc/call_object.c:634:57: expected struct sock const *sk net/rxrpc/call_object.c:634:57: got struct sock [noderef] <asn:4>*<noident> Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ace45bec |
|
30-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix firewall route keepalive Fix the firewall route keepalive part of AF_RXRPC which is currently function incorrectly by replying to VERSION REPLY packets from the server with VERSION REQUEST packets. Instead, send VERSION REPLY packets to the peers of service connections to act as keep-alives 20s after the latest packet was transmitted to that peer. Also, just discard VERSION REPLY packets rather than replying to them. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1bae5d22 |
|
27-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace call completion Add a tracepoint to track rxrpc calls moving into the completed state and to log the completion type and the recorded error value and abort code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a25e21f0 |
|
27-Mar-2018 |
David Howells <dhowells@redhat.com> |
rxrpc, afs: Use debug_ids rather than pointers in traces In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3d18cbb7 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix conn expiry timers Fix the rxrpc connection expiry timers so that connections for closed AF_RXRPC sockets get deleted in a more timely fashion, freeing up the transport UDP port much more quickly. (1) Replace the delayed work items with work items plus timers so that timer_reduce() can be used to shorten them and so that the timer doesn't requeue the work item if the net namespace is dead. (2) Don't use queue_delayed_work() as that won't alter the timeout if the timer is already running. (3) Don't rearm the timers if the network namespace is dead. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f859ab61 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix service endpoint expiry RxRPC service endpoints expire like they're supposed to by the following means: (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the global service conn timeout, otherwise the first rxrpc_net struct to die will cause connections on all others to expire immediately from then on. (2) Mark local service endpoints for which the socket has been closed (->service_closed) so that the expiration timeout can be much shortened for service and client connections going through that endpoint. (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage count reaches 1, not 0, as idle conns have a 1 count. (4) The accumulator for the earliest time we might want to schedule for should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as the comparison functions use signed arithmetic. (5) Simplify the expiration handling, adding the expiration value to the idle timestamp each time rather than keeping track of the time in the past before which the idle timestamp must go to be expired. This is much easier to read. (6) Ignore the timeouts if the net namespace is dead. (7) Restart the service reaper work item rather the client reaper. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
415f44e4 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Add keepalive for a call We need to transmit a packet every so often to act as a keepalive for the peer (which has a timeout from the last time it received a packet) and also to prevent any intervening firewalls from closing the route. Do this by resetting a timer every time we transmit a packet. If the timer ever expires, we transmit a PING ACK packet and thereby also elicit a PING RESPONSE ACK from the other side - which prevents our last-rx timeout from expiring. The timer is set to 1/6 of the last-rx timeout so that we can detect the other side going away if it misses 6 replies in a row. This is particularly necessary for servers where the processing of the service function may take a significant amount of time. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
bd1fdf8c |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a timeout for detecting lost ACKs/lost DATA Add an extra timeout that is set/updated when we send a DATA packet that has the request-ack flag set. This allows us to detect if we don't get an ACK in response to the latest flagged packet. The ACK packet is adjudged to have been lost if it doesn't turn up within 2*RTT of the transmission. If the timeout occurs, we schedule the sending of a PING ACK to find out the state of the other side. If a new DATA packet is ready to go sooner, we cancel the sending of the ping and set the request-ack flag on that instead. If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what we had at the time of the ping transmission, we adjudge all the DATA packets sent between the response tx_top and the ping-time tx_top to have been lost and retransmit immediately. Rather than sending a PING ACK, we could just pick a DATA packet and speculatively retransmit that with request-ack set. It should result in either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the a PING-RESPONSE ACK mentioned above. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a158bdd3 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix call timeouts Fix the rxrpc call expiration timeouts and make them settable from userspace. By analogy with other rx implementations, there should be three timeouts: (1) "Normal timeout" This is set for all calls and is triggered if we haven't received any packets from the peer in a while. It is measured from the last time we received any packet on that call. This is not reset by any connection packets (such as CHALLENGE/RESPONSE packets). If a service operation takes a long time, the server should generate PING ACKs at a duration that's substantially less than the normal timeout so is to keep both sides alive. This is set at 1/6 of normal timeout. (2) "Idle timeout" This is set only for a service call and is triggered if we stop receiving the DATA packets that comprise the request data. It is measured from the last time we received a DATA packet. (3) "Hard timeout" This can be set for a call and specified the maximum lifetime of that call. It should not be specified by default. Some operations (such as volume transfer) take a long time. Allow userspace to set/change the timeouts on a call with sendmsg, using a control message: RXRPC_SET_CALL_TIMEOUTS The data to the message is a number of 32-bit words, not all of which need be given: u32 hard_timeout; /* sec from first packet */ u32 idle_timeout; /* msec from packet Rx */ u32 normal_timeout; /* msec from data Rx */ This can be set in combination with any other sendmsg() that affects a call. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
48124178 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Split the call params from the operation params When rxrpc_sendmsg() parses the control message buffer, it places the parameters extracted into a structure, but lumps together call parameters (such as user call ID) with operation parameters (such as whether to send data, send an abort or accept a call). Split the call parameters out into their own structure, a copy of which is then embedded in the operation parameters struct. The call parameters struct is then passed down into the places that need it instead of passing the individual parameters. This allows for extra call parameters to be added. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3136ef49 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Delay terminal ACK transmission on a client call Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
9faaff59 |
|
24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem. The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller. In such a case, the following warning is produced. ====================================================== WARNING: possible circular locking dependency detected 4.14.0-fscache+ #243 Tainted: G E ------------------------------------------------------ modpost/16701 is trying to acquire lock: (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs] but task is already holding lock: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #2 (&call->user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #0 (&vnode->io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 other info that might help us debug this: Chain exists of: &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&call->user_mutex); lock(&mm->mmap_sem); lock(&vnode->io_lock); *** DEADLOCK *** 1 lock held by modpost/16701: #0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:ffff880071e93da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720 RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00 R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07 RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900 RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000 R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000 R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com>
|
#
20acbd9a |
|
02-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Lock around calling a kernel service Rx notification Place a spinlock around the invocation of call->notify_rx() for a kernel service call and lock again when ending the call and replace the notification pointer with a pointer to a dummy function. This is required because it's possible for rxrpc_notify_socket() to be called after the call has been ended by the kernel service if called from the asynchronous work function rxrpc_process_call(). However, rxrpc_notify_socket() currently only holds the RCU read lock when invoking ->notify_rx(), which means that the afs_call struct would need to be disposed of by call_rcu() rather than by kfree(). But we shouldn't see any notifications from a call after calling rxrpc_kernel_end_call(), so a lock is required in rxrpc code. Without this, we may see the call wait queue as having a corrupt spinlock: BUG: spinlock bad magic on CPU#0, kworker/0:2/1612 general protection fault: 0000 [#1] SMP ... Workqueue: krxrpcd rxrpc_process_call task: ffff88040b83c400 task.stack: ffff88040adfc000 RIP: 0010:spin_bug+0x161/0x18f RSP: 0018:ffff88040adffcc0 EFLAGS: 00010002 RAX: 0000000000000032 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81ab16cf RDX: ffff88041fa14c01 RSI: ffff88041fa0ccb8 RDI: ffff88041fa0ccb8 RBP: ffff88040adffcd8 R08: 00000000ffffffff R09: 00000000ffffffff R10: ffff88040adffc60 R11: 000000000000022c R12: ffff88040aca2208 R13: ffffffff81a58114 R14: 0000000000000000 R15: 0000000000000000 .... Call Trace: do_raw_spin_lock+0x1d/0x89 _raw_spin_lock_irqsave+0x3d/0x49 ? __wake_up_common_lock+0x4c/0xa7 __wake_up_common_lock+0x4c/0xa7 ? __lock_is_held+0x47/0x7a __wake_up+0xe/0x10 afs_wake_up_call_waiter+0x11b/0x122 [kafs] rxrpc_notify_socket+0x12b/0x258 rxrpc_process_call+0x18e/0x7d0 process_one_work+0x298/0x4de ? rescuer_thread+0x280/0x280 worker_thread+0x1d1/0x2ae ? rescuer_thread+0x280/0x280 kthread+0x12c/0x134 ? kthread_create_on_node+0x3a/0x3a ret_from_fork+0x27/0x40 In this case, note the corrupt data in EBX. The address of the offending afs_call is in R12, plus the offset to the spinlock. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c038a58c |
|
29-Aug-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow failed client calls to be retried Allow a client call that failed on network error to be retried, provided that the Tx queue still holds DATA packet 1. This allows an operation to be submitted to another server or another address for the same server without having to repackage and re-encrypt the data so far processed. Two new functions are provided: (1) rxrpc_kernel_check_call() - This is used to find out the completion state of a call to guess whether it can be retried and whether it should be retried. (2) rxrpc_kernel_retry_call() - Disconnect the call from its current connection, reset the state and submit it as a new client call to a new address. The new address need not match the previous address. A call may be retried even if all the data hasn't been loaded into it yet; a partially constructed will be retained at the same point it was at when an error condition was detected. msg_data_left() can be used to find out how much data was packaged before the error occurred. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7b674e39 |
|
29-Aug-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix IPv6 support Fix IPv6 support in AF_RXRPC in the following ways: (1) When extracting the address from a received IPv4 packet, if the local transport socket is open for IPv6 then fill out the sockaddr_rxrpc struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead of an AF_INET one. (2) When sending CHALLENGE or RESPONSE packets, the transport length needs to be set from the sockaddr_rxrpc::transport_len field rather than sizeof() on the IPv4 transport address. (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up the address correctly before searching for the affected peer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
10674a03 |
|
29-Aug-2017 |
Baolin Wang <baolin.wang@linaro.org> |
net: rxrpc: Replace time_t type with time64_t type Since the 'expiry' variable of 'struct key_preparsed_payload' has been changed to 'time64_t' type, which is year 2038 safe on 32bits system. In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type when copying ticket expires time to 'prep->expiry', then this patch introduces two helper functions to help convert 'u32' to 'time64_t' type. This patch also uses ktime_get_real_seconds() to get current time instead of get_seconds() which is not year 2038 safe on 32bits system. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ddc6c70f |
|
21-Jul-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the packet.h include file into net/rxrpc/ Move the protocol description header file into net/rxrpc/ and rename it to protocol.h. It's no longer necessary to expose it as packets are no longer exposed to kernel services (such as AFS) that use the facility. The abort codes are transferred to the UAPI header instead as we pass these back to userspace and also to kernel services. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f7aec129 |
|
14-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Cache the congestion window setting Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm. The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e754eba6 |
|
06-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide a cmsg to specify the amount of Tx data for a call Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call. Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold. By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet). An error will be returned if too little or too much data is presented in the Tx phase. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4e255721 |
|
05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Add service upgrade support for client connections Make it possible for a client to use AuriStor's service upgrade facility. The client does this by adding an RXRPC_UPGRADE_SERVICE control message to the first sendmsg() of a call. This takes no parameters. When recvmsg() starts returning data from the call, the service ID field in the returned msg_name will reflect the result of the upgrade attempt. If the upgrade was ignored, srx_service will match what was set in the sendmsg(); if the upgrade happened the srx_service will be altered to indicate the service the server upgraded to. Note that: (1) The choice of upgrade service is up to the server (2) Further client calls to the same server that would share a connection are blocked if an upgrade probe is in progress. (3) This should only be used to probe the service. Clients should then use the returned service ID in all subsequent communications with that server (and not set the upgrade). Note that the kernel will not retain this information should the connection expire from its cache. (4) If a server that supports upgrading is replaced by one that doesn't, whilst a connection is live, and if the replacement is running, say, OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement server will not respond to packets sent to the upgraded connection. At this point, calls will time out and the server must be reprobed. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4722974d |
|
05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement service upgrade Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with: (1) Various of the standard AFS RPC calls have IPv4 addresses in their requests and/or replies - but there's no room for including IPv6 addresses. (2) Definition of IPv6-specific RPC operations in the standard operation sets has not yet been achieved. (3) One could envision the creation a new service on the same port that as the original service. The new service could implement improved operations - and the client could try this first, falling back to the original service if it's not there. Unfortunately, certain servers ignore packets addressed to a service they don't implement and don't respond in any way - not even with an ABORT. This means that the client must then wait for the call timeout to occur. What service upgrade does is to see if the connection is marked as being 'upgradeable' and if so, change the service ID in the server and thus the request and reply formats. Note that the upgrade isn't mandatory - a server that supports only the original call set will ignore the upgrade request. In the protocol, the procedure is then as follows: (1) To request an upgrade, the first DATA packet in a new connection must have the userStatus set to 1 (this is normally 0). The userStatus value is normally ignored by the server. (2) If the server doesn't support upgrading, the reply packets will contain the same service ID as for the first request packet. (3) If the server does support upgrading, all future reply packets on that connection will contain the new service ID and the new service ID will be applied to *all* further calls on that connection as well. (4) The RPC op used to probe the upgrade must take the same request data as the shadow call in the upgrade set (but may return a different reply). GetCapability RPC ops were added to all standard sets for just this purpose. Ops where the request formats differ cannot be used for probing. (5) The client must wait for completion of the probe before sending any further RPC ops to the same destination. It should then use the service ID that recvmsg() reported back in all future calls. (6) The shadow service must have call definitions for all the operation IDs defined by the original service. To support service upgrading, a server should: (1) Call bind() twice on its AF_RXRPC socket before calling listen(). Each bind() should supply a different service ID, but the transport addresses must be the same. This allows the server to receive requests with either service ID. (2) Enable automatic upgrading by calling setsockopt(), specifying RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of unsigned shorts as the argument: unsigned short optval[2]; This specifies a pair of service IDs. They must be different and must match the service IDs bound to the socket. Member 0 is the service ID to upgrade from and member 1 is the service ID to upgrade to. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
28036f44 |
|
05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Permit multiple service binding Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictions: (1) All bind() calls involved must have a non-zero service ID. (2) The service IDs must all be different. (3) The rest of the address (notably the transport part) must be the same in all (a single UDP socket is shared). (4) This must be done before listen() or sendmsg() is called. This allows someone to connect to the service socket with different service IDs and lays the foundation for service upgrading. The service ID used by an incoming call can be extracted from the msg_name returned by recvmsg(). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
68d6d1ae |
|
05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Separate the connection's protocol service ID from the lookup ID Keep the rxrpc_connection struct's idea of the service ID that is exposed in the protocol separate from the service ID that's used as a lookup key. This allows the protocol service ID on a client connection to get upgraded without making the connection unfindable for other client calls that also would like to use the upgraded connection. The connection's actual service ID is then returned through recvmsg() by way of msg_name. Whilst we're at it, we get rid of the last_service_id field from each channel. The service ID is per-connection, not per-call and an entire connection is upgraded in one go. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
2baec2c3 |
|
24-May-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Support network namespacing Support network namespacing in AF_RXRPC with the following changes: (1) All the local endpoint, peer and call lists, locks, counters, etc. are moved into the per-namespace record. (2) All the connection tracking is moved into the per-namespace record with the exception of the client connection ID tree, which is kept global so that connection IDs are kept unique per-machine. (3) Each namespace gets its own epoch. This allows each network namespace to pretend to be a separate client machine. (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and the contents reflect the namespace. fs/afs/ should be okay with this patch as it explicitly requires the current net namespace to be init_net to permit a mount to proceed at the moment. It will, however, need updating so that cells, IP addresses and DNS records are per-namespace also. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb46f6ee |
|
06-Apr-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace protocol errors in received packets Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received packets. The following changes are made: (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a call and mark the call aborted. This is wrapped by rxrpc_abort_eproto() that makes the why string usable in trace. (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error generation points, replacing rxrpc_abort_call() with the latter. (3) Only send an abort packet in rxkad_verify_packet*() if we actually managed to abort the call. Note that a trace event is also emitted if a kernel user (e.g. afs) tries to send data through a call when it's not in the transmission phase, though it's not technically a receive event. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
540b1c48 |
|
27-Feb-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix deadlock between call creation and sendmsg/recvmsg All the routines by which rxrpc is accessed from the outside are serialised by means of the socket lock (sendmsg, recvmsg, bind, rxrpc_kernel_begin_call(), ...) and this presents a problem: (1) If a number of calls on the same socket are in the process of connection to the same peer, a maximum of four concurrent live calls are permitted before further calls need to wait for a slot. (2) If a call is waiting for a slot, it is deep inside sendmsg() or rxrpc_kernel_begin_call() and the entry function is holding the socket lock. (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented from servicing the other calls as they need to take the socket lock to do so. (4) The socket is stuck until a call is aborted and makes its slot available to the waiter. Fix this by: (1) Provide each call with a mutex ('user_mutex') that arbitrates access by the users of rxrpc separately for each specific call. (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as they've got a call and taken its mutex. Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is set but someone else has the lock. Should I instead only return EWOULDBLOCK if there's nothing currently to be done on a socket, and sleep in this particular instance because there is something to be done, but we appear to be blocked by the interrupt handler doing its ping? (3) Make rxrpc_new_client_call() unlock the socket after allocating a new call, locking its user mutex and adding it to the socket's call tree. The call is returned locked so that sendmsg() can add data to it immediately. From the moment the call is in the socket tree, it is subject to access by sendmsg() and recvmsg() - even if it isn't connected yet. (4) Lock new service calls in the UDP data_ready handler (in rxrpc_new_incoming_call()) because they may already be in the socket's tree and the data_ready handler makes them live immediately if a user ID has already been preassigned. Note that the new call is locked before any notifications are sent that it is live, so doing mutex_trylock() *ought* to always succeed. Userspace is prevented from doing sendmsg() on calls that are in a too-early state in rxrpc_do_sendmsg(). (5) Make rxrpc_new_incoming_call() return the call with the user mutex held so that a ping can be scheduled immediately under it. Note that it might be worth moving the ping call into rxrpc_new_incoming_call() and then we can drop the mutex there. (6) Make rxrpc_accept_call() take the lock on the call it is accepting and release the socket after adding the call to the socket's tree. This is slightly tricky as we've dequeued the call by that point and have to requeue it. Note that requeuing emits a trace event. (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the new mutex immediately and don't bother with the socket mutex at all. This patch has the nice bonus that calls on the same socket are now to some extent parallelisable. Note that we might want to move rxrpc_service_prealloc() calls out from the socket lock and give it its own lock, so that we don't hang progress in other calls because we're waiting for the allocator. We probably also want to avoid calling rxrpc_notify_socket() from within the socket lock (rxrpc_accept_call()). Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.c.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
210f0353 |
|
05-Jan-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow listen(sock, 0) to be used to disable listening Allow listen() with a backlog of 0 to be used to disable listening on an AF_RXRPC socket. This also releases any preallocation, thereby making it easier for a kernel service to account for all allocated call structures when shutting down the service. The socket cannot thereafter have listening reenabled, but must rather be closed and reopened. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b54a134a |
|
05-Jan-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix handling of enums-to-string translation in tracing Fix the way enum values are translated into strings in AF_RXRPC tracepoints. The problem with just doing a lookup in a normal flat array of strings or chars is that external tracing infrastructure can't find it. Rather, TRACE_DEFINE_ENUM must be used. Also sort the enums and string tables to make it easier to keep them in order so that a future patch to __print_symbolic() can be optimised to try a direct lookup into the table first before iterating over it. A couple of _proto() macro calls are removed because they refered to tables that got moved to the tracing infrastructure. The relevant data can be found by way of tracing. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
9749fd2b |
|
06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Need to produce an ACK for service op if op takes a long time We need to generate a DELAY ACK from the service end of an operation if we start doing the actual operation work and it takes longer than expected. This will hard-ACK the request data and allow the client to release its resources. To make this work: (1) We have to set the ack timer and propose an ACK when the call moves to the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel the timer when we start transmitting the reply (the first DATA packet of the reply implicitly ACKs the request phase). (2) It must be possible to set the timer when the caller is holding call->state_lock, so split the lock-getting part of the timer function out. (3) Add trace notes for the ACK we're requesting and the timer we clear. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a5af7e1f |
|
06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs Separate the output of PING ACKs from the output of other sorts of ACK so that if we receive a PING ACK and schedule transmission of a PING RESPONSE ACK, the response doesn't get cancelled by a PING ACK we happen to be scheduling transmission of at the same time. If a PING RESPONSE gets lost, the other side might just sit there waiting for it and refuse to proceed otherwise. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
26cb02aa |
|
06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix warning by splitting rxrpc_send_call_packet() Split rxrpc_send_data_packet() to separate ACK generation (which is more complicated) from ABORT generation. This simplifies the code a bit and fixes the following warning: In file included from ../net/rxrpc/output.c:20:0: net/rxrpc/output.c: In function 'rxrpc_send_call_packet': net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized] net/rxrpc/output.c:103:24: note: 'top' was declared here net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized] Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
19c0dbd5 |
|
06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix duplicate const Remove a duplicate const keyword. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
df0adc78 |
|
26-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Keep the call timeouts as ktimes rather than jiffies Keep that call timeouts as ktimes rather than jiffies so that they can be expressed as functions of RTT. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c31410ea |
|
30-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove error from struct rxrpc_skb_priv as it is unused Remove error from struct rxrpc_skb_priv as it is no longer used. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
775e5b71 |
|
30-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: The offset field in struct rxrpc_skb_priv is unnecessary The offset field in struct rxrpc_skb_priv is unnecessary as the value can always be calculated. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1e9e5c95 |
|
29-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Reduce the rxrpc_local::services list to a pointer Reduce the rxrpc_local::services list to just a pointer as we don't permit multiple service endpoints to bind to a single transport endpoints (this is excluded by rxrpc_lookup_local()). The reason we don't allow this is that if you send a request to an AFS filesystem service, it will try to talk back to your cache manager on the port you sent from (this is how file change notifications are handled). To prevent someone from stealing your CM callbacks, we don't let AF_RXRPC sockets share a UDP socket if at least one of them has a service bound. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a1767077 |
|
29-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Make Tx loss-injection go through normal return and adjust tracing In rxrpc_send_data_packet() make the loss-injection path return through the same code as the transmission path so that the RTT determination is initiated and any future timer shuffling will be done, despite the packet having been binned. Whilst we're at it: (1) Add to the tx_data tracepoint an indication of whether or not we're retransmitting a data packet. (2) When we're deciding whether or not to request an ACK, rather than checking if we're in fast-retransmit mode check instead if we're retransmitting. (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're not altering the sk_buff refcount nor are we just seeing it after getting it off the Tx list. (4) The rxrpc_skb_tx_lost note is then no longer used so remove it. (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
57494343 |
|
24-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement slow-start Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A tracepoint is added to log the state of the congestion management algorithm and the decisions it makes. Notes: (1) Since we send fixed-size DATA packets (apart from the final packet in each phase), counters and calculations are in terms of packets rather than bytes. (2) The ACK packet carries the equivalent of TCP SACK. (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly suited to SACK of a small number of packets. It seems that, almost inevitably, by the time three 'duplicate' ACKs have been seen, we have narrowed the loss down to one or two missing packets, and the FLIGHT_SIZE calculation ends up as 2. (4) In rxrpc_resend(), if there was no data that apparently needed retransmission, we transmit a PING ACK to ask the peer to tell us what its Rx window state is. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0d967960 |
|
24-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Schedule an ACK if the reply to a client call appears overdue If we've sent all the request data in a client call but haven't seen any sign of the reply data yet, schedule an ACK to be sent to the server to find out if the reply data got lost. If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to demand a response to find out whether we need to retransmit. If the server says it has received all of the data, we send an IDLE ACK to tell the server that we haven't received anything in the receive phase as yet. To make this work, a non-immediate PING ACK must carry a delay. I've chosen the same as the IDLE ACK for the moment. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
31a1b989 |
|
24-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Generate a summary of the ACK state for later use Generate a summary of the Tx buffer packet state when an ACK is received for use in a later patch that does congestion management. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dd7c1ee5 |
|
24-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Reinitialise the call ACK and timer state for client reply phase Clear the ACK reason, ACK timer and resend timer when entering the client reply phase when the first DATA packet is received. New ACKs will be proposed once the data is queued. The resend timer is no longer relevant and we need to cancel ACKs scheduled to probe for a lost reply. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
805b21b9 |
|
24-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Send an ACK after every few DATA packets we receive Send an ACK if we haven't sent one for the last two packets we've received. This keeps the other end apprised of where we've got to - which is important if they're doing slow-start. We do this in recvmsg so that we can dispatch a packet directly without the need to wake up the background thread. This should possibly be made configurable in future. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
9c7ad434 |
|
23-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add tracepoint for ACK proposal Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
fc7ab6d2 |
|
23-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint for the call timer Add a tracepoint to log call timer initiation, setting and expiry. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
70790dbe |
|
22-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Pass the last Tx packet marker in the annotation buffer When the last packet of data to be transmitted on a call is queued, tx_top is set and then the RXRPC_CALL_TX_LAST flag is set. Unfortunately, this leaves a race in the ACK processing side of things because the flag affects the interpretation of tx_top and also allows us to start receiving reply data before we've finished transmitting. To fix this, make the following changes: (1) rxrpc_queue_packet() now sets a marker in the annotation buffer instead of setting the RXRPC_CALL_TX_LAST flag. (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the same context as the routines that use it. (3) rxrpc_end_tx_phase() is simplified to just shift the call state. The Tx window must have been rotated before calling to discard the last packet. (4) rxrpc_receiving_reply() is added to handle the arrival of the first DATA packet of a reply to a client call (which is an implicit ACK of the Tx phase). (5) The last part of rxrpc_input_ack() is reordered to perform Tx rotation, then soft-ACK application and then to end the phase if we've rotated the last packet. In the event of a terminal ACK, the soft-ACK application will be skipped as nAcks should be 0. (6) rxrpc_input_ackall() now has to rotate as well as ending the phase. In addition: (7) Alter the transmit tracepoint to log the rotation of the last packet. (8) Remove the no-longer relevant queue_reqack tracepoint note. The ACK-REQUESTED packet header flag is now set as needed when we actually transmit the packet and may vary by retransmission. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dfc3da44 |
|
22-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Need to start the resend timer on initial transmission When a DATA packet has its initial transmission, we may need to start or adjust the resend timer. Without this we end up relying on being sent a NACK to initiate the resend. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c0d058c2 |
|
22-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Make sure sendmsg() is woken on call completion Make sure that sendmsg() gets woken up if the call it is waiting for completes abnormally. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0d4b103c |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Reduce the number of ACK-Requests sent Reduce the number of ACK-Requests we set on DATA packets that we're sending to reduce network traffic. We set the flag on odd-numbered DATA packets to start off the RTT cache until we have at least three entries in it and then probe once per second thereafter to keep it topped up. This could be made tunable in future. Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA packets as we transmit them and not stored statically in the sk_buff. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
50235c4b |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Obtain RTT data by requesting ACKs on DATA packets In addition to sending a PING ACK to gain RTT data, we can set the RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The ACK packet contains the serial number of the packet it is in response to, so we can look through the Tx buffer for a matching DATA packet. This requires that the data packets be stamped with the time of transmission as a ktime rather than having the resend_at time in jiffies. This further requires the resend code to do the resend determination in ktimes and convert to jiffies to set the timer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8e83134d |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Send pings to get RTT data Send a PING ACK packet to the peer when we get a new incoming call from a peer we don't have a record for. The PING RESPONSE ACK packet will tell us the following about the peer: (1) its receive window size (2) its MTU sizes (3) its support for jumbo DATA packets (4) if it supports slow start (similar to RFC 5681) (5) an estimate of the RTT This is necessary because the peer won't normally send us an ACK until it gets to the Rx phase and we send it a packet, but we would like to know some of this information before we start sending packets. A pair of tracepoints are added so that RTT determination can be observed. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
cf1a6474 |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add per-peer RTT tracker Add a function to track the average RTT for a peer. Sources of RTT data will be added in subsequent patches. The RTT data will be useful in the future for determining resend timeouts and for handling the slow-start part of the Rx protocol. Also add a pair of tracepoints, one to log transmissions to elicit a response for RTT purposes and one to log responses that contribute RTT data. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f07373ea |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add re-sent Tx annotation Add a Tx-phase annotation for packet buffers to indicate that a buffer has already been retransmitted. This will be used by future congestion management. Re-retransmissions of a packet don't affect the congestion window managment in the same way as initial retransmissions. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5a924b89 |
|
21-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs Don't store the rxrpc protocol header in sk_buffs on the transmit queue, but rather generate it on the fly and pass it to kernel_sendmsg() as a separate iov. This reduces the amount of storage required. Note that the security header is still stored in the sk_buff as it may get encrypted along with the data (and doesn't change with each transmission). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
71f3ca40 |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Improve skb tracing Improve sk_buff tracing within AF_RXRPC by the following means: (1) Use an enum to note the event type rather than plain integers and use an array of event names rather than a big multi ?: list. (2) Distinguish Rx from Tx packets and account them separately. This requires the call phase to be tracked so that we know what we might find in rxtx_buffer[]. (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the event type. (4) A pair of 'rotate' events are added to indicate packets that are about to be rotated out of the Rx and Tx windows. (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for packet loss injection recording. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
84997905 |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint to follow what recvmsg does Add a tracepoint to follow what recvmsg does within AF_RXRPC. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
58dc63c9 |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint to follow packets in the Rx buffer Add a tracepoint to follow the life of packets that get added to a call's receive buffer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a124fe3e |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer Add a tracepoint to follow the insertion of a packet into the transmit buffer, its transmission and its rotation out of the buffer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
363deeab |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add connection tracepoint and client conn state tracepoint Add a pair of tracepoints, one to track rxrpc_connection struct ref counting and the other to track the client connection cache state. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a84a46d7 |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add some additional call tracing Add additional call tracepoint points for noting call-connected, call-released and connection-failed events. Also fix one tracepoint that was using an integer instead of the corresponding enum value as the point type. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a3868bfc |
|
17-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Print the packet type name in the Rx packet trace Print a symbolic packet type name for each valid received packet in the trace output, not just a number. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
75e42126 |
|
13-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Correctly initialise, limit and transmit call->rx_winsize call->rx_winsize should be initialised to the sysctl setting and the sysctl setting should be limited to the maximum we want to permit. Further, we need to place this in the ACK info instead of the sysctl setting. Furthermore, discard the idea of accepting the subpackets of a jumbo packet that lie beyond the receive window when the first packet of the jumbo is within the window. Just discard the excess subpackets instead. This allows the receive window to be opened up right to the buffer size less one for the dead slot. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
cbd00891 |
|
13-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Adjust the call ref tracepoint to show kernel API refs Adjust the call ref tracepoint to show references held on a call by the kernel API separately as much as possible and add an additional trace to at the allocation point from the preallocation buffer for an incoming call. Note that this doesn't show the allocation of a client call for the kernel separately at the moment. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
248f219c |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rewrite the data and ack handling code Rewrite the data and ack handling code such that: (1) Parsing of received ACK and ABORT packets and the distribution and the filing of DATA packets happens entirely within the data_ready context called from the UDP socket. This allows us to process and discard ACK and ABORT packets much more quickly (they're no longer stashed on a queue for a background thread to process). (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead keep track of the offset and length of the content of each packet in the sk_buff metadata. This means we don't do any allocation in the receive path. (3) Jumbo DATA packet parsing is now done in data_ready context. Rather than cloning the packet once for each subpacket and pulling/trimming it, we file the packet multiple times with an annotation for each indicating which subpacket is there. From that we can directly calculate the offset and length. (4) A call's receive queue can be accessed without taking locks (memory barriers do have to be used, though). (5) Incoming calls are set up from preallocated resources and immediately made live. They can than have packets queued upon them and ACKs generated. If insufficient resources exist, DATA packet #1 is given a BUSY reply and other DATA packets are discarded). (6) sk_buffs no longer take a ref on their parent call. To make this work, the following changes are made: (1) Each call's receive buffer is now a circular buffer of sk_buff pointers (rxtx_buffer) rather than a number of sk_buff_heads spread between the call and the socket. This permits each sk_buff to be in the buffer multiple times. The receive buffer is reused for the transmit buffer. (2) A circular buffer of annotations (rxtx_annotations) is kept parallel to the data buffer. Transmission phase annotations indicate whether a buffered packet has been ACK'd or not and whether it needs retransmission. Receive phase annotations indicate whether a slot holds a whole packet or a jumbo subpacket and, if the latter, which subpacket. They also note whether the packet has been decrypted in place. (3) DATA packet window tracking is much simplified. Each phase has just two numbers representing the window (rx_hard_ack/rx_top and tx_hard_ack/tx_top). The hard_ack number is the sequence number before base of the window, representing the last packet the other side says it has consumed. hard_ack starts from 0 and the first packet is sequence number 1. The top number is the sequence number of the highest-numbered packet residing in the buffer. Packets between hard_ack+1 and top are soft-ACK'd to indicate they've been received, but not yet consumed. Four macros, before(), before_eq(), after() and after_eq() are added to compare sequence numbers within the window. This allows for the top of the window to wrap when the hard-ack sequence number gets close to the limit. Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also to indicate when rx_top and tx_top point at the packets with the LAST_PACKET bit set, indicating the end of the phase. (4) Calls are queued on the socket 'receive queue' rather than packets. This means that we don't need have to invent dummy packets to queue to indicate abnormal/terminal states and we don't have to keep metadata packets (such as ABORTs) around (5) The offset and length of a (sub)packet's content are now passed to the verify_packet security op. This is currently expected to decrypt the packet in place and validate it. However, there's now nowhere to store the revised offset and length of the actual data within the decrypted blob (there may be a header and padding to skip) because an sk_buff may represent multiple packets, so a locate_data security op is added to retrieve these details from the sk_buff content when needed. (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is individually secured and needs to be individually decrypted. The code to do this is broken out into rxrpc_recvmsg_data() and shared with the kernel API. It now iterates over the call's receive buffer rather than walking the socket receive queue. Additional changes: (1) The timers are condensed to a single timer that is set for the soonest of three timeouts (delayed ACK generation, DATA retransmission and call lifespan). (2) Transmission of ACK and ABORT packets is effected immediately from process-context socket ops/kernel API calls that cause them instead of them being punted off to a background work item. The data_ready handler still has to defer to the background, though. (3) A shutdown op is added to the AF_RXRPC socket so that the AFS filesystem can shut down the socket and flush its own work items before closing the socket to deal with any in-progress service calls. Future additional changes that will need to be considered: (1) Make sure that a call doesn't hog the front of the queue by receiving data from the network as fast as userspace is consuming it to the exclusion of other calls. (2) Transmit delayed ACKs from within recvmsg() when we've consumed sufficiently more packets to avoid the background work item needing to run. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
00e90712 |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Preallocate peers, conns and calls for incoming service requests Make it possible for the data_ready handler called from the UDP transport socket to completely instantiate an rxrpc_call structure and make it immediately live by preallocating all the memory it might need. The idea is to cut out the background thread usage as much as possible. [Note that the preallocated structs are not actually used in this patch - that will be done in a future patch.] If insufficient resources are available in the preallocation buffers, it will be possible to discard the DATA packet in the data_ready handler or schedule a BUSY packet without the need to schedule an attempt at allocation in a background thread. To this end: (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a maximum number each of the listen backlog size. The backlog size is limited to a maxmimum of 32. Only this many of each can be in the preallocation buffer. (2) For userspace sockets, the preallocation is charged initially by listen() and will be recharged by accepting or rejecting pending new incoming calls. (3) For kernel services {,re,dis}charging of the preallocation buffers is handled manually. Two notifier callbacks have to be provided before kernel_listen() is invoked: (a) An indication that a new call has been instantiated. This can be used to trigger background recharging. (b) An indication that a call is being discarded. This is used when the socket is being released. A function, rxrpc_kernel_charge_accept() is called by the kernel service to preallocate a single call. It should be passed the user ID to be used for that call and a callback to associate the rxrpc call with the kernel service's side of the ID. (4) Discard the preallocation when the socket is closed. (5) Temporarily bump the refcount on the call allocated in rxrpc_incoming_call() so that rxrpc_release_call() can ditch the preallocation ref on service calls unconditionally. This will no longer be necessary once the preallocation is used. Note that this does not yet control the number of active service calls on a client - that will come in a later patch. A future development would be to provide a setsockopt() call that allows a userspace server to manually charge the preallocation buffer. This would allow user call IDs to be provided in advance and the awkward manual accept stage to be bypassed. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
2ab27215 |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove skb_count from struct rxrpc_call Remove the sk_buff count from the rxrpc_call struct as it's less useful once we stop queueing sk_buffs. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
de8d6c74 |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Convert rxrpc_local::services to an hlist Convert the rxrpc_local::services list to an hlist so that it can be accessed under RCU conditions more readily. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
cf13258f |
|
08-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix ASSERTCMP and ASSERTIFCMP to handle signed values Fix ASSERTCMP and ASSERTIFCMP to be able to handle signed values by casting both parameters to the type of the first before comparing. Without this, both values are cast to unsigned long, which means that checks for values less than zero don't work. The downside of this is that the state enum values in struct rxrpc_call and struct rxrpc_connection can't be bitfields as __typeof__ can't handle them. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5a42976d |
|
06-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add tracepoint for working out where aborts happen Add a tracepoint for working out where local aborts happen. Each tracepoint call is labelled with a 3-letter code so that they can be distinguished - and the DATA sequence number is added too where available. rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can indicate the circumstances when it aborts a call. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e8d6bbb0 |
|
07-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix returns of call completion helpers rxrpc_set_call_completion() returns bool, not int, so the ret variable should match this. rxrpc_call_completed() and __rxrpc_call_completed() should return the value of rxrpc_set_call_completion(). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8d94aa38 |
|
07-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Calls shouldn't hold socket refs rxrpc calls shouldn't hold refs on the sock struct. This was done so that the socket wouldn't go away whilst the call was in progress, such that the call could reach the socket's queues. However, we can mark the socket as requiring an RCU release and rely on the RCU read lock. To make this work, we do: (1) rxrpc_release_call() removes the call's call user ID. This is now only called from socket operations and not from the call processor: rxrpc_accept_call() / rxrpc_kernel_accept_call() rxrpc_reject_call() / rxrpc_kernel_reject_call() rxrpc_kernel_end_call() rxrpc_release_calls_on_socket() rxrpc_recvmsg() Though it is also called in the cleanup path of rxrpc_accept_incoming_call() before we assign a user ID. (2) Pass the socket pointer into rxrpc_release_call() rather than getting it from the call so that we can get rid of uninitialised calls. (3) Fix call processor queueing to pass a ref to the work queue and to release that ref at the end of the processor function (or to pass it back to the work queue if we have to requeue). (4) Skip out of the call processor function asap if the call is complete and don't requeue it if the call is complete. (5) Clean up the call immediately that the refcount reaches 0 rather than trying to defer it. Actual deallocation is deferred to RCU, however. (6) Don't hold socket refs for allocated calls. (7) Use the RCU read lock when queueing a message on a socket and treat the call's socket pointer according to RCU rules and check it for NULL. We also need to use the RCU read lock when viewing a call through procfs. (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call() if this hasn't been done yet so that we can then disconnect the call. Once the call is disconnected, it won't have any access to the connection struct and the UDP socket for the call work processor to be able to send the ACK. Terminal retransmission will be handled by the connection processor. (9) Release all calls immediately on the closing of a socket rather than trying to defer this. Incomplete calls will be aborted. The call refcount model is much simplified. Refs are held on the call by: (1) A socket's user ID tree. (2) A socket's incoming call secureq and acceptq. (3) A kernel service that has a call in progress. (4) A queued call work processor. We have to take care to put any call that we failed to queue. (5) sk_buffs on a socket's receive queue. A future patch will get rid of this. Whilst we're at it, we can do: (1) Get rid of the RXRPC_CALL_EV_RELEASE event. Release is now done entirely from the socket routines and never from the call's processor. (2) Get rid of the RXRPC_CALL_DEAD state. Calls now end in the RXRPC_CALL_COMPLETE state. (3) Get rid of the rxrpc_call::destroyer work item. Calls are now torn down when their refcount reaches 0 and then handed over to RCU for final cleanup. (4) Get rid of the rxrpc_call::deadspan timer. Calls are cleaned up immediately they're finished with and don't hang around. Post-completion retransmission is handled by the connection processor once the call is disconnected. (5) Get rid of the dead call expiry setting as there's no longer a timer to set. (6) rxrpc_destroy_all_calls() can just check that the call list is empty. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
278ac0cd |
|
07-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Cache the security index in the rxrpc_call struct Cache the security index in the rxrpc_call struct so that we can get at it even when the call has been disconnected and the connection pointer cleared. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
fff72429 |
|
07-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Improve the call tracking tracepoint Improve the call tracking tracepoint by showing more differentiation between some of the put and get events, including: (1) Getting and putting refs for the socket call user ID tree. (2) Getting and putting refs for queueing and failing to queue the call processor work item. Note that these aren't necessarily used in this patch, but will be taken advantage of in future patches. An enum is added for the event subtype numbers rather than coding them directly as decimal numbers and a table of 3-letter strings is provided rather than a sequence of ?: operators. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
71a17de3 |
|
07-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Whitespace cleanup Remove some whitespace. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
3dc20f09 |
|
04-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc Move enum rxrpc_command to sendmsg.c Move enum rxrpc_command to sendmsg.c as it's now only used in that file. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0b58b8a1 |
|
02-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Split sendmsg from packet transmission code Split the sendmsg code from the packet transmission code (mostly to be found in output.c). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d001648e |
|
30-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't expose skbs to in-kernel users [ver #2] Don't expose skbs to in-kernel users, such as the AFS filesystem, but instead provide a notification hook the indicates that a call needs attention and another that indicates that there's a new call to be collected. This makes the following possibilities more achievable: (1) Call refcounting can be made simpler if skbs don't hold refs to calls. (2) skbs referring to non-data events will be able to be freed much sooner rather than being queued for AFS to pick up as rxrpc_kernel_recv_data will be able to consult the call state. (3) We can shortcut the receive phase when a call is remotely aborted because we don't have to go through all the packets to get to the one cancelling the operation. (4) It makes it easier to do encryption/decryption directly between AFS's buffers and sk_buffs. (5) Encryption/decryption can more easily be done in the AFS's thread contexts - usually that of the userspace process that issued a syscall - rather than in one of rxrpc's background threads on a workqueue. (6) AFS will be able to wait synchronously on a call inside AF_RXRPC. To make this work, the following interface function has been added: int rxrpc_kernel_recv_data( struct socket *sock, struct rxrpc_call *call, void *buffer, size_t bufsize, size_t *_offset, bool want_more, u32 *_abort_code); This is the recvmsg equivalent. It allows the caller to find out about the state of a specific call and to transfer received data into a buffer piecemeal. afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction logic between them. They don't wait synchronously yet because the socket lock needs to be dealt with. Five interface functions have been removed: rxrpc_kernel_is_data_last() rxrpc_kernel_get_abort_code() rxrpc_kernel_get_error_number() rxrpc_kernel_free_skb() rxrpc_kernel_data_consumed() As a temporary hack, sk_buffs going to an in-kernel call are queued on the rxrpc_call struct (->knlrecv_queue) rather than being handed over to the in-kernel user. To process the queue internally, a temporary function, temp_deliver_data() has been added. This will be replaced with common code between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e34d4234 |
|
30-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace rxrpc_call usage Add a trace event for debuging rxrpc_call struct usage. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f5c17aae |
|
30-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Calls should only have one terminal state Condense the terminal states of a call state machine to a single state, plus a separate completion type value. The value is then set, along with error and abort code values, only when the call is transitioned to the completion state. Helpers are provided to simplify this. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
45025bce |
|
24-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Improve management and caching of client connection objects Improve the management and caching of client rxrpc connection objects. From this point, client connections will be managed separately from service connections because AF_RXRPC controls the creation and re-use of client connections but doesn't have that luxury with service connections. Further, there will be limits on the numbers of client connections that may be live on a machine. No direct restriction will be placed on the number of client calls, excepting that each client connection can support a maximum of four concurrent calls. Note that, for a number of reasons, we don't want to simply discard a client connection as soon as the last call is apparently finished: (1) Security is negotiated per-connection and the context is then shared between all calls on that connection. The context can be negotiated again if the connection lapses, but that involves holding up calls whilst at least two packets are exchanged and various crypto bits are performed - so we'd ideally like to cache it for a little while at least. (2) If a packet goes astray, we will need to retransmit a final ACK or ABORT packet. To make this work, we need to keep around the connection details for a little while. (3) The locally held structures represent some amount of setup time, to be weighed against their occupation of memory when idle. To this end, the client connection cache is managed by a state machine on each connection. There are five states: (1) INACTIVE - The connection is not held in any list and may not have been exposed to the world. If it has been previously exposed, it was discarded from the idle list after expiring. (2) WAITING - The connection is waiting for the number of client conns to drop below the maximum capacity. Calls may be in progress upon it from when it was active and got culled. The connection is on the rxrpc_waiting_client_conns list which is kept in to-be-granted order. Culled conns with waiters go to the back of the queue just like new conns. (3) ACTIVE - The connection has at least one call in progress upon it, it may freely grant available channels to new calls and calls may be waiting on it for channels to become available. The connection is on the rxrpc_active_client_conns list which is kept in activation order for culling purposes. (4) CULLED - The connection got summarily culled to try and free up capacity. Calls currently in progress on the connection are allowed to continue, but new calls will have to wait. There can be no waiters in this state - the conn would have to go to the WAITING state instead. (5) IDLE - The connection has no calls in progress upon it and must have been exposed to the world (ie. the EXPOSED flag must be set). When it expires, the EXPOSED flag is cleared and the connection transitions to the INACTIVE state. The connection is on the rxrpc_idle_client_conns list which is kept in order of how soon they'll expire. A connection in the ACTIVE or CULLED state must have at least one active call upon it; if in the WAITING state it may have active calls upon it; other states may not have active calls. As long as a connection remains active and doesn't get culled, it may continue to process calls - even if there are connections on the wait queue. This simplifies things a bit and reduces the amount of checking we need do. There are a couple flags of relevance to the cache: (1) EXPOSED - The connection ID got exposed to the world. If this flag is set, an extra ref is added to the connection preventing it from being reaped when it has no calls outstanding. This flag is cleared and the ref dropped when a conn is discarded from the idle list. (2) DONT_REUSE - The connection should be discarded as soon as possible and should not be reused. This commit also provides a number of new settings: (*) /proc/net/rxrpc/max_client_conns The maximum number of live client connections. Above this number, new connections get added to the wait list and must wait for an active conn to be culled. Culled connections can be reused, but they will go to the back of the wait list and have to wait. (*) /proc/net/rxrpc/reap_client_conns If the number of desired connections exceeds the maximum above, the active connection list will be culled until there are only this many left in it. (*) /proc/net/rxrpc/idle_conn_expiry The normal expiry time for a client connection, provided there are fewer than reap_client_conns of them around. (*) /proc/net/rxrpc/idle_conn_fast_expiry The expedited expiry time, used when there are more than reap_client_conns of them around. Note that I combined the Tx wait queue with the channel grant wait queue to save space as only one of these should be in use at once. Note also that, for the moment, the service connection cache still uses the old connection management code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4d028b2c |
|
24-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Dup the main conn list for the proc interface The main connection list is used for two independent purposes: primarily it is used to find connections to reap and secondarily it is used to list connections in procfs. Split the procfs list out from the reap list. This allows us to stop using the reap list for client connections when they acquire a separate management strategy from service collections. The client connections will not be on a management single list, and sometimes won't be on a management list at all. This doesn't leave them floating, however, as they will also be on an rb-tree rooted on the socket so that the socket can find them to dispatch calls. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
df5d8bf7 |
|
24-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Make /proc/net/rxrpc_calls safer Make /proc/net/rxrpc_calls safer by stashing a copy of the peer pointer in the rxrpc_call struct and checking in the show routine that the peer pointer, the socket pointer and the local pointer obtained from the socket pointer aren't NULL before we use them. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
18bfeba5 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor Perform terminal call ACK/ABORT retransmission in the connection processor rather than in the call processor. With this change, once last_call is set, no more incoming packets will be routed to the corresponding call or any earlier calls on that channel (call IDs must only increase on a channel on a connection). Further, if a packet's callNumber is before the last_call ID or a packet is aimed at successfully completed service call then that packet is discarded and ignored. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
563ea7d5 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Calculate serial skew on packet reception Calculate the serial number skew in the data_ready handler when a packet has been received and a connection looked up. The skew is cached in the sk_buff's priority field. The connection highest received serial number is updated at this time also. This can be done without locks or atomic instructions because, at this point, the code is serialised by the socket. This generates more accurate skew data because if the packet is offloaded to a work queue before this is determined, more packets may come in, bumping the highest serial number and thereby increasing the apparent skew. This also removes some unnecessary atomic ops. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f51b4480 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Set connection expiry on idle, not put Set the connection expiry time when a connection becomes idle rather than doing this in rxrpc_put_connection(). This makes the put path more efficient (it is likely to be called occasionally whilst a connection has outstanding calls because active workqueue items needs to be given a ref). The time is also preset in the connection allocator in case the connection never gets used. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
df844fd4 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use a tracepoint for skb accounting debugging Use a tracepoint to log various skb accounting points to help in debugging refcounting errors. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
01a90a45 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop channel number field from rxrpc_call struct Drop the channel number (channel) field from the rxrpc_call struct to reduce the size of the call struct. The field is redundant: if the call is attached to a connection, the channel can be obtained from there by AND'ing with RXRPC_CHANNELMASK. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dabe5a79 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Tidy up the rxrpc_call struct a bit Do a little tidying of the rxrpc_call struct: (1) in_clientflag is no longer compared against the value that's in the packet, so keeping it in this form isn't necessary. Use a flag in flags instead and provide a pair of wrapper functions. (2) We don't read the epoch value, so that can go. (3) Move what remains of the data that were used for hashing up in the struct to be with the channel number. (4) Get rid of the local pointer. We can get at this via the socket struct and we only use this in the procfs viewer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
26164e77 |
|
23-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove RXRPC_CALL_PROC_BUSY Remove RXRPC_CALL_PROC_BUSY as work queue items are now 100% non-reentrant. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
372ee163 |
|
03-Aug-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix races between skb free, ACK generation and replying Inside the kafs filesystem it is possible to occasionally have a call processed and terminated before we've had a chance to check whether we need to clean up the rx queue for that call because afs_send_simple_reply() ends the call when it is done, but this is done in a workqueue item that might happen to run to completion before afs_deliver_to_call() completes. Further, it is possible for rxrpc_kernel_send_data() to be called to send a reply before the last request-phase data skb is released. The rxrpc skb destructor is where the ACK processing is done and the call state is advanced upon release of the last skb. ACK generation is also deferred to a work item because it's possible that the skb destructor is not called in a context where kernel_sendmsg() can be invoked. To this end, the following changes are made: (1) kernel_rxrpc_data_consumed() is added. This should be called whenever an skb is emptied so as to crank the ACK and call states. This does not release the skb, however. kernel_rxrpc_free_skb() must now be called to achieve that. These together replace rxrpc_kernel_data_delivered(). (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed(). This makes afs_deliver_to_call() easier to work as the skb can simply be discarded unconditionally here without trying to work out what the return value of the ->deliver() function means. The ->deliver() functions can, via afs_data_complete(), afs_transfer_reply() and afs_extract_data() mark that an skb has been consumed (thereby cranking the state) without the need to conditionally free the skb to make sure the state is correct on an incoming call for when the call processor tries to send the reply. (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it has finished with a packet and MSG_PEEK isn't set. (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data(). Because of this, we no longer need to clear the destructor and put the call before we free the skb in cases where we don't want the ACK/call state to be cranked. (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather than 0 if they expect more data (afs_extract_data() returns -EAGAIN to the delivery function already), and the caller is now responsible for producing an abort if that was the last packet. (6) There are many bits of unmarshalling code where: ret = afs_extract_data(call, skb, last, ...); switch (ret) { case 0: break; case -EAGAIN: return 0; default: return ret; } is to be found. As -EAGAIN can now be passed back to the caller, we now just return if ret < 0: ret = afs_extract_data(call, skb, last, ...); if (ret < 0) return ret; (7) Checks for trailing data and empty final data packets has been consolidated as afs_data_complete(). So: if (skb->len > 0) return -EBADMSG; if (!last) return 0; becomes: ret = afs_data_complete(call, skb, last); if (ret < 0) return ret; (8) afs_transfer_reply() now checks the amount of data it has against the amount of data desired and the amount of data in the skb and returns an error to induce an abort if we don't get exactly what we want. Without these changes, the following oops can occasionally be observed, particularly if some printks are inserted into the delivery path: general protection fault: 0000 [#1] SMP Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc] CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Workqueue: kafsd afs_async_workfn [kafs] task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000 RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1 RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002 RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710 RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0 Stack: 0000000000000006 000000000be04930 0000000000000000 ffff880400000000 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38 Call Trace: [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61 [<ffffffff814c928f>] skb_dequeue+0x18/0x61 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs] [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs] [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs] [<ffffffff81063a3a>] process_one_work+0x29d/0x57c [<ffffffff81064ac2>] worker_thread+0x24a/0x385 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0 [<ffffffff810696f5>] kthread+0xf3/0xfb [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d440a1ce |
|
05-Jul-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Kill off the call hash table The call hash table is now no longer used as calls are looked up directly by channel slot on the connection, so kill it off. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
8496af50 |
|
01-Jul-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use RCU to access a peer's service connection tree Move to using RCU access to a peer's service connection tree when routing an incoming packet. This is done using a seqlock to trigger retrying of the tree walk if a change happened. Further, we no longer get a ref on the connection looked up in the data_ready handler unless we queue the connection's work item - and then only if the refcount > 0. Note that I'm avoiding the use of a hash table for service connections because each service connection is addressed by a 62-bit number (constructed from epoch and connection ID >> 2) that would allow the client to engage in bucket stuffing, given knowledge of the hash algorithm. Peers, however, are hashed as the network address is less controllable by the client. The total number of peers will also be limited in a future commit. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1291e9d1 |
|
29-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Move data_ready peer lookup into rxrpc_find_connection() Move the peer lookup done in input.c by data_ready into rxrpc_find_connection(). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
e8d70ce1 |
|
29-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Prune the contents of the rxrpc_conn_proto struct Prune the contents of the rxrpc_conn_proto struct. Most of the fields aren't used anymore. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
001c1122 |
|
30-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Maintain an extra ref on a conn for the cache list Overhaul the usage count accounting for the rxrpc_connection struct to make it easier to implement RCU access from the data_ready handler. The problem is that currently we're using a lock to prevent the garbage collector from trying to clean up a connection that we're contemplating unidling. We could just stick incoming packets on the connection we find, but we've then got a problem that we may race when dispatching a work item to process it as we need to give that a ref to prevent the rxrpc_connection struct from disappearing in the meantime. Further, incoming packets may get discarded if attached to an rxrpc_connection struct that is going away. Whilst this is not a total disaster - the client will presumably resend - it would delay processing of the call. This would affect the AFS client filesystem's service manager operation. To this end: (1) We now maintain an extra count on the connection usage count whilst it is on the connection list. This mean it is not in use when its refcount is 1. (2) When trying to reuse an old connection, we only increment the refcount if it is greater than 0. If it is 0, we replace it in the tree with a new candidate connection. (3) Two connection flags are added to indicate whether or not a connection is in the local's client connection tree (used by sendmsg) or the peer's service connection tree (used by data_ready). This makes sure that we don't try and remove a connection if it got replaced. The flags are tested under lock with the removal operation to prevent the reaper from killing the rxrpc_connection struct whilst someone else is trying to effect a replacement. This could probably be alleviated by using memory barriers between the flag set/test and the rb_tree ops. The rb_tree op would still need to be under the lock, however. (4) When trying to reap an old connection, we try to flip the usage count from 1 to 0. If it's not 1 at that point, then it must've come back to life temporarily and we ignore it. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
d991b4a3 |
|
29-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Move peer lookup from call-accept to new-incoming-conn Move the lookup of a peer from a call that's being accepted into the function that creates a new incoming connection. This will allow us to avoid incrementing the peer's usage count in some cases in future. Note that I haven't bother to integrate rxrpc_get_addr_from_skb() with rxrpc_extract_addr_from_skb() as I'm going to delete the former in the very near future. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
7877a4a4 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Split service connection code out into its own file Split the service-specific connection code out into into its own file. The client-specific code has already been split out. This will leave just the common code in the original file. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c6d2b8d7 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Split client connection code out into its own file Split the client-specific connection code out into its own file. It will behave somewhat differently from the service-specific connection code, so it makes sense to separate them. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a1399f8b |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Call channels should have separate call number spaces Each channel on a connection has a separate, independent number space from which to allocate callNumber values. It is entirely possible, for example, to have a connection with four active calls, each with call number 1. Note that the callNumber values for any particular channel don't have to start at 1, but they are supposed to increment monotonically for that channel from a client's perspective and may not be reused once the call number is transmitted (until the epoch cycles all the way back round). Currently, however, call numbers are allocated on a per-connection basis and, further, are held in an rb-tree. The rb-tree is redundant as the four channel pointers in the rxrpc_connection struct are entirely capable of pointing to all the calls currently in progress on a connection. To this end, make the following changes: (1) Handle call number allocation independently per channel. (2) Get rid of the conn->calls rb-tree. This is overkill as a connection may have a maximum of four calls in progress at any one time. Use the pointers in the channels[] array instead, indexed by the channel number from the packet. (3) For each channel, save the result of the last call that was in progress on that channel in conn->channels[] so that the final ACK or ABORT packet can be replayed if necessary. Any call earlier than that is just ignored. If we've seen the next call number in a packet, the last one is most definitely defunct. (4) When generating a RESPONSE packet for a connection, the call number counter for each channel must be included in it. (5) When parsing a RESPONSE packet for a connection, the call number counters contained therein should be used to set the minimum expected call numbers on each channel. To do in future commits: (1) Replay terminal packets based on the last call stored in conn->channels[]. (2) Connections should be retired before the callNumber space on any channel runs out. (3) A server is expected to disregard or reject any new incoming call that has a call number less than the current call number counter. The call number counter for that channel must be advanced to the new call number. Note that the server cannot just require that the next call that it sees on a channel be exactly the call number counter + 1 because then there's a scenario that could cause a problem: The client transmits a packet to initiate a connection, the network goes out, the server sends an ACK (which gets lost), the client sends an ABORT (which also gets lost); the network then reconnects, the client then reuses the call number for the next call (it doesn't know the server already saw the call number), but the server thinks it already has the first packet of this call (it doesn't know that the client doesn't know that it saw the call number the first time). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dee46364 |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Add RCU destruction for connections and calls Add RCU destruction for connections and calls as the RCU lookup from the transport socket data_ready handler is going to come along shortly. Whilst we're at it, move the cleanup workqueue flushing and RCU barrierage into the destruction code for the objects that need it (locals and connections) and add the extra RCU barrier required for connection cleanup. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
2c4579e4 |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Move usage count getting into rxrpc_queue_conn() Rather than calling rxrpc_get_connection() manually before calling rxrpc_queue_conn(), do it inside the queue wrapper. This allows us to do some important fixes: (1) If the usage count is 0, do nothing. This prevents connections from being reanimated once they're dead. (2) If rxrpc_queue_work() fails because the work item is already queued, retract the usage count increment which would otherwise be lost. (3) Don't take a ref on the connection in the work function. By passing the ref through the work item, this is unnecessary. Doing it in the work function is too late anyway. Previously, connection-directed packets held a ref on the connection, but that's not really the best idea. And another useful changes: (*) Don't need to take a refcount on the connection in the data_ready handler unless we invoke the connection's work item. We're using RCU there so that's otherwise redundant. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
eb9b9d22 |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Check that the client conns cache is empty before module removal Check that the client conns cache is empty before module removal and bug if not, listing any offending connections that are still present. Unfortunately, if there are connections still around, then the transport socket is still unexpectedly open and active, so we can't just unallocate the connections. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
bba304db |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Turn connection #defines into enums and put outside struct def Turn the connection event and state #define lists into enums and move outside of the struct definition. Whilst we're at it, change _SERVER to _SERVICE in those identifiers and add EV_ into the event name to distinguish them from flags and states. Also add a symbol indicating the number of states and use that in the state text array. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5acbee46 |
|
27-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide queuing helper functions Provide queueing helper functions so that the queueing of local and connection objects can be fixed later. The issue is that a ref on the object needs to be passed to the work queue, but the act of queueing the object may fail because the object is already queued. Testing the queuedness of an object before hand doesn't work because there can be a race with someone else trying to queue it. What will have to be done is to adjust the refcount depending on the result of the queue operation. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
a263629d |
|
26-Jun-2016 |
Herbert Xu <herbert@gondor.apana.org.au> |
rxrpc: Avoid using stack memory in SG lists in rxkad rxkad uses stack memory in SG lists which would not work if stacks were allocated from vmalloc memory. In fact, in most cases this isn't even necessary as the stack memory ends up getting copied over to kmalloc memory. This patch eliminates all the unnecessary stack memory uses by supplying the final destination directly to the crypto API. In two instances where a temporary buffer is actually needed we also switch use a scratch area in the rxrpc_call struct (only one DATA packet will be being secured or verified at a time). Finally there is no need to split a split-page buffer into two SG entries so code dealing with that has been removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
aa390bbe |
|
17-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Kill off the rxrpc_transport struct The rxrpc_transport struct is now redundant, given that the rxrpc_peer struct is now per peer port rather than per peer host, so get rid of it. Service connection lists are transferred to the rxrpc_peer struct, as is the conn_lock. Previous patches moved the client connection handling out of the rxrpc_transport struct and discarded the connection bundling code. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
999b69f8 |
|
17-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Kill the client connection bundle concept Kill off the concept of maintaining a bundle of connections to a particular target service to increase the number of call slots available for any beyond four for that service (there are four call slots per connection). This will make cleaning up the connection handling code easier and facilitate removal of the rxrpc_transport struct. Bundling can be reintroduced later if necessary. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5627cc8b |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide more refcount helper functions Provide refcount helper functions for connections so that the code doesn't touch local or connection usage counts directly. Also make it such that local and peer put functions can take a NULL pointer. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
985a5c82 |
|
17-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Make rxrpc_send_packet() take a connection not a transport Make rxrpc_send_packet() take a connection not a transport as part of the phasing out of the rxrpc_transport struct. Whilst we're at it, rename the function to rxrpc_send_data_packet() to differentiate it from the other packet sending functions. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4a3388c8 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use IDR to allocate client conn IDs on a machine-wide basis Use the IDR facility to allocate client connection IDs on a machine-wide basis so that each client connection has a unique identifier. When the connection ID space wraps, we advance the epoch by 1, thereby effectively having a 62-bit ID space. The IDR facility is then used to look up client connections during incoming packet routing instead of using an rbtree rooted on the transport. This change allows for the removal of the transport in the future and also means that client connections can be looked up directly in the data-ready handler by connection ID. The ID management code is placed in a new file, conn-client.c, to which all the client connection-specific code will eventually move. Note that the IDR tree gets very expensive on memory if the connection IDs are widely scattered throughout the number space, so we shall need to retire connections that have, say, an ID more than four times the maximum number of client conns away from the current allocation point to try and keep the IDs concentrated. We will also need to retire connections from an old epoch. Also note that, for the moment, a pointer to the transport has to be passed through into the ID allocation function so that we can take a BH lock to prevent a locking issue against in-BH lookup of client connections. This will go away later when RCU is used for server connections also. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
42886ffe |
|
16-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Pass sk_buff * rather than rxrpc_host_header * to functions Pass a pointer to struct sk_buff rather than struct rxrpc_host_header to functions so that they can in the future get at transport protocol parameters rather than just RxRPC parameters. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
cc8feb8e |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix exclusive connection handling "Exclusive connections" are meant to be used for a single client call and then scrapped. The idea is to limit the use of the negotiated security context. The current code, however, isn't doing this: it is instead restricting the socket to a single virtual connection and doing all the calls over that. This is changed such that the socket no longer maintains a special virtual connection over which it will do all the calls, but rather gets a new one each time a new exclusive call is made. Further, using a socket option for this is a poor choice. It should be done on sendmsg with a control message marker instead so that calls can be marked exclusive individually. To that end, add RXRPC_EXCLUSIVE_CALL which, if passed to sendmsg() as a control message element, will cause the call to be done on an single-use connection. The socket option (RXRPC_EXCLUSIVE_CONNECTION) still exists and, if set, will override any lack of RXRPC_EXCLUSIVE_CALL being specified so that programs using the setsockopt() will appear to work the same. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
19ffa01c |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use structs to hold connection params and protocol info Define and use a structure to hold connection parameters. This makes it easier to pass multiple connection parameters around. Define and use a structure to hold protocol information used to hash a connection for lookup on incoming packet. Most of these fields will be disposed of eventually, including the duplicate local pointer. Whilst we're at it rename "proto" to "family" when referring to a protocol family. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4f95dd78 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rework local endpoint management Rework the local RxRPC endpoint management. Local endpoint objects are maintained in a flat list as before. This should be okay as there shouldn't be more than one per open AF_RXRPC socket (there can be fewer as local endpoints can be shared if their local service ID is 0 and they share the same local transport parameters). Changes: (1) Local endpoints may now only be shared if they have local service ID 0 (ie. they're not being used for listening). This prevents a scenario where process A is listening of the Cache Manager port and process B contacts a fileserver - which may then attempt to send CM requests back to B. But if A and B are sharing a local endpoint, A will get the CM requests meant for B. (2) We use a mutex to handle lookups and don't provide RCU-only lookups since we only expect to access the list when opening a socket or destroying an endpoint. The local endpoint object is pointed to by the transport socket's sk_user_data for the life of the transport socket - allowing us to refer to it directly from the sk_data_ready and sk_error_report callbacks. (3) atomic_inc_not_zero() now exists and can be used to only share a local endpoint if the last reference hasn't yet gone. (4) We can remove rxrpc_local_lock - a spinlock that had to be taken with BH processing disabled given that we assume sk_user_data won't change under us. (5) The transport socket is shut down before we clear the sk_user_data pointer so that we can be sure that the transport socket's callbacks won't be invoked once the RCU destruction is scheduled. (6) Local endpoints have a work item that handles both destruction and event processing. The means that destruction doesn't then need to wait for event processing. The event queues can then be cleared after the transport socket is shut down. (7) Local endpoints are no longer available for resurrection beyond the life of the sockets that had them open. As soon as their last ref goes, they are scheduled for destruction and may not have their usage count moved from 0. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
87563616 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Separate local endpoint event handling out into its own file Separate local endpoint event handling out into its own file preparatory to overhauling the object management aspect (which remains in the original file). Signed-off-by: David Howells <dhowells@redhat.com>
|
#
f66d7490 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Use the peer record to distribute network errors Use the peer record to distribute network errors rather than the transport object (which I want to get rid of). An error from a particular peer terminates all calls on that peer. For future consideration: (1) For ICMP-induced errors it might be worth trying to extract the RxRPC header from the offending packet, if one is returned attached to the ICMP packet, to better direct the error. This may be overkill, though, since an ICMP packet would be expected to be relating to the destination port, machine or network. RxRPC ABORT and BUSY packets give notice at RxRPC level. (2) To also abort connection-level communications (such as CHALLENGE packets) where indicted by an error - but that requires some revamping of the connection event handling first. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
abe89ef0 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rename rxrpc_UDP_error_report() to rxrpc_error_report() Rename rxrpc_UDP_error_report() to rxrpc_error_report() as it might get called for something other than UDP. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
be6e6707 |
|
04-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rework peer object handling to use hash table and RCU Rework peer object handling to use a hash table instead of a flat list and to use RCU. Peer objects are no longer destroyed by passing them to a workqueue to process, but rather are just passed to the RCU garbage collector as kfree'able objects. The hash function uses the local endpoint plus all the components of the remote address, except for the RxRPC service ID. Peers thus represent a UDP port on the remote machine as contacted by a UDP port on this machine. The RCU read lock is used to handle non-creating lookups so that they can be called from bottom half context in the sk_error_report handler without having to lock the hash table against modification. rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in the future, this will be passed to a work item for error distribution in the error_report path and this function will cease being used in the data_ready path. Creating lookups are done under spinlock rather than mutex as they might be set up due to an external stimulus if the local endpoint is a server. Captured network error messages (ICMP) are handled with respect to this struct and MTU size and RTT are cached here. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0d81a51a |
|
13-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Update the comments in ar-internal.h to reflect renames Update the section comments in ar-internal.h that indicate the locations of the referenced items to reflect the renames done to the .c files in net/rxrpc/. This also involves some rearrangement to reflect keep the sections in order of filename. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0e119b41 |
|
10-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Limit the listening backlog Limit the socket incoming call backlog queue size so that a remote client can't pump in sufficient new calls that the server runs out of memory. Note that this is partially theoretical at the moment since whilst the number of calls is limited, the number of packets trying to set up new calls is not. This will be addressed in a later patch. If the caller of listen() specifies a backlog INT_MAX, then they get the current maximum; anything else greater than max_backlog or anything negative incurs EINVAL. The limit on the maximum queue size can be set by: echo N >/proc/sys/net/rxrpc/max_backlog where 4<=N<=32. Further, set the default backlog to 0, requiring listen() to be called before we start actually queueing new calls. Whilst this kind of is a change in the UAPI, the caller can't actually *accept* new calls anyway unless they've first called listen() to put the socket into the LISTENING state - thus the aforementioned new calls would otherwise just sit there, eating up kernel memory. (Note that sockets that don't have a non-zero service ID bound don't get incoming calls anyway.) Given that the default backlog is now 0, make the AFS filesystem call kernel_listen() to set the maximum backlog for itself. Possible improvements include: (1) Trimming a too-large backlog to max_backlog when listen is called. (2) Trimming the backlog value whenever the value is used so that changes to max_backlog are applied to an open socket automatically. Note that the AFS filesystem opens one socket and keeps it open for extended periods, so would miss out on changes to max_backlog. (3) Having a separate setting for the AFS filesystem. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2341e077 |
|
09-Jun-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Simplify connect() implementation and simplify sendmsg() op Simplify the RxRPC connect() implementation. It will just note the destination address it is given, and if a sendmsg() comes along with no address, this will be assigned as the address. No transport struct will be held internally, which will allow us to remove this later. Simplify sendmsg() also. Whilst a call is active, userspace refers to it by a private unique user ID specified in a control message. When sendmsg() sees a user ID that doesn't map to an extant call, it creates a new call for that user ID and attempts to add it. If, when we try to add it, the user ID is now registered, we now reject the message with -EEXIST. We should never see this situation unless two threads are racing, trying to create a call with the same ID - which would be an error. It also isn't required to provide sendmsg() with an address - provided the control message data holds a user ID that maps to a currently active call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b6d5398 |
|
02-Jun-2016 |
Joe Perches <joe@perches.com> |
rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB Use the more common kernel logging style and reduce object size. The logging message prefix changes from a mixture of "RxRPC:" and "RXRPC:" to "af_rxrpc: ". $ size net/rxrpc/built-in.o* text data bss dec hex filename 64172 1972 8304 74448 122d0 net/rxrpc/built-in.o.new 67512 1972 8304 77788 12fdc net/rxrpc/built-in.o.old Miscellanea: o Consolidate the ASSERT macros to use a single pr_err call with decimal and hexadecimal output and a stringified #OP argument Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0e4d82f |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Create a null security type and get rid of conditional calls Create a null security type for security index 0 and get rid of all conditional calls to the security operations. We expect normally to be using security, so this should be of little negative impact. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
648af7fc |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Absorb the rxkad security module Absorb the rxkad security module into the af_rxrpc module so that there's only one module file. This avoids a circular dependency whereby rxkad pins af_rxrpc and cached connections pin rxkad but can't be manually evicted (they will expire eventually and cease pinning). With this change, af_rxrpc can just be unloaded, despite having cached connections. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
843099ca |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't pass gfp around in incoming call handling functions Don't pass gfp around in incoming call handling functions, but rather hard code it at the points where we actually need it since the value comes from within the rxrpc driver and is always the same. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc44b3a0 |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Differentiate local and remote abort codes in structs In the rxrpc_connection and rxrpc_call structs, there's one field to hold the abort code, no matter whether that value was generated locally to be sent or was received from the peer via an abort packet. Split the abort code fields in two for cleanliness sake and add an error field to hold the Linux error number to the rxrpc_call struct too (sometimes this is generated in a context where we can't return it to userspace directly). Furthermore, add a skb mark to indicate a packet that caused a local abort to be generated so that recvmsg() can pick up the correct abort code. A future addition will need to be to indicate to userspace the difference between aborts via a control message. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b3e87f1 |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Static arrays of strings should be const char *const[] Static arrays of strings should be const char *const[]. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e688d9c |
|
07-Apr-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Move some miscellaneous bits out into their own file Move some miscellaneous bits out into their own file to make it easier to split the call handling. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dad8aff7 |
|
09-Mar-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Replace all unsigned with unsigned int Replace all "unsigned" types with "unsigned int" types. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4f1342f |
|
04-Mar-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Adjust some whitespace and comments Remove some excess whitespace, insert some missing spaces and adjust a couple of comments. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
0d12f8a4 |
|
04-Mar-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Keep the skb private record of the Rx header in host byte order Currently, a copy of the Rx packet header is copied into the the sk_buff private data so that we can advance the pointer into the buffer, potentially discarding the original. At the moment, this copy is held in network byte order, but this means we're doing a lot of unnecessary translations. The reasons it was done this way are that we need the values in network byte order occasionally and we can use the copy, slightly modified, as part of an iov array when sending an ack or an abort packet. However, it seems more reasonable on review that it would be better kept in host byte order and that we make up a new header when we want to send another packet. To this end, rename the original header struct to rxrpc_wire_header (with BE fields) and institute a variant called rxrpc_host_header that has host order fields. Change the struct in the sk_buff private data into an rxrpc_host_header and translate the values when filling it in. This further allows us to keep values kept in various structures in host byte order rather than network byte order and allows removal of some fields that are byteswapped duplicates. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
4c198ad1 |
|
04-Mar-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Rename call events to begin RXRPC_CALL_EV_ Rename call event names to begin RXRPC_CALL_EV_ to distinguish them from the flags. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5b8848d1 |
|
04-Mar-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Convert call flag and event numbers into enums Convert call flag and event numbers into enums and move their definitions outside of the struct. Also move the call state enum outside of the struct and add an extra element to count the number of states. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1afe593b |
|
24-Jan-2016 |
Herbert Xu <herbert@gondor.apana.org.au> |
rxrpc: Use skcipher This patch replaces uses of blkcipher with skcipher. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
22a3f9a2 |
|
17-Sep-2015 |
Ksenija Stanojevic <ksenija.stanojevic@gmail.com> |
rxrpc: Replace get_seconds with ktime_get_seconds Replace time_t type and get_seconds function which are not y2038 safe on 32-bit systems. Function ktime_get_seconds use monotonic instead of real time and therefore will not cause overflow. Signed-off-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44ba0698 |
|
01-Apr-2015 |
David Howells <dhowells@redhat.com> |
RxRPC: Handle VERSION Rx protocol packets Handle VERSION Rx protocol packets. We should respond to a VERSION packet with a string indicating the Rx version. This is a maximum of 64 characters and is padded out to 65 chars with NUL bytes. Note that other AFS clients use the version request as a NAT keepalive so we need to handle it rather than returning an abort. The standard formulation seems to be: <project> <version> built <yyyy>-<mm>-<dd> for example: " OpenAFS 1.6.2 built 2013-05-07 " (note the three extra spaces) as obtained with: rxdebug grand.mit.edu -version from the openafs package. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
1b784140 |
|
02-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
676d2369 |
|
11-Apr-2014 |
David S. Miller <davem@davemloft.net> |
net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7727640c |
|
03-Mar-2014 |
Tim Smith <tim@electronghost.co.uk> |
af_rxrpc: Keep rxrpc_call pointers in a hashtable Keep track of rxrpc_call structures in a hashtable so they can be found directly from the network parameters which define the call. This allows incoming packets to be routed directly to a call without walking through hierarchy of peer -> transport -> connection -> call and all the spinlocks that that entailed. Signed-off-by: Tim Smith <tim@electronghost.co.uk> Signed-off-by: David Howells <dhowells@redhat.com>
|
#
817913d8 |
|
07-Feb-2014 |
David Howells <dhowells@redhat.com> |
af_rxrpc: Expose more RxRPC parameters via sysctls Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU maximum size and the number of packets that can be glued into a jumbo packet. More info added to Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
5873c083 |
|
07-Feb-2014 |
David Howells <dhowells@redhat.com> |
af_rxrpc: Add sysctls for configuring RxRPC parameters Add sysctls for configuring RxRPC protocol handling, specifically controls on delays before ack generation, the delay before resending a packet, the maximum lifetime of a call and the expiration times of calls, connections and transports that haven't been recently used. More info added in Documentation/networking/rxrpc.txt. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c1b1203d |
|
18-Oct-2013 |
Joe Perches <joe@perches.com> |
net: misc: Remove extern from function prototypes There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95c96174 |
|
14-Apr-2012 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: cleanup unsigned to unsigned int Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12fdff3f |
|
12-Aug-2010 |
David Howells <dhowells@redhat.com> |
Add a dummy printk function for the maintenance of unused printks Add a dummy printk function for the maintenance of unused printks through gcc format checking, and also so that side-effect checking is maintained too. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4e36a95e |
|
16-Sep-2009 |
David Howells <dhowells@redhat.com> |
RxRPC: Use uX/sX rather than uintX_t/intX_t types Use uX rather than uintX_t types for consistency. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33941284 |
|
13-Sep-2009 |
David Howells <dhowells@redhat.com> |
RxRPC: Allow key payloads to be passed in XDR form Allow add_key() and KEYCTL_INSTANTIATE to accept key payloads in XDR form as described by openafs-1.4.10/src/auth/afs_token.xg. This provides a way of passing kaserver, Kerberos 4, Kerberos 5 and GSSAPI keys from userspace, and allows for future expansion. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f389f4b |
|
03-Apr-2008 |
Sven Schnelle <svens@stackframe.org> |
rxrpc: remove smp_processor_id() from debug macro Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
91e916cf |
|
28-Mar-2008 |
Al Viro <viro@ftp.linux.org.uk> |
net/rxrpc trivial annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0dc47877 |
|
05-Mar-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
036c2e27 |
|
30-Jan-2008 |
Jan Engelhardt <jengelh@computergmbh.de> |
[AF_RXRPC]: constify function pointer tables Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
411faf58 |
|
26-Apr-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[RXRPC]: Remove bogus atomic_* overrides. These are done with CPP defines which several platforms use for their atomic.h implementation, which floods the build with warnings and breaks the build. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
651350d1 |
|
26-Apr-2007 |
David Howells <dhowells@redhat.com> |
[AF_RXRPC]: Add an interface to the AF_RXRPC module for the AFS filesystem to use Add an interface to the AF_RXRPC module so that the AFS filesystem module can more easily make use of the services available. AFS still opens a socket but then uses the action functions in lieu of sendmsg() and registers an intercept functions to grab messages before they're queued on the socket Rx queue. This permits AFS (or whatever) to: (1) Avoid the overhead of using the recvmsg() call. (2) Use different keys directly on individual client calls on one socket rather than having to open a whole slew of sockets, one for each key it might want to use. (3) Avoid calling request_key() at the point of issue of a call or opening of a socket. This is done instead by AFS at the point of open(), unlink() or other VFS operation and the key handed through. (4) Request the use of something other than GFP_KERNEL to allocate memory. Furthermore: (*) The socket buffer markings used by RxRPC are made available for AFS so that it can interpret the cooked RxRPC messages itself. (*) rxgen (un)marshalling abort codes are made available. The following documentation for the kernel interface is added to Documentation/networking/rxrpc.txt: ========================= AF_RXRPC KERNEL INTERFACE ========================= The AF_RXRPC module also provides an interface for use by in-kernel utilities such as the AFS filesystem. This permits such a utility to: (1) Use different keys directly on individual client calls on one socket rather than having to open a whole slew of sockets, one for each key it might want to use. (2) Avoid having RxRPC call request_key() at the point of issue of a call or opening of a socket. Instead the utility is responsible for requesting a key at the appropriate point. AFS, for instance, would do this during VFS operations such as open() or unlink(). The key is then handed through when the call is initiated. (3) Request the use of something other than GFP_KERNEL to allocate memory. (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be intercepted before they get put into the socket Rx queue and the socket buffers manipulated directly. To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, bind an addess as appropriate and listen if it's to be a server socket, but then it passes this to the kernel interface functions. The kernel interface functions are as follows: (*) Begin a new client call. struct rxrpc_call * rxrpc_kernel_begin_call(struct socket *sock, struct sockaddr_rxrpc *srx, struct key *key, unsigned long user_call_ID, gfp_t gfp); This allocates the infrastructure to make a new RxRPC call and assigns call and connection numbers. The call will be made on the UDP port that the socket is bound to. The call will go to the destination address of a connected client socket unless an alternative is supplied (srx is non-NULL). If a key is supplied then this will be used to secure the call instead of the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls secured in this way will still share connections if at all possible. The user_call_ID is equivalent to that supplied to sendmsg() in the control data buffer. It is entirely feasible to use this to point to a kernel data structure. If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. (*) End a client call. void rxrpc_kernel_end_call(struct rxrpc_call *call); This is used to end a previously begun call. The user_call_ID is expunged from AF_RXRPC's knowledge and will not be seen again in association with the specified call. (*) Send data through a call. int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg, size_t len); This is used to supply either the request part of a client call or the reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the data buffers to be used. msg_iov may not be NULL and must point exclusively to in-kernel virtual addresses. msg.msg_flags may be given MSG_MORE if there will be subsequent data sends for this call. The msg must not specify a destination address, control data or any flags other than MSG_MORE. len is the total amount of data to transmit. (*) Abort a call. void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code); This is used to abort a call if it's still in an abortable state. The abort code specified will be placed in the ABORT message sent. (*) Intercept received RxRPC messages. typedef void (*rxrpc_interceptor_t)(struct sock *sk, unsigned long user_call_ID, struct sk_buff *skb); void rxrpc_kernel_intercept_rx_messages(struct socket *sock, rxrpc_interceptor_t interceptor); This installs an interceptor function on the specified AF_RXRPC socket. All messages that would otherwise wind up in the socket's Rx queue are then diverted to this function. Note that care must be taken to process the messages in the right order to maintain DATA message sequentiality. The interceptor function itself is provided with the address of the socket and handling the incoming message, the ID assigned by the kernel utility to the call and the socket buffer containing the message. The skb->mark field indicates the type of message: MARK MEANING =============================== ======================================= RXRPC_SKB_MARK_DATA Data message RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call RXRPC_SKB_MARK_BUSY Client call rejected as server busy RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer RXRPC_SKB_MARK_NET_ERROR Network error detected RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance The remote abort message can be probed with rxrpc_kernel_get_abort_code(). The two error messages can be probed with rxrpc_kernel_get_error_number(). A new call can be accepted with rxrpc_kernel_accept_call(). Data messages can have their contents extracted with the usual bunch of socket buffer manipulation functions. A data message can be determined to be the last one in a sequence with rxrpc_kernel_is_data_last(). When a data message has been used up, rxrpc_kernel_data_delivered() should be called on it.. Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose of. It is possible to get extra refs on all types of message for later freeing, but this may pin the state of a call until the message is finally freed. (*) Accept an incoming call. struct rxrpc_call * rxrpc_kernel_accept_call(struct socket *sock, unsigned long user_call_ID); This is used to accept an incoming call and to assign it a call ID. This function is similar to rxrpc_kernel_begin_call() and calls accepted must be ended in the same way. If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. (*) Reject an incoming call. int rxrpc_kernel_reject_call(struct socket *sock); This is used to reject the first incoming call on the socket's queue with a BUSY message. -ENODATA is returned if there were no incoming calls. Other errors may be returned if the call had been aborted (-ECONNABORTED) or had timed out (-ETIME). (*) Record the delivery of a data message and free it. void rxrpc_kernel_data_delivered(struct sk_buff *skb); This is used to record a data message as having been delivered and to update the ACK state for the call. The socket buffer will be freed. (*) Free a message. void rxrpc_kernel_free_skb(struct sk_buff *skb); This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC socket. (*) Determine if a data message is the last one on a call. bool rxrpc_kernel_is_data_last(struct sk_buff *skb); This is used to determine if a socket buffer holds the last data message to be received for a call (true will be returned if it does, false if not). The data message will be part of the reply on a client call and the request on an incoming call. In the latter case there will be more messages, but in the former case there will not. (*) Get the abort code from an abort message. u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb); This is used to extract the abort code from a remote abort message. (*) Get the error number from a local or network error message. int rxrpc_kernel_get_error_number(struct sk_buff *skb); This is used to extract the error number from a message indicating either a local error occurred or a network error occurred. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
17926a79 |
|
26-Apr-2007 |
David Howells <dhowells@redhat.com> |
[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve answers to AFS clients. KerberosIV security is fully supported. The patches and some example test programs can be found in: http://people.redhat.com/~dhowells/rxrpc/ This will eventually replace the old implementation of kernel-only RxRPC currently resident in net/rxrpc/. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|