History log of /linux-master/net/rxrpc/sendmsg.c
Revision Date Author Comments
# 89e43541 12-Mar-2024 David Howells <dhowells@redhat.com>

rxrpc: Fix error check on ->alloc_txbuf()

rxrpc_alloc_*_txbuf() and ->alloc_txbuf() return NULL to indicate no
memory, but rxrpc_send_data() uses IS_ERR().

Fix rxrpc_send_data() to check for NULL only and set -ENOMEM if it sees
that.

Fixes: 49489bb03a50 ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags")
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 153f90a0 30-Jan-2024 David Howells <dhowells@redhat.com>

rxrpc: Use ktimes for call timeout tracking and set the timer lazily

Track the call timeouts as ktimes rather than jiffies as the latter's
granularity is too high and only set the timer at the end of the event
handling function.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org


# 49489bb0 29-Jan-2024 David Howells <dhowells@redhat.com>

rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags

Switch from keeping the transmission buffers in the rxrpc_txbuf struct and
allocated from the slab, to allocating them using page fragment allocators
(which uses raw pages), thereby allowing them to be passed to
MSG_SPLICE_PAGES and avoid copying into the UDP buffers.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org


# ff342bdc 29-Jan-2024 David Howells <dhowells@redhat.com>

rxrpc: Add a kvec[] to the rxrpc_txbuf struct

Add a kvec[] to the rxrpc_txbuf struct to point to the contributory buffers
for a packet. Start with just a single element for now, but this will be
expanded later.

Make the ACK sending function use it, which means that rxrpc_fill_out_ack()
doesn't need to return the size of the sack table, padding and trailer.

Make the data sending code use it, both in where sendmsg() packages code up
into txbufs and where those txbufs are transmitted.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org


# 12bdff73 29-Jan-2024 David Howells <dhowells@redhat.com>

rxrpc: Convert rxrpc_txbuf::flags into a mask and don't use atomics

Convert the transmission buffer flags into a mask and use | and & rather
than bitops functions (atomic ops are not required as only the I/O thread
can manipulate them once submitted for transmission).

The bottom byte can then correspond directly to the Rx protocol header
flags.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org


# 72904d7b 18-Oct-2023 David Howells <dhowells@redhat.com>

rxrpc, afs: Allow afs to pin rxrpc_peer objects

Change rxrpc's API such that:

(1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
rxrpc_peer record for a remote address and a corresponding function,
rxrpc_kernel_put_peer(), is provided to dispose of it again.

(2) When setting up a call, the rxrpc_peer object used during a call is
now passed in rather than being set up by rxrpc_connect_call(). For
afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
the full address (the service ID then has to be passed in as a
separate parameter).

(3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
get a pointer to the transport address for display purposed, and
another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
rxrpc address.

(4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
is then altered to take a peer. This now returns the RTT or -1 if
there are insufficient samples.

(5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().

(6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
peer the caller already has.

This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents. It also makes it easier to get hold of
the RTT. The following changes are made to afs:

(1) The addr_list struct's addrs[] elements now hold a peer struct pointer
and a service ID rather than a sockaddr_rxrpc.

(2) When displaying the transport address, rxrpc_kernel_remote_addr() is
used.

(3) The port arg is removed from afs_alloc_addrlist() since it's always
overridden.

(4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
now return an error that must be handled.

(5) afs_find_server() now takes a peer pointer to specify the address.

(6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
now do peer pointer comparison rather than address comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# db099c62 28-Apr-2023 David Howells <dhowells@redhat.com>

rxrpc: Fix timeout of a call that hasn't yet been granted a channel

afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may
get stalled in the background waiting for a connection to become
available); it then calls rxrpc_kernel_set_max_life() to set the timeouts -
but that starts the call timer so the call timer might then expire before
we get a connection assigned - leading to the following oops if the call
stalled:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701
RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157
...
Call Trace:
<TASK>
rxrpc_send_ACK+0x50/0x13b
rxrpc_input_call_event+0x16a/0x67d
rxrpc_io_thread+0x1b6/0x45f
? _raw_spin_unlock_irqrestore+0x1f/0x35
? rxrpc_input_packet+0x519/0x519
kthread+0xe7/0xef
? kthread_complete_and_exit+0x1b/0x1b
ret_from_fork+0x22/0x30

Fix this by noting the timeouts in struct rxrpc_call when the call is
created. The timer will be started when the first packet is transmitted.

It shouldn't be possible to trigger this directly from userspace through
AF_RXRPC as sendmsg() will return EBUSY if the call is in the
waiting-for-conn state if it dropped out of the wait due to a signal.

Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0eb362d2 28-Apr-2023 David Howells <dhowells@redhat.com>

rxrpc: Make it so that a waiting process can be aborted

When sendmsg() creates an rxrpc call, it queues it to wait for a connection
and channel to be assigned and then waits before it can start shovelling
data as the encrypted DATA packet content includes a summary of the
connection parameters.

However, sendmsg() may get interrupted before a connection gets assigned
and further sendmsg() calls will fail with EBUSY until an assignment is
made.

Fix this so that the call can at least be aborted without failing on
EBUSY. We have to be careful here as sendmsg() mustn't be allowed to start
the call timer if the call doesn't yet have a connection assigned as an
oops may follow shortly thereafter.

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d098d83 28-Apr-2023 David Howells <dhowells@redhat.com>

rxrpc: Fix hard call timeout units

The hard call timeout is specified in the RXRPC_SET_CALL_TIMEOUT cmsg in
seconds, so fix the point at which sendmsg() applies it to the call to
convert to jiffies from seconds, not milliseconds.

Fixes: a158bdd3247b ("rxrpc: Fix timeout of a call that hasn't yet been granted a channel")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b5fdc0f 25-Apr-2023 David Howells <dhowells@redhat.com>

rxrpc: Fix potential data race in rxrpc_wait_to_be_connected()

Inside the loop in rxrpc_wait_to_be_connected() it checks call->error to
see if it should exit the loop without first checking the call state. This
is probably safe as if call->error is set, the call is dead anyway, but we
should probably wait for the call state to have been set to completion
first, lest it cause surprise on the way out.

Fix this by only accessing call->error if the call is complete. We don't
actually need to access the error inside the loop as we'll do that after.

This caused the following report:

BUG: KCSAN: data-race in rxrpc_send_data / rxrpc_set_call_completion

write to 0xffff888159cf3c50 of 4 bytes by task 25673 on cpu 1:
rxrpc_set_call_completion+0x71/0x1c0 net/rxrpc/call_state.c:22
rxrpc_send_data_packet+0xba9/0x1650 net/rxrpc/output.c:479
rxrpc_transmit_one+0x1e/0x130 net/rxrpc/output.c:714
rxrpc_decant_prepared_tx net/rxrpc/call_event.c:326 [inline]
rxrpc_transmit_some_data+0x496/0x600 net/rxrpc/call_event.c:350
rxrpc_input_call_event+0x564/0x1220 net/rxrpc/call_event.c:464
rxrpc_io_thread+0x307/0x1d80 net/rxrpc/io_thread.c:461
kthread+0x1ac/0x1e0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

read to 0xffff888159cf3c50 of 4 bytes by task 25672 on cpu 0:
rxrpc_send_data+0x29e/0x1950 net/rxrpc/sendmsg.c:296
rxrpc_do_sendmsg+0xb7a/0xc20 net/rxrpc/sendmsg.c:726
rxrpc_sendmsg+0x413/0x520 net/rxrpc/af_rxrpc.c:565
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
___sys_sendmsg net/socket.c:2555 [inline]
__sys_sendmmsg+0x263/0x500 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000000 -> 0xffffffea

Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread")
Reported-by: syzbot+ebc945fdb4acd72cba78@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000e7c6d205fa10a3cd@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Dmitry Vyukov <dvyukov@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/508133.1682427395@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 9d35d880 19-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Move client call connection to the I/O thread

Move the connection setup of client calls to the I/O thread so that a whole
load of locking and barrierage can be eliminated. This necessitates the
app thread waiting for connection to complete before it can begin
encrypting data.

This also completes the fix for a race that exists between call connection
and call disconnection whereby the data transmission code adds the call to
the peer error distribution list after the call has been disconnected (say
by the rxrpc socket getting closed).

The fix is to complete the process of moving call connection, data
transmission and call disconnection into the I/O thread and thus forcibly
serialising them.

Note that the issue may predate the overhaul to an I/O thread model that
were included in the merge window for v6.2, but the timing is very much
changed by the change given below.

Fixes: cf37b5987508 ("rxrpc: Move DATA transmission into call processor work item")
Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 96b4059f 27-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Remove call->state_lock

All the setters of call->state are now in the I/O thread and thus the state
lock is now unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 2d689424 11-Nov-2022 David Howells <dhowells@redhat.com>

rxrpc: Move call state changes from sendmsg to I/O thread

Move all the call state changes that are made in rxrpc_sendmsg() to the I/O
thread. This is a step towards removing the call state lock.

This requires the switch to the RXRPC_CALL_CLIENT_AWAIT_REPLY and
RXRPC_CALL_SERVER_SEND_REPLY states to be done when the last packet is
decanted from ->tx_sendmsg to ->tx_buffer in the I/O thread, not when it is
added to ->tx_sendmsg by sendmsg().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# d41b3f5b 19-Dec-2022 David Howells <dhowells@redhat.com>

rxrpc: Wrap accesses to get call state to put the barrier in one place

Wrap accesses to get the state of a call from outside of the I/O thread in
a single place so that the barrier needed to order wrt the error code and
abort code is in just that place.

Also use a barrier when setting the call state and again when reading the
call state such that the auxiliary completion info (error code, abort code)
can be read without taking a read lock on the call state lock.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 1bab27af 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters

Use the information now stored in struct rxrpc_call to configure the
connection bundle and thence the connection, rather than using the
rxrpc_conn_parameters struct.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 57af281e 06-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Tidy up abort generation infrastructure

Tidy up the abort generation infrastructure in the following ways:

(1) Create an enum and string mapping table to list the reasons an abort
might be generated in tracing.

(2) Replace the 3-char string with the values from (1) in the places that
use that to log the abort source. This gets rid of a memcpy() in the
tracepoint.

(3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
and use values from (1) to indicate the trace reason.

(4) Always make a call to an abort function at the point of the abort
rather than stashing the values into variables and using goto to get
to a place where it reported. The C optimiser will collapse the calls
together as appropriate. The abort functions return a value that can
be returned directly if appropriate.

Note that this extends into afs also at the points where that generates an
abort. To aid with this, the afs sources need to #define
RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
because they don't have access to the rxrpc internal structures that some
of the tracepoints make use of.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# a343b174 12-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Only set/transmit aborts in the I/O thread

Only set the abort call completion state in the I/O thread and only
transmit ABORT packets from there. rxrpc_abort_call() can then be made to
actually send the packet.

Further, ABORT packets should only be sent if the call has been exposed to
the network (ie. at least one attempted DATA transmission has occurred for
it).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 4feb2c44 15-Dec-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix missing unlock in rxrpc_do_sendmsg()

One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex
before returning. Fix it to do this.

Note that this still doesn't get rid of the checker warning:

../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit

I think the interplay between the socket lock and the call's user_mutex may
be too complicated for checker to analyse, especially as
rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the
call's user_mutex if successful but unconditionally drops the socket lock.

Fixes: e754eba685aa ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0346843 30-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Transmit ACKs at the point of generation

For ACKs generated inside the I/O thread, transmit the ACK at the point of
generation. Where the ACK is generated outside of the I/O thread, it's
offloaded to the I/O thread to transmit it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 3dd9c8b5 24-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Remove the _bh annotation from all the spinlocks

None of the spinlocks in rxrpc need a _bh annotation now as the RCU
callback routines no longer take spinlocks and the bulk of the packet
wrangling code is now run in the I/O thread, not softirq context.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 5e6ef4f1 23-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Make the I/O thread take over the call and local processor work

Move the functions from the call->processor and local->processor work items
into the domain of the I/O thread.

The call event processor, now called from the I/O thread, then takes over
the job of cranking the call state machine, processing incoming packets and
transmitting DATA, ACK and ABORT packets. In a future patch,
rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it
for later transmission.

The call event processor becomes purely received-skb driven. It only
transmits things in response to events. We use "pokes" to queue a dummy
skb to make it do things like start/resume transmitting data. Timer expiry
also results in pokes.

The connection event processor, becomes similar, though crypto events, such
as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item
to avoid doing crypto in the I/O thread.

The local event processor is removed and VERSION response packets are
generated directly from the packet parser. Similarly, ABORTs generated in
response to protocol errors will be transmitted immediately rather than
being pushed onto a queue for later transmission.

Changes:
========
ver #2)
- Fix a couple of introduced lock context imbalances.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# cf37b598 31-Mar-2022 David Howells <dhowells@redhat.com>

rxrpc: Move DATA transmission into call processor work item

Move DATA transmission into the call processor work item. In a future
patch, this will be called from the I/O thread rather than being itsown
work item.

This will allow DATA transmission to be driven directly by incoming ACKs,
pokes and timers as those are processed.

The Tx queue is also split: The queue of packets prepared by sendmsg is now
places in call->tx_sendmsg and the packet dispatcher decants the packets
into call->tx_buffer as space becomes available in the transmission
window. This allows sendmsg to run ahead of the available space to try and
prevent an underflow in transmission.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# cb0fc0c9 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_call tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 47c810a7 21-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing

In rxrpc tracing, use enums to generate lists of points of interest rather
than __builtin_return_address() for the rxrpc_peer tracepoint

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 1fc4fa2a 03-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix congestion management

rxrpc has a problem in its congestion management in that it saves the
congestion window size (cwnd) from one call to another, but if this is 0 at
the time is saved, then the next call may not actually manage to ever
transmit anything.

To this end:

(1) Don't save cwnd between calls, but rather reset back down to the
initial cwnd and re-enter slow-start if data transmission is idle for
more than an RTT.

(2) Preserve ssthresh instead, as that is a handy estimate of pipe
capacity. Knowing roughly when to stop slow start and enter
congestion avoidance can reduce the tendency to overshoot and drop
larger amounts of packets when probing.

In future, cwind growth also needs to be constrained when the window isn't
being filled due to being application limited.

Reported-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# d57a3a15 07-May-2022 David Howells <dhowells@redhat.com>

rxrpc: Save last ACK's SACK table rather than marking txbufs

Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets. Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen. If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# a4ea4c47 31-Mar-2022 David Howells <dhowells@redhat.com>

rxrpc: Don't use a ring buffer for call Tx queue

Change the way the Tx queueing works to make the following ends easier to
achieve:

(1) The filling of packets, the encryption of packets and the transmission
of packets can be handled in parallel by separate threads, rather than
rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
packet before moving onto the next one.

(2) Get rid of the fixed-size ring which sets a hard limit on the number
of packets that can be retained in the ring. This allows the number
of packets to increase without having to allocate a very large ring or
having variable-sized rings.

[Note: the downside of this is that it's then less efficient to locate
a packet for retransmission as we then have to step through a list and
examine each buffer in the list.]

(3) Allow the filler/encrypter to run ahead of the transmission window.

(4) Make it easier to do zero copy UDP from the packet buffers.

(5) Make it easier to do zero copy from userspace to the packet buffers -
and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

(1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
to be transmitted in. This allows them to be placed on multiple
queues simultaneously. An sk_buff isn't really necessary as it's
never passed on to lower-level networking code.

(2) Keep the transmissable packets in a linked list on the call struct
rather than in a ring. As a consequence, the annotation buffer isn't
used either; rather a flag is set on the packet to indicate ackedness.

(3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
transmitted has been queued. Add RXRPC_CALL_TX_ALL_ACKED to indicate
that all packets up to and including the last got hard acked.

(4) Wire headers are now stored in the txbuf rather than being concocted
on the stack and they're stored immediately before the data, thereby
allowing zerocopy of a single span.

(5) Don't bother with instant-resend on transmission failure; rather,
leave it for a timer or an ACK packet to trigger.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 530403d9 30-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Clean up ACK handling

Clean up the rxrpc_propose_ACK() function. If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs. All other
ACKs, bar terminal IDLE ACK are sent immediately. The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval. Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 72f0c6fb 30-Jan-2020 David Howells <dhowells@redhat.com>

rxrpc: Allocate ACK records at proposal and queue for transmission

Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# 27f699cc 07-Oct-2022 David Howells <dhowells@redhat.com>

rxrpc: Remove the flags from the rxrpc_skb tracepoint

Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
be using this for the transmission buffers and so marking which are
transmission buffers isn't going to be necessary.

Note that this also remove the rxrpc skb flag that indicates if this is a
transmission buffer and so the count is not updated for the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# b0154246 11-May-2022 David Howells <dhowells@redhat.com>

rxrpc: Add stats procfile and DATA packet stats

Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
what rxrpc has been doing. Writing a blank line to the stats file will
clear the increment-only counters. Allocated resource counters don't get
cleared.

Add some counters to count various things about DATA packets, including the
number created, transmitted and retransmitted and the number received, the
number of ACK-requests markings and the number of jumbo packets received.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org


# b0f571ec 24-Aug-2022 David Howells <dhowells@redhat.com>

rxrpc: Fix locking in rxrpc's sendmsg

Fix three bugs in the rxrpc's sendmsg implementation:

(1) rxrpc_new_client_call() should release the socket lock when returning
an error from rxrpc_get_call_slot().

(2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
held in the event that we're interrupted by a signal whilst waiting
for tx space on the socket or relocking the call mutex afterwards.

Fix this by: (a) moving the unlock/lock of the call mutex up to
rxrpc_send_data() such that the lock is not held around all of
rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
whether we're return with the lock dropped. Note that this means
recvmsg() will not block on this call whilst we're waiting.

(3) After dropping and regaining the call mutex, rxrpc_send_data() needs
to go and recheck the state of the tx_pending buffer and the
tx_total_len check in case we raced with another sendmsg() on the same
call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait
for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
__lock_release kernel/locking/lockdep.c:5306 [inline]
lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
__mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4ba68c51 21-May-2022 David Howells <dhowells@redhat.com>

rxrpc: Return an error to sendmsg if call failed

If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that
was being given data was aborted remotely or otherwise failed, return an
error rather than returning the amount of data buffered for transmission.

The call (presumably) did not complete, so there's not much point
continuing with it. AF_RXRPC considers it "complete" and so will be
unwilling to do anything else with it - and won't send a notification for
it, deeming the return from sendmsg sufficient.

Not returning an error causes afs to incorrectly handle a StoreData
operation that gets interrupted by a change of address due to NAT
reconfiguration.

This doesn't normally affect most operations since their request parameters
tend to fit into a single UDP packet and afs_make_call() returns before the
server responds; StoreData is different as it involves transmission of a
lot of data.

This can be triggered on a client by doing something like:

dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512

at one prompt, and then changing the network address at another prompt,
e.g.:

ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0

Tracing packets on an Auristor fileserver looks something like:

192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
<ARP exchange for 192.168.6.2>
192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001

The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort
code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the
condition and generates an abort first and the unmarshal error is a
consequence of that at the application layer.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7d775b1 15-Sep-2020 David Howells <dhowells@redhat.com>

rxrpc: Ask the security class how much space to allow in a packet

Ask the security class how much header and trailer space to allow for when
allocating a packet, given how much data is remaining.

This will allow the rxgk security class to stick both a trailer in as well
as a header as appropriate in the future.

Signed-off-by: David Howells <dhowells@redhat.com>


# f4bdf3d6 17-Sep-2020 David Howells <dhowells@redhat.com>

rxrpc: Don't reserve security header in Tx DATA skbuff

Insert the security header into the skbuff representing a DATA packet to be
transmitted rather than using skb_reserve() when the packet is allocated.
This makes it easier to apply crypto that spans the security header and the
data, particularly in the upcoming RxGK class where we have a common
encrypt-and-checksum function that is used in a number of circumstances.

Signed-off-by: David Howells <dhowells@redhat.com>


# 2d914c1b 30-Sep-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix accept on a connection that need securing

When a new incoming call arrives at an userspace rxrpc socket on a new
connection that has a security class set, the code currently pushes it onto
the accept queue to hold a ref on it for the socket. This doesn't work,
however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
state and discards the ref. This means that the call runs out of refs too
early and the kernel oopses.

By contrast, a kernel rxrpc socket manually pre-charges the incoming call
pool with calls that already have user call IDs assigned, so they are ref'd
by the call tree on the socket.

Change the mode of operation for userspace rxrpc server sockets to work
like this too. Although this is a UAPI change, server sockets aren't
currently functional.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 65550098 28-Jul-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix race between recvmsg and sendmsg on immediate call failure

There's a race between rxrpc_sendmsg setting up a call, but then failing to
send anything on it due to an error, and recvmsg() seeing the call
completion occur and trying to return the state to the user.

An assertion fails in rxrpc_recvmsg() because the call has already been
released from the socket and is about to be released again as recvmsg deals
with it. (The recvmsg_q queue on the socket holds a ref, so there's no
problem with use-after-free.)

We also have to be careful not to end up reporting an error twice, in such
a way that both returns indicate to userspace that the user ID supplied
with the call is no longer in use - which could cause the client to
malfunction if it recycles the user ID fast enough.

Fix this by the following means:

(1) When sendmsg() creates a call after the point that the call has been
successfully added to the socket, don't return any errors through
sendmsg(), but rather complete the call and let recvmsg() retrieve
them. Make sendmsg() return 0 at this point. Further calls to
sendmsg() for that call will fail with ESHUTDOWN.

Note that at this point, we haven't send any packets yet, so the
server doesn't yet know about the call.

(2) If sendmsg() returns an error when it was expected to create a new
call, it means that the user ID wasn't used.

(3) Mark the call disconnected before marking it completed to prevent an
oops in rxrpc_release_call().

(4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
that the user ID is no longer known by the kernel.

An oops like the following is produced:

kernel BUG at net/rxrpc/recvmsg.c:605!
...
RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
...
Call Trace:
? __init_waitqueue_head+0x2f/0x2f
____sys_recvmsg+0x8a/0x148
? import_iovec+0x69/0x9c
? copy_msghdr_from_user+0x5c/0x86
___sys_recvmsg+0x72/0xaa
? __fget_files+0x22/0x57
? __fget_light+0x46/0x51
? fdget+0x9/0x1b
do_recvmmsg+0x15e/0x232
? _raw_spin_unlock+0xa/0xb
? vtime_delta+0xf/0x25
__x64_sys_recvmmsg+0x2c/0x2f
do_syscall_64+0x4c/0x78
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 639f181f 19-Jul-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA

rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.

Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
as this particular error doesn't get stored in ->sk_err by the networking
core.

Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
errors (there's no way for it to report which call, if any, the error was
caused by).

Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ac0d622 03-Jun-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix missing notification

Under some circumstances, rxrpc will fail a transmit a packet through the
underlying UDP socket (ie. UDP sendmsg returns an error). This may result
in a call getting stuck.

In the instance being seen, where AFS tries to send a probe to the Volume
Location server, tracepoints show the UDP Tx failure (in this case returing
error 99 EADDRNOTAVAIL) and then nothing more:

afs_make_vl_call: c=0000015d VL.GetCapabilities
rxrpc_call: c=0000015d NWc u=1 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000dd89ee8a
rxrpc_call: c=0000015d Gus u=2 sp=rxrpc_new_client_call+0x14f/0x580 [rxrpc] a=00000000e20e4b08
rxrpc_call: c=0000015d SEE u=2 sp=rxrpc_activate_one_channel+0x7b/0x1c0 [rxrpc] a=00000000e20e4b08
rxrpc_call: c=0000015d CON u=2 sp=rxrpc_kernel_begin_call+0x106/0x170 [rxrpc] a=00000000e20e4b08
rxrpc_tx_fail: c=0000015d r=1 ret=-99 CallDataNofrag

The problem is that if the initial packet fails and the retransmission
timer hasn't been started, the call is set to completed and an error is
returned from rxrpc_send_data_packet() to rxrpc_queue_packet(). Though
rxrpc_instant_resend() is called, this does nothing because the call is
marked completed.

So rxrpc_notify_socket() isn't called and the error is passed back up to
rxrpc_send_data(), rxrpc_kernel_send_data() and thence to afs_make_call()
and afs_vl_get_capabilities() where it is simply ignored because it is
assumed that the result of a probe will be collected asynchronously.

Fileserver probing is similarly affected via afs_fs_get_capabilities().

Fix this by always issuing a notification in __rxrpc_set_call_completion()
if it shifts a call to the completed state, even if an error is also
returned to the caller through the function return value.

Also put in a little bit of optimisation to avoid taking the call
state_lock and disabling softirqs if the call is already in the completed
state and remove some now redundant rxrpc_notify_socket() calls.

Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state")
Reported-by: Gerry Seidman <gerry@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>


# 3067bf8c 03-Jun-2020 David Howells <dhowells@redhat.com>

rxrpc: Move the call completion handling out of line

Move the handling of call completion out of line so that the next patch can
add more code in that area.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>


# c410bf01 11-May-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix the excessive initial retransmission timeout

rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled. This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.

Fix this by:

(1) Copying the RTT/RTO calculation code from Linux's TCP implementation
and altering it to fit rxrpc.

(2) Altering the various users of the RTT to make use of the new SRTT
value.

(3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
value instead (which is needed in jiffies), along with a backoff.

Notes:

(1) rxrpc provides RTT samples by matching the serial numbers on outgoing
DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
against the reference serial number in incoming REQUESTED ACK and
PING-RESPONSE ACK packets.

(2) Each packet that is transmitted on an rxrpc connection gets a new
per-connection serial number, even for retransmissions, so an ACK can
be cross-referenced to a specific trigger packet. This allows RTT
information to be drawn from retransmitted DATA packets also.

(3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
on an rxrpc_call because many RPC calls won't live long enough to
generate more than one sample.

(4) The calculated SRTT value is in units of 8ths of a microsecond rather
than nanoseconds.

The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.

Fixes: 17926a79320a ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: David Howells <dhowells@redhat.com>


# 498b5776 13-Mar-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix sendmsg(MSG_WAITALL) handling

Fix the handling of sendmsg() with MSG_WAITALL for userspace to round the
timeout for when a signal occurs up to at least two jiffies as a 1 jiffy
timeout may end up being effectively 0 if jiffies wraps at the wrong time.

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells@redhat.com>


# e138aa7d 13-Mar-2020 David Howells <dhowells@redhat.com>

rxrpc: Fix call interruptibility handling

Fix the interruptibility of kernel-initiated client calls so that they're
either only interruptible when they're waiting for a call slot to come
available or they're not interruptible at all. Either way, they're not
interruptible during transmission.

This should help prevent StoreData calls from being interrupted when
writeback is in progress. It doesn't, however, handle interruption during
the receive phase.

Userspace-initiated calls are still interruptable. After the signal has
been handled, sendmsg() will return the amount of data copied out of the
buffer and userspace can perform another sendmsg() call to continue
transmission.

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells@redhat.com>


# 158fe666 13-Mar-2020 David Howells <dhowells@redhat.com>

rxrpc: Abstract out the calculation of whether there's Tx space

Abstract out the calculation of there being sufficient Tx buffer space.
This is reproduced several times in the rxrpc sendmsg code.

Signed-off-by: David Howells <dhowells@redhat.com>


# 91fcfbe8 07-Oct-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix call crypto state cleanup

Fix the cleanup of the crypto state on a call after the call has been
disconnected. As the call has been disconnected, its connection ref has
been discarded and so we can't go through that to get to the security ops
table.

Fix this by caching the security ops pointer in the rxrpc_call struct and
using that when freeing the call security state. Also use this in other
places we're dealing with call-specific security.

The symptoms look like:

BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
net/rxrpc/call_object.c:481
Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764

Fixes: 1db88c534371 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>


# c48fc11b 07-Oct-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix call ref leak

When sendmsg() finds a call to continue on with, if the call is in an
inappropriate state, it doesn't release the ref it just got on that call
before returning an error.

This causes the following symptom to show up with kasan:

BUG: KASAN: use-after-free in rxrpc_send_keepalive+0x8a2/0x940
net/rxrpc/output.c:635
Read of size 8 at addr ffff888064219698 by task kworker/0:3/11077

where line 635 is:

whdr.epoch = htonl(peer->local->rxnet->epoch);

The local endpoint (which cannot be pinned by the call) has been released,
but not the peer (which is pinned by the call).

Fix this by releasing the call in the error path.

Fixes: 37411cad633f ("rxrpc: Fix potential NULL-pointer exception")
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>


# 987db9f7 19-Aug-2019 David Howells <dhowells@redhat.com>

rxrpc: Use the tx-phase skb flag to simplify tracing

Use the previously-added transmit-phase skbuff private flag to simplify the
socket buffer tracing a bit. Which phase the skbuff comes from can now be
divined from the skb rather than having to be guessed from the call state.

We can also reduce the number of rxrpc_skb_trace values by eliminating the
difference between Tx and Rx in the symbols.

Signed-off-by: David Howells <dhowells@redhat.com>


# b311e684 19-Aug-2019 David Howells <dhowells@redhat.com>

rxrpc: Add a private skb flag to indicate transmission-phase skbs

Add a flag in the private data on an skbuff to indicate that this is a
transmission-phase buffer rather than a receive-phase buffer.

Signed-off-by: David Howells <dhowells@redhat.com>


# c69565ee 30-Jul-2019 David Howells <dhowells@redhat.com>

rxrpc: Fix the lack of notification when sendmsg() fails on a DATA packet

Fix the fact that a notification isn't sent to the recvmsg side to indicate
a call failed when sendmsg() fails to transmit a DATA packet with the error
ENETUNREACH, EHOSTUNREACH or ECONNREFUSED.

Without this notification, the afs client just sits there waiting for the
call to complete in some manner (which it's not now going to do), which
also pins the rxrpc call in place.

This can be seen if the client has a scope-level IPv6 address, but not a
global-level IPv6 address, and we try and transmit an operation to a
server's IPv6 address.

Looking in /proc/net/rxrpc/calls shows completed calls just sat there with
an abort code of RX_USER_ABORT and an error code of -ENETUNREACH.

Fixes: c54e43d752c7 ("rxrpc: Fix missing start of call timeout")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>


# b4d0d230 20-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public licence as published by
the free software foundation either version 2 of the licence or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 114 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b960a34b 09-May-2019 David Howells <dhowells@redhat.com>

rxrpc: Allow the kernel to mark a call as being non-interruptible

Allow kernel services using AF_RXRPC to indicate that a call should be
non-interruptible. This allows kafs to make things like lock-extension and
writeback data storage calls non-interruptible.

If this is set, signals will be ignored for operations on that call where
possible - such as waiting to get a call channel on an rxrpc connection.

It doesn't prevent UDP sendmsg from being interrupted, but that will be
handled by packet retransmission.

rxrpc_kernel_recv_data() isn't affected by this since that never waits,
preferring instead to return -EAGAIN and leave the waiting to the caller.

Userspace initiated calls can't be set to be uninterruptible at this time.

Signed-off-by: David Howells <dhowells@redhat.com>


# 8e8715aa 12-Apr-2019 Marc Dionne <marc.dionne@auristor.com>

rxrpc: Allow errors to be returned from rxrpc_queue_packet()

Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.

This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e122d845 10-Jan-2019 David Howells <dhowells@redhat.com>

Revert "rxrpc: Allow failed client calls to be retried"

The changes introduced to allow rxrpc calls to be retried creates an issue
when it comes to refcounting afs_call structs. The problem is that when
rxrpc_send_data() queues the last packet for an asynchronous call, the
following sequence can occur:

(1) The notify_end_tx callback is invoked which causes the state in the
afs_call to be changed from AFS_CALL_CL_REQUESTING or
AFS_CALL_SV_REPLYING.

(2) afs_deliver_to_call() can then process event notifications from rxrpc
on the async_work queue.

(3) Delivery of events, such as an abort from the server, can cause the
afs_call state to be changed to AFS_CALL_COMPLETE on async_work.

(4) For an asynchronous call, afs_process_async_call() notes that the call
is complete and tried to clean up all the refs on async_work.

(5) rxrpc_send_data() might return the amount of data transferred
(success) or an error - which could in turn reflect a local error or a
received error.

Synchronising the clean up after rxrpc_kernel_send_data() returns an error
with the asynchronous cleanup is then tricky to get right.

Mostly revert commit c038a58ccfd6704d4d7d60ed3d6a0fca13cf13a4. The two API
functions the original commit added aren't currently used. This makes
rxrpc_kernel_send_data() always return successfully if it queued the data
it was given.

Note that this doesn't affect synchronous calls since their Rx notification
function merely pokes a wait queue and does not refcounting. The
asynchronous call notification function *has* to do refcounting and pass a
ref over the work item to avoid the need to sync the workqueue in call
cleanup.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c54e43d7 10-May-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix missing start of call timeout

The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point. This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet. Without this, the call may wait forever if
the server doesn't deign to reply.

Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet. The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>


# 17226f12 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix leak of rxrpc_peer objects

When a new client call is requested, an rxrpc_conn_parameters struct object
is passed in with a bunch of parameters set, such as the local endpoint to
use. A pointer to the target peer record is also placed in there by
rxrpc_get_client_conn() - and this is removed if and only if a new
connection object is allocated. Thus it leaks if a new connection object
isn't allocated.

Fix this by putting any peer object attached to the rxrpc_conn_parameters
object in the function that allocated it.

Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells <dhowells@redhat.com>


# 88f2a825 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix checker warnings and errors

Fix various issues detected by checker.

Errors:

(*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
call->socket.

Warnings:

(*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
trace_rxrpc_conn() as the where argument.

(*) rxrpc_disconnect_client_call() should get its net pointer via the
call->conn rather than call->sock to avoid a warning about accessing
an RCU pointer without protection.

(*) Proc seq start/stop functions need annotation as they pass locks
between the functions.

False positives:

(*) Checker doesn't correctly handle of seq-retry lock context balance in
rxrpc_find_service_conn_rcu().

(*) Checker thinks execution may proceed past the BUG() in
rxrpc_publish_service_conn().

(*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
rxkad.c.

Signed-off-by: David Howells <dhowells@redhat.com>


# 03877bf6 30-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc: Fix Tx ring annotation after initial Tx failure

rxrpc calls have a ring of packets that are awaiting ACK or retransmission
and a parallel ring of annotations that tracks the state of those packets.
If the initial transmission of a packet on the underlying UDP socket fails
then the packet annotation is marked for resend - but the setting of this
mark accidentally erases the last-packet mark also stored in the same
annotation slot. If this happens, a call won't switch out of the Tx phase
when all the packets have been transmitted.

Fix this by retaining the last-packet mark and only altering the packet
state.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>


# a25e21f0 27-Mar-2018 David Howells <dhowells@redhat.com>

rxrpc, afs: Use debug_ids rather than pointers in traces

In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful. Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowells@redhat.com>


# 282ef472 28-Nov-2017 Gustavo A. R. Silva <garsilva@embeddedor.com>

rxrpc: Fix variable overwrite

Values assigned to both variable resend_at and ack_at are overwritten
before they can be used.

The correct fix here is to add 'now' to the previously computed value in
resend_at and ack_at.

Addresses-Coverity-ID: 1462262
Addresses-Coverity-ID: 1462263
Addresses-Coverity-ID: 1462264
Fixes: beb8e5e4f38c ("rxrpc: Express protocol timeouts in terms of RTT")
Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.uk
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David Howells <dhowells@redhat.com>


# bd1fdf8c 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Add a timeout for detecting lost ACKs/lost DATA

Add an extra timeout that is set/updated when we send a DATA packet that
has the request-ack flag set. This allows us to detect if we don't get an
ACK in response to the latest flagged packet.

The ACK packet is adjudged to have been lost if it doesn't turn up within
2*RTT of the transmission.

If the timeout occurs, we schedule the sending of a PING ACK to find out
the state of the other side. If a new DATA packet is ready to go sooner,
we cancel the sending of the ping and set the request-ack flag on that
instead.

If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
we had at the time of the ping transmission, we adjudge all the DATA
packets sent between the response tx_top and the ping-time tx_top to have
been lost and retransmit immediately.

Rather than sending a PING ACK, we could just pick a DATA packet and
speculatively retransmit that with request-ack set. It should result in
either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
a PING-RESPONSE ACK mentioned above.

Signed-off-by: David Howells <dhowells@redhat.com>


# beb8e5e4 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Express protocol timeouts in terms of RTT

Express protocol timeouts for data retransmission and deferred ack
generation in terms on RTT rather than specified timeouts once we have
sufficient RTT samples.

For the moment, this requires just one RTT sample to be able to use this
for ack deferral and two for data retransmission.

The data retransmission timeout is set at RTT*1.5 and the ACK deferral
timeout is set at RTT.

Note that the calculated timeout is limited to a minimum of 4ns to make
sure it doesn't happen too quickly.

Signed-off-by: David Howells <dhowells@redhat.com>


# a158bdd3 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix call timeouts

Fix the rxrpc call expiration timeouts and make them settable from
userspace. By analogy with other rx implementations, there should be three
timeouts:

(1) "Normal timeout"

This is set for all calls and is triggered if we haven't received any
packets from the peer in a while. It is measured from the last time
we received any packet on that call. This is not reset by any
connection packets (such as CHALLENGE/RESPONSE packets).

If a service operation takes a long time, the server should generate
PING ACKs at a duration that's substantially less than the normal
timeout so is to keep both sides alive. This is set at 1/6 of normal
timeout.

(2) "Idle timeout"

This is set only for a service call and is triggered if we stop
receiving the DATA packets that comprise the request data. It is
measured from the last time we received a DATA packet.

(3) "Hard timeout"

This can be set for a call and specified the maximum lifetime of that
call. It should not be specified by default. Some operations (such
as volume transfer) take a long time.

Allow userspace to set/change the timeouts on a call with sendmsg, using a
control message:

RXRPC_SET_CALL_TIMEOUTS

The data to the message is a number of 32-bit words, not all of which need
be given:

u32 hard_timeout; /* sec from first packet */
u32 idle_timeout; /* msec from packet Rx */
u32 normal_timeout; /* msec from data Rx */

This can be set in combination with any other sendmsg() that affects a
call.

Signed-off-by: David Howells <dhowells@redhat.com>


# 48124178 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Split the call params from the operation params

When rxrpc_sendmsg() parses the control message buffer, it places the
parameters extracted into a structure, but lumps together call parameters
(such as user call ID) with operation parameters (such as whether to send
data, send an abort or accept a call).

Split the call parameters out into their own structure, a copy of which is
then embedded in the operation parameters struct.

The call parameters struct is then passed down into the places that need it
instead of passing the individual parameters. This allows for extra call
parameters to be added.

Signed-off-by: David Howells <dhowells@redhat.com>


# 48ca2463 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: Don't set upgrade by default in sendmsg()

Don't set upgrade by default when creating a call from sendmsg(). This is
a holdover from when I was testing the code.

Signed-off-by: David Howells <dhowells@redhat.com>


# 03a6c822 24-Nov-2017 David Howells <dhowells@redhat.com>

rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing

The caller of rxrpc_accept_call() must release the lock on call->user_mutex
returned by that function.

Signed-off-by: David Howells <dhowells@redhat.com>


# e3cf3970 19-Oct-2017 Gustavo A. R. Silva <garsilva@embeddedor.com>

net: rxrpc: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc5e3a54 18-Oct-2017 David Howells <dhowells@redhat.com>

rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals

Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to
ignore signals whilst loading up the message queue, provided progress is
being made in emptying the queue at the other side.

Progress is defined as the base of the transmit window having being
advanced within 2 RTT periods. If the period is exceeded with no progress,
sendmsg() will return anyway, indicating how much data has been copied, if
any.

Once the supplied buffer is entirely decanted, the sendmsg() will return.

Signed-off-by: David Howells <dhowells@redhat.com>


# c038a58c 29-Aug-2017 David Howells <dhowells@redhat.com>

rxrpc: Allow failed client calls to be retried

Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.

Two new functions are provided:

(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.

(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.

A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.

Signed-off-by: David Howells <dhowells@redhat.com>


# e833251a 29-Aug-2017 David Howells <dhowells@redhat.com>

rxrpc: Add notification of end-of-Tx phase

Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.

This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.

Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.

Signed-off-by: David Howells <dhowells@redhat.com>


# bd2db2d2 29-Aug-2017 David Howells <dhowells@redhat.com>

rxrpc: Don't negate call->error before returning it

call->error is stored as 0 or a negative error code. Don't negate this
value (ie. make it positive) before returning it from a kernel function
(though it should still be negated before passing to userspace through a
control message).

Signed-off-by: David Howells <dhowells@redhat.com>


# b080db58 16-Jun-2017 Johannes Berg <johannes.berg@intel.com>

networking: convert many more places to skb_put_zero()

There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = skb_put(skb, len);
+p = skb_put_zero(skb, len);
|
-p = (t)skb_put(skb, len);
+p = skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
|
-p = (t *)skb_put(skb, sizeof(t));
+p = skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(skb_put(skb, len), 0, len);
+skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e754eba6 06-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Provide a cmsg to specify the amount of Tx data for a call

Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.

Signed-off-by: David Howells <dhowells@redhat.com>


# 3ab26a6f 07-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Consolidate sendmsg parameters

Consolidate the sendmsg control message parameters into a struct rather
than passing them individually through the argument list of
rxrpc_sendmsg_cmsg(). This makes it easier to add more parameters.

Signed-off-by: David Howells <dhowells@redhat.com>


# 4e255721 05-Jun-2017 David Howells <dhowells@redhat.com>

rxrpc: Add service upgrade support for client connections

Make it possible for a client to use AuriStor's service upgrade facility.

The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call. This takes no parameters.

When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt. If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.

Note that:

(1) The choice of upgrade service is up to the server

(2) Further client calls to the same server that would share a connection
are blocked if an upgrade probe is in progress.

(3) This should only be used to probe the service. Clients should then
use the returned service ID in all subsequent communications with that
server (and not set the upgrade). Note that the kernel will not
retain this information should the connection expire from its cache.

(4) If a server that supports upgrading is replaced by one that doesn't,
whilst a connection is live, and if the replacement is running, say,
OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
server will not respond to packets sent to the upgraded connection.

At this point, calls will time out and the server must be reprobed.

Signed-off-by: David Howells <dhowells@redhat.com>


# fb46f6ee 06-Apr-2017 David Howells <dhowells@redhat.com>

rxrpc: Trace protocol errors in received packets

Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
packets. The following changes are made:

(1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
call and mark the call aborted. This is wrapped by
rxrpc_abort_eproto() that makes the why string usable in trace.

(2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
generation points, replacing rxrpc_abort_call() with the latter.

(3) Only send an abort packet in rxkad_verify_packet*() if we actually
managed to abort the call.

Note that a trace event is also emitted if a kernel user (e.g. afs) tries
to send data through a call when it's not in the transmission phase, though
it's not technically a receive event.

Signed-off-by: David Howells <dhowells@redhat.com>


# 84a4c09c 06-Apr-2017 David Howells <dhowells@redhat.com>

rxrpc: Note a successfully aborted kernel operation

Make rxrpc_kernel_abort_call() return an indication as to whether it
actually aborted the operation or not so that kafs can trace the failure of
the operation. Note that 'success' in this context means changing the
state of the call, not necessarily successfully transmitting an ABORT
packet.

Signed-off-by: David Howells <dhowells@redhat.com>


# 3a92789a 06-Apr-2017 David Howells <dhowells@redhat.com>

rxrpc: Use negative error codes in rxrpc_call struct

Use negative error codes in struct rxrpc_call::error because that's what
the kernel normally deals with and to make the code consistent. We only
turn them positive when transcribing into a cmsg for userspace recvmsg.

Signed-off-by: David Howells <dhowells@redhat.com>


# 6fc166d6 09-Mar-2017 David Howells <dhowells@redhat.com>

rxrpc: rxrpc_kernel_send_data() needs to handle failed call better

If rxrpc_kernel_send_data() is asked to send data through a call that has
already failed (due to a remote abort, received protocol error or network
error), then return the associated error code saved in the call rather than
ESHUTDOWN.

This allows the caller to work out whether to ask for the abort code or not
based on this.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 146d8fef 03-Mar-2017 David Howells <dhowells@redhat.com>

rxrpc: Call state should be read with READ_ONCE() under some circumstances

The call state may be changed at any time by the data-ready routine in
response to received packets, so if the call state is to be read and acted
upon several times in a function, READ_ONCE() must be used unless the call
state lock is held.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37411cad 02-Mar-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix potential NULL-pointer exception

Fix a potential NULL-pointer exception in rxrpc_do_sendmsg(). The call
state check that I added should have gone into the else-body of the
if-statement where we actually have a call to check.

Found by CoverityScan CID#1414316 ("Dereference after null check").

Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 174cd4b1 02-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 540b1c48 27-Feb-2017 David Howells <dhowells@redhat.com>

rxrpc: Fix deadlock between call creation and sendmsg/recvmsg

All the routines by which rxrpc is accessed from the outside are serialised
by means of the socket lock (sendmsg, recvmsg, bind,
rxrpc_kernel_begin_call(), ...) and this presents a problem:

(1) If a number of calls on the same socket are in the process of
connection to the same peer, a maximum of four concurrent live calls
are permitted before further calls need to wait for a slot.

(2) If a call is waiting for a slot, it is deep inside sendmsg() or
rxrpc_kernel_begin_call() and the entry function is holding the socket
lock.

(3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
from servicing the other calls as they need to take the socket lock to
do so.

(4) The socket is stuck until a call is aborted and makes its slot
available to the waiter.

Fix this by:

(1) Provide each call with a mutex ('user_mutex') that arbitrates access
by the users of rxrpc separately for each specific call.

(2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
they've got a call and taken its mutex.

Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
set but someone else has the lock. Should I instead only return
EWOULDBLOCK if there's nothing currently to be done on a socket, and
sleep in this particular instance because there is something to be
done, but we appear to be blocked by the interrupt handler doing its
ping?

(3) Make rxrpc_new_client_call() unlock the socket after allocating a new
call, locking its user mutex and adding it to the socket's call tree.
The call is returned locked so that sendmsg() can add data to it
immediately.

From the moment the call is in the socket tree, it is subject to
access by sendmsg() and recvmsg() - even if it isn't connected yet.

(4) Lock new service calls in the UDP data_ready handler (in
rxrpc_new_incoming_call()) because they may already be in the socket's
tree and the data_ready handler makes them live immediately if a user
ID has already been preassigned.

Note that the new call is locked before any notifications are sent
that it is live, so doing mutex_trylock() *ought* to always succeed.
Userspace is prevented from doing sendmsg() on calls that are in a
too-early state in rxrpc_do_sendmsg().

(5) Make rxrpc_new_incoming_call() return the call with the user mutex
held so that a ping can be scheduled immediately under it.

Note that it might be worth moving the ping call into
rxrpc_new_incoming_call() and then we can drop the mutex there.

(6) Make rxrpc_accept_call() take the lock on the call it is accepting and
release the socket after adding the call to the socket's tree. This
is slightly tricky as we've dequeued the call by that point and have
to requeue it.

Note that requeuing emits a trace event.

(7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
new mutex immediately and don't bother with the socket mutex at all.

This patch has the nice bonus that calls on the same socket are now to some
extent parallelisable.

Note that we might want to move rxrpc_service_prealloc() calls out from the
socket lock and give it its own lock, so that we don't hang progress in
other calls because we're waiting for the allocator.

We probably also want to avoid calling rxrpc_notify_socket() from within
the socket lock (rxrpc_accept_call()).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.c.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ff8cebf 03-Jan-2017 yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))

sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9749fd2b 06-Oct-2016 David Howells <dhowells@redhat.com>

rxrpc: Need to produce an ACK for service op if op takes a long time

We need to generate a DELAY ACK from the service end of an operation if we
start doing the actual operation work and it takes longer than expected.
This will hard-ACK the request data and allow the client to release its
resources.

To make this work:

(1) We have to set the ack timer and propose an ACK when the call moves to
the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
the timer when we start transmitting the reply (the first DATA packet
of the reply implicitly ACKs the request phase).

(2) It must be possible to set the timer when the caller is holding
call->state_lock, so split the lock-getting part of the timer function
out.

(3) Add trace notes for the ACK we're requesting and the timer we clear.

Signed-off-by: David Howells <dhowells@redhat.com>


# a5af7e1f 06-Oct-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs

Separate the output of PING ACKs from the output of other sorts of ACK so
that if we receive a PING ACK and schedule transmission of a PING RESPONSE
ACK, the response doesn't get cancelled by a PING ACK we happen to be
scheduling transmission of at the same time.

If a PING RESPONSE gets lost, the other side might just sit there waiting
for it and refuse to proceed otherwise.

Signed-off-by: David Howells <dhowells@redhat.com>


# 26cb02aa 06-Oct-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix warning by splitting rxrpc_send_call_packet()

Split rxrpc_send_data_packet() to separate ACK generation (which is more
complicated) from ABORT generation. This simplifies the code a bit and
fixes the following warning:

In file included from ../net/rxrpc/output.c:20:0:
net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/rxrpc/output.c:103:24: note: 'top' was declared here
net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>


# df0adc78 26-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Keep the call timeouts as ktimes rather than jiffies

Keep that call timeouts as ktimes rather than jiffies so that they can be
expressed as functions of RTT.

Signed-off-by: David Howells <dhowells@redhat.com>


# a1767077 29-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Make Tx loss-injection go through normal return and adjust tracing

In rxrpc_send_data_packet() make the loss-injection path return through the
same code as the transmission path so that the RTT determination is
initiated and any future timer shuffling will be done, despite the packet
having been binned.

Whilst we're at it:

(1) Add to the tx_data tracepoint an indication of whether or not we're
retransmitting a data packet.

(2) When we're deciding whether or not to request an ACK, rather than
checking if we're in fast-retransmit mode check instead if we're
retransmitting.

(3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
not altering the sk_buff refcount nor are we just seeing it after
getting it off the Tx list.

(4) The rxrpc_skb_tx_lost note is then no longer used so remove it.

(5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.

Signed-off-by: David Howells <dhowells@redhat.com>


# 57494343 24-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Implement slow-start

Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

(1) Since we send fixed-size DATA packets (apart from the final packet in
each phase), counters and calculations are in terms of packets rather
than bytes.

(2) The ACK packet carries the equivalent of TCP SACK.

(3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
suited to SACK of a small number of packets. It seems that, almost
inevitably, by the time three 'duplicate' ACKs have been seen, we have
narrowed the loss down to one or two missing packets, and the
FLIGHT_SIZE calculation ends up as 2.

(4) In rxrpc_resend(), if there was no data that apparently needed
retransmission, we transmit a PING ACK to ask the peer to tell us what
its Rx window state is.

Signed-off-by: David Howells <dhowells@redhat.com>


# fc7ab6d2 23-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add a tracepoint for the call timer

Add a tracepoint to log call timer initiation, setting and expiry.

Signed-off-by: David Howells <dhowells@redhat.com>


# 70790dbe 22-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Pass the last Tx packet marker in the annotation buffer

When the last packet of data to be transmitted on a call is queued, tx_top
is set and then the RXRPC_CALL_TX_LAST flag is set. Unfortunately, this
leaves a race in the ACK processing side of things because the flag affects
the interpretation of tx_top and also allows us to start receiving reply
data before we've finished transmitting.

To fix this, make the following changes:

(1) rxrpc_queue_packet() now sets a marker in the annotation buffer
instead of setting the RXRPC_CALL_TX_LAST flag.

(2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the
same context as the routines that use it.

(3) rxrpc_end_tx_phase() is simplified to just shift the call state.
The Tx window must have been rotated before calling to discard the
last packet.

(4) rxrpc_receiving_reply() is added to handle the arrival of the first
DATA packet of a reply to a client call (which is an implicit ACK of
the Tx phase).

(5) The last part of rxrpc_input_ack() is reordered to perform Tx
rotation, then soft-ACK application and then to end the phase if we've
rotated the last packet. In the event of a terminal ACK, the soft-ACK
application will be skipped as nAcks should be 0.

(6) rxrpc_input_ackall() now has to rotate as well as ending the phase.

In addition:

(7) Alter the transmit tracepoint to log the rotation of the last packet.

(8) Remove the no-longer relevant queue_reqack tracepoint note. The
ACK-REQUESTED packet header flag is now set as needed when we actually
transmit the packet and may vary by retransmission.

Signed-off-by: David Howells <dhowells@redhat.com>


# dfc3da44 22-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Need to start the resend timer on initial transmission

When a DATA packet has its initial transmission, we may need to start or
adjust the resend timer. Without this we end up relying on being sent a
NACK to initiate the resend.

Signed-off-by: David Howells <dhowells@redhat.com>


# b24d2891 23-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Preset timestamp on Tx sk_buffs

Set the timestamp on sk_buffs holding packets to be transmitted before
queueing them because the moment the packet is on the queue it can be seen
by the retransmission algorithm - which may see a completely random
timestamp.

If the retransmission algorithm sees such a timestamp, it may retransmit
the packet and, in future, tell the congestion management algorithm that
the retransmit timer expired.

Signed-off-by: David Howells <dhowells@redhat.com>


# 0d4b103c 21-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Reduce the number of ACK-Requests sent

Reduce the number of ACK-Requests we set on DATA packets that we're sending
to reduce network traffic. We set the flag on odd-numbered DATA packets to
start off the RTT cache until we have at least three entries in it and then
probe once per second thereafter to keep it topped up.

This could be made tunable in future.

Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
packets as we transmit them and not stored statically in the sk_buff.

Signed-off-by: David Howells <dhowells@redhat.com>


# 50235c4b 21-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Obtain RTT data by requesting ACKs on DATA packets

In addition to sending a PING ACK to gain RTT data, we can set the
RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK. The
ACK packet contains the serial number of the packet it is in response to,
so we can look through the Tx buffer for a matching DATA packet.

This requires that the data packets be stamped with the time of
transmission as a ktime rather than having the resend_at time in jiffies.

This further requires the resend code to do the resend determination in
ktimes and convert to jiffies to set the timer.

Signed-off-by: David Howells <dhowells@redhat.com>


# 7aa51da7 21-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Expedite ping response transmission

Expedite the transmission of a response to a PING ACK by sending it from
sendmsg if one is pending. We're most likely to see a PING ACK during the
client call Tx phase as the other side may use it to determine a number of
parameters, such as the client's receive window size, the RTT and whether
the client is doing slow start (similar to RFC5681).

If we don't expedite it, it's left to the background processing thread to
transmit.

Signed-off-by: David Howells <dhowells@redhat.com>


# 5a924b89 21-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs

Don't store the rxrpc protocol header in sk_buffs on the transmit queue,
but rather generate it on the fly and pass it to kernel_sendmsg() as a
separate iov. This reduces the amount of storage required.

Note that the security header is still stored in the sk_buff as it may get
encrypted along with the data (and doesn't change with each transmission).

Signed-off-by: David Howells <dhowells@redhat.com>


# 71f3ca40 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Improve skb tracing

Improve sk_buff tracing within AF_RXRPC by the following means:

(1) Use an enum to note the event type rather than plain integers and use
an array of event names rather than a big multi ?: list.

(2) Distinguish Rx from Tx packets and account them separately. This
requires the call phase to be tracked so that we know what we might
find in rxtx_buffer[].

(3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
event type.

(4) A pair of 'rotate' events are added to indicate packets that are about
to be rotated out of the Rx and Tx windows.

(5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
packet loss injection recording.

Signed-off-by: David Howells <dhowells@redhat.com>


# a124fe3e 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer

Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.

Signed-off-by: David Howells <dhowells@redhat.com>


# 182f5056 17-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes

Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.

In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked. That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.

Signed-off-by: David Howells <dhowells@redhat.com>


# 248f219c 08-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Rewrite the data and ack handling code

Rewrite the data and ack handling code such that:

(1) Parsing of received ACK and ABORT packets and the distribution and the
filing of DATA packets happens entirely within the data_ready context
called from the UDP socket. This allows us to process and discard ACK
and ABORT packets much more quickly (they're no longer stashed on a
queue for a background thread to process).

(2) We avoid calling skb_clone(), pskb_pull() and pskb_trim(). We instead
keep track of the offset and length of the content of each packet in
the sk_buff metadata. This means we don't do any allocation in the
receive path.

(3) Jumbo DATA packet parsing is now done in data_ready context. Rather
than cloning the packet once for each subpacket and pulling/trimming
it, we file the packet multiple times with an annotation for each
indicating which subpacket is there. From that we can directly
calculate the offset and length.

(4) A call's receive queue can be accessed without taking locks (memory
barriers do have to be used, though).

(5) Incoming calls are set up from preallocated resources and immediately
made live. They can than have packets queued upon them and ACKs
generated. If insufficient resources exist, DATA packet #1 is given a
BUSY reply and other DATA packets are discarded).

(6) sk_buffs no longer take a ref on their parent call.

To make this work, the following changes are made:

(1) Each call's receive buffer is now a circular buffer of sk_buff
pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
between the call and the socket. This permits each sk_buff to be in
the buffer multiple times. The receive buffer is reused for the
transmit buffer.

(2) A circular buffer of annotations (rxtx_annotations) is kept parallel
to the data buffer. Transmission phase annotations indicate whether a
buffered packet has been ACK'd or not and whether it needs
retransmission.

Receive phase annotations indicate whether a slot holds a whole packet
or a jumbo subpacket and, if the latter, which subpacket. They also
note whether the packet has been decrypted in place.

(3) DATA packet window tracking is much simplified. Each phase has just
two numbers representing the window (rx_hard_ack/rx_top and
tx_hard_ack/tx_top).

The hard_ack number is the sequence number before base of the window,
representing the last packet the other side says it has consumed.
hard_ack starts from 0 and the first packet is sequence number 1.

The top number is the sequence number of the highest-numbered packet
residing in the buffer. Packets between hard_ack+1 and top are
soft-ACK'd to indicate they've been received, but not yet consumed.

Four macros, before(), before_eq(), after() and after_eq() are added
to compare sequence numbers within the window. This allows for the
top of the window to wrap when the hard-ack sequence number gets close
to the limit.

Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
to indicate when rx_top and tx_top point at the packets with the
LAST_PACKET bit set, indicating the end of the phase.

(4) Calls are queued on the socket 'receive queue' rather than packets.
This means that we don't need have to invent dummy packets to queue to
indicate abnormal/terminal states and we don't have to keep metadata
packets (such as ABORTs) around

(5) The offset and length of a (sub)packet's content are now passed to
the verify_packet security op. This is currently expected to decrypt
the packet in place and validate it.

However, there's now nowhere to store the revised offset and length of
the actual data within the decrypted blob (there may be a header and
padding to skip) because an sk_buff may represent multiple packets, so
a locate_data security op is added to retrieve these details from the
sk_buff content when needed.

(6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
individually secured and needs to be individually decrypted. The code
to do this is broken out into rxrpc_recvmsg_data() and shared with the
kernel API. It now iterates over the call's receive buffer rather
than walking the socket receive queue.

Additional changes:

(1) The timers are condensed to a single timer that is set for the soonest
of three timeouts (delayed ACK generation, DATA retransmission and
call lifespan).

(2) Transmission of ACK and ABORT packets is effected immediately from
process-context socket ops/kernel API calls that cause them instead of
them being punted off to a background work item. The data_ready
handler still has to defer to the background, though.

(3) A shutdown op is added to the AF_RXRPC socket so that the AFS
filesystem can shut down the socket and flush its own work items
before closing the socket to deal with any in-progress service calls.

Future additional changes that will need to be considered:

(1) Make sure that a call doesn't hog the front of the queue by receiving
data from the network as fast as userspace is consuming it to the
exclusion of other calls.

(2) Transmit delayed ACKs from within recvmsg() when we've consumed
sufficiently more packets to avoid the background work item needing to
run.

Signed-off-by: David Howells <dhowells@redhat.com>


# 5a42976d 06-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Add tracepoint for working out where aborts happen

Add a tracepoint for working out where local aborts happen. Each
tracepoint call is labelled with a 3-letter code so that they can be
distinguished - and the DATA sequence number is added too where available.

rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can
indicate the circumstances when it aborts a call.

Signed-off-by: David Howells <dhowells@redhat.com>


# 278ac0cd 07-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Cache the security index in the rxrpc_call struct

Cache the security index in the rxrpc_call struct so that we can get at it
even when the call has been disconnected and the connection pointer
cleared.

Signed-off-by: David Howells <dhowells@redhat.com>


# fff72429 07-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Improve the call tracking tracepoint

Improve the call tracking tracepoint by showing more differentiation
between some of the put and get events, including:

(1) Getting and putting refs for the socket call user ID tree.

(2) Getting and putting refs for queueing and failing to queue the call
processor work item.

Note that these aren't necessarily used in this patch, but will be taken
advantage of in future patches.

An enum is added for the event subtype numbers rather than coding them
directly as decimal numbers and a table of 3-letter strings is provided
rather than a sequence of ?: operators.

Signed-off-by: David Howells <dhowells@redhat.com>


# 3dc20f09 04-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc Move enum rxrpc_command to sendmsg.c

Move enum rxrpc_command to sendmsg.c as it's now only used in that file.

Signed-off-by: David Howells <dhowells@redhat.com>


# df423a4a 02-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Rearrange net/rxrpc/sendmsg.c

Rearrange net/rxrpc/sendmsg.c to be in a more logical order. This makes it
easier to follow and eliminates forward declarations.

Signed-off-by: David Howells <dhowells@redhat.com>


# 0b58b8a1 02-Sep-2016 David Howells <dhowells@redhat.com>

rxrpc: Split sendmsg from packet transmission code

Split the sendmsg code from the packet transmission code (mostly to be
found in output.c).

Signed-off-by: David Howells <dhowells@redhat.com>