History log of /linux-master/io_uring/net.c
Revision Date Author Comments
# 4fe82aed 08-Apr-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: restore msg_control on sendzc retry

cac9e4418f4cb ("io_uring/net: save msghdr->msg_control for retries")
reinstatiates msg_control before every __sys_sendmsg_sock(), since the
function can overwrite the value in msghdr. We need to do same for
zerocopy sendmsg.

Cc: stable@vger.kernel.org
Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg")
Link: https://github.com/axboe/liburing/issues/1067
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc1d5d9df0576fa66ddad4420d240a98a020b267.1712596179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f3a640cc 15-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: ensure async prep handlers always initialize ->done_io

If we get a request with IOSQE_ASYNC set, then we first run the prep
async handlers. But if we then fail setting it up and want to post
a CQE with -EINVAL, we use ->done_io. This was previously guarded with
REQ_F_PARTIAL_IO, and the normal setup handlers do set it up before any
potential errors, but we need to cover the async setup too.

Fixes: 9817ad85899f ("io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1af04699 08-Mar-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: dedup io_recv_finish req completion

There are two block in io_recv_finish() completing the request, which we
can combine and remove jumping.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0e338dcb33c88de83809fda021cba9e7c9681620.1709905727.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e0e4ab52 08-Mar-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring: refactor DEFER_TASKRUN multishot checks

We disallow DEFER_TASKRUN multishots from running by io-wq, which is
checked by individual opcodes in the issue path. We can consolidate all
it in io_wq_submit_work() at the same time moving the checks out of the
hot path.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.1709905727.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3a96378e 08-Mar-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring: fix mshot io-wq checks

When checking for concurrent CQE posting, we're not only interested in
requests running from the poll handler but also strayed requests ended
up in normal io-wq execution. We're disallowing multishots in general
from io-wq, not only when they came in a certain way.

Cc: stable@vger.kernel.org
Fixes: 17add5cea2bba ("io_uring: force multishot CQEs into task context")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d8c5b36a39258036f93301cd60d3cd295e40653d.1709905727.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d9b44188 06-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add io_req_msg_cleanup() helper

For the fast inline path, we manually recycle the io_async_msghdr and
free the iovec, and then clear the REQ_F_NEED_CLEANUP flag to avoid
that needing doing in the slower path. We already do that in 2 spots, and
in preparation for adding more, add a helper and use it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fb6328bc 06-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: simplify msghd->msg_inq checking

Just check for larger than zero rather than check for non-zero and
not -1. This is easier to read, and also protects against any errants
< 0 values that aren't -1.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 186daf23 07-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE

We only use the flag for this purpose, so rename it accordingly. This
further prevents various other use cases of it, keeping it clean and
consistent. Then we can also check it in one spot, when it's being
attempted recycled, and remove some dead code in io_kbuf_recycle_ring().

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9817ad85 07-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io

Ensure that prep handlers always initialize sr->done_io before any
potential failure conditions, and with that, we now it's always been
set even for the failure case.

With that, we don't need to use the REQ_F_PARTIAL_IO flag to gate on that.
Additionally, we should not overwrite req->cqe.res unless sr->done_io is
actually positive.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# deaef31b 07-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: correctly handle multishot recvmsg retry setup

If we loop for multishot receive on the initial attempt, and then abort
later on to wait for more, we miss a case where we should be copying the
io_async_msghdr from the stack to stable storage. This leads to the next
retry potentially failing, if the application had the msghdr on the
stack.

Cc: stable@vger.kernel.org
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b5311dbc 07-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler

This flag should not be persistent across retries, so ensure we clear
it before potentially attemting a retry.

Fixes: c3f9109dbc9e ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8ede3db5 01-Mar-2024 Dan Carpenter <dan.carpenter@linaro.org>

io_uring/net: fix overflow check in io_recvmsg_mshot_prep()

The "controllen" variable is type size_t (unsigned long). Casting it
to int could lead to an integer underflow.

The check_add_overflow() function considers the type of the destination
which is type int. If we add two positive values and the result cannot
fit in an integer then that's counted as an overflow.

However, if we cast "controllen" to an int and it turns negative, then
negative values *can* fit into an int type so there is no overflow.

Good: 100 + (unsigned long)-4 = 96 <-- overflow
Bad: 100 + (int)-4 = 96 <-- no overflow

I deleted the cast of the sizeof() as well. That's not a bug but the
cast is unnecessary.

Fixes: 9b0fc3c054ff ("io_uring: fix types in io_recvmsg_multishot_overflow")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 86bcacc9 01-Mar-2024 Muhammad Usama Anjum <usama.anjum@collabora.com>

io_uring/net: correct the type of variable

The namelen is of type int. It shouldn't be made size_t which is
unsigned. The signed number is needed for error checking before use.

Fixes: c55978024d12 ("io_uring/net: move receive multishot out of the generic msghdr path")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240301144349.2807544-1-usama.anjum@collabora.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# eb18c29d 25-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: move recv/recvmsg flags out of retry loop

The flags don't change, just intialize them once rather than every loop
for multishot.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 792060de 26-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: improve the usercopy for sendmsg/recvmsg

We're spending a considerable amount of the sendmsg/recvmsg time just
copying in the message header. And for provided buffers, the known
single entry iovec.

Be a bit smarter about it and enable/disable user access around our
copying. In a test case that does both sendmsg and recvmsg, the
runtime before this change (averaged over multiple runs, very stable
times however):

Kernel Time Diff
====================================
-git 4720 usec
-git+commit 4311 usec -8.7%

and looking at a profile diff, we see the following:

0.25% +9.33% [kernel.kallsyms] [k] _copy_from_user
4.47% -3.32% [kernel.kallsyms] [k] __io_msg_copy_hdr.constprop.0

where we drop more than 9% of _copy_from_user() time, and consequently
add time to __io_msg_copy_hdr() where the copies are now attributed to,
but with a net win of 6%.

In comparison, the same test case with send/recv runs in 3745 usec, which
is (expectedly) still quite a bit faster. But at least sendmsg/recvmsg is
now only ~13% slower, where it was ~21% slower before.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c5597802 27-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: move receive multishot out of the generic msghdr path

Move the actual user_msghdr / compat_msghdr into the send and receive
sides, respectively, so we can move the uaddr receive handling into its
own handler, and ditto the multishot with buffer selection logic.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 52307ac4 19-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: unify how recvmsg and sendmsg copy in the msghdr

For recvmsg, we roll our own since we support buffer selections. This
isn't the case for sendmsg right now, but in preparation for doing so,
make the recvmsg copy helpers generic so we can call them from the
sendmsg side as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a37ee9e1 14-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: fix multishot accept overflow handling

If we hit CQ ring overflow when attempting to post a multishot accept
completion, we don't properly save the result or return code. This
results in losing the accepted fd value.

Instead, we return the result from the poll operation that triggered
the accept retry. This is generally POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND
which is 0xc3, or 195, which looks like a valid file descriptor, but it
really has no connection to that.

Handle this like we do for other multishot completions - assign the
result, and return IOU_STOP_MULTISHOT to cancel any further completions
from this request when overflow is hit. This preserves the result, as we
should, and tells the application that the request needs to be re-armed.

Cc: stable@vger.kernel.org
Fixes: 515e26961295 ("io_uring: revert "io_uring fix multishot accept ordering"")
Link: https://github.com/axboe/liburing/issues/1062
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 72bd8025 01-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers

If we use IORING_OP_RECV with provided buffers and pass in '0' as the
length of the request, the length is retrieved from the selected buffer.
If MSG_WAITALL is also set and we get a short receive, then we may hit
the retry path which decrements sr->len and increments the buffer for
a retry. However, the length is still zero at this point, which means
that sr->len now becomes huge and import_ubuf() will cap it to
MAX_RW_COUNT and subsequently return -EFAULT for the range as a whole.

Fix this by always assigning sr->len once the buffer has been selected.

Cc: stable@vger.kernel.org
Fixes: 7ba89d2af17a ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 76b367a2 29-Jan-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: limit inline multishot retries

If we have multiple clients and some/all are flooding the receives to
such an extent that we can retry a LOT handling multishot receives, then
we can be starving some clients and hence serving traffic in an
imbalanced fashion.

Limit multishot retry attempts to some arbitrary value, whose only
purpose serves to ensure that we don't keep serving a single connection
for way too long. We default to 32 retries, which should be more than
enough to provide fairness, yet not so small that we'll spend too much
time requeuing rather than handling traffic.

Cc: stable@vger.kernel.org
Depends-on: 704ea888d646 ("io_uring/poll: add requeue return code from poll multishot handling")
Depends-on: 1e5d765a82f ("io_uring/net: un-indent mshot retry path in io_recv_finish()")
Depends-on: e84b01a880f6 ("io_uring/poll: move poll execution helpers higher up")
Fixes: b3fdea6ecb55 ("io_uring: multishot recv")
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Link: https://github.com/axboe/liburing/issues/1043
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 91e5d765 29-Jan-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: un-indent mshot retry path in io_recv_finish()

In preparation for putting some retry logic in there, have the done
path just skip straight to the end rather than have too much nesting
in here.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f8f9ab2d 03-Nov-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: ensure socket is marked connected on connect retry

io_uring does non-blocking connection attempts, which can yield some
unexpected results if a connect request is re-attempted by an an
application. This is equivalent to the following sync syscall sequence:

sock = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP);
connect(sock, &addr, sizeof(addr);

ret == -1 and errno == EINPROGRESS expected here. Now poll for POLLOUT
on sock, and when that returns, we expect the socket to be connected.
But if we follow that procedure with:

connect(sock, &addr, sizeof(addr));

you'd expect ret == -1 and errno == EISCONN here, but you actually get
ret == 0. If we attempt the connection one more time, then we get EISCON
as expected.

io_uring used to do this, but turns out that bluetooth fails with EBADFD
if you attempt to re-connect. Also looks like EISCONN _could_ occur with
this sequence.

Retain the ->in_progress logic, but work-around a potential EISCONN or
EBADFD error and only in those cases look at the sock_error(). This
should work in general and avoid the odd sequence of a repeated connect
request returning success when the socket is already connected.

This is all a side effect of the socket state being in a CONNECTING
state when we get EINPROGRESS, and only a re-connect or other related
operation will turn that into CONNECTED.

Cc: stable@vger.kernel.org
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/980
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c21a8027 14-Sep-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix iter retargeting for selected buf

When using selected buffer feature, io_uring delays data iter setup
until later. If io_setup_async_msg() is called before that it might see
not correctly setup iterator. Pre-init nr_segs and judge from its state
whether we repointing.

Cc: stable@vger.kernel.org
Reported-by: syzbot+a4c6e5ef999b68b26ed1@syzkaller.appspotmail.com
Fixes: 0455d4ccec548 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0000000000002770be06053c7757@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b6b2bb58 11-Aug-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring: never overflow io_aux_cqe

Now all callers of io_aux_cqe() set allow_overflow to false, remove the
parameter and not allow overflowing auxilary multishot cqes.

When CQ is full the function callers and all multishot requests in
general are expected to complete the request. That prevents indefinite
in-background grows of the overflow list and let's the userspace to
handle the backlog at its own pace.

Resubmitting a request should also be faster than accounting a bunch of
overflows, so it should be better for perf when it happens, but a well
behaving userspace should be trying to avoid overflows in any case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bb20d14d708ea174721e58bb53786b0521e4dd6d.1691757663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b2e74db5 11-Aug-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't overflow multishot recv

Don't allow overflowing multishot recv CQEs, it might get out of
hand, hurt performance, and in the worst case scenario OOM the task.

Cc: stable@vger.kernel.org
Fixes: b3fdea6ecb55c ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0b295634e8f1b71aa764c984608c22d85f88f75c.1691757663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1bfed233 11-Aug-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't overflow multishot accept

Don't allow overflowing multishot accept CQEs, we want to limit
the grows of the overflow list.

Cc: stable@vger.kernel.org
Fixes: 4e86a2c980137 ("io_uring: implement multishot mode for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7d0d749649244873772623dd7747966f516fe6e2.1691757663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b65db921 27-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: use proper value for msg_inq

struct msghdr->msg_inq is a signed type, yet we attempt to store what
is essentially an unsigned bitmask in there. We only really need to know
if the field was stored or not, but let's use the proper type to avoid
any misunderstandings on what is being attempted here.

Link: https://lore.kernel.org/io-uring/CAHk-=wjKb24aSe6fE4zDH-eh8hr-FB9BbukObUVSMGOrsBHCRQ@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 26fed836 20-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr

Rather than assign the user pointer to msghdr->msg_control, assign it
to msghdr->msg_control_user to make sparse happy. They are in a union
so the end result is the same, but let's avoid new sparse warnings and
squash this one.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306210654.mDMcyMuB-lkp@intel.com/
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 78d0d206 19-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: disable partial retries for recvmsg with cmsg

We cannot sanely handle partial retries for recvmsg if we have cmsg
attached. If we don't, then we'd just be overwriting the initial cmsg
header on retries. Alternatively we could increment and handle this
appropriately, but it doesn't seem worth the complication.

Move the MSG_WAITALL check into the non-multishot case while at it,
since MSG_WAITALL is explicitly disabled for multishot anyway.

Link: https://lore.kernel.org/io-uring/0b0d4411-c8fd-4272-770b-e030af6919a0@kernel.dk/
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b1dc4920 19-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: clear msg_controllen on partial sendmsg retry

If we have cmsg attached AND we transferred partial data at least, clear
msg_controllen on retry so we don't attempt to send that again.

Cc: stable@vger.kernel.org # 5.10+
Fixes: cac9e4418f4c ("io_uring/net: save msghdr->msg_control for retries")
Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cac9e441 12-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: save msghdr->msg_control for retries

If the application sets ->msg_control and we have to later retry this
command, or if it got queued with IOSQE_ASYNC to begin with, then we
need to retain the original msg_control value. This is due to the net
stack overwriting this field with an in-kernel pointer, to copy it
in. Hitting that path for the second time will now fail the copy from
user, as it's attempting to copy from a non-user address.

Cc: stable@vger.kernel.org # 5.10+
Link: https://github.com/axboe/liburing/issues/880
Reported-and-tested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d86eaed1 07-Jun-2023 Jens Axboe <axboe@kernel.dk>

io_uring: cleanup io_aux_cqe() API

Everybody is passing in the request, so get rid of the io_ring_ctx and
explicit user_data pass-in. Both the ctx and user_data can be deduced
from the request at hand.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b841b901 22-May-2023 David Howells <dhowells@redhat.com>

net: Declare MSG_SPLICE_PAGES internal sendmsg() flag

Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a
network protocol that it should splice pages from the source iterator
rather than copying the data if it can. This flag is added to a list that
is cleared by sendmsg syscalls on entry.

This is intended as a replacement for the ->sendpage() op, allowing a way
to splice in several multipage folios in one go.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a2741c58 17-May-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: don't retry recvmsg() unnecessarily

If we're doing multishot receives, then we always end up doing two trips
through sock_recvmsg(). For protocols that sanely set msghdr->msg_inq,
then we don't need to waste time picking a new buffer and attempting a
new receive if there's nothing there.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7d41bcb7 17-May-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: push IORING_CQE_F_SOCK_NONEMPTY into io_recv_finish()

Rather than have this logic in both io_recv() and io_recvmsg_multishot(),
push it into the handler they both call when finishing a receive
operation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 88fc8b84 17-May-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: initalize msghdr->msg_inq to known value

We can't currently tell if ->msg_inq was set when we ask for msg_get_inq,
initialize it to -1U so we can tell apart if it was set and there's
no data left, or if it just wasn't set at all by the protocol.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bf34e697 17-May-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: initialize struct msghdr more sanely for io_recv()

We only need to clear the input fields on the first invocation, not
when potentially doing a retry.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# de4f5fed 29-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: add iter_iovec() helper

This returns a pointer to the current iovec entry in the iterator. Only
useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF
and ITER_IOVEC identically for the first segment.

Rename struct iov_iter->iov to iov_iter->__iov to find any potentially
troublesome spots, and also to prevent anyone from adding new code that
accesses iter->iov directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 74e2e17e 20-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: avoid sending -ECONNABORTED on repeated connection requests

Since io_uring does nonblocking connect requests, if we do two repeated
ones without having a listener, the second will get -ECONNABORTED rather
than the expected -ECONNREFUSED. Treat -ECONNABORTED like a normal retry
condition if we're nonblocking, if we haven't already seen it.

Cc: stable@vger.kernel.org
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/828
Reported-by: Hui, Chunyang <sanqian.hcy@antgroup.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7605c43d 24-Feb-2023 David Lamparter <equinox@diac24.net>

io_uring: remove MSG_NOSIGNAL from recvmsg

MSG_NOSIGNAL is not applicable for the receiving side, SIGPIPE is
generated when trying to write to a "broken pipe". AF_PACKET's
packet_recvmsg() does enforce this, giving back EINVAL when MSG_NOSIGNAL
is set - making it unuseable in io_uring's recvmsg.

Remove MSG_NOSIGNAL from io_recvmsg_prep().

Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: David Lamparter <equinox@diac24.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230224150123.128346-1-equinox@diac24.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# aebb224f 27-Jan-2023 Dylan Yudaken <dylany@meta.com>

io_uring: for requests that require async, force it

Some requests require being run async as they do not support
non-blocking. Instead of trying to issue these requests, getting -EAGAIN
and then queueing them for async issue, rather just force async upfront.

Add WARN_ON_ONCE to make sure surprising code paths do not come up,
however in those cases the bug would end up being a blocking
io_uring_enter(2) which should not be critical.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20230127135227.3646353-3-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4b61152e 05-Jan-2023 Jens Axboe <axboe@kernel.dk>

io_uring: switch network send/recv to ITER_UBUF

This is more efficient than iter_iov.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
[merged to 6.2]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# b00c51ef 22-Jan-2023 Jens Axboe <axboe@kernel.dk>

io_uring/net: cache provided buffer group value for multishot receives

If we're using ring provided buffers with multishot receive, and we end
up doing an io-wq based issue at some points that also needs to select
a buffer, we'll lose the initially assigned buffer group as
io_ring_buffer_select() correctly clears the buffer group list as the
issue isn't serialized by the ctx uring_lock. This is fine for normal
receives as the request puts the buffer and finishes, but for multishot,
we will re-arm and do further receives. On the next trigger for this
multishot receive, the receive will try and pick from a buffer group
whose value is the same as the buffer ID of the las receive. That is
obviously incorrect, and will result in a premature -ENOUFS error for
the receive even if we had available buffers in the correct group.

Cache the buffer group value at prep time, so we can restore it for
future receives. This only needs doing for the above mentioned case, but
just do it by default to keep it easier to read.

Cc: stable@vger.kernel.org
Fixes: b3fdea6ecb55 ("io_uring: multishot recv")
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Cc: Dylan Yudaken <dylany@meta.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6c3e8955 19-Dec-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix cleanup after recycle

Don't access io_async_msghdr io_netmsg_recycle(), it may be reallocated.

Cc: stable@vger.kernel.org
Fixes: 9bb66906f23e5 ("io_uring: support multishot in recvmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e326f4ad4046ddadf15bf34bf3fa58c6372f6b5.1671461985.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 990a4de5 19-Dec-2022 Jens Axboe <axboe@kernel.dk>

io_uring/net: ensure compat import handlers clear free_iov

If we're not allocating the vectors because the count is below
UIO_FASTIOV, we still do need to properly clear ->free_iov to prevent
an erronous free of on-stack data.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Fixes: 4c17a496a7a0 ("io_uring/net: fix cleanup double free free_iov init")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 17add5ce 06-Dec-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: force multishot CQEs into task context

Multishot are posting CQEs outside of the normal request completion
path, which is usually done from within a task work handler. However, it
might be not the case when it's yet to be polled but has been punted to
io-wq. Make it abide ->task_complete and push it to the polling path
when executed by io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7714aaff583096769a0f26e8e747759e556feb1.1670384893.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# de4eda9d 15-Sep-2022 Al Viro <viro@zeniv.linux.org.uk>

use less confusing names for iov_iter direction initializers

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9b8c5475 24-Nov-2022 Dylan Yudaken <dylany@meta.com>

io_uring: add io_aux_cqe which allows deferred completion

Use the just introduced deferred post cqe completion state when possible
in io_aux_cqe. If not possible fallback to io_post_aux_cqe.

This introduces a complication because of allow_overflow. For deferred
completions we cannot know without locking the completion_lock if it will
overflow (and even if we locked it, another post could sneak in and cause
this cqe to be in overflow).
However since overflow protection is mostly a best effort defence in depth
to prevent infinite loops of CQEs for poll, just checking the overflow bit
is going to be good enough and will result in at most 16 (array size of
deferred cqes) overflows.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e2ad599d 07-Nov-2022 Dylan Yudaken <dylany@meta.com>

io_uring: allow multishot recv CQEs to overflow

With commit aa1df3a360a0 ("io_uring: fix CQE reordering"), there are
stronger guarantees for overflow ordering. Specifically ensuring that
userspace will not receive out of order receive CQEs. Therefore this is
not needed any more for recv/recvmsg.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221107125236.260132-4-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 515e2696 21-Nov-2022 Dylan Yudaken <dylany@meta.com>

io_uring: revert "io_uring fix multishot accept ordering"

This is no longer needed after commit aa1df3a360a0 ("io_uring: fix CQE
reordering"), since all reordering is now taken care of.

This reverts commit cbd25748545c ("io_uring: fix multishot accept
ordering").

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# df730ec2 02-Nov-2022 Xinghui Li <korantli@tencent.com>

io_uring: fix two assignments in if conditions

Fixes two errors:

"ERROR: do not use assignment in if condition
130: FILE: io_uring/net.c:130:
+ if (!(issue_flags & IO_URING_F_UNLOCKED) &&

ERROR: do not use assignment in if condition
599: FILE: io_uring/poll.c:599:
+ } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&"
reported by checkpatch.pl in net.c and poll.c .

Signed-off-by: Xinghui Li <korantli@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20221102082503.32236-1-korantwork@gmail.com
[axboe: style tweaks]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 42385b02 04-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: move mm accounting to a slower path

We can also move mm accounting to the extended callbacks. It removes a
few cycles from the hot path including skipping one function call and
setting io_req_task_complete as a callback directly. For user backed I/O
it shouldn't make any difference taking into considering atomic mm
accounting and page pinning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1062f270273ad11c1b7b45ec59a6a317533d5e64.1667557923.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 40725d1b 04-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: move zc reporting from the hot path

Add custom tw and notif callbacks on top of usual bits also handling zc
reporting. That moves it from the hot path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/40de4a6409042478e1f35adc4912e23226cb1b5c.1667557923.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e307e669 27-Oct-2022 Stefan Metzmacher <metze@samba.org>

io_uring/net: introduce IORING_SEND_ZC_REPORT_USAGE flag

It might be useful for applications to detect if a zero copy transfer with
SEND[MSG]_ZC was actually possible or not. The application can fallback to
plain SEND[MSG] in order to avoid the overhead of two cqes per request. Or
it can generate a log message that could indicate to an administrator that
no zero copy was possible and could explain degraded performance.

Cc: stable@vger.kernel.org # 6.1
Link: https://lore.kernel.org/io-uring/fb6a7599-8a9b-15e5-9b64-6cd9d01c6ff4@gmail.com/T/#m2b0d9df94ce43b0e69e6c089bdff0ce6babbdfaa
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8945b01756d902f5d5b0667f20b957ad3f742e5e.1666895626.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 100d6b17 17-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: fix multishot recv request leaks

Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.

Cc: stable@vger.kernel.org
Fixes: 1300ebb20286b ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 91482864 17-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: fix multishot accept request leaks

Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.

Cc: stable@vger.kernel.org
Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cc767e7c 21-Oct-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fail zc sendmsg when unsupported by socket

The previous patch fails zerocopy send requests for protocols that don't
support it, do the same for zerocopy sendmsg.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# edf81438 21-Oct-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fail zc send when unsupported by socket

If a protocol doesn't support zerocopy it will silently fall back to
copying. This type of behaviour has always been a source of troubles
so it's better to fail such requests instead.

Cc: <stable@vger.kernel.org> # 6.0
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3fb1bd68 04-Oct-2022 Jens Axboe <axboe@kernel.dk>

io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT

We treat EINPROGRESS like EAGAIN, but if we're retrying post getting
EINPROGRESS, then we just need to check the socket for errors and
terminate the request.

This was exposed on a bluetooth connection request which ends up
taking a while and hitting EINPROGRESS, and yields a CQE result of
-EBADFD because we're retrying a connect on a socket that is now
connected.

Cc: stable@vger.kernel.org
Fixes: 87f80d623c6c ("io_uring: handle connect -EINPROGRESS like -EAGAIN")
Link: https://github.com/axboe/liburing/issues/671
Reported-by: Aidan Sun <aidansun05@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 108893dd 29-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix notif cqe reordering

send zc is not restricted to !IO_URING_F_UNLOCKED anymore and so
we can't use task-tw ordering trick to order notification cqes
with requests completions. In this case leave it alone and let
io_send_zc_cleanup() flush it.

Cc: stable@vger.kernel.org
Fixes: 53bdc88aac9a2 ("io_uring/notif: order notif vs send CQEs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0031f3a00d492e814a4a0935a2029a46d9c9ba06.1664486545.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6f10ae8a 29-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't update msg_name if not provided

io_sendmsg_copy_hdr() may clear msg->msg_name if the userspace didn't
provide it, we should retain NULL in this case.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/97d49f61b5ec76d0900df658cfde3aa59ff22121.1664486545.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3e4cb6eb 29-Sep-2022 Stefan Metzmacher <metze@samba.org>

io_uring/net: fix fast_iov assignment in io_setup_async_msg()

I hit a very bad problem during my tests of SENDMSG_ZC.
BUG(); in first_iovec_segment() triggered very easily.
The problem was io_setup_async_msg() in the partial retry case,
which seems to happen more often with _ZC.

iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset
being only relative to the first element.

Which means kmsg->msg.msg_iter.iov is no longer the
same as kmsg->fast_iov.

But this would rewind the copy to be the start of
async_msg->fast_iov, which means the internal
state of sync_msg->msg.msg_iter is inconsitent.

I tested with 5 vectors with length like this 4, 0, 64, 20, 8388608
and got a short writes with:
- ret=2675244 min_ret=8388692 => remaining 5713448 sr->done_io=2675244
- ret=-EAGAIN => io_uring_poll_arm
- ret=4911225 min_ret=5713448 => remaining 802223 sr->done_io=7586469
- ret=-EAGAIN => io_uring_poll_arm
- ret=802223 min_ret=802223 => res=8388692

While this was easily triggered with SENDMSG_ZC (queued for 6.1),
it was a potential problem starting with 7ba89d2af17aa879dda30f5d5d3f152e587fc551
in 5.18 for IORING_OP_RECVMSG.
And also with 4c3c09439c08b03d9503df0ca4c7619c5842892e in 5.19
for IORING_OP_SENDMSG.

However 257e84a5377fbbc336ff563833a8712619acce56 introduced the critical
code into io_setup_async_msg() in 5.11.

Fixes: 7ba89d2af17aa ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Fixes: 257e84a5377fb ("io_uring: refactor sendmsg/recvmsg iov managing")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2e7be246e2fb173520862b0c7098e55767567a2.1664436949.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 04360d3e 28-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix non-zc send with address

We're currently ignoring the dest address with non-zerocopy send because
even though we copy it from the userspace shortly after ->msg_name gets
zeroed. Move msghdr init earlier.

Fixes: 516e82f0e043a ("io_uring/net: support non-zerocopy sendto")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/176ced5e8568aa5d300ca899b7f05b303ebc49fd.1664409532.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6ae91ac9 27-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't skip notifs for failed requests

We currently only add a notification CQE when the send succeded, i.e.
cqe.res >= 0. However, it'd be more robust to do buffer notifications
for failed requests as well in case drivers decide do something fanky.

Always return a buffer notification after initial prep, don't hide it.
This behaviour is better aligned with documentation and the patch also
helps the userspace to respect it.

Cc: stable@vger.kernel.org # 6.0
Suggested-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9c8bead87b2b980fcec441b8faef52188b4a6588.1664292100.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4c17a496 26-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix cleanup double free free_iov init

Having ->async_data doesn't mean it's initialised and previously we vere
relying on setting F_CLEANUP at the right moment. With zc sendmsg
though, we set F_CLEANUP early in prep when we alloc a notif and so we
may allocate async_data, fail in copy_msg_hdr() leaving
struct io_async_msghdr not initialised correctly but with F_CLEANUP
set, which causes a ->free_iov double free and probably other nastiness.

Always initialise ->free_iov. Also, now it might point to fast_iov when
fails, so avoid freeing it during cleanups.

Reported-by: syzbot+edfd15cd4246a3fc615a@syzkaller.appspotmail.com
Fixes: 493108d95f146 ("io_uring/net: zerocopy sendmsg")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a75155fa 23-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix UAF in io_sendrecv_fail()

We should not assume anything about ->free_iov just from
REQ_F_ASYNC_DATA but rather rely on REQ_F_NEED_CLEANUP, as we may
allocate ->async_data but failed init would leave the field in not
consistent state. The easiest solution is to remove removing
REQ_F_NEED_CLEANUP and so ->async_data dealloc from io_sendrecv_fail()
and let io_send_zc_cleanup() do the job. The catch here is that we also
need to prevent double notif flushing, just test it for NULL and zero
where it's needed.

BUG: KASAN: use-after-free in io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
Write of size 8 at addr ffff8880771b4080 by task syz-executor.3/30199

CPU: 1 PID: 30199 Comm: syz-executor.3 Not tainted 6.0.0-rc6-next-20220923-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x15e/0x45d mm/kasan/report.c:395
kasan_report+0xbb/0x1f0 mm/kasan/report.c:495
io_sendrecv_fail+0x3b0/0x3e0 io_uring/net.c:1221
io_req_complete_failed+0x155/0x1b0 io_uring/io_uring.c:873
io_drain_req io_uring/io_uring.c:1648 [inline]
io_queue_sqe_fallback.cold+0x29f/0x788 io_uring/io_uring.c:1931
io_submit_sqe io_uring/io_uring.c:2160 [inline]
io_submit_sqes+0x1180/0x1df0 io_uring/io_uring.c:2276
__do_sys_io_uring_enter+0xac6/0x2410 io_uring/io_uring.c:3216
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: c4c0009e0b56e ("io_uring/net: combine fail handlers")
Reported-by: syzbot+4c597a574a3f5a251bda@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/23ab8346e407ea50b1198a172c8a97e1cf22915b.1663945875.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 493108d9 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: zerocopy sendmsg

Add a zerocopy version of sendmsg.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6aabc4bdfc0ec78df6ec9328137e394af9d4e7ef.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c4c0009e 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: combine fail handlers

Merge io_send_zc_fail() into io_sendrecv_fail(), saves a few lines of
code and some headache for following patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0eba1d577413aef5602cd45f588b9230207082d.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b0e9b551 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: rename io_sendzc()

Simple renaming of io_sendzc*() functions in preparatio to adding
a zerocopy sendmsg variant.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/265af46829e6076dd220011b1858dc3151969226.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 516e82f0 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: support non-zerocopy sendto

We have normal sends, but what is missing is sendto-like requests. Add
sendto() capabilities to IORING_OP_SEND by passing in addr just as we do
for IORING_OP_SEND_ZC.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69fbd8b2cb830e57d1bf9ec351e9bf95c5b77e3f.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6ae61b7a 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: refactor io_setup_async_addr

Instead of passing the right address into io_setup_async_addr() only
specify local on-stack storage and let the function infer where to grab
it from. It optimises out one local variable we have to deal with.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6bfa9ab810d776853eb26ed59301e2536c3a5471.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5693bcce 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't lose partial send_zc on fail

Partial zc send may end up in io_req_complete_failed(), which not only
would return invalid result but also mask out the notification leading
to lifetime issues.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5673285b5e83e6ceca323727b4ddaa584b5cc91e.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7e6b638e 20-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: don't lose partial send/recv on fail

Just as with rw, partial send/recv may end up in
io_req_complete_failed() and loose the result, make sure we return the
number of bytes processed.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a4ff95897b5419356fca9ea55db91ac15b2975f9.1663668091.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ac9e5784 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: use io_sr_msg for sendzc

Reuse struct io_sr_msg for zerocopy sends, which is handy. There is
only one zerocopy specific field, namely .notif, and we have enough
space for it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/408c5b1b2d8869e1a12da5f5a78ed72cac112149.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0b048557 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: refactor io_sr_msg types

In preparation for using struct io_sr_msg for zerocopy sends, clean up
types. First, flags can be u16 as it's provided by the userspace in u16
ioprio, as well as addr_len. This saves us 4 bytes. Also use unsigned
for size and done_io, both are as well limited to u32.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42c2639d6385b8b2181342d2af3a42d3b1c5bcd2.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cd9021e8 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: add non-bvec sg chunking callback

Add a sg_from_iter() for when we initiate non-bvec zerocopy sends, which
helps us to remove some extra steps from io_sg_from_iter(). The only
thing the new function has to do before giving control away to
__zerocopy_sg_from_iter() is to check if the skb has managed frags and
downgrade them if so.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cda3dea0d36f7931f63a70f350130f085ac3f3dd.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6bf8ad25 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: io_async_msghdr caches for sendzc

We already keep io_async_msghdr caches for normal send/recv requests,
use them also for zerocopy send.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42fa615b6e0be25f47a685c35d7b5e4f1b03d348.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 858c293e 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: use async caches for async prep

send/recv have async_data caches but there are only used from within
issue handlers. Extend their use also to ->prep_async, should be handy
with links and IOSQE_ASYNC.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b9a2264b807582a97ed606c5bfcdc2399384e8a5.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 95eafc74 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: reshuffle error handling

We should prioritise send/recv retry cases over failures, they're more
important. Shuffle -ERESTARTSYS after we handled retries.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9059691b30d0963b7269fa4a0c81ee7720555e6.1662639236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e3366e02 16-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix zc fixed buf lifetime

Notifications usually outlive requests, so we need to pin buffers with
it by assigning a rsrc to it instead of the request.

Fixed: b48c312be05e8 ("io_uring/net: simplify zerocopy send user API")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dd6406ff8a90887f2b36ed6205dac9fda17c1f35.1663366886.git.asml.silence@gmail.com
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3c840053 08-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: copy addr for zc on POLL_FIRST

Every time we return from an issue handler and expect the request to be
retried we should also setup it for async exec ourselves. Do that when
we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll
re-read the address, which might be a surprise for the userspace.

Fixes: 092aeedb750a9 ("io_uring: allow to pass addr into sendzc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b48c312b 01-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: simplify zerocopy send user API

Following user feedback, this patch simplifies zerocopy send API. One of
the main complaints is that the current API is difficult with the
userspace managing notification slots, and then send retries with error
handling make it even worse.

Instead of keeping notification slots change it to the per-request
notifications model, which posts both completion and notification CQEs
for each request when any data has been sent, and only one CQE if it
fails. All notification CQEs will have IORING_CQE_F_NOTIF set and
IORING_CQE_F_MORE in completion CQEs indicates whether to wait a
notification or not.

IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now.

This is less flexible, but greatly simplifies the user API and also the
kernel implementation. We reuse notif helpers in this patch, but in the
future there won't be need for keeping two requests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 57f33224 01-Sep-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/notif: remove notif registration

We're going to remove the userspace exposed zerocopy notification API,
remove notification registration.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# dfb58b17 26-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix overexcessive retries

Length parameter of io_sg_from_iter() can be smaller than the iterator's
size, as it's with TCP, so when we set from->count at the end of the
function we truncate the iterator forcing TCP to return preliminary with
a short send. It affects zerocopy sends with large payload sizes and
leads to retries and possible request failures.

Fixes: 3ff1a0d395c00 ("io_uring: enable managed frags with register buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 581711c4 24-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: save address for sendzc async execution

We usually copy all bits that a request needs from the userspace for
async execution, so the userspace can keep them on the stack. However,
send zerocopy violates this pattern for addresses and may reloads it
e.g. from io-wq. Save the address if any in ->async_data as usual.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com
[axboe: fold in incremental fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 986e263d 24-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix indentation

Fix up indentation before we get complaints from tooling.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bd5754e3764215ccd7fb04cd636ea9167aaa275d.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5a848b7c 24-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix zc send link failing

Failed requests should be marked with req_set_fail(), so links and cqe
skipping work correctly, which is missing in io_sendzc(). Note,
io_sendzc() return IOU_OK on failure, so the core code won't do the
cleanup for us.

Fixes: 06a5464be84e4 ("io_uring: wire send zc request type")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e47d46fda9db30154ce66a549bb0d3380b780520.1661342812.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3f743e9b 17-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: use right helpers for async_data

There is another spot where we check ->async_data directly instead of
using req_has_async_data(), which is the way to do it, fix it up.

Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 86dc8f23 15-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: improve zc addr import error handling

We may account memory to a memcg of a request that didn't even got to
the network layer. It's not a bug as it'll be routinely cleaned up on
flush, but it might be confusing for the userspace.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8aae61f4c3ddc4da97c1da876bb73871f352d50.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 06360426 15-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: use right helpers for async recycle

We have a helper that checks for whether a request contains anything in
->async_data or not, namely req_has_async_data(). It's better to use it
as it might have some extra considerations.

Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7414da4e7c3c32c31fc02dfd1355af4ccf4ca5f.1660566179.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f2ccb5ae 11-Aug-2022 Stefan Metzmacher <metze@samba.org>

io_uring: make io_kiocb_to_cmd() typesafe

We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d1f6222c 05-Aug-2022 Dylan Yudaken <dylany@fb.com>

io_uring: fix io_recvmsg_prep_multishot sparse warnings

Fix casts missing the __user parts. This seemed to only cause errors on
the alpha build, or if checked with sparse, but it was definitely an
oversight.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220805115450.3921352-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4a933e62 04-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: send retry for zerocopy

io_uring handles short sends/recvs for stream sockets when MSG_WAITALL
is set, however new zerocopy send is inconsistent in this regard, which
might be confusing. Handle short sends.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b876a4838597d9bba4f3215db60d72c33c448ad0.1659622472.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 14b146b6 27-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: notification completion optimisation

We want to use all optimisations that we have for io_uring requests like
completion batching, memory caching and more but for zc notifications.
Fortunately, notification perfectly fit the request model so we can
overlay them onto struct io_kiocb and use all the infratructure.

Most of the fields of struct io_notif natively fits into io_kiocb, so we
replace struct io_notif with struct io_kiocb carrying struct
io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to
use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand
coded caching. __io_notif_complete_tw() is converted to use io_uring's
tw infra.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 293402e5 25-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: use unsigned for flags

Use unsigned int type for msg flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5cfaed13d3191337b14b8664ca68b515d9e2d1b4.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6a9ce66f 25-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: make page accounting more consistent

Make network page accounting more consistent with how buffer
registration is working, i.e. account all memory to ctx->user.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4aacfe64bbb81b27f9ecf5d5c219c69a07e5aa56.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2e32ba56 25-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: checks errors of zc mem accounting

mm_account_pinned_pages() may fail, don't ignore the return value.

Fixes: e29e3bd4b968 ("io_uring: account locked pages for non-fixed zc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae0542ed8e6706071bb83ad3e7ad6a70d207fd9.1658742118.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3ff1a0d3 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: enable managed frags with register buffers

io_uring's registered buffers infra has a good performant way of pinning
pages, so let's use SKBFL_MANAGED_FRAG_REFS when our requests are purely
register buffer backed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/278731d3f20caf346cfc025fbee0b4c9ee4ed751.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 63809137 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: flush notifiers after sendzc

Allow to flush notifiers as a part of sendzc request by setting
IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will
flush the used [active] notifier.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e0b4d9a6797e2fd6092824fe42953db7a519bbc8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 10c7d33e 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: sendzc with fixed buffers

Allow zerocopy sends to use fixed buffers. There is an optimisation for
this case, the network layer don't need to reference the pages, see
SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed
buffers until the notifier is released.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1d8bd1b5934e541d90c1824eb4020ae3f5f43f3.1657643355.git.asml.silence@gmail.com
[axboe: fold in 32-bit pointer cast warning fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 092aeedb 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: allow to pass addr into sendzc

Allow to specify an address to zerocopy sends making it more like
sendto(2).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/70417a8f7c5b51ab454690bae08adc0c187f89e8.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e29e3bd4 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: account locked pages for non-fixed zc

Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec
based zerocopy sends. Do the accounting on the io_uring side.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19b6e3975440f59f1f6199c7ee7acf977b4eecdc.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 06a5464b 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: wire send zc request type

Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from
IORING_OP_SEND is that the user should specify a notification slot
index in sqe::notification_idx and the buffers are safe to reuse only
when the used notification is flushed and completes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a80387c6a68ce9cf99b3b6ef6f71068468761fb7.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e02b6651 12-Jul-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: initialise msghdr::msg_ubuf

Initialise newly added ->msg_ubuf in io_recv() and io_send().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8f9f263875a4a36e7f26cc5d55ebe315308f57d.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4f6a94d3 15-Jul-2022 Jens Axboe <axboe@kernel.dk>

net: fix compat pointer in get_compat_msghdr()

A previous change enabled external users to copy the data before
calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or
__io_compat_recvmsg_copy_hdr() to take that into account. They are both
stil passing in the __user pointer rather than the copied version.

Ensure we pass in the kernel struct, not the pointer to the user data.

Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/
Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9b0fc3c0 15-Jul-2022 Dylan Yudaken <dylany@fb.com>

io_uring: fix types in io_recvmsg_multishot_overflow

io_recvmsg_multishot_overflow had incorrect types on non x64 system.
But also it had an unnecessary INT_MAX check, which could just be done
by changing the type of the accumulator to int (also simplifying the
casts).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: a8b38c4ce724 ("io_uring: support multishot in recvmsg")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220715130252.610639-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9bb66906 14-Jul-2022 Dylan Yudaken <dylany@fb.com>

io_uring: support multishot in recvmsg

Similar to multishot recv, this will require provided buffers to be
used. However recvmsg is much more complex than recv as it has multiple
outputs. Specifically flags, name, and control messages.

Support this by introducing a new struct io_uring_recvmsg_out with 4
fields. namelen, controllen and flags match the similar out fields in
msghdr from standard recvmsg(2), payloadlen is the length of the payload
following the header.
This struct is placed at the start of the returned buffer. Based on what
the user specifies in struct msghdr, the next bytes of the buffer will be
name (the next msg_namelen bytes), and then control (the next
msg_controllen bytes). The payload will come at the end. The return value
in the CQE is the total used size of the provided buffer.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-4-dylany@fb.com
[axboe: style fixups, see link]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 72c531f8 14-Jul-2022 Dylan Yudaken <dylany@fb.com>

net: copy from user before calling __get_compat_msghdr

this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7fa875b8 14-Jul-2022 Dylan Yudaken <dylany@fb.com>

net: copy from user before calling __copy_msghdr

this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6d2f75a0 08-Jul-2022 Dylan Yudaken <dylany@fb.com>

io_uring: support 0 length iov in buffer select in compat

Match up work done in "io_uring: allow iov_len = 0 for recvmsg and buffer
select", but for compat code path.

Fixes: a68caad69ce5 ("io_uring: allow iov_len = 0 for recvmsg and buffer select")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220708181838.1495428-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e2df2ccb 08-Jul-2022 Dylan Yudaken <dylany@fb.com>

io_uring: fix multishot ending when not polled

If multishot is not actually polling then return IOU_OK rather than the
result.
If the result was > 0 this will confuse things further up the callstack
which expect a return <= 0.

Fixes: 1300ebb20286 ("io_uring: multishot recv")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220708181838.1495428-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 43e0bbbd 07-Jul-2022 Jens Axboe <axboe@kernel.dk>

io_uring: add netmsg cache

For recvmsg/sendmsg, if they don't complete inline, we currently need
to allocate a struct io_async_msghdr for each request. This is a
somewhat large struct.

Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the
alloc + free overhead considerably, yielding 4-5% of extra performance
running netbench.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cf0dd952 04-Jul-2022 Dylan Yudaken <dylany@fb.com>

io_uring: disable multishot recvmsg

recvmsg has semantics that do not make it trivial to extend to
multishot. Specifically it has user pointers and returns data in the
original parameter. In order to make this API useful these will need to be
somehow included with the provided buffers.

For now remove multishot for recvmsg as it is not useful.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220704140106.200167-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b3fdea6e 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: multishot recv

Support multishot receive for io_uring.
Typical server applications will run a loop where for each recv CQE it
requeues another recv/recvmsg.

This can be simplified by using the existing multishot functionality
combined with io_uring's provided buffers.
The API is to add the IORING_RECV_MULTISHOT flag to the SQE. CQEs will
then be posted (with IORING_CQE_F_MORE flag set) when data is available
and is read. Once an error occurs or the socket ends, the multishot will
be removed and a completion without IORING_CQE_F_MORE will be posted.

The benefit to this is that the recv is much more performant.
* Subsequent receives are queued up straight away without requiring the
application to finish a processing loop.
* If there are more data in the socket (sat the provided buffer size is
smaller than the socket buffer) then the data is immediately
returned, improving batching.
* Poll is only armed once and reused, saving CPU cycles

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-11-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cbd25748 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: fix multishot accept ordering

Similar to multishot poll, drop multishot accept when CQE overflow occurs.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-10-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 52120f0f 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: add allow_overflow to io_post_aux_cqe

Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d4e097da 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: recycle buffers on error

Rather than passing an error back to the user with a buffer attached,
recycle the buffer immediately.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5702196e 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: allow iov_len = 0 for recvmsg and buffer select

When using BUFFER_SELECT there is no technical requirement that the user
actually provides iov, and this removes one copy_from_user call.

So allow iov_len to be 0.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 27a9d66f 16-Jun-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: kill extra io_uring_types.h includes

io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d245bca6 17-Jun-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: don't expose io_fill_cqe_aux()

Deduplicate some code and add a helper for filling an aux CQE, locking
and notification.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7c6557c8f9dc5c4cfb01292116c682a0ff61081.1655455613.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3b77495a 13-Jun-2022 Jens Axboe <axboe@kernel.dk>

io_uring: split provided buffers handling into its own file

Move both the opcodes related to it, and the internals code dealing with
it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f9ead18c 25-May-2022 Jens Axboe <axboe@kernel.dk>

io_uring: split network related opcodes into its own file

While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>