History log of /linux-master/io_uring/kbuf.c
Revision Date Author Comments
# 561e4f94 02-Apr-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: hold io_buffer_list reference over mmap

If we look up the kbuf, ensure that it doesn't get unregistered until
after we're done with it. Since we're inside mmap, we cannot safely use
the io_uring lock. Rely on the fact that we can lookup the buffer list
under RCU now and grab a reference to it, preventing it from being
unregistered until we're done with it. The lookup returns the
io_buffer_list directly with it referenced.

Cc: stable@vger.kernel.org # v6.4+
Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6b69c4ab 15-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: protect io_buffer_list teardown with a reference

No functional changes in this patch, just in preparation for being able
to keep the buffer list alive outside of the ctx->uring_lock.

Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3b80cff5 14-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: get rid of bl->is_ready

Now that xarray is being exclusively used for the buffer_list lookup,
this check is no longer needed. Get rid of it and the is_ready member.

Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 09ab7eff 14-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: get rid of lower BGID lists

Just rely on the xarray for any kind of bgid. This simplifies things, and
it really doesn't bring us much, if anything.

Cc: stable@vger.kernel.org # v6.4+
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9219e4a9 13-Mar-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/kbuf: rename is_mapped

In buffer lists we have ->is_mapped as well as ->is_mmap, it's
pretty hard to stay sane double checking which one means what,
and in the long run there is a high chance of an eventual bug.
Rename ->is_mapped into ->is_buf_ring.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4838f4d8ad506ad6373f1c305aee2d2c1a89786.1710343154.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 186daf23 07-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE

We only use the flag for this purpose, so rename it accordingly. This
further prevents various other use cases of it, keeping it clean and
consistent. Then we can also check it in one spot, when it's being
attempted recycled, and remove some dead code in io_kbuf_recycle_ring().

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c3f9109d 19-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: flag request if buffer pool is empty after buffer pick

Normally we do an extra roundtrip for retries even if the buffer pool has
depleted, as we don't check that upfront. Rather than add this check, have
the buffer selection methods mark the request with REQ_F_BL_EMPTY if the
used buffer group is out of buffers after this selection. This is very
cheap to do once we're all the way inside there anyway, and it gives the
caller a chance to make better decisions on how to proceed.

For example, recv/recvmsg multishot could check this flag when it
decides whether to keep receiving or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8435c6f3 29-Jan-2024 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: cleanup passing back cflags

We have various functions calculating the CQE cflags we need to pass
back, but it's all the same everywhere. Make a number of the putting
functions void, and just have the two main helps for this, io_put_kbuf()
and io_put_kbuf_comp() calculate the actual mask and pass it back.

While at it, cleanup how we put REQ_F_BUFFER_RING buffers. Before
this change, we would call into __io_put_kbuf() only to go right back
in to the header defined functions. As clearing this type of buffer
is just re-assigning the buf_index and incrementing the head, this
is very wasteful.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 95041b93 28-Jan-2024 Jens Axboe <axboe@kernel.dk>

io_uring: add io_file_can_poll() helper

This adds a flag to avoid dipping dereferencing file and then f_op to
figure out if the file has a poll handler defined or not. We generally
call this at least twice for networked workloads, and if using ring
provided buffers, we do it on every buffer selection. Particularly the
latter is troublesome, as it's otherwise a very fast operation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d293b1a8 21-Dec-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: add method for returning provided buffer ring head

The tail of the provided ring buffer is shared between the kernel and
the application, but the head is private to the kernel as the
application doesn't need to see it. However, this also prevents the
application from knowing how many buffers the kernel has consumed.
Usually this is fine, as the information is inherently racy in that
the kernel could be consuming buffers continually, but for cleanup
purposes it may be relevant to know how many buffers are still left
in the ring.

Add IORING_REGISTER_PBUF_STATUS which will return status for a given
provided buffer ring. Right now it just returns the head, but space
is reserved for more information later in, if needed.

Link: https://github.com/axboe/liburing/discussions/1020
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9865346b 05-Dec-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: check for buffer list readiness after NULL check

Move the buffer list 'is_ready' check below the validity check for
the buffer list for a given group.

Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e53f7b54 05-Dec-2023 Dan Carpenter <dan.carpenter@linaro.org>

io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()

The io_mem_alloc() function returns error pointers, not NULL. Update
the check accordingly.

Fixes: b10b73c102a2 ("io_uring/kbuf: recycle freed mapped buffer ring entries")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/5ed268d3-a997-4f64-bd71-47faa92101ab@moroto.mountain
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5cf4f52e 27-Nov-2023 Jens Axboe <axboe@kernel.dk>

io_uring: free io_buffer_list entries via RCU

mmap_lock nests under uring_lock out of necessity, as we may be doing
user copies with uring_lock held. However, for mmap of provided buffer
rings, we attempt to grab uring_lock with mmap_lock already held from
do_mmap(). This makes lockdep, rightfully, complain:

WARNING: possible circular locking dependency detected
6.7.0-rc1-00009-gff3337ebaf94-dirty #4438 Not tainted
------------------------------------------------------
buf-ring.t/442 is trying to acquire lock:
ffff00020e1480a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_uring_validate_mmap_request.isra.0+0x4c/0x140

but task is already holding lock:
ffff0000dc226190 (&mm->mmap_lock){++++}-{3:3}, at: vm_mmap_pgoff+0x124/0x264

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_lock){++++}-{3:3}:
__might_fault+0x90/0xbc
io_register_pbuf_ring+0x94/0x488
__arm64_sys_io_uring_register+0x8dc/0x1318
invoke_syscall+0x5c/0x17c
el0_svc_common.constprop.0+0x108/0x130
do_el0_svc+0x2c/0x38
el0_svc+0x4c/0x94
el0t_64_sync_handler+0x118/0x124
el0t_64_sync+0x168/0x16c

-> #0 (&ctx->uring_lock){+.+.}-{3:3}:
__lock_acquire+0x19a0/0x2d14
lock_acquire+0x2e0/0x44c
__mutex_lock+0x118/0x564
mutex_lock_nested+0x20/0x28
io_uring_validate_mmap_request.isra.0+0x4c/0x140
io_uring_mmu_get_unmapped_area+0x3c/0x98
get_unmapped_area+0xa4/0x158
do_mmap+0xec/0x5b4
vm_mmap_pgoff+0x158/0x264
ksys_mmap_pgoff+0x1d4/0x254
__arm64_sys_mmap+0x80/0x9c
invoke_syscall+0x5c/0x17c
el0_svc_common.constprop.0+0x108/0x130
do_el0_svc+0x2c/0x38
el0_svc+0x4c/0x94
el0t_64_sync_handler+0x118/0x124
el0t_64_sync+0x168/0x16c

From that mmap(2) path, we really just need to ensure that the buffer
list doesn't go away from underneath us. For the lower indexed entries,
they never go away until the ring is freed and we can always sanely
reference those as long as the caller has a file reference. For the
higher indexed ones in our xarray, we just need to ensure that the
buffer list remains valid while we return the address of it.

Free the higher indexed io_buffer_list entries via RCU. With that we can
avoid needing ->uring_lock inside mmap(2), and simply hold the RCU read
lock around the buffer list lookup and address check.

To ensure that the arrayed lookup either returns a valid fully formulated
entry via RCU lookup, add an 'is_ready' flag that we access with store
and release memory ordering. This isn't needed for the xarray lookups,
but doesn't hurt either. Since this isn't a fast path, retain it across
both types. Similarly, for the allocated array inside the ctx, ensure
we use the proper load/acquire as setup could in theory be running in
parallel with mmap.

While in there, add a few lockdep checks for documentation purposes.

Cc: stable@vger.kernel.org
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 07d6063d 27-Nov-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: prune deferred locked cache when tearing down

We used to just use our page list for final teardown, which would ensure
that we got all the buffers, even the ones that were not on the normal
cached list. But while moving to slab for the io_buffers, we know only
prune this list, not the deferred locked list that we have. This can
cause a leak of memory, if the workload ends up using the intermediate
locked list.

Fix this by always pruning both lists when tearing down.

Fixes: b3a4dbc89d40 ("io_uring/kbuf: Use slab for struct io_buffer objects")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b10b73c1 28-Nov-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: recycle freed mapped buffer ring entries

Right now we stash any potentially mmap'ed provided ring buffer range
for freeing at release time, regardless of when they get unregistered.
Since we're keeping track of these ranges anyway, keep track of their
registration state as well, and use that to recycle ranges when
appropriate rather than always allocate new ones.

The lookup is a basic scan of entries, checking for the best matching
free entry.

Fixes: c392cbecd8ec ("io_uring/kbuf: defer release of mapped buffer rings")
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c392cbec 27-Nov-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: defer release of mapped buffer rings

If a provided buffer ring is setup with IOU_PBUF_RING_MMAP, then the
kernel allocates the memory for it and the application is expected to
mmap(2) this memory. However, io_uring uses remap_pfn_range() for this
operation, so we cannot rely on normal munmap/release on freeing them
for us.

Stash an io_buf_free entry away for each of these, if any, and provide
a helper to free them post ->release().

Cc: stable@vger.kernel.org
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 89d528ba 06-Nov-2023 Dylan Yudaken <dyudaken@gmail.com>

io_uring: indicate if io_kbuf_recycle did recycle anything

It can be useful to know if io_kbuf_recycle did actually recycle the
buffer on the request, or if it left the request alone.

Signed-off-by: Dylan Yudaken <dyudaken@gmail.com>
Link: https://lore.kernel.org/r/20231106203909.197089-2-dyudaken@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b3a4dbc8 04-Oct-2023 Gabriel Krisman Bertazi <krisman@suse.de>

io_uring/kbuf: Use slab for struct io_buffer objects

The allocation of struct io_buffer for metadata of provided buffers is
done through a custom allocator that directly gets pages and
fragments them. But, slab would do just fine, as this is not a hot path
(in fact, it is a deprecated feature) and, by keeping a custom allocator
implementation we lose benefits like tracking, poisoning,
sanitizers. Finally, the custom code is more complex and requires
keeping the list of pages in struct ctx for no good reason. This patch
cleans this path up and just uses slab.

I microbenchmarked it by forcing the allocation of a large number of
objects with the least number of io_uring commands possible (keeping
nbufs=USHRT_MAX), with and without the patch. There is a slight
increase in time spent in the allocation with slab, of course, but even
when allocating to system resources exhaustion, which is not very
realistic and happened around 1/2 billion provided buffers for me, it
wasn't a significant hit in system time. Specially if we think of a
real-world scenario, an application doing register/unregister of
provided buffers will hit ctx->io_buffers_cache more often than actually
going to slab.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-4-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f74c746e 04-Oct-2023 Gabriel Krisman Bertazi <krisman@suse.de>

io_uring/kbuf: Allow the full buffer id space for provided buffers

nbufs tracks the number of buffers and not the last bgid. In 16-bit, we
have 2^16 valid buffers, but the check mistakenly rejects the last
bid. Let's fix it to make the interface consistent with the
documentation.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-3-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ab69838e 04-Oct-2023 Gabriel Krisman Bertazi <krisman@suse.de>

io_uring/kbuf: Fix check of BID wrapping in provided buffers

Commit 3851d25c75ed0 ("io_uring: check for rollover of buffer ID when
providing buffers") introduced a check to prevent wrapping the BID
counter when sqe->off is provided, but it's off-by-one too
restrictive, rejecting the last possible BID (65534).

i.e., the following fails with -EINVAL.

io_uring_prep_provide_buffers(sqe, addr, size, 0xFFFF, 0, 0);

Fixes: 3851d25c75ed ("io_uring: check for rollover of buffer ID when providing buffers")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20231005000531.30800-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f8024f1f 02-Oct-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: don't allow registered buffer rings on highmem pages

syzbot reports that registering a mapped buffer ring on arm32 can
trigger an OOPS. Registered buffer rings have two modes, one of them
is the application passing in the memory that the buffer ring should
reside in. Once those pages are mapped, we use page_address() to get
a virtual address. This will obviously fail on highmem pages, which
aren't mapped.

Add a check if we have any highmem pages after mapping, and fail the
attempt to register a provided buffer ring if we do. This will return
the same error as kernels that don't support provided buffer rings to
begin with.

Link: https://lore.kernel.org/io-uring/000000000000af635c0606bcb889@google.com/
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Cc: stable@vger.kernel.org
Reported-by: syzbot+2113e61b8848fa7951d8@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 99a9e0b8 16-Aug-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

io_uring: stop calling free_compound_page()

Patch series "Remove _folio_dtor and _folio_order", v2.


This patch (of 13):

folio_put() is the standard way to write this, and it's not appreciably
slower. This is an enabling patch for removing free_compound_page()
entirely.

Link: https://lkml.kernel.org/r/20230816151201.3655946-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230816151201.3655946-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# ceac766a 10-Apr-2023 Pavel Begunkov <asml.silence@gmail.com>

io_uring/kbuf: remove extra ->buf_ring null check

The kernel test robot complains about __io_remove_buffers().

io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced
before check 'bl->buf_ring' (see line 219)

That check is not needed as ->buf_ring will always be set, so we can
remove it and so silence the warning.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9a632bbf749d9d911e605255652ce08d18e7d2c6.1681210788.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fcb46c0c 17-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: disallow mapping a badly aligned provided ring buffer

On at least parisc, we have strict requirements on how we virtually map
an address that is shared between the application and the kernel. On
these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
shared ring buffer for provided buffers. If the application is mapping
these pages and asking the kernel to pin+map them as well, then we have
no control over what virtual address we get in the kernel.

For that case, do a sanity check if SHM_COLOUR is defined, and disallow
the mapping request. The application must fall back to using
IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
with the set of helpers that it has.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c56e022c 14-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring: add support for user mapped provided buffer ring

The ring mapped provided buffer rings rely on the application allocating
the memory for the ring, and then the kernel will map it. This generally
works fine, but runs into issues on some architectures where we need
to be able to ensure that the kernel and application virtual address for
the ring play nicely together. This at least impacts architectures that
set SHM_COLOUR, but potentially also anyone setting SHMLBA.

To use this variant of ring provided buffers, the application need not
allocate any memory for the ring. Instead the kernel will do so, and
the allocation must subsequently call mmap(2) on the ring with the
offset set to:

IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)

to get a virtual address for the buffer ring. Normally the application
would allocate a suitable piece of memory (and correctly aligned) and
simply pass that in via io_uring_buf_reg.ring_addr and the kernel would
map it.

Outside of the setup differences, the kernel allocate + user mapped
provided buffer ring works exactly the same.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 81cf17cd 14-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: rename struct io_uring_buf_reg 'pad' to'flags'

In preparation for allowing flags to be set for registration, rename
the padding and use it for that.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 25a2c188 14-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: add buffer_list->is_mapped member

Rather than rely on checking buffer_list->buf_pages or ->buf_nr_pages,
add a separate member that tracks if this is a ring mapped provided
buffer list or not.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ba56b632 14-Mar-2023 Jens Axboe <axboe@kernel.dk>

io_uring/kbuf: move pinning of provided buffer ring into helper

In preparation for allowing the kernel to allocate the provided buffer
rings and have the application mmap it instead, abstract out the
current method of pinning and mapping the user allocated ring.

No functional changes intended in this patch.

Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b4a72c05 01-Apr-2023 Wojciech Lukowicz <wlukowicz01@gmail.com>

io_uring: fix memory leak when removing provided buffers

When removing provided buffers, io_buffer structs are not being disposed
of, leading to a memory leak. They can't be freed individually, because
they are allocated in page-sized groups. They need to be added to some
free list instead, such as io_buffers_cache. All callers already hold
the lock protecting it, apart from when destroying buffers, so had to
extend the lock there.

Fixes: cc3cec8367cb ("io_uring: speedup provided buffer handling")
Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c0921e51 01-Apr-2023 Wojciech Lukowicz <wlukowicz01@gmail.com>

io_uring: fix return value when removing provided buffers

When a request to remove buffers is submitted, and the given number to be
removed is larger than available in the specified buffer group, the
resulting CQE result will be the number of removed buffers + 1, which is
1 more than it should be.

Previously, the head was part of the list and it got removed after the
loop, so the increment was needed. Now, the head is not an element of
the list, so the increment shouldn't be there anymore.

Fixes: dbc7d452e7cf ("io_uring: manage provided buffers strictly ordered")
Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 48ba0837 18-Feb-2023 Wojciech Lukowicz <wlukowicz01@gmail.com>

io_uring: fix size calculation when registering buf ring

Using struct_size() to calculate the size of io_uring_buf_ring will sum
the size of the struct and of the bufs array. However, the struct's fields
are overlaid with the array making the calculated size larger than it
should be.

When registering a ring with N * PAGE_SIZE / sizeof(struct io_uring_buf)
entries, i.e. with fully filled pages, the calculated size will span one
more page than it should and io_uring will try to pin the following page.
Depending on how the application allocated the ring, it might succeed
using an unrelated page or fail returning EFAULT.

The size of the ring should be the product of ring_entries and the size
of io_uring_buf, i.e. the size of the bufs array only.

Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230218184141.70891-1-wlukowicz01@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c3b49093 24-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: don't use complete_post in kbuf

Now we're handling IOPOLL completions more generically, get rid uses of
_post() and send requests through the normal path. It may have some
extra mertis performance wise, but we don't care much as there is a
better interface for selected buffers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4deded706587f55b006dc33adf0c13cfc3b2319f.1669310258.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1bec951c 23-Nov-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: iopoll protect complete_post

io_req_complete_post() may be used by iopoll enabled rings, grab locks
in this case. That requires to pass issue_flags to propagate the locking
state.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc6d854065c57c838ca8e8806f707a226b70fd2d.1669203009.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3851d25c 10-Nov-2022 Jens Axboe <axboe@kernel.dk>

io_uring: check for rollover of buffer ID when providing buffers

We already check if the chosen starting offset for the buffer IDs fit
within an unsigned short, as 65535 is the maximum value for a provided
buffer. But if the caller asks to add N buffers at offset M, and M + N
would exceed the size of the unsigned short, we simply add buffers with
wrapping around the ID.

This is not necessarily a bug and could in fact be a valid use case, but
it seems confusing and inconsistent with the initial check for starting
offset. Let's check for wrap consistently, and error the addition if we
do need to wrap.

Reported-by: Olivier Langlois <olivier@trillion01.com>
Link: https://github.com/axboe/liburing/issues/726
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f2ccb5ae 11-Aug-2022 Stefan Metzmacher <metze@samba.org>

io_uring: make io_kiocb_to_cmd() typesafe

We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cc18cc5e 04-Aug-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: mem-account pbuf buckets

Potentially, someone may create as many pbuf bucket as there are indexes
in an xarray without any other restrictions bounding our memory usage,
put memory needed for the buckets under memory accounting.

Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d34c452e45793e978d26e2606211ec9070d329ea.1659622312.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b8c01559 30-Jun-2022 Dylan Yudaken <dylany@fb.com>

io_uring: allow 0 length for buffer select

If user gives 0 for length, we can set it from the available buffer size.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 795bbbc8 23-Jun-2022 Hao Xu <howeyxu@tencent.com>

io_uring: kbuf: inline io_kbuf_recycle_ring()

Make io_kbuf_recycle_ring() inline since it is the fast path of
provided buffer.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220623130126.179232-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 024b8fde 21-Jun-2022 Hao Xu <howeyxu@tencent.com>

io_uring: kbuf: kill __io_kbuf_recycle()

__io_kbuf_recycle() is only called in io_kbuf_recycle(). Kill it and
tweak the code so that the legacy pbuf and ring pbuf code become clear

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220622055551.642370-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 27a9d66f 16-Jun-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: kill extra io_uring_types.h includes

io_uring/io_uring.h already includes io_uring_types.h, no need to
include it every time. Kill it in a bunch of places, it prepares us for
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94d8c943fbe0ef949981c508ddcee7fc1c18850f.1655384063.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f09c8643 16-Jun-2022 Hao Xu <howeyxu@tencent.com>

io_uring: kbuf: add comments for some tricky code

Add comments to explain why it is always under uring lock when
incrementing head in __io_kbuf_recycle. And rectify one comemnt about
kbuf consuming in iowq case.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220617050429.94293-1-hao.xu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 53ccf69b 16-Jun-2022 Pavel Begunkov <asml.silence@gmail.com>

io_uring: don't inline io_put_kbuf

io_put_kbuf() is huge, don't bloat the kernel with inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3b77495a 13-Jun-2022 Jens Axboe <axboe@kernel.dk>

io_uring: split provided buffers handling into its own file

Move both the opcodes related to it, and the internals code dealing with
it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>