History log of /linux-master/include/linux/uio.h
Revision Date Author Comments
# a50026bd 05-Mar-2024 Linus Torvalds <torvalds@linux-foundation.org>

iov_iter: get rid of 'copy_mc' flag

This flag is only set by one single user: the magical core dumping code
that looks up user pages one by one, and then writes them out using
their kernel addresses (by using a BVEC_ITER).

That actually ends up being a huge problem, because while we do use
copy_mc_to_kernel() for this case and it is able to handle the possible
machine checks involved, nothing else is really ready to handle the
failures caused by the machine check.

In particular, as reported by Tong Tiangen, we don't actually support
fault_in_iov_iter_readable() on a machine check area.

As a result, the usual logic for writing things to a file under a
filesystem lock, which involves doing a copy with page faults disabled
and then if that fails trying to fault pages in without holding the
locks with fault_in_iov_iter_readable() does not work at all.

We could decide to always just make the MC copy "succeed" (and filling
the destination with zeroes), and that would then create a core dump
file that just ignores any machine checks.

But honestly, this single special case has been problematic before, and
means that all the normal iov_iter code ends up slightly more complex
and slower.

See for example commit c9eec08bac96 ("iov_iter: Don't deal with
iter->copy_mc in memcpy_from_iter_mc()") where David Howells
re-organized the code just to avoid having to check the 'copy_mc' flags
inside the inner iov_iter loops.

So considering that we have exactly one user, and that one user is a
non-critical special case that doesn't actually ever trigger in real
life (Tong found this with manual error injection), the sane solution is
to just decide that the onus on handling the machine check lines on that
user instead.

Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying
the user data to a stable kernel page before writing it out.

Fixes: f1982740f5e7 ("iov_iter: Convert iterate*() to inline funcs")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com
Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reported-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 9fd7874c 04-Dec-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: replace import_single_range() with import_ubuf()

With the removal of the 'iov' argument to import_single_range(), the two
functions are now fully identical. Convert the import_single_range()
callers to import_ubuf(), and remove the former fully.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 6ac805d1 04-Dec-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: remove unused 'iov' argument from import_single_range()

It is entirely unused, just get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>


# b5f0e20f 25-Sep-2023 David Howells <dhowells@redhat.com>

iov_iter, net: Move hash_and_copy_to_iter() to net/

Move hash_and_copy_to_iter() to be with its only caller in networking code.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-13-dhowells@redhat.com
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 6d0d4199 25-Sep-2023 David Howells <dhowells@redhat.com>

iov_iter, net: Move csum_and_copy_to/from_iter() to net/

Move csum_and_copy_to/from_iter() to net code now that the iteration
framework can be #included.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-10-dhowells@redhat.com
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: netdev@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# f1b4cb65 25-Sep-2023 David Howells <dhowells@redhat.com>

iov_iter: Derive user-backedness from the iterator type

Use the iterator type to determine whether an iterator is user-backed or
not rather than using a special flag for it. Now that ITER_UBUF and
ITER_IOVEC are 0 and 1, they can be checked with a single comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-7-dhowells@redhat.com
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 7d9e44a6 25-Sep-2023 David Howells <dhowells@redhat.com>

iov_iter: Renumber ITER_* constants

Renumber the ITER_* iterator-type constants to put things in the same order
as in the iteration functions and to group user-backed iterators at the
bottom.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-6-dhowells@redhat.com
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 581beb4f 25-Sep-2023 David Howells <dhowells@redhat.com>

iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE

Now that ITER_PIPE has been removed, iov_iter::last_offset is no longer
used, so remove it.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230925120309.1731676-2-dhowells@redhat.com
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Christian Brauner <christian@brauner.io>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 1b030698 09-Jul-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

iov_iter: Add copy_folio_from_iter_atomic()

Add a folio wrapper around copy_page_from_iter_atomic().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 84bd06c6 14-Jun-2023 Christoph Hellwig <hch@lst.de>

iov_iter: remove iov_iter_get_pages and iov_iter_get_pages_alloc

Now that the direct I/O helpers have switched to use
iov_iter_extract_pages, these helpers are unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20230614140341.521331-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f5f82cd1 06-Jun-2023 David Howells <dhowells@redhat.com>

Move netfs_extract_iter_to_sg() to lib/scatterlist.c

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: netdev@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 3fc40265 22-May-2023 David Howells <dhowells@redhat.com>

iov_iter: Kill ITER_PIPE

The ITER_PIPE-type iterator was only used by generic_file_splice_read() and
that has been replaced and removed. This leaves ITER_PIPE unused - so
remove it too.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 245f0922 16-Apr-2023 Kefeng Wang <wangkefeng.wang@huawei.com>

mm: hwpoison: coredump: support recovery from dump_user_range()

dump_user_range() is used to copy the user page to a coredump file, but if
a hardware memory error occurred during copy, which called from
__kernel_write_iter() in dump_user_range(), it crashes,

CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425

pc : __memcpy+0x110/0x260
lr : _copy_from_iter+0x3bc/0x4c8
...
Call trace:
__memcpy+0x110/0x260
copy_page_from_iter+0xcc/0x130
pipe_write+0x164/0x6d8
__kernel_write_iter+0x9c/0x210
dump_user_range+0xc8/0x1d8
elf_core_dump+0x308/0x368
do_coredump+0x2e8/0xa40
get_signal+0x59c/0x788
do_signal+0x118/0x1f8
do_notify_resume+0xf0/0x280
el0_da+0x130/0x138
el0t_64_sync_handler+0x68/0xc0
el0t_64_sync+0x188/0x190

Generally, the '->write_iter' of file ops will use copy_page_from_iter()
and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel()
in both of them to handle #MC during source read, which stop coredump
processing and kill the task instead of kernel panic, but the source
address may not always a user address, so introduce a new copy_mc flag in
struct iov_iter{} to indicate that the iter could do a safe memory copy,
also introduce the helpers to set/cleck the flag, for now, it's only used
in coredump's dump_user_range(), but it could expand to any other
scenarios to fix the similar issue.

Link: https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Tong Tiangen <tongtiangen@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 4f80818b 22-Mar-2023 Lorenzo Stoakes <lstoakes@gmail.com>

iov_iter: add copy_page_to_iter_nofault()

Provide a means to copy a page to user space from an iterator, aborting if
a page fault would occur. This supports compound pages, but may be passed
a tail page with an offset extending further into the compound page, so we
cannot pass a folio.

This allows for this function to be called from atomic context and _try_
to user pages if they are faulted in, aborting if not.

The function does not use _copy_to_iter() in order to not specify
might_fault(), this is similar to copy_page_from_iter_atomic().

This is being added in order that an iteratable form of vread() can be
implemented while holding spinlocks.

Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 747b1f65 28-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: overlay struct iovec and ubuf/len

Add an internal struct iovec that we can return as a pointer, with the
fields of the iovec overlapping with the ITER_UBUF ubuf and length
fields.

Then we can have iter_iov() check for the appropriate type, and return
&iter->__ubuf_iovec for ITER_UBUF and iter->__iov for ITER_IOVEC and
things will magically work out for a single segment request regardless
of either type.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cd0bd57a 28-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: set nr_segs = 1 for ITER_UBUF

To avoid needing to check if a given user backed iov_iter is of type
ITER_IOVEC or ITER_UBUF, set the number of segments for the ITER_UBUF
case to 1 as we're carrying a single segment.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6eb203e1 29-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: remove iov_iter_iovec()

No more users are left of this function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 95e49cf8 29-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: add iter_iov_addr() and iter_iov_len() helpers

These just return the address and length of the current iovec segment
in the iterator. Convert existing iov_iter_iovec() users to use them
instead of getting a copy of the current vec.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# de4f5fed 29-Mar-2023 Jens Axboe <axboe@kernel.dk>

iov_iter: add iter_iovec() helper

This returns a pointer to the current iovec entry in the iterator. Only
useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF
and ITER_IOVEC identically for the first segment.

Rename struct iov_iter->iov to iov_iter->__iov to find any potentially
troublesome spots, and also to prevent anyone from adding new code that
accesses iter->iov directly.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7d58fe73 28-Oct-2022 David Howells <dhowells@redhat.com>

iov_iter: Add a function to extract a page list from an iterator

Add a function, iov_iter_extract_pages(), to extract a list of pages from
an iterator. The pages may be returned with a pin added or nothing,
depending on the type of iterator.

Add a second function, iov_iter_extract_will_pin(), to determine how the
cleanup should be done.

There are two cases:

(1) ITER_IOVEC or ITER_UBUF iterator.

Extracted pages will have pins (FOLL_PIN) obtained on them so that a
concurrent fork() will forcibly copy the page so that DMA is done
to/from the parent's buffer and is unavailable to/unaffected by the
child process.

iov_iter_extract_will_pin() will return true for this case. The
caller should use something like unpin_user_page() to dispose of the
page.

(2) Any other sort of iterator.

No refs or pins are obtained on the page, the assumption is made that
the caller will manage page retention.

iov_iter_extract_will_pin() will return false. The pages don't need
additional disposal.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: John Hubbard <jhubbard@nvidia.com>
cc: David Hildenbrand <david@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>


# f62e52d1 18-Jan-2023 David Howells <dhowells@redhat.com>

iov_iter: Define flags to qualify page extraction.

Define flags to qualify page extraction to pass into iov_iter_*_pages*()
rather than passing in FOLL_* flags.

For now only a flag to allow peer-to-peer DMA is supported.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>


# 2ad9bd83 05-Jan-2023 Jens Axboe <axboe@kernel.dk>

iov: add import_ubuf()

Like import_single_range(), but for ITER_UBUF.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# de4eda9d 15-Sep-2022 Al Viro <viro@zeniv.linux.org.uk>

use less confusing names for iov_iter direction initializers

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d8207640 21-Oct-2022 Logan Gunthorpe <logang@deltatee.com>

iov_iter: introduce iov_iter_get_pages_[alloc_]flags()

Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags()
which take a flags argument that is passed to get_user_pages_fast().

This is so that FOLL_PCI_P2PDMA can be passed when appropriate.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7187440d 08-Sep-2022 Dan Carpenter <dan.carpenter@oracle.com>

iov_iter: use "maxpages" parameter

This was intended to be "maxpages" instead of INT_MAX. There is only
one caller and it passes INT_MAX so this does not affect runtime.

Fixes: b93235e68921 ("tls: cap the output scatter list to something reasonable")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eba2d3d7 10-Jun-2022 Al Viro <viro@zeniv.linux.org.uk>

get rid of non-advancing variants

mechanical change; will be further massaged in subsequent commits

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1ef255e2 09-Jun-2022 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 10f525a8 15-Jun-2022 Al Viro <viro@zeniv.linux.org.uk>

ITER_PIPE: cache the type of last buffer

We often need to find whether the last buffer is anon or not, and
currently it's rather clumsy:
check if ->iov_offset is non-zero (i.e. that pipe is not empty)
if so, get the corresponding pipe_buffer and check its ->ops
if it's &default_pipe_buf_ops, we have an anon buffer.

Let's replace the use of ->iov_offset (which is nowhere near similar to
its role for other flavours) with signed field (->last_offset), with
the following rules:
empty, no buffers occupied: 0
anon, with bytes up to N-1 filled: N
zero-copy, with bytes up to N-1 filled: -N

That way abs(i->last_offset) is equal to what used to be in i->iov_offset
and empty vs. anon vs. zero-copy can be distinguished by the sign of
i->last_offset.

Checks for "should we extend the last buffer or should we start
a new one?" become easier to follow that way.

Note that most of the operations can only be done in a sane
state - i.e. when the pipe has nothing past the current position of
iterator. About the only thing that could be done outside of that
state is iov_iter_advance(), which transitions to the sane state by
truncating the pipe. There are only two cases where we leave the
sane state:
1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be
dealt with later, when we make get_pages advancing - the callers are
actually happier that way.
2) iov_iter copied, then something is put into the copy. Since
they share the underlying pipe, the original gets behind. When we
decide that we are done with the copy (original is not usable until then)
we advance the original. direct_io used to be done that way; nowadays
it operates on the original and we do iov_iter_revert() to discard
the excessive data. At the moment there's nothing in the kernel that
could do that to ITER_PIPE iterators, so this reason for insane state
is theoretical right now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# fcb14cb1 22-May-2022 Al Viro <viro@zeniv.linux.org.uk>

new iov_iter flavour - ITER_UBUF

Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0e3c3b90 06-Jun-2022 Al Viro <viro@zeniv.linux.org.uk>

No need of likely/unlikely on calls of check_copy_size()

it's inline and unlikely() inside of it (including the implicit one
in WARN_ON_ONCE()) suffice to convince the compiler that getting
false from check_copy_size() is unlikely.

Spotted-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cfa320f7 10-Jun-2022 Keith Busch <kbusch@kernel.org>

iov: introduce iov_iter_aligned

The existing iov_iter_alignment() function returns the logical OR of
address and length. For cases where address and length need to be
considered separately, introduce a helper function that a caller can
specificy length and address masks that indicate if the iov is
unaligned.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b93235e6 02-Feb-2022 Jakub Kicinski <kuba@kernel.org>

tls: cap the output scatter list to something reasonable

TLS recvmsg() passes user pages as destination for decrypt.
The decrypt operation is repeated record by record, each
record being 16kB, max. TLS allocates an sg_table and uses
iov_iter_get_pages() to populate it with enough pages to
fit the decrypted record.

Even though we decrypt a single message at a time we size
the sg_table based on the entire length of the iovec.
This leads to unnecessarily large allocations, risking
triggering OOM conditions.

Use iov_iter_truncate() / iov_iter_reexpand() to construct
a "capped" version of iov_iter_npages(). Alternatively we
could parametrize iov_iter_npages() to take the size as
arg instead of using i->count, or do something else..

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d9c19d32 18-Oct-2021 Matthew Wilcox (Oracle) <willy@infradead.org>

iov_iter: Add copy_folio_to_iter()

This wrapper around copy_page_to_iter() works because copy_page_to_iter()
handles compound pages correctly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>


# e17f7a0b 15-Dec-2021 Christoph Hellwig <hch@lst.de>

uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()

These two wrappers are never used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211215084508.435401-2-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 3337ab08 11-Jul-2021 Andreas Gruenbacher <agruenba@redhat.com>

iov_iter: Introduce nofault flag to disable page faults

Introduce a new nofault flag to indicate to iov_iter_get_pages not to
fault in user pages.

This is implemented by passing the FOLL_NOFAULT flag to get_user_pages,
which causes get_user_pages to fail when it would otherwise fault in a
page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting
in pages when page faults are not allowed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# cdd591fc 05-Jul-2021 Andreas Gruenbacher <agruenba@redhat.com>

iov_iter: Introduce fault_in_iov_iter_writeable

Introduce a new fault_in_iov_iter_writeable helper for safely faulting
in an iterator for writing. Uses get_user_pages() to fault in the pages
without actually writing to them, which would be destructive.

We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that
the iterator passed to .read_iter isn't in memory.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a6294593 02-Aug-2021 Andreas Gruenbacher <agruenba@redhat.com>

iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable

Turn iov_iter_fault_in_readable into a function that returns the number
of bytes not faulted in, similar to copy_to_user, instead of returning a
non-zero value when any of the requested pages couldn't be faulted in.
This supports the existing users that require all pages to be faulted in
as well as new users that are happy if any pages can be faulted in.

Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
sure this change doesn't silently break things.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7dedd3e1 10-Sep-2021 Jens Axboe <axboe@kernel.dk>

Revert "iov_iter: track truncated size"

This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa.

We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8fb0f47a 10-Sep-2021 Jens Axboe <axboe@kernel.dk>

iov_iter: add helper to save iov_iter state

In an ideal world, when someone is passed an iov_iter and returns X bytes,
then X bytes would have been consumed/advanced from the iov_iter. But we
have use cases that always consume the entire iterator, a few examples
of that are iomap and bdev O_DIRECT. This means we cannot rely on the
state of the iov_iter once we've called ->read_iter() or ->write_iter().

This would be easier if we didn't always have to deal with truncate of
the iov_iter, as rewinding would be trivial without that. We recently
added a commit to track the truncate state, but that grew the iov_iter
by 8 bytes and wasn't the best solution.

Implement a helper to save enough of the iov_iter state to sanely restore
it after we've called the read/write iterator helpers. This currently
only works for IOVEC/BVEC/KVEC as that's all we need, support for other
iterator types are left as an exercise for the reader.

Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2112ff5c 23-Aug-2021 Pavel Begunkov <asml.silence@gmail.com>

iov_iter: track truncated size

Remember how many bytes were truncated and reverted back. Because
not reexpanded iterators don't always work well with reverting, we may
need to know that to reexpand ourselves when needed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f0b65f39 30-Apr-2021 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant

Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it. In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed. That, however, needs to be done only on slow
paths.

All in-tree callers converted. And that kills the last user of iterate_all_kinds()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 8409a0d2 02-May-2021 Al Viro <viro@zeniv.linux.org.uk>

sanitize iov_iter_fault_in_readable()

1) constify iov_iter argument; we are not advancing it in this primitive.

2) cap the amount requested by the amount of data in iov_iter. All
existing callers should've been safe, but the check is really cheap and
doing it here makes for easier analysis, as well as more consistent
semantics among the primitives.

3) don't bother with iterate_iovec(). Explicit loop is not any harder
to follow, and we get rid of standalone iterate_iovec() users - it's
only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 8cd54c1c 22-Apr-2021 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: separate direction from flavour

Instead of having them mixed in iter->type, use separate ->iter_type
and ->data_source (u8 and bool resp.) And don't bother with (pseudo-)
bitmap for the former - microoptimizations from being able to check
if the flavour is one of two values are not worth the confusion for
optimizer. It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE,
so we end up with extra headache.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 4b6c132b 29-Apr-2021 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert()

Use corresponding plain variants, revert on short copy. That's the way it
should've been done from the very beginning, except that we didn't have
iov_iter_revert() back then...

[fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 66cd071a 09-Apr-2021 David Howells <dhowells@redhat.com>

iov_iter: Remove iov_iter_for_each_range()

Remove iov_iter_for_each_range() as it's no longer used with the removal of
lustre.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 3d14ec1f 25-Apr-2021 David Howells <dhowells@redhat.com>

iov_iter: Four fixes for ITER_XARRAY

Fix four things[1] in the patch that adds ITER_XARRAY[2]:

(1) Remove the address_space struct predeclaration. This is a holdover
from when it was ITER_MAPPING.

(2) Fix _copy_mc_to_iter() so that the xarray segment updates count and
iov_offset in the iterator before returning.

(3) Fix iov_iter_alignment() to not loop in the xarray case. Because the
middle pages are all whole pages, only the end pages need be
considered - and this can be reduced to just looking at the start
position in the xarray and the iteration size.

(4) Fix iov_iter_advance() to limit the size of the advance to no more
than the remaining iteration size.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1]
Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]


# 7ff506207 10-Feb-2020 David Howells <dhowells@redhat.com>

iov_iter: Add ITER_XARRAY

Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes. The iterator supports transparent huge pages.

The iterate_xarray() macro calls the helper function with rcu_access()
helped. I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.

Changes:
v7:
- Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1].

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Christoph Hellwig <hch@lst.de>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/27c369a8f42bb8a617672b2dc0126a5c6df5a050.camel@kernel.org [1]


# 52cbd23a 03-Feb-2021 Willem de Bruijn <willemb@google.com>

udp: fix skb_copy_and_csum_datagram with odd segment sizes

When iteratively computing a checksum with csum_block_add, track the
offset "pos" to correctly rotate in csum_block_add when offset is odd.

The open coded implementation of skb_copy_and_csum_datagram did this.
With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
pos was reinitialized to 0 on each call.

Bring back the pos by passing it along with the csum to the callback.

Changes v1->v2
- pass csum value, instead of csump pointer (Alexander Duyck)

Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
Reported-by: Oliver Graute <oliver.graute@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ec6347bb 05-Oct-2020 Dan Williams <dan.j.williams@intel.com>

x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com


# 89cd35c5 24-Sep-2020 Christoph Hellwig <hch@lst.de>

iov_iter: transparently handle compat iovecs in import_iovec

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# bfdc5970 24-Sep-2020 Christoph Hellwig <hch@lst.de>

iov_iter: refactor rw_copy_check_uvector and import_iovec

Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

- iovec_from_user copies a iovec from userspace either into the provided
stack buffer if it fits, or allocates a new buffer for it. Returns
the actually used iovec. It also verifies that iov_len does fit a
signed type, and handles compat iovecs if the compat flag is set.
- __import_iovec consolidates the native and compat versions of
import_iovec. It calls iovec_from_user, then validates each iovec
actually points to user addresses, and ensures the total length
doesn't overflow.

This has two major implications:

- the access_process_vm case loses the total lenght checking, which
wasn't required anyway, given that each call receives two iovecs
for the local and remote side of the operation, and it verifies
the total length on the local side already.
- instead of a single loop there now are two loops over the iovecs.
Given that the iovecs are cache hot this doesn't make a major
difference

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7999096f 12-Jun-2020 Herbert Xu <herbert@gondor.apana.org.au>

iov_iter: Move unnecessary inclusion of crypto/hash.h

The header file linux/uio.h includes crypto/hash.h which pulls in
most of the Crypto API. Since linux/uio.h is used throughout the
kernel this means that every tiny bit of change to the Crypto API
causes the entire kernel to get rebuilt.

This patch fixes this by moving it into lib/iov_iter.c instead
where it is actually used.

This patch also fixes the ifdef to use CRYPTO_HASH instead of just
CRYPTO which does not guarantee the existence of ahash.

Unfortunately a number of drivers were relying on linux/uio.h to
provide access to linux/slab.h. This patch adds inclusions of
linux/slab.h as detected by build failures.

Also skbuff.h was relying on this to provide a declaration for
ahash_request. This patch adds a forward declaration instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 8cefc107 15-Nov-2019 David Howells <dhowells@redhat.com>

pipe: Use head and tail pointers for the ring, not cursor and length

Convert pipes to use head and tail pointers for the buffer ring rather than
pointer and length as the latter requires two atomic ops to update (or a
combined op) whereas the former only requires one.

(1) The head pointer is the point at which production occurs and points to
the slot in which the next buffer will be placed. This is equivalent
to pipe->curbuf + pipe->nrbufs.

The head pointer belongs to the write-side.

(2) The tail pointer is the point at which consumption occurs. It points
to the next slot to be consumed. This is equivalent to pipe->curbuf.

The tail pointer belongs to the read-side.

(3) head and tail are allowed to run to UINT_MAX and wrap naturally. They
are only masked off when the array is being accessed, e.g.:

pipe->bufs[head & mask]

This means that it is not necessary to have a dead slot in the ring as
head == tail isn't ambiguous.

(4) The ring is empty if "head == tail".

A helper, pipe_empty(), is provided for this.

(5) The occupancy of the ring is "head - tail".

A helper, pipe_occupancy(), is provided for this.

(6) The number of free slots in the ring is "pipe->ring_size - occupancy".

A helper, pipe_space_for_user() is provided to indicate how many slots
userspace may use.

(7) The ring is full if "head - tail >= pipe->ring_size".

A helper, pipe_full(), is provided for this.

Signed-off-by: David Howells <dhowells@redhat.com>


# b6207430 26-Jun-2019 Christoph Hellwig <hch@lst.de>

block: never take page references for ITER_BVEC

If we pass pages through an iov_iter we always already have a reference
in the caller. Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 87e5e6da 14-May-2019 Jens Axboe <axboe@kernel.dk>

uio: make import_iovec()/compat_import_iovec() return bytes on success

Currently these functions return < 0 on error, and 0 for success.
Change that so that we return < 0 on error, but number of bytes
for success.

Some callers already treat the return value that way, others need a
slight tweak.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f5eb4d3b 26-Apr-2019 Ming Lei <ming.lei@redhat.com>

iov_iter: fix iov_iter_type

Commit 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag
is stored into iter->type.

However, iov_iter_type() doesn't consider the new added flag, fix
it by masking this flag in iov_iter_type().

Fixes: 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 875f1d07 27-Feb-2019 Jens Axboe <axboe@kernel.dk>

iov_iter: add ITER_BVEC_FLAG_NO_REF flag

For ITER_BVEC, if we're holding on to kernel pages, the caller
doesn't need to grab a reference to the bvec pages, and drop that
same reference on IO completion. This is essentially safe for any
ITER_BVEC, but some use cases end up reusing pages and uncondtionally
dropping a page reference on completion. And example of that is
sendfile(2), that ends up being a splice_in + splice_out on the
pipe pages.

Add a flag that tells us it's fine to not grab a page reference
to the bvec pages, since that caller knows not to drop a reference
when it's done with the pages.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 77000bc4 04-Feb-2019 Christoph Hellwig <hch@lst.de>

uio: remove the unused iov_for_each macro

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d05f4435 03-Dec-2018 Sagi Grimberg <sagi@lightbitslabs.com>

iov_iter: introduce hash_and_copy_to_iter helper

Allow consumers that want to use iov iterator helpers and also update
a predefined hash calculation online when copying data. This is useful
when copying incoming network buffers to a local iterator and calculate
a digest on the incoming stream. nvme-tcp host driver that will be
introduced in following patches is the first consumer via
skb_copy_and_hash_datagram_iter.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# cb002d07 03-Dec-2018 Sagi Grimberg <sagi@lightbitslabs.com>

iov_iter: pass void csum pointer to csum_and_copy_to_iter

The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram
and we are trying to unite its logic with skb_copy_datagram_iter by passing
a callback to the copy function that we want to apply. Thus, we need
to make the checksum pointer private to the function.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9ea9ce04 19-Oct-2018 David Howells <dhowells@redhat.com>

iov_iter: Add I/O discard iterator

Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it.

This is useful in a network filesystem for discarding any unwanted data
sent by a server.

Signed-off-by: David Howells <dhowells@redhat.com>


# aa563d7b 19-Oct-2018 David Howells <dhowells@redhat.com>

iov_iter: Separate type from direction and use accessor functions

In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>


# 00e23707 22-Oct-2018 David Howells <dhowells@redhat.com>

iov_iter: Use accessor function

Use accessor functions to access an iterator's type and direction. This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.

Signed-off-by: David Howells <dhowells@redhat.com>


# dfb06cba 05-Sep-2018 Dave Jiang <dave.jiang@intel.com>

uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()

copy_to_iter_mcsafe() is passing in the is_source parameter as "false"
to check_copy_size(). This is different than what copy_to_iter() does.
Also, the addr parameter passed to check_copy_size() is the source so
therefore we should be passing in "true" instead.

Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Cc: <stable@vger.kernel.org>
Reported-by: Fan Du <fan.du@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 522239b4 23-May-2018 Dan Williams <dan.j.williams@intel.com>

uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation

Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can
optionally select, and fixup the declaration of _copy_to_iter_mcsafe().

Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 8780356e 03-May-2018 Dan Williams <dan.j.williams@intel.com>

x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()

Use the updated memcpy_mcsafe() implementation to define
copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
difference from typical copy_to_iter() is that the ITER_KVEC and
ITER_BVEC iterator types can fail to complete a full transfer.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 09cf698a 17-Feb-2017 Al Viro <viro@zeniv.linux.org.uk>

new primitive: iov_iter_for_each_range()

For kvec and bvec: feeds segments to given callback as long as it
returns 0. For iovec and pipe: fails.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# faea1329 24-Sep-2017 Al Viro <viro@zeniv.linux.org.uk>

kill iov_shorten()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c43aeb19 10-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

fix brown paperbag bug in inlined copy_..._iter()

"copied nothing" == "return 0", not "return full size".

Fixes: aa28de275a24 "iov_iter/hardening: move object size checks to inlined part"
Spotted-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aa28de27 29-Jun-2017 Al Viro <viro@zeniv.linux.org.uk>

iov_iter/hardening: move object size checks to inlined part

There we actually have useful information about object sizes.
Note: this patch has them done for all iov_iter flavours.
Right now we do them twice in iovec case, but that'll change
very shortly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0aed55af 29-May-2017 Dan Williams <dan.j.williams@intel.com>

x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations

The pmem driver has a need to transfer data with a persistent memory
destination and be able to rely on the fact that the destination writes are not
cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
(non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
to ensure data-writes have reached a power-fail-safe zone in the platform. The
fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
around and fence previous writes with an "sfence".

Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
memcpy_flushcache, that guarantee that the destination buffer is not dirty in
the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
will be used to replace the "pmem api" (include/linux/pmem.h +
arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
otherwise.

This is meant to satisfy the concern from Linus that if a driver wants to do
something beyond the normal nocache semantics it should be something private to
that driver [1], and Al's concern that anything uaccess related belongs with
the rest of the uaccess code [2].

The first consumer of this interface is a new 'copy_from_iter' dax operation so
that pmem can inject cache maintenance operations without imposing this
overhead on other dax-capable drivers.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
[2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 27c0e374 17-Feb-2017 Al Viro <viro@zeniv.linux.org.uk>

[iov_iter] new privimitive: iov_iter_revert()

opposite to iov_iter_advance(); the caller is responsible for never
using it to move back past the initial position.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cbbd26b8 01-Nov-2016 Al Viro <viro@zeniv.linux.org.uk>

[iov_iter] new primitives - copy_from_iter_full() and friends

copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users. *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case. Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d3849953 01-Nov-2016 Christoph Hellwig <hch@lst.de>

fs: decouple READ and WRITE from the block layer ops

Move READ and WRITE to kernel.h and don't define them in terms of block
layer ops; they are our generic data direction indicators these days
and have no more resemblance with the block layer ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# b57332b4 10-Oct-2016 Al Viro <viro@zeniv.linux.org.uk>

constify iov_iter_count() and iter_is_iovec()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 241699cd 22-Sep-2016 Al Viro <viro@zeniv.linux.org.uk>

new iov_iter flavour: pipe-backed

iov_iter variant for passing data into pipe. copy_to_iter()
copies data into page(s) it has allocated and stuffs them into
the pipe; copy_page_to_iter() stuffs there a reference to the
page given to it. Both will try to coalesce if possible.
iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
and friends will do as copy_to_iter() would have and return the
pages where the data would've been copied. iov_iter_advance()
will truncate everything past the spot it has advanced to.

New primitive: iov_iter_pipe(), used for initializing those.
pipe should be locked all along.

Running out of space acts as fault would for iovec-backed ones;
in other words, giving it to ->read_iter() may result in short
read if the pipe overflows, or -EFAULT if it happens with nothing
copied there.

In other words, ->read_iter() on those acts pretty much like
->splice_read(). Moreover, all generic_file_splice_read() users,
as well as many other ->splice_read() instances can be switched
to that scheme - that'll happen in the next commit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 4bce9f6e 17-Sep-2016 Al Viro <viro@zeniv.linux.org.uk>

get rid of separate multipage fault-in primitives

* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d4690f1e 15-Sep-2016 Al Viro <viro@ZenIV.linux.org.uk>

fix iov_iter_fault_in_readable()

... by turning it into what used to be multipages counterpart

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 357f435d 08-Apr-2016 Al Viro <viro@zeniv.linux.org.uk>

fix the copy vs. map logics in blk_rq_map_user_iov()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 36f7a8a4 06-Dec-2015 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: constify {csum_and_,}copy_to_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# bd8e0ff9 17-Mar-2015 Omar Sandoval <osandov@osandov.com>

new helper: iov_iter_rw()

Get either READ or WRITE out of iter->type.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 171a0203 11-Mar-2015 Anton Altaparmakov <anton@tuxera.com>

VFS: Add iov_iter_fault_in_multipages_readable()

simillar to iov_iter_fault_in_readable() but differs in that it is
not limited to faulting in the first iovec and instead faults in
"bytes" bytes iterating over the iovecs as necessary.

Also, instead of only faulting in the first and last page of the
range, all pages are faulted in.

This function is needed by NTFS when it does multi page file
writes.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# bc917be8 21-Mar-2015 Al Viro <viro@zeniv.linux.org.uk>

saner iov_iter initialization primitives

iovec-backed iov_iter instances are assumed to satisfy several properties:
* no more than UIO_MAXIOV elements in iovec array
* total size of all ranges is no more than MAX_RW_COUNT
* all ranges pass access_ok().

The problem is, invariants of data structures should be established in the
primitives creating those data structures, not in the code using those
primitives. And iov_iter_init() violates that principle. For a while we
managed to get away with that, but once the use of iov_iter started to
spread, it didn't take long for shit to hit the fan - missed check in
sys_sendto() had introduced a roothole.

We _do_ have primitives for importing and validating iovecs (both native and
compat ones) and those primitives are almost always followed by shoving the
resulting iovec into iov_iter. Life would be considerably simpler (and safer)
if we combined those primitives with initializing iov_iter.

That gives us two new primitives - import_iovec() and compat_import_iovec().
Calling conventions:
iovec = iov_array;
err = import_iovec(direction, uvec, nr_segs,
ARRAY_SIZE(iov_array), &iovec,
&iter);
imports user vector into kernel space (into iov_array if it fits, allocated
if it doesn't fit or if iovec was NULL), validates it and sets iter up to
refer to it. On success 0 is returned and allocated kernel copy (or NULL
if the array had fit into caller-supplied one) is returned via iovec.
On failure all allocations are undone and -E... is returned. If the total
size of ranges exceeds MAX_RW_COUNT, the excess is silently truncated.

compat_import_iovec() expects uvec to be a pointer to user array of compat_iovec;
otherwise it's identical to import_iovec().

Finally, import_single_range() sets iov_iter backed by single-element iovec
covering a user-supplied range -

err = import_single_range(direction, address, size, iovec, &iter);

does validation and sets iter up. Again, size in excess of MAX_RW_COUNT gets
silently truncated.

Next commits will be switching the things up to use of those and reducing
the amount of iov_iter_init() instances.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 4b8164b9 31-Jan-2015 Al Viro <viro@zeniv.linux.org.uk>

new helper: dup_iter()

Copy iter and kmemdup the underlying array for the copy. Returns
a pointer to result of kmemdup() to be kfree()'d later.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 57dd8a07 10-Dec-2014 Al Viro <viro@zeniv.linux.org.uk>

vhost: vhost_scsi_handle_vq() should just use copy_from_user()

it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ba7438ae 10-Dec-2014 Al Viro <viro@zeniv.linux.org.uk>

vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aad9a1ce 10-Dec-2014 Al Viro <viro@zeniv.linux.org.uk>

vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 05afcb77 22-Jan-2015 Al Viro <viro@zeniv.linux.org.uk>

new helper: iov_iter_bvec()

similar to iov_iter_kvec(), for ITER_BVEC ones

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 777eda2c 17-Dec-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: iter_is_iovec()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 218321e7 24-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

bury memcpy_toiovec()

no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aa583096 27-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

copy_from_iter_nocache()

BTW, do we want memcpy_nocache()?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# abb78f87 24-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: iov_iter_kvec()

initialization of kvec-backed iov_iter

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a604ec7e 23-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

csum_and_copy_..._iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a280455f 27-Nov-2014 Al Viro <viro@zeniv.linux.org.uk>

iov_iter.c: handle ITER_KVEC directly

... without bothering with copy_..._user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c35e0248 01-Aug-2014 Matthew Wilcox <willy@infradead.org>

Add copy_to_iter(), copy_from_iter() and iov_iter_zero()

For DAX, we want to be able to copy between iovecs and kernel addresses
that don't necessarily have a struct page. This is a fairly simple
rearrangement for bvec iters to kmap the pages outside and pass them in,
but for user iovecs it gets more complicated because we might try various
different ways to kmap the memory. Duplicating the existing logic works
out best in this case.

We need to be able to write zeroes to an iovec for reads from unwritten
ranges in a file. This is performed by the new iov_iter_zero() function,
again patterned after the existing code that handles iovec iterators.

[AV: and export the buggers...]

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 2c80929c 24-Sep-2014 Miklos Szeredi <mszeredi@suse.cz>

fuse: honour max_read and max_write in direct_io mode

The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter. So fuse_get_user_pages() must
ensure that *nbytesp won't grow.

Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.

The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.

Fixes: c9c37e2e6378 ("fuse: switch to iov_iter_get_pages()")
Reported-by: Werner Baumann <werner.baumann@onlinehome.de>
Tested-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c7f3888a 18-Jun-2014 Al Viro <viro@zeniv.linux.org.uk>

switch iov_iter_get_pages() to passing maximal number of pages

... instead of maximal size.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ac5ccdba 19-Jun-2014 Michael S. Tsirkin <mst@redhat.com>

iovec: move memcpy_from/toiovecend to lib/iovec.c

ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!

commit 9f977ef7b671f6169eca78bf40f230fe84b7c7e5
vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.

socket.h already includes uio.h, so no callers need updating.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>


# 0b86dbf6 23-Jun-2014 Al Viro <viro@ZenIV.linux.org.uk>

Fix 32-bit regression in block device read(2)

blkdev_read_iter() wants to cap the iov_iter by the amount of data
remaining to the end of device. That's what iov_iter_truncate() is for
(trim iter->count if it's above the given limit). So far, so good, but
the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
case of a large device) we end up with that upper limit truncated down
to 32 bits *before* comparing it with iter->count.

Easily fixed by making iov_iter_truncate() take 64bit argument - it does
the right thing after such change (we only reach the assignment in there
when the current value of iter->count is greater than the limit, i.e.
for anything that would get truncated we don't reach the assignment at
all) and that argument is not the new value of iter->count - it's an
upper limit for such.

The overhead of passing u64 is not an issue - the thing is inlined, so
callers passing size_t won't pay any penalty.

Reported-and-tested-by: Theodore Tso <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 62a8067a 04-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

bio_vec-backed iov_iter

New variant of iov_iter - ITER_BVEC in iter->type, backed with
bio_vec array instead of iovec one. Primitives taught to deal
with such beasts, __swap_write() switched to using that kind
of iov_iter.

Note that bio_vec is just a <page, offset, length> triple - there's
nothing block-specific about it. I've left the definition where it
was, but took it from under ifdef CONFIG_BLOCK.

Next target: ->splice_write()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# b42b15fd 03-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

lustre: get rid of messing with iovecs

* switch to ->read_iter/->write_iter
* keep a pointer to iov_iter instead of iov/nr_segs
* do not modify iovecs; use iov_iter_truncate()/iov_iter_advance() and
a new primitive - iov_iter_reexpand() (expand previously truncated
iterator) istead.
* (racy) check for lustre VMAs intersecting with iovecs kept for now as
for_each_iov() loop.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f0d1bec9 03-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: copy_page_from_iter()

parallel to copy_page_to_iter(). pipe_write() switched to it (and became
->write_iter()).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0c949334 22-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

iov_iter_truncate()

Now It Can Be Done(tm) - we don't need to do iov_shorten() in
generic_file_direct_write() anymore, now that all ->direct_IO()
instances are converted to proper iov_iter methods and honour
iter->count and iter->iov_offset properly.

Get rid of count/ocount arguments of generic_file_direct_write(),
while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 91f79c43 21-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: iov_iter_get_pages_alloc()

same as iov_iter_get_pages(), except that pages array is allocated
(kmalloc if possible, vmalloc if that fails) and left for caller to
free. Lustre and NFS ->direct_IO() switched to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f67da30c 18-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: iov_iter_npages()

counts the pages covered by iov_iter, up to given limit.
do_block_direct_io() and fuse_iter_npages() switched to
it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7b2c99d1 15-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: iov_iter_get_pages()

iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning
the pages of up to maxsize of (contiguous) data from iter. Returns the
amount of memory grabbed or -error. In case of success, the requested
area begins at offset start in pages[0] and runs through pages[1], etc.
Less than requested amount might be returned - either because the contiguous
area in the beginning of iterator is smaller than requested, or because
the kernel failed to pin that many pages.

direct-io.c switched to using iov_iter_get_pages()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 71d8e532 05-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

start adding the tag to iov_iter

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 886a3911 05-Mar-2014 Al Viro <viro@zeniv.linux.org.uk>

new primitive: iov_iter_alignment()

returns the value aligned as badly as the worst remaining segment
in iov_iter is. Use instead of open-coded equivalents.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# e7c24607 10-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

kill iov_iter_copy_from_user()

all callers can use copy_page_from_iter() and it actually simplifies
them.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 6e58e79d 03-Feb-2014 Al Viro <viro@zeniv.linux.org.uk>

introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()

generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that. Just set an iov_iter up and pass *that*
to do_generic_file_aio_read(). With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 92236878 27-Nov-2013 Kent Overstreet <kmo@daterainc.com>

iov_iter: Move iov_iter to uio.h

Signed-off-by: Kent Overstreet <kmo@daterainc.com>


# d2f83e90 16-May-2013 Rusty Russell <rusty@rustcorp.com.au>

Hoist memcpy_fromiovec/memcpy_toiovec into lib/

ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.

In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.

socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 607ca46e 13-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Disintegrate include/linux

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>


# 812ed032 29-Jul-2009 Jiri Slaby <jirislaby@kernel.org>

uio: mark uio.h functions __KERNEL__ only

To avoid userspace build failures such as:

.../linux/uio.h:37: error: expected `=', `,', `;', `asm' or `__attribute__' before `iov_length'
.../linux/uio.h:47: error: expected declaration specifiers or `...' before `size_t'

move uio functions inside a __KERNEL__ block.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e0807061 16-Jul-2007 Christoph Hellwig <hch@lst.de>

remove odd and misleading comments from uio.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!