#
a50026bd |
|
05-Mar-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
iov_iter: get rid of 'copy_mc' flag This flag is only set by one single user: the magical core dumping code that looks up user pages one by one, and then writes them out using their kernel addresses (by using a BVEC_ITER). That actually ends up being a huge problem, because while we do use copy_mc_to_kernel() for this case and it is able to handle the possible machine checks involved, nothing else is really ready to handle the failures caused by the machine check. In particular, as reported by Tong Tiangen, we don't actually support fault_in_iov_iter_readable() on a machine check area. As a result, the usual logic for writing things to a file under a filesystem lock, which involves doing a copy with page faults disabled and then if that fails trying to fault pages in without holding the locks with fault_in_iov_iter_readable() does not work at all. We could decide to always just make the MC copy "succeed" (and filling the destination with zeroes), and that would then create a core dump file that just ignores any machine checks. But honestly, this single special case has been problematic before, and means that all the normal iov_iter code ends up slightly more complex and slower. See for example commit c9eec08bac96 ("iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()") where David Howells re-organized the code just to avoid having to check the 'copy_mc' flags inside the inner iov_iter loops. So considering that we have exactly one user, and that one user is a non-critical special case that doesn't actually ever trigger in real life (Tong found this with manual error injection), the sane solution is to just decide that the onus on handling the machine check lines on that user instead. Ergo, do the copy_mc_to_kernel() in the core dump logic itself, copying the user data to a stable kernel page before writing it out. Fixes: f1982740f5e7 ("iov_iter: Convert iterate*() to inline funcs") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Link: https://lore.kernel.org/r/20240305133336.3804360-1-tongtiangen@huawei.com Link: https://lore.kernel.org/all/4e80924d-9c85-f13a-722a-6a5d2b1c225a@huawei.com/ Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reported-by: Tong Tiangen <tongtiangen@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
9fd7874c |
|
04-Dec-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: replace import_single_range() with import_ubuf() With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
6ac805d1 |
|
04-Dec-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: remove unused 'iov' argument from import_single_range() It is entirely unused, just get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
b5f0e20f |
|
25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Move hash_and_copy_to_iter() to net/ Move hash_and_copy_to_iter() to be with its only caller in networking code. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-13-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
6d0d4199 |
|
25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Move csum_and_copy_to/from_iter() to net/ Move csum_and_copy_to/from_iter() to net code now that the iteration framework can be #included. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-10-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
f1b4cb65 |
|
25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter: Derive user-backedness from the iterator type Use the iterator type to determine whether an iterator is user-backed or not rather than using a special flag for it. Now that ITER_UBUF and ITER_IOVEC are 0 and 1, they can be checked with a single comparison. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-7-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
7d9e44a6 |
|
25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter: Renumber ITER_* constants Renumber the ITER_* iterator-type constants to put things in the same order as in the iteration functions and to group user-backed iterators at the bottom. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-6-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
581beb4f |
|
25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter: Remove last_offset from iov_iter as it was for ITER_PIPE Now that ITER_PIPE has been removed, iov_iter::last_offset is no longer used, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-2-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
1b030698 |
|
09-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iov_iter: Add copy_folio_from_iter_atomic() Add a folio wrapper around copy_page_from_iter_atomic(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
84bd06c6 |
|
14-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
iov_iter: remove iov_iter_get_pages and iov_iter_get_pages_alloc Now that the direct I/O helpers have switched to use iov_iter_extract_pages, these helpers are unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f5f82cd1 |
|
06-Jun-2023 |
David Howells <dhowells@redhat.com> |
Move netfs_extract_iter_to_sg() to lib/scatterlist.c Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be used by more than just network filesystems (AF_ALG, for example). Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jens Axboe <axboe@kernel.dk> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-crypto@vger.kernel.org cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: netdev@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
3fc40265 |
|
22-May-2023 |
David Howells <dhowells@redhat.com> |
iov_iter: Kill ITER_PIPE The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-31-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
245f0922 |
|
16-Apr-2023 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: hwpoison: coredump: support recovery from dump_user_range() dump_user_range() is used to copy the user page to a coredump file, but if a hardware memory error occurred during copy, which called from __kernel_write_iter() in dump_user_range(), it crashes, CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425 pc : __memcpy+0x110/0x260 lr : _copy_from_iter+0x3bc/0x4c8 ... Call trace: __memcpy+0x110/0x260 copy_page_from_iter+0xcc/0x130 pipe_write+0x164/0x6d8 __kernel_write_iter+0x9c/0x210 dump_user_range+0xc8/0x1d8 elf_core_dump+0x308/0x368 do_coredump+0x2e8/0xa40 get_signal+0x59c/0x788 do_signal+0x118/0x1f8 do_notify_resume+0xf0/0x280 el0_da+0x130/0x138 el0t_64_sync_handler+0x68/0xc0 el0t_64_sync+0x188/0x190 Generally, the '->write_iter' of file ops will use copy_page_from_iter() and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel() in both of them to handle #MC during source read, which stop coredump processing and kill the task instead of kernel panic, but the source address may not always a user address, so introduce a new copy_mc flag in struct iov_iter{} to indicate that the iter could do a safe memory copy, also introduce the helpers to set/cleck the flag, for now, it's only used in coredump's dump_user_range(), but it could expand to any other scenarios to fix the similar issue. Link: https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Tong Tiangen <tongtiangen@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
4f80818b |
|
22-Mar-2023 |
Lorenzo Stoakes <lstoakes@gmail.com> |
iov_iter: add copy_page_to_iter_nofault() Provide a means to copy a page to user space from an iterator, aborting if a page fault would occur. This supports compound pages, but may be passed a tail page with an offset extending further into the compound page, so we cannot pass a folio. This allows for this function to be called from atomic context and _try_ to user pages if they are faulted in, aborting if not. The function does not use _copy_to_iter() in order to not specify might_fault(), this is similar to copy_page_from_iter_atomic(). This is being added in order that an iteratable form of vread() can be implemented while holding spinlocks. Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
747b1f65 |
|
28-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: overlay struct iovec and ubuf/len Add an internal struct iovec that we can return as a pointer, with the fields of the iovec overlapping with the ITER_UBUF ubuf and length fields. Then we can have iter_iov() check for the appropriate type, and return &iter->__ubuf_iovec for ITER_UBUF and iter->__iov for ITER_IOVEC and things will magically work out for a single segment request regardless of either type. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
cd0bd57a |
|
28-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: set nr_segs = 1 for ITER_UBUF To avoid needing to check if a given user backed iov_iter is of type ITER_IOVEC or ITER_UBUF, set the number of segments for the ITER_UBUF case to 1 as we're carrying a single segment. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6eb203e1 |
|
29-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: remove iov_iter_iovec() No more users are left of this function. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
95e49cf8 |
|
29-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: add iter_iov_addr() and iter_iov_len() helpers These just return the address and length of the current iovec segment in the iterator. Convert existing iov_iter_iovec() users to use them instead of getting a copy of the current vec. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
de4f5fed |
|
29-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: add iter_iovec() helper This returns a pointer to the current iovec entry in the iterator. Only useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF and ITER_IOVEC identically for the first segment. Rename struct iov_iter->iov to iov_iter->__iov to find any potentially troublesome spots, and also to prevent anyone from adding new code that accesses iter->iov directly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7d58fe73 |
|
28-Oct-2022 |
David Howells <dhowells@redhat.com> |
iov_iter: Add a function to extract a page list from an iterator Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_will_pin(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_will_pin() will return true for this case. The caller should use something like unpin_user_page() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_will_pin() will return false. The pages don't need additional disposal. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
|
#
f62e52d1 |
|
18-Jan-2023 |
David Howells <dhowells@redhat.com> |
iov_iter: Define flags to qualify page extraction. Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
|
#
2ad9bd83 |
|
05-Jan-2023 |
Jens Axboe <axboe@kernel.dk> |
iov: add import_ubuf() Like import_single_range(), but for ITER_UBUF. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
de4eda9d |
|
15-Sep-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
use less confusing names for iov_iter direction initializers READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d8207640 |
|
21-Oct-2022 |
Logan Gunthorpe <logang@deltatee.com> |
iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags() which take a flags argument that is passed to get_user_pages_fast(). This is so that FOLL_PCI_P2PDMA can be passed when appropriate. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7187440d |
|
08-Sep-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
iov_iter: use "maxpages" parameter This was intended to be "maxpages" instead of INT_MAX. There is only one caller and it passes INT_MAX so this does not affect runtime. Fixes: b93235e68921 ("tls: cap the output scatter list to something reasonable") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eba2d3d7 |
|
10-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
get rid of non-advancing variants mechanical change; will be further massaged in subsequent commits Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1ef255e2 |
|
09-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
10f525a8 |
|
15-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
ITER_PIPE: cache the type of last buffer We often need to find whether the last buffer is anon or not, and currently it's rather clumsy: check if ->iov_offset is non-zero (i.e. that pipe is not empty) if so, get the corresponding pipe_buffer and check its ->ops if it's &default_pipe_buf_ops, we have an anon buffer. Let's replace the use of ->iov_offset (which is nowhere near similar to its role for other flavours) with signed field (->last_offset), with the following rules: empty, no buffers occupied: 0 anon, with bytes up to N-1 filled: N zero-copy, with bytes up to N-1 filled: -N That way abs(i->last_offset) is equal to what used to be in i->iov_offset and empty vs. anon vs. zero-copy can be distinguished by the sign of i->last_offset. Checks for "should we extend the last buffer or should we start a new one?" become easier to follow that way. Note that most of the operations can only be done in a sane state - i.e. when the pipe has nothing past the current position of iterator. About the only thing that could be done outside of that state is iov_iter_advance(), which transitions to the sane state by truncating the pipe. There are only two cases where we leave the sane state: 1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be dealt with later, when we make get_pages advancing - the callers are actually happier that way. 2) iov_iter copied, then something is put into the copy. Since they share the underlying pipe, the original gets behind. When we decide that we are done with the copy (original is not usable until then) we advance the original. direct_io used to be done that way; nowadays it operates on the original and we do iov_iter_revert() to discard the excessive data. At the moment there's nothing in the kernel that could do that to ITER_PIPE iterators, so this reason for insane state is theoretical right now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
fcb14cb1 |
|
22-May-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
new iov_iter flavour - ITER_UBUF Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0e3c3b90 |
|
06-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
No need of likely/unlikely on calls of check_copy_size() it's inline and unlikely() inside of it (including the implicit one in WARN_ON_ONCE()) suffice to convince the compiler that getting false from check_copy_size() is unlikely. Spotted-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cfa320f7 |
|
10-Jun-2022 |
Keith Busch <kbusch@kernel.org> |
iov: introduce iov_iter_aligned The existing iov_iter_alignment() function returns the logical OR of address and length. For cases where address and length need to be considered separately, introduce a helper function that a caller can specificy length and address masks that indicate if the iov is unaligned. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b93235e6 |
|
02-Feb-2022 |
Jakub Kicinski <kuba@kernel.org> |
tls: cap the output scatter list to something reasonable TLS recvmsg() passes user pages as destination for decrypt. The decrypt operation is repeated record by record, each record being 16kB, max. TLS allocates an sg_table and uses iov_iter_get_pages() to populate it with enough pages to fit the decrypted record. Even though we decrypt a single message at a time we size the sg_table based on the entire length of the iovec. This leads to unnecessarily large allocations, risking triggering OOM conditions. Use iov_iter_truncate() / iov_iter_reexpand() to construct a "capped" version of iov_iter_npages(). Alternatively we could parametrize iov_iter_npages() to take the size as arg instead of using i->count, or do something else.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9c19d32 |
|
18-Oct-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
iov_iter: Add copy_folio_to_iter() This wrapper around copy_page_to_iter() works because copy_page_to_iter() handles compound pages correctly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
|
#
e17f7a0b |
|
15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
uio: remove copy_from_iter_flushcache() and copy_mc_to_iter() These two wrappers are never used. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211215084508.435401-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
3337ab08 |
|
11-Jul-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iov_iter: Introduce nofault flag to disable page faults Introduce a new nofault flag to indicate to iov_iter_get_pages not to fault in user pages. This is implemented by passing the FOLL_NOFAULT flag to get_user_pages, which causes get_user_pages to fail when it would otherwise fault in a page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting in pages when page faults are not allowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
cdd591fc |
|
05-Jul-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iov_iter: Introduce fault_in_iov_iter_writeable Introduce a new fault_in_iov_iter_writeable helper for safely faulting in an iterator for writing. Uses get_user_pages() to fault in the pages without actually writing to them, which would be destructive. We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that the iterator passed to .read_iter isn't in memory. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
a6294593 |
|
02-Aug-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Turn iov_iter_fault_in_readable into a function that returns the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make sure this change doesn't silently break things. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
7dedd3e1 |
|
10-Sep-2021 |
Jens Axboe <axboe@kernel.dk> |
Revert "iov_iter: track truncated size" This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa. We no longer need to track the truncation count, the one user that did need it has been converted to using iov_iter_restore() instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8fb0f47a |
|
10-Sep-2021 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: add helper to save iov_iter state In an ideal world, when someone is passed an iov_iter and returns X bytes, then X bytes would have been consumed/advanced from the iov_iter. But we have use cases that always consume the entire iterator, a few examples of that are iomap and bdev O_DIRECT. This means we cannot rely on the state of the iov_iter once we've called ->read_iter() or ->write_iter(). This would be easier if we didn't always have to deal with truncate of the iov_iter, as rewinding would be trivial without that. We recently added a commit to track the truncate state, but that grew the iov_iter by 8 bytes and wasn't the best solution. Implement a helper to save enough of the iov_iter state to sanely restore it after we've called the read/write iterator helpers. This currently only works for IOVEC/BVEC/KVEC as that's all we need, support for other iterator types are left as an exercise for the reader. Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2112ff5c |
|
23-Aug-2021 |
Pavel Begunkov <asml.silence@gmail.com> |
iov_iter: track truncated size Remember how many bytes were truncated and reverted back. Because not reexpanded iterators don't always work well with reverting, we may need to know that to reexpand ourselves when needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f0b65f39 |
|
30-Apr-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do *not* need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8409a0d2 |
|
02-May-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
sanitize iov_iter_fault_in_readable() 1) constify iov_iter argument; we are not advancing it in this primitive. 2) cap the amount requested by the amount of data in iov_iter. All existing callers should've been safe, but the check is really cheap and doing it here makes for easier analysis, as well as more consistent semantics among the primitives. 3) don't bother with iterate_iovec(). Explicit loop is not any harder to follow, and we get rid of standalone iterate_iovec() users - it's only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8cd54c1c |
|
22-Apr-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: separate direction from flavour Instead of having them mixed in iter->type, use separate ->iter_type and ->data_source (u8 and bool resp.) And don't bother with (pseudo-) bitmap for the former - microoptimizations from being able to check if the flavour is one of two values are not worth the confusion for optimizer. It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE, so we end up with extra headache. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4b6c132b |
|
29-Apr-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert() Use corresponding plain variants, revert on short copy. That's the way it should've been done from the very beginning, except that we didn't have iov_iter_revert() back then... [fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
66cd071a |
|
09-Apr-2021 |
David Howells <dhowells@redhat.com> |
iov_iter: Remove iov_iter_for_each_range() Remove iov_iter_for_each_range() as it's no longer used with the removal of lustre. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
3d14ec1f |
|
25-Apr-2021 |
David Howells <dhowells@redhat.com> |
iov_iter: Four fixes for ITER_XARRAY Fix four things[1] in the patch that adds ITER_XARRAY[2]: (1) Remove the address_space struct predeclaration. This is a holdover from when it was ITER_MAPPING. (2) Fix _copy_mc_to_iter() so that the xarray segment updates count and iov_offset in the iterator before returning. (3) Fix iov_iter_alignment() to not loop in the xarray case. Because the middle pages are all whole pages, only the end pages need be considered - and this can be reduced to just looking at the start position in the xarray and the iteration size. (4) Fix iov_iter_advance() to limit the size of the advance to no more than the remaining iteration size. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1] Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]
|
#
7ff506207 |
|
10-Feb-2020 |
David Howells <dhowells@redhat.com> |
iov_iter: Add ITER_XARRAY Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The iterate_xarray() macro calls the helper function with rcu_access() helped. I think that this is only a problem for iov_iter_for_each_range() - and that returns an error for ITER_XARRAY (also, this function does not appear to be called). The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Note that older versions of this patch implemented an ITER_MAPPING instead, which was almost the same. Changes: v7: - Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Christoph Hellwig <hch@lst.de> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/ # v6 Link: https://lore.kernel.org/r/27c369a8f42bb8a617672b2dc0126a5c6df5a050.camel@kernel.org [1]
|
#
52cbd23a |
|
03-Feb-2021 |
Willem de Bruijn <willemb@google.com> |
udp: fix skb_copy_and_csum_datagram with odd segment sizes When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd. The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call. Bring back the pos by passing it along with the csum to the callback. Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck) Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute <oliver.graute@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ec6347bb |
|
05-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
|
#
89cd35c5 |
|
24-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
iov_iter: transparently handle compat iovecs in import_iovec Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bfdc5970 |
|
24-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
iov_iter: refactor rw_copy_check_uvector and import_iovec Split rw_copy_check_uvector into two new helpers with more sensible calling conventions: - iovec_from_user copies a iovec from userspace either into the provided stack buffer if it fits, or allocates a new buffer for it. Returns the actually used iovec. It also verifies that iov_len does fit a signed type, and handles compat iovecs if the compat flag is set. - __import_iovec consolidates the native and compat versions of import_iovec. It calls iovec_from_user, then validates each iovec actually points to user addresses, and ensures the total length doesn't overflow. This has two major implications: - the access_process_vm case loses the total lenght checking, which wasn't required anyway, given that each call receives two iovecs for the local and remote side of the operation, and it verifies the total length on the local side already. - instead of a single loop there now are two loops over the iovecs. Given that the iovecs are cache hot this doesn't make a major difference Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7999096f |
|
12-Jun-2020 |
Herbert Xu <herbert@gondor.apana.org.au> |
iov_iter: Move unnecessary inclusion of crypto/hash.h The header file linux/uio.h includes crypto/hash.h which pulls in most of the Crypto API. Since linux/uio.h is used throughout the kernel this means that every tiny bit of change to the Crypto API causes the entire kernel to get rebuilt. This patch fixes this by moving it into lib/iov_iter.c instead where it is actually used. This patch also fixes the ifdef to use CRYPTO_HASH instead of just CRYPTO which does not guarantee the existence of ahash. Unfortunately a number of drivers were relying on linux/uio.h to provide access to linux/slab.h. This patch adds inclusions of linux/slab.h as detected by build failures. Also skbuff.h was relying on this to provide a declaration for ahash_request. This patch adds a forward declaration instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
8cefc107 |
|
15-Nov-2019 |
David Howells <dhowells@redhat.com> |
pipe: Use head and tail pointers for the ring, not cursor and length Convert pipes to use head and tail pointers for the buffer ring rather than pointer and length as the latter requires two atomic ops to update (or a combined op) whereas the former only requires one. (1) The head pointer is the point at which production occurs and points to the slot in which the next buffer will be placed. This is equivalent to pipe->curbuf + pipe->nrbufs. The head pointer belongs to the write-side. (2) The tail pointer is the point at which consumption occurs. It points to the next slot to be consumed. This is equivalent to pipe->curbuf. The tail pointer belongs to the read-side. (3) head and tail are allowed to run to UINT_MAX and wrap naturally. They are only masked off when the array is being accessed, e.g.: pipe->bufs[head & mask] This means that it is not necessary to have a dead slot in the ring as head == tail isn't ambiguous. (4) The ring is empty if "head == tail". A helper, pipe_empty(), is provided for this. (5) The occupancy of the ring is "head - tail". A helper, pipe_occupancy(), is provided for this. (6) The number of free slots in the ring is "pipe->ring_size - occupancy". A helper, pipe_space_for_user() is provided to indicate how many slots userspace may use. (7) The ring is full if "head - tail >= pipe->ring_size". A helper, pipe_full(), is provided for this. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
b6207430 |
|
26-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
block: never take page references for ITER_BVEC If we pass pages through an iov_iter we always already have a reference in the caller. Thus remove the ITER_BVEC_FLAG_NO_REF and don't take reference to pages by default for bvec backed iov_iters. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
87e5e6da |
|
14-May-2019 |
Jens Axboe <axboe@kernel.dk> |
uio: make import_iovec()/compat_import_iovec() return bytes on success Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f5eb4d3b |
|
26-Apr-2019 |
Ming Lei <ming.lei@redhat.com> |
iov_iter: fix iov_iter_type Commit 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag is stored into iter->type. However, iov_iter_type() doesn't consider the new added flag, fix it by masking this flag in iov_iter_type(). Fixes: 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
875f1d07 |
|
27-Feb-2019 |
Jens Axboe <axboe@kernel.dk> |
iov_iter: add ITER_BVEC_FLAG_NO_REF flag For ITER_BVEC, if we're holding on to kernel pages, the caller doesn't need to grab a reference to the bvec pages, and drop that same reference on IO completion. This is essentially safe for any ITER_BVEC, but some use cases end up reusing pages and uncondtionally dropping a page reference on completion. And example of that is sendfile(2), that ends up being a splice_in + splice_out on the pipe pages. Add a flag that tells us it's fine to not grab a page reference to the bvec pages, since that caller knows not to drop a reference when it's done with the pages. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
77000bc4 |
|
04-Feb-2019 |
Christoph Hellwig <hch@lst.de> |
uio: remove the unused iov_for_each macro Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d05f4435 |
|
03-Dec-2018 |
Sagi Grimberg <sagi@lightbitslabs.com> |
iov_iter: introduce hash_and_copy_to_iter helper Allow consumers that want to use iov iterator helpers and also update a predefined hash calculation online when copying data. This is useful when copying incoming network buffers to a local iterator and calculate a digest on the incoming stream. nvme-tcp host driver that will be introduced in following patches is the first consumer via skb_copy_and_hash_datagram_iter. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
cb002d07 |
|
03-Dec-2018 |
Sagi Grimberg <sagi@lightbitslabs.com> |
iov_iter: pass void csum pointer to csum_and_copy_to_iter The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
9ea9ce04 |
|
19-Oct-2018 |
David Howells <dhowells@redhat.com> |
iov_iter: Add I/O discard iterator Add a new iterator, ITER_DISCARD, that can only be used in READ mode and just discards any data copied to it. This is useful in a network filesystem for discarding any unwanted data sent by a server. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
aa563d7b |
|
19-Oct-2018 |
David Howells <dhowells@redhat.com> |
iov_iter: Separate type from direction and use accessor functions In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
00e23707 |
|
22-Oct-2018 |
David Howells <dhowells@redhat.com> |
iov_iter: Use accessor function Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
dfb06cba |
|
05-Sep-2018 |
Dave Jiang <dave.jiang@intel.com> |
uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe() copy_to_iter_mcsafe() is passing in the is_source parameter as "false" to check_copy_size(). This is different than what copy_to_iter() does. Also, the addr parameter passed to check_copy_size() is the source so therefore we should be passing in "true" instead. Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Cc: <stable@vger.kernel.org> Reported-by: Fan Du <fan.du@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
522239b4 |
|
23-May-2018 |
Dan Williams <dan.j.williams@intel.com> |
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can optionally select, and fixup the declaration of _copy_to_iter_mcsafe(). Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8780356e |
|
03-May-2018 |
Dan Williams <dan.j.williams@intel.com> |
x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe() Use the updated memcpy_mcsafe() implementation to define copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant difference from typical copy_to_iter() is that the ITER_KVEC and ITER_BVEC iterator types can fail to complete a full transfer. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
09cf698a |
|
17-Feb-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
new primitive: iov_iter_for_each_range() For kvec and bvec: feeds segments to given callback as long as it returns 0. For iovec and pipe: fails. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
faea1329 |
|
24-Sep-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
kill iov_shorten() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c43aeb19 |
|
10-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
fix brown paperbag bug in inlined copy_..._iter() "copied nothing" == "return 0", not "return full size". Fixes: aa28de275a24 "iov_iter/hardening: move object size checks to inlined part" Spotted-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aa28de27 |
|
29-Jun-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter/hardening: move object size checks to inlined part There we actually have useful information about object sizes. Note: this patch has them done for all iov_iter flavours. Right now we do them twice in iovec case, but that'll change very shortly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0aed55af |
|
29-May-2017 |
Dan Williams <dan.j.williams@intel.com> |
x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
27c0e374 |
|
17-Feb-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
[iov_iter] new privimitive: iov_iter_revert() opposite to iov_iter_advance(); the caller is responsible for never using it to move back past the initial position. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cbbd26b8 |
|
01-Nov-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
[iov_iter] new primitives - copy_from_iter_full() and friends copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d3849953 |
|
01-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
fs: decouple READ and WRITE from the block layer ops Move READ and WRITE to kernel.h and don't define them in terms of block layer ops; they are our generic data direction indicators these days and have no more resemblance with the block layer ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
b57332b4 |
|
10-Oct-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
constify iov_iter_count() and iter_is_iovec() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
241699cd |
|
22-Sep-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
new iov_iter flavour: pipe-backed iov_iter variant for passing data into pipe. copy_to_iter() copies data into page(s) it has allocated and stuffs them into the pipe; copy_page_to_iter() stuffs there a reference to the page given to it. Both will try to coalesce if possible. iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages() and friends will do as copy_to_iter() would have and return the pages where the data would've been copied. iov_iter_advance() will truncate everything past the spot it has advanced to. New primitive: iov_iter_pipe(), used for initializing those. pipe should be locked all along. Running out of space acts as fault would for iovec-backed ones; in other words, giving it to ->read_iter() may result in short read if the pipe overflows, or -EFAULT if it happens with nothing copied there. In other words, ->read_iter() on those acts pretty much like ->splice_read(). Moreover, all generic_file_splice_read() users, as well as many other ->splice_read() instances can be switched to that scheme - that'll happen in the next commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4bce9f6e |
|
17-Sep-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
get rid of separate multipage fault-in primitives * the only remaining callers of "short" fault-ins are just as happy with generic variants (both in lib/iov_iter.c); switch them to multipage variants, kill the "short" ones * rename the multipage variants to now available plain ones. * get rid of compat macro defining iov_iter_fault_in_multipage_readable by expanding it in its only user. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d4690f1e |
|
15-Sep-2016 |
Al Viro <viro@ZenIV.linux.org.uk> |
fix iov_iter_fault_in_readable() ... by turning it into what used to be multipages counterpart Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
357f435d |
|
08-Apr-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
fix the copy vs. map logics in blk_rq_map_user_iov() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
36f7a8a4 |
|
06-Dec-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: constify {csum_and_,}copy_to_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bd8e0ff9 |
|
17-Mar-2015 |
Omar Sandoval <osandov@osandov.com> |
new helper: iov_iter_rw() Get either READ or WRITE out of iter->type. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
171a0203 |
|
11-Mar-2015 |
Anton Altaparmakov <anton@tuxera.com> |
VFS: Add iov_iter_fault_in_multipages_readable() simillar to iov_iter_fault_in_readable() but differs in that it is not limited to faulting in the first iovec and instead faults in "bytes" bytes iterating over the iovecs as necessary. Also, instead of only faulting in the first and last page of the range, all pages are faulted in. This function is needed by NTFS when it does multi page file writes. Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bc917be8 |
|
21-Mar-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
saner iov_iter initialization primitives iovec-backed iov_iter instances are assumed to satisfy several properties: * no more than UIO_MAXIOV elements in iovec array * total size of all ranges is no more than MAX_RW_COUNT * all ranges pass access_ok(). The problem is, invariants of data structures should be established in the primitives creating those data structures, not in the code using those primitives. And iov_iter_init() violates that principle. For a while we managed to get away with that, but once the use of iov_iter started to spread, it didn't take long for shit to hit the fan - missed check in sys_sendto() had introduced a roothole. We _do_ have primitives for importing and validating iovecs (both native and compat ones) and those primitives are almost always followed by shoving the resulting iovec into iov_iter. Life would be considerably simpler (and safer) if we combined those primitives with initializing iov_iter. That gives us two new primitives - import_iovec() and compat_import_iovec(). Calling conventions: iovec = iov_array; err = import_iovec(direction, uvec, nr_segs, ARRAY_SIZE(iov_array), &iovec, &iter); imports user vector into kernel space (into iov_array if it fits, allocated if it doesn't fit or if iovec was NULL), validates it and sets iter up to refer to it. On success 0 is returned and allocated kernel copy (or NULL if the array had fit into caller-supplied one) is returned via iovec. On failure all allocations are undone and -E... is returned. If the total size of ranges exceeds MAX_RW_COUNT, the excess is silently truncated. compat_import_iovec() expects uvec to be a pointer to user array of compat_iovec; otherwise it's identical to import_iovec(). Finally, import_single_range() sets iov_iter backed by single-element iovec covering a user-supplied range - err = import_single_range(direction, address, size, iovec, &iter); does validation and sets iter up. Again, size in excess of MAX_RW_COUNT gets silently truncated. Next commits will be switching the things up to use of those and reducing the amount of iov_iter_init() instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4b8164b9 |
|
31-Jan-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: dup_iter() Copy iter and kmemdup the underlying array for the copy. Returns a pointer to result of kmemdup() to be kfree()'d later. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
57dd8a07 |
|
10-Dec-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
vhost: vhost_scsi_handle_vq() should just use copy_from_user() it has just verified that it asks no more than the length of the first segment of iovec. And with that the last user of stuff in lib/iovec.c is gone. RIP. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ba7438ae |
|
10-Dec-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend() Cc: Michael S. Tsirkin <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aad9a1ce |
|
10-Dec-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec() Cc: Michael S. Tsirkin <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
05afcb77 |
|
22-Jan-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iov_iter_bvec() similar to iov_iter_kvec(), for ITER_BVEC ones Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
777eda2c |
|
17-Dec-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iter_is_iovec() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
218321e7 |
|
24-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
bury memcpy_toiovec() no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
aa583096 |
|
27-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
copy_from_iter_nocache() BTW, do we want memcpy_nocache()? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
abb78f87 |
|
24-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iov_iter_kvec() initialization of kvec-backed iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a604ec7e |
|
23-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
csum_and_copy_..._iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a280455f |
|
27-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter.c: handle ITER_KVEC directly ... without bothering with copy_..._user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c35e0248 |
|
01-Aug-2014 |
Matthew Wilcox <willy@infradead.org> |
Add copy_to_iter(), copy_from_iter() and iov_iter_zero() For DAX, we want to be able to copy between iovecs and kernel addresses that don't necessarily have a struct page. This is a fairly simple rearrangement for bvec iters to kmap the pages outside and pass them in, but for user iovecs it gets more complicated because we might try various different ways to kmap the memory. Duplicating the existing logic works out best in this case. We need to be able to write zeroes to an iovec for reads from unwritten ranges in a file. This is performed by the new iov_iter_zero() function, again patterned after the existing code that handles iovec iterators. [AV: and export the buggers...] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2c80929c |
|
24-Sep-2014 |
Miklos Szeredi <mszeredi@suse.cz> |
fuse: honour max_read and max_write in direct_io mode The third argument of fuse_get_user_pages() "nbytesp" refers to the number of bytes a caller asked to pack into fuse request. This value may be lesser than capacity of fuse request or iov_iter. So fuse_get_user_pages() must ensure that *nbytesp won't grow. Now, when helper iov_iter_get_pages() performs all hard work of extracting pages from iov_iter, it can be done by passing properly calculated "maxsize" to the helper. The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need this capability, so pass LONG_MAX as the maxsize argument here. Fixes: c9c37e2e6378 ("fuse: switch to iov_iter_get_pages()") Reported-by: Werner Baumann <werner.baumann@onlinehome.de> Tested-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c7f3888a |
|
18-Jun-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
switch iov_iter_get_pages() to passing maximal number of pages ... instead of maximal size. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ac5ccdba |
|
19-Jun-2014 |
Michael S. Tsirkin <mst@redhat.com> |
iovec: move memcpy_from/toiovecend to lib/iovec.c ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined! commit 9f977ef7b671f6169eca78bf40f230fe84b7c7e5 vhost-scsi: Include prot_bytes into expected data transfer length in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend(). This function is not available when CONFIG_NET is not enabled. socket.h already includes uio.h, so no callers need updating. Reported-by: Randy Dunlap <rdunlap@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
#
0b86dbf6 |
|
23-Jun-2014 |
Al Viro <viro@ZenIV.linux.org.uk> |
Fix 32-bit regression in block device read(2) blkdev_read_iter() wants to cap the iov_iter by the amount of data remaining to the end of device. That's what iov_iter_truncate() is for (trim iter->count if it's above the given limit). So far, so good, but the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in case of a large device) we end up with that upper limit truncated down to 32 bits *before* comparing it with iter->count. Easily fixed by making iov_iter_truncate() take 64bit argument - it does the right thing after such change (we only reach the assignment in there when the current value of iter->count is greater than the limit, i.e. for anything that would get truncated we don't reach the assignment at all) and that argument is not the new value of iter->count - it's an upper limit for such. The overhead of passing u64 is not an issue - the thing is inlined, so callers passing size_t won't pay any penalty. Reported-and-tested-by: Theodore Tso <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
62a8067a |
|
04-Apr-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
bio_vec-backed iov_iter New variant of iov_iter - ITER_BVEC in iter->type, backed with bio_vec array instead of iovec one. Primitives taught to deal with such beasts, __swap_write() switched to using that kind of iov_iter. Note that bio_vec is just a <page, offset, length> triple - there's nothing block-specific about it. I've left the definition where it was, but took it from under ifdef CONFIG_BLOCK. Next target: ->splice_write()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b42b15fd |
|
03-Apr-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
lustre: get rid of messing with iovecs * switch to ->read_iter/->write_iter * keep a pointer to iov_iter instead of iov/nr_segs * do not modify iovecs; use iov_iter_truncate()/iov_iter_advance() and a new primitive - iov_iter_reexpand() (expand previously truncated iterator) istead. * (racy) check for lustre VMAs intersecting with iovecs kept for now as for_each_iov() loop. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f0d1bec9 |
|
03-Apr-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: copy_page_from_iter() parallel to copy_page_to_iter(). pipe_write() switched to it (and became ->write_iter()). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0c949334 |
|
22-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter_truncate() Now It Can Be Done(tm) - we don't need to do iov_shorten() in generic_file_direct_write() anymore, now that all ->direct_IO() instances are converted to proper iov_iter methods and honour iter->count and iter->iov_offset properly. Get rid of count/ocount arguments of generic_file_direct_write(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
91f79c43 |
|
21-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iov_iter_get_pages_alloc() same as iov_iter_get_pages(), except that pages array is allocated (kmalloc if possible, vmalloc if that fails) and left for caller to free. Lustre and NFS ->direct_IO() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f67da30c |
|
18-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iov_iter_npages() counts the pages covered by iov_iter, up to given limit. do_block_direct_io() and fuse_iter_npages() switched to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7b2c99d1 |
|
15-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: iov_iter_get_pages() iov_iter_get_pages(iter, pages, maxsize, &start) grabs references pinning the pages of up to maxsize of (contiguous) data from iter. Returns the amount of memory grabbed or -error. In case of success, the requested area begins at offset start in pages[0] and runs through pages[1], etc. Less than requested amount might be returned - either because the contiguous area in the beginning of iterator is smaller than requested, or because the kernel failed to pin that many pages. direct-io.c switched to using iov_iter_get_pages() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
71d8e532 |
|
05-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
start adding the tag to iov_iter For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
886a3911 |
|
05-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
new primitive: iov_iter_alignment() returns the value aligned as badly as the worst remaining segment in iov_iter is. Use instead of open-coded equivalents. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e7c24607 |
|
10-Apr-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
kill iov_iter_copy_from_user() all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6e58e79d |
|
03-Feb-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read() generic_file_aio_read() was looping over the target iovec, with loop over (source) pages nested inside that. Just set an iov_iter up and pass *that* to do_generic_file_aio_read(). With copy_page_to_iter() doing all work of mapping and copying a page to iovec and advancing iov_iter. Switch shmem_file_aio_read() to the same and kill file_read_actor(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
92236878 |
|
27-Nov-2013 |
Kent Overstreet <kmo@daterainc.com> |
iov_iter: Move iov_iter to uio.h Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
#
d2f83e90 |
|
16-May-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
Hoist memcpy_fromiovec/memcpy_toiovec into lib/ ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined! That function is only present with CONFIG_NET. Turns out that crypto/algif_skcipher.c also uses that outside net, but it actually needs sockets anyway. In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist that function and revert that commit too. socket.h already includes uio.h, so no callers need updating; trying only broke things fo x86_64 randconfig (thanks Fengguang!). Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
#
607ca46e |
|
13-Oct-2012 |
David Howells <dhowells@redhat.com> |
UAPI: (Scripted) Disintegrate include/linux Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
|
#
812ed032 |
|
29-Jul-2009 |
Jiri Slaby <jirislaby@kernel.org> |
uio: mark uio.h functions __KERNEL__ only To avoid userspace build failures such as: .../linux/uio.h:37: error: expected `=', `,', `;', `asm' or `__attribute__' before `iov_length' .../linux/uio.h:47: error: expected declaration specifiers or `...' before `size_t' move uio functions inside a __KERNEL__ block. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e0807061 |
|
16-Jul-2007 |
Christoph Hellwig <hch@lst.de> |
remove odd and misleading comments from uio.h Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|