Searched +hist:16 +hist:abef0e (Results 1 - 2 of 2) sorted by relevance
/linux-master/fs/ | ||
H A D | seq_file.c | diff 372904c0 Mon Nov 08 19:35:16 MST 2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com> seq_file: move seq_escape() to a header Move seq_escape() to the header as inliner, for a small kernel text size reduction. Link: https://lkml.kernel.org/r/20211001122917.67228-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff cc72181a Wed Jun 30 19:55:46 MDT 2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com> seq_file: drop unused *_escape_mem_ascii() There are no more users of the seq_escape_mem_ascii() followed by string_escape_mem_ascii(). Remove them for good. Link: https://lkml.kernel.org/r/20210504180819.73127-16-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff df561f66 Sun Aug 23 16:36:59 MDT 2020 Gustavo A. R. Silva <gustavoars@kernel.org> treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> diff 6a2aeab5 Tue Aug 13 16:37:44 MDT 2019 NeilBrown <neilb@suse.com> seq_file: fix problem when seeking mid-record If you use lseek or similar (e.g. pread) to access a location in a seq_file file that is within a record, rather than at a record boundary, then the first read will return the remainder of the record, and the second read will return the whole of that same record (instead of the next record). When seeking to a record boundary, the next record is correctly returned. This bug was introduced by a recent patch (identified below). Before that patch, seq_read() would increment m->index when the last of the buffer was returned (m->count == 0). After that patch, we rely on ->next to increment m->index after filling the buffer - but there was one place where that didn't happen. Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/ Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Sergei Turchanov <turchanov@farpost.com> Tested-by: Sergei Turchanov <turchanov@farpost.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=] fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> diff 1f4aace6 Fri Aug 17 16:44:41 MDT 2018 NeilBrown <neilb@suse.com> fs/seq_file.c: simplify seq_file iteration code and interface The documentation for seq_file suggests that it is necessary to be able to move the iterator to a given offset, however that is not the case. If the iterator is stored in the private data and is stable from one read() syscall to the next, it is only necessary to support first/next interactions. Implementing this in a client is a little clumsy. - if ->start() is given a pos of zero, it should go to start of sequence. - if ->start() is given the name pos that was given to the most recent next() or start(), it should restore the iterator to state just before that last call - if ->start is given another number, it should set the iterator one beyond the start just before the last ->start or ->next call. Also, the documentation says that the implementation can interpret the pos however it likes (other than zero meaning start), but seq_file increments the pos sometimes which does impose on the implementation. This patch simplifies the interface for first/next iteration and simplifies the code, while maintaining complete backward compatability. Now: - if ->start() is given a pos of zero, it should return an iterator placed at the start of the sequence - if ->start() is given a non-zero pos, it should return the iterator in the same state it was after the last ->start or ->next. This is particularly useful for interators which walk the multiple chains in a hash table, e.g. using rhashtable_walk*. See fs/gfs2/glock.c and drivers/staging/lustre/lustre/llite/vvp_dev.c A large part of achieving this is to *always* call ->next after ->show has successfully stored all of an entry in the buffer. Never just increment the index instead. Also: - always pass &m->index to ->start() and ->next(), never a temp variable - don't clear ->from when ->count is zero, as ->from is dead when ->count is zero. Some ->next functions do not increment *pos when they return NULL. To maintain compatability with this, we still need to increment m->index in one place, if ->next didn't increment it. Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps. This patch doesn't work around buggy next() functions for this case. [neilb@suse.com: ensure ->from is valid] Link: http://lkml.kernel.org/r/87601ryb8a.fsf@notabene.neil.brown.name Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Jonathan Corbet <corbet@lwn.net> [docs] Tested-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff d1be35cb Tue Apr 10 17:31:16 MDT 2018 Andrei Vagin <avagin@openvz.org> proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps seq_put_decimal_ull_w(m, str, val, width) prints a decimal number with a specified minimal field width. It is equivalent of seq_printf(m, "%s%*d", str, width, val), but it works much faster. == test_smaps.py num = 0 with open("/proc/1/smaps") as f: for x in xrange(10000): data = f.read() f.seek(0, 0) == == Before patch == $ time python test_smaps.py real 0m4.593s user 0m0.398s sys 0m4.158s == After patch == $ time python test_smaps.py real 0m3.828s user 0m0.413s sys 0m3.408s $ perf -g record python test_smaps.py == Before patch == - 79.01% 3.36% python [kernel.kallsyms] [k] show_smap.isra.33 - 75.65% show_smap.isra.33 + 48.85% seq_printf + 15.75% __walk_page_range + 9.70% show_map_vma.isra.23 0.61% seq_puts == After patch == - 75.51% 4.62% python [kernel.kallsyms] [k] show_smap.isra.33 - 70.88% show_smap.isra.33 + 24.82% seq_put_decimal_ull_w + 19.78% __walk_page_range + 12.74% seq_printf + 11.08% show_map_vma.isra.23 + 1.68% seq_puts [akpm@linux-foundation.org: fix drivers/of/unittest.c build] Link: http://lkml.kernel.org/r/20180212074931.7227-1-avagin@openvz.org Signed-off-by: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff a7c3e901 Mon May 08 16:57:09 MDT 2017 Michal Hocko <mhocko@suse.com> mm: introduce kv[mz]alloc helpers Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 088bf2ff Thu Aug 25 16:17:11 MDT 2016 Vegard Nossum <vegard.nossum@oracle.com> fs/seq_file: fix out-of-bounds read seq_read() is a nasty piece of work, not to mention buggy. It has (I think) an old bug which allows unprivileged userspace to read beyond the end of m->buf. I was getting these: BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880 Read of size 2713 by task trinity-c2/1329 CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: kasan_object_err+0x1c/0x80 kasan_report_error+0x2cb/0x7e0 kasan_report+0x4e/0x80 check_memory_region+0x13e/0x1a0 kasan_check_read+0x11/0x20 seq_read+0xcd2/0x1480 proc_reg_read+0x10b/0x260 do_loop_readv_writev.part.5+0x140/0x2c0 do_readv_writev+0x589/0x860 vfs_readv+0x7b/0xd0 do_readv+0xd8/0x2c0 SyS_readv+0xb/0x10 do_syscall_64+0x1b3/0x4b0 entry_SYSCALL64_slow_path+0x25/0x25 Object at ffff880116889100, in cache kmalloc-4096 size: 4096 Allocated: PID = 1329 save_stack_trace+0x26/0x80 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x1aa/0x4a0 seq_buf_alloc+0x35/0x40 seq_read+0x7d8/0x1480 proc_reg_read+0x10b/0x260 do_loop_readv_writev.part.5+0x140/0x2c0 do_readv_writev+0x589/0x860 vfs_readv+0x7b/0xd0 do_readv+0xd8/0x2c0 SyS_readv+0xb/0x10 do_syscall_64+0x1b3/0x4b0 return_from_SYSCALL_64+0x0/0x6a Freed: PID = 0 (stack is not available) Memory state around the buggy address: ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint This seems to be the same thing that Dave Jones was seeing here: https://lkml.org/lkml/2016/8/12/334 There are multiple issues here: 1) If we enter the function with a non-empty buffer, there is an attempt to flush it. But it was not clearing m->from after doing so, which means that if we try to do this flush twice in a row without any call to traverse() in between, we are going to be reading from the wrong place -- the splat above, fixed by this patch. 2) If there's a short write to userspace because of page faults, the buffer may already contain multiple lines (i.e. pos has advanced by more than 1), but we don't save the progress that was made so the next call will output what we've already returned previously. Since that is a much less serious issue (and I have a headache after staring at seq_read() for the past 8 hours), I'll leave that for now. Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> diff 37607102 Wed Sep 09 16:38:33 MDT 2015 Andy Shevchenko <andriy.shevchenko@linux.intel.com> seq_file: provide an analogue of print_hex_dump() This introduces a new helper and switches current users to use it. All patches are compiled tested. kmemleak is tested via its own test suite. This patch (of 6): The new seq_hex_dump() is a complete analogue of print_hex_dump(). We have few users of this functionality already. It allows to reduce their codebase. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Cc: Tadeusz Struk <tadeusz.struk@intel.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Tuchscherer <ingo.tuchscherer@de.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
H A D | read_write.c | diff 73065126 Thu Nov 30 07:16:24 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: use do_splice_direct() for nfsd/ksmbd server-side-copy nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to perform kernel copy between two files on any two filesystems. Splicing input file, while holding file_start_write() on the output file which is on a different sb, posses a risk for fanotify related deadlocks. We only need to call splice_file_range() from within the context of ->copy_file_range() filesystem methods with file_start_write() held. To avoid the possible deadlocks, always use do_splice_direct() instead of splice_file_range() for the kernel copy fallback in vfs_copy_file_range() without holding file_start_write(). Reported-and-tested-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231130141624.3338942-4-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> diff da40448c Thu Nov 30 07:16:23 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: move file_start_write() into direct_splice_actor() The callers of do_splice_direct() hold file_start_write() on the output file. This may cause file permission hooks to be called indirectly on an overlayfs lower layer, which is on the same filesystem of the output file and could lead to deadlock with fanotify permission events. To fix this potential deadlock, move file_start_write() from the callers into the direct_splice_actor(), so file_start_write() will not be held while splicing from the input file. Suggested-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231128214258.GA2398475@perftesting/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231130141624.3338942-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> diff 488e8f68 Thu Nov 30 07:16:22 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: fork splice_file_range() from do_splice_direct() In preparation of calling do_splice_direct() without file_start_write() held, create a new helper splice_file_range(), to be called from context of ->copy_file_range() methods instead of do_splice_direct(). Currently, the only difference is that splice_file_range() does not take flags argument and that it asserts that file_start_write() is held, but we factor out a common helper do_splice_direct_actor() that will be used later. Use the new helper from __ceph_copy_file_range(), that was incorrectly passing to do_splice_direct() the copy flags argument as splice flags. The value of copy flags in ceph is always 0, so it is a smenatic bug fix. Move the declaration of both helpers to linux/splice.h. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231130141624.3338942-2-amir73il@gmail.com Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> diff 3d5cd491 Wed Nov 22 05:27:14 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: create file_write_started() helper Convenience wrapper for sb_write_started(file_inode(inode)->i_sb)), which has a single occurrence in the code right now. Document the false negatives of those helpers, which makes them unusable to assert that sb_start_write() is not held. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-16-amir73il@gmail.com Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org> diff b8e1425b Sun Jul 16 05:47:14 MDT 2023 Amir Goldstein <amir73il@gmail.com> fs: move permission hook out of do_iter_read() We recently moved fsnotify hook, rw_verify_area() and other checks from do_iter_write() out to its two callers. for consistency, do the same thing for do_iter_read() - move the rw_verify_area() checks and fsnotify hook to the callers vfs_iter_read() and vfs_readv(). This aligns those vfs helpers with the pattern used in vfs_read() and vfs_iocb_iter_read() and the vfs write helpers, where all the checks are in the vfs helpers and the do_* or call_* helpers do the work. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-13-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> diff 1c8aa833 Sun Jul 16 05:47:14 MDT 2023 Amir Goldstein <amir73il@gmail.com> fs: move permission hook out of do_iter_write() In many of the vfs helpers, the rw_verity_area() checks are called before taking sb_start_write(), making them "start-write-safe". do_iter_write() is an exception to this rule. do_iter_write() has two callers - vfs_iter_write() and vfs_writev(). Move rw_verify_area() and other checks from do_iter_write() out to its callers to make them "start-write-safe". Move also the fsnotify_modify() hook to align with similar pattern used in vfs_write() and other vfs helpers. This is needed for fanotify "pre content" events. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231122122715.2561213-12-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> diff 95e49cf8 Wed Mar 29 09:16:45 MDT 2023 Jens Axboe <axboe@kernel.dk> iov_iter: add iter_iov_addr() and iter_iov_len() helpers These just return the address and length of the current iovec segment in the iterator. Convert existing iov_iter_iovec() users to use them instead of getting a copy of the current vec. Signed-off-by: Jens Axboe <axboe@kernel.dk> diff bdeb77bc Sat Jul 16 22:37:10 MDT 2022 Andrei Vagin <avagin@gmail.com> fs: sendfile handles O_NONBLOCK of out_fd sendfile has to return EAGAIN if out_fd is nonblocking and the write into it would block. Here is a small reproducer for the problem: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/sendfile.h> #define FILE_SIZE (1UL << 30) int main(int argc, char **argv) { int p[2], fd; if (pipe2(p, O_NONBLOCK)) return 1; fd = open(argv[1], O_RDWR | O_TMPFILE, 0666); if (fd < 0) return 1; ftruncate(fd, FILE_SIZE); if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) { fprintf(stderr, "FAIL\n"); } if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) { fprintf(stderr, "FAIL\n"); } return 0; } It worked before b964bf53e540, it is stuck after b964bf53e540, and it works again with this fix. This regression occurred because do_splice_direct() calls pipe_write that handles O_NONBLOCK. Here is a trace log from the reproducer: 1) | __x64_sys_sendfile64() { 1) | do_sendfile() { 1) | __fdget() 1) | rw_verify_area() 1) | __fdget() 1) | rw_verify_area() 1) | do_splice_direct() { 1) | rw_verify_area() 1) | splice_direct_to_actor() { 1) | do_splice_to() { 1) | rw_verify_area() 1) | generic_file_splice_read() 1) + 74.153 us | } 1) | direct_splice_actor() { 1) | iter_file_splice_write() { 1) | __kmalloc() 1) 0.148 us | pipe_lock(); 1) 0.153 us | splice_from_pipe_next.part.0(); 1) 0.162 us | page_cache_pipe_buf_confirm(); ... 16 times 1) 0.159 us | page_cache_pipe_buf_confirm(); 1) | vfs_iter_write() { 1) | do_iter_write() { 1) | rw_verify_area() 1) | do_iter_readv_writev() { 1) | pipe_write() { 1) | mutex_lock() 1) 0.153 us | mutex_unlock(); 1) 1.368 us | } 1) 1.686 us | } 1) 5.798 us | } 1) 6.084 us | } 1) 0.174 us | kfree(); 1) 0.152 us | pipe_unlock(); 1) + 14.461 us | } 1) + 14.783 us | } 1) 0.164 us | page_cache_pipe_buf_release(); ... 16 times 1) 0.161 us | page_cache_pipe_buf_release(); 1) | touch_atime() 1) + 95.854 us | } 1) + 99.784 us | } 1) ! 107.393 us | } 1) ! 107.699 us | } Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly") Signed-off-by: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff bdeb77bc Sat Jul 16 22:37:10 MDT 2022 Andrei Vagin <avagin@gmail.com> fs: sendfile handles O_NONBLOCK of out_fd sendfile has to return EAGAIN if out_fd is nonblocking and the write into it would block. Here is a small reproducer for the problem: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/sendfile.h> #define FILE_SIZE (1UL << 30) int main(int argc, char **argv) { int p[2], fd; if (pipe2(p, O_NONBLOCK)) return 1; fd = open(argv[1], O_RDWR | O_TMPFILE, 0666); if (fd < 0) return 1; ftruncate(fd, FILE_SIZE); if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) { fprintf(stderr, "FAIL\n"); } if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) { fprintf(stderr, "FAIL\n"); } return 0; } It worked before b964bf53e540, it is stuck after b964bf53e540, and it works again with this fix. This regression occurred because do_splice_direct() calls pipe_write that handles O_NONBLOCK. Here is a trace log from the reproducer: 1) | __x64_sys_sendfile64() { 1) | do_sendfile() { 1) | __fdget() 1) | rw_verify_area() 1) | __fdget() 1) | rw_verify_area() 1) | do_splice_direct() { 1) | rw_verify_area() 1) | splice_direct_to_actor() { 1) | do_splice_to() { 1) | rw_verify_area() 1) | generic_file_splice_read() 1) + 74.153 us | } 1) | direct_splice_actor() { 1) | iter_file_splice_write() { 1) | __kmalloc() 1) 0.148 us | pipe_lock(); 1) 0.153 us | splice_from_pipe_next.part.0(); 1) 0.162 us | page_cache_pipe_buf_confirm(); ... 16 times 1) 0.159 us | page_cache_pipe_buf_confirm(); 1) | vfs_iter_write() { 1) | do_iter_write() { 1) | rw_verify_area() 1) | do_iter_readv_writev() { 1) | pipe_write() { 1) | mutex_lock() 1) 0.153 us | mutex_unlock(); 1) 1.368 us | } 1) 1.686 us | } 1) 5.798 us | } 1) 6.084 us | } 1) 0.174 us | kfree(); 1) 0.152 us | pipe_unlock(); 1) + 14.461 us | } 1) + 14.783 us | } 1) 0.164 us | page_cache_pipe_buf_release(); ... 16 times 1) 0.161 us | page_cache_pipe_buf_release(); 1) | touch_atime() 1) + 95.854 us | } 1) + 99.784 us | } 1) ! 107.393 us | } 1) ! 107.699 us | } Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly") Signed-off-by: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> diff bdeb77bc Sat Jul 16 22:37:10 MDT 2022 Andrei Vagin <avagin@gmail.com> fs: sendfile handles O_NONBLOCK of out_fd sendfile has to return EAGAIN if out_fd is nonblocking and the write into it would block. Here is a small reproducer for the problem: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/sendfile.h> #define FILE_SIZE (1UL << 30) int main(int argc, char **argv) { int p[2], fd; if (pipe2(p, O_NONBLOCK)) return 1; fd = open(argv[1], O_RDWR | O_TMPFILE, 0666); if (fd < 0) return 1; ftruncate(fd, FILE_SIZE); if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) { fprintf(stderr, "FAIL\n"); } if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) { fprintf(stderr, "FAIL\n"); } return 0; } It worked before b964bf53e540, it is stuck after b964bf53e540, and it works again with this fix. This regression occurred because do_splice_direct() calls pipe_write that handles O_NONBLOCK. Here is a trace log from the reproducer: 1) | __x64_sys_sendfile64() { 1) | do_sendfile() { 1) | __fdget() 1) | rw_verify_area() 1) | __fdget() 1) | rw_verify_area() 1) | do_splice_direct() { 1) | rw_verify_area() 1) | splice_direct_to_actor() { 1) | do_splice_to() { 1) | rw_verify_area() 1) | generic_file_splice_read() 1) + 74.153 us | } 1) | direct_splice_actor() { 1) | iter_file_splice_write() { 1) | __kmalloc() 1) 0.148 us | pipe_lock(); 1) 0.153 us | splice_from_pipe_next.part.0(); 1) 0.162 us | page_cache_pipe_buf_confirm(); ... 16 times 1) 0.159 us | page_cache_pipe_buf_confirm(); 1) | vfs_iter_write() { 1) | do_iter_write() { 1) | rw_verify_area() 1) | do_iter_readv_writev() { 1) | pipe_write() { 1) | mutex_lock() 1) 0.153 us | mutex_unlock(); 1) 1.368 us | } 1) 1.686 us | } 1) 5.798 us | } 1) 6.084 us | } 1) 0.174 us | kfree(); 1) 0.152 us | pipe_unlock(); 1) + 14.461 us | } 1) + 14.783 us | } 1) 0.164 us | page_cache_pipe_buf_release(); ... 16 times 1) 0.161 us | page_cache_pipe_buf_release(); 1) | touch_atime() 1) + 95.854 us | } 1) + 99.784 us | } 1) ! 107.393 us | } 1) ! 107.699 us | } Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com Fixes: b964bf53e540 ("teach sendfile(2) to handle send-to-pipe directly") Signed-off-by: Andrei Vagin <avagin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
Completed in 221 milliseconds