#
44981351 |
|
02-Feb-2024 |
Bart Van Assche <bvanassche@acm.org> |
block, fs: Restore the per-bio/request data lifetime fields Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit b179c98f7697 ("block: Remove request.write_hint") and commit c75e707fe1aa ("block: remove the per-bio/request write hint"). This patch does not modify the size of struct bio because the new bi_write_hint member fills a hole in struct bio. pahole reports the following for struct bio on an x86_64 system with this patch applied: /* size: 112, cachelines: 2, members: 20 */ /* sum members: 110, holes: 1, sum holes: 2 */ /* last cacheline: 48 bytes */ Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-7-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
8c052fb3 |
|
08-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: support IOCB_DIO_CALLER_COMP If IOCB_DIO_CALLER_COMP is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7b3c14d1 |
|
19-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: add IOMAP_DIO_INLINE_COMP Rather than gate whether or not we need to punt a dio completion to a workqueue on whether the IO is a write or not, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not. No functional changes in this patch. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
daa99c5a |
|
20-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: only set iocb->private for polled bio iocb->private is only used for polled IO, where the completer will find the bio to poll through that field. Assign it when we're submitting a polled bio, and get rid of the dio->poll_bio indirection. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3a0be38c |
|
19-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: treat a write through cache the same as FUA Whether we have a write back cache and are using FUA or don't have a write back cache at all is the same situation. Treat them the same. This makes the IOMAP_DIO_WRITE_FUA name a bit misleading, as we have two cases that provide stable writes: 1) Volatile write cache with FUA writes 2) Normal write without a volatile write cache Rename that flag to IOMAP_DIO_STABLE_WRITE to make that clearer, and update some of the FUA comments as well. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
44842f64 |
|
22-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: use an unsigned type for IOMAP_DIO_* defines IOMAP_DIO_DIRTY shifts by 31 bits, which makes UBSAN unhappy. Clean up all the defines by making the shifted value an unsigned value. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3486237c |
|
19-Jul-2023 |
Jens Axboe <axboe@kernel.dk> |
iomap: cleanup up iomap_dio_bio_end_io() Make the logic a bit easier to follow: 1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt). No functional changes in this patch. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8ee93b4b |
|
01-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Use the common helpers for direct I/O page invalidation instead of open coding the logic. This leads to a slight reordering of checks in __iomap_dio_rw to keep the logic straight. Link: https://lkml.kernel.org/r/20230601145904.1385409-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
c402a9a9 |
|
01-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
filemap: add a kiocb_invalidate_post_direct_write helper Add a helper to invalidate page cache after a dio write. Link: https://lkml.kernel.org/r/20230601145904.1385409-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
936e114a |
|
01-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
iomap: update ki_pos a little later in iomap_dio_complete Move the ki_pos update down a bit to prepare for a better common helper that invalidates pages based of an iocb. Link: https://lkml.kernel.org/r/20230601145904.1385409-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
e51bab4e |
|
22-May-2023 |
Christoph Hellwig <hch@lst.de> |
block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Jan Kara <jack@suse.cz> cc: Matthew Wilcox <willy@infradead.org> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-block@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-4-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a450f497 |
|
22-May-2023 |
David Howells <dhowells@redhat.com> |
iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing ZERO_PAGE can't go away, no need to hold an extra reference. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230522205744.2825689-2-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3fd41721 |
|
21-Apr-2023 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
iomap: Add DIO tracepoints Add trace_iomap_dio_rw_begin, trace_iomap_dio_rw_queued and trace_iomap_dio_complete tracepoint. trace_iomap_dio_rw_queued is mostly only to know that the request was queued and -EIOCBQUEUED was returned. It is mostly trace_iomap_dio_rw_begin & trace_iomap_dio_complete which has all the details. <example output log> a.out-2073 [006] 134.225717: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x0 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags DIO_FORCE_WAIT aio 1 a.out-2073 [006] 134.226234: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2074 [006] 136.225975: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2074 [006] 136.226173: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/3-31 [003] 136.226389: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2075 [003] 141.674969: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags aio 1 a.out-2075 [003] 141.676085: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 kworker/2:0-27 [002] 141.676432: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096 a.out-2077 [006] 143.443746: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 1 a.out-2077 [006] 143.443866: iomap_dio_rw_queued: dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0 ksoftirqd/5-41 [005] 143.444134: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096 a.out-2078 [007] 146.716833: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2078 [007] 146.717639: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 a.out-2079 [006] 148.972605: iomap_dio_rw_begin: dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags aio 0 a.out-2079 [006] 148.973099: iomap_dio_complete: dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: line up strings all prettylike] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
d3bff1fc |
|
21-Apr-2023 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
iomap: Remove IOMAP_DIO_NOSYNC unused dio flag IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for aio dsync writes this is not useful anyway. For aio dsync case, we we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC doesn't let iomap_dio_complete() to call generic_write_sync(), hence we may lose the sync write. Hence kill this flag as it is not in use by any FS now. Tested-by: Disha Goel <disgoel@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
8e81aa16 |
|
20-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
iomap: remove IOMAP_F_ZONE_APPEND No users left now that btrfs takes REQ_OP_WRITE bios from iomap and splits and converts them to REQ_OP_ZONE_APPEND internally. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
f1bd37a4 |
|
14-Nov-2022 |
Keith Busch <kbusch@kernel.org> |
iomap: directly use logical block size Don't transform the logical block size to a bit shift only to shift it back to the original block size. Just use the size. Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
fcb14cb1 |
|
22-May-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
new iov_iter flavour - ITER_UBUF Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
91b94c5d |
|
22-May-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC New helper to be used instead of direct checks for IOCB_DSYNC: iocb_is_dsync(iocb). Checks converted, which allows to avoid the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines) from iocb_flags() - it's checked in iocb_is_dsync() instead Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
36518b6b |
|
07-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
teach iomap_dio_rw() to suppress dsync New flag, equivalent to removal of IOCB_DSYNC from iocb flags. This mimics what btrfs is doing (and that's what btrfs will switch to). However, I'm not at all sure that we want to suppress REQ_FUA for those - all btrfs hack really cares about is suppression of generic_write_sync(). For now let's keep the existing behaviour, but I really want to hear more detailed arguments pro or contra. [folded brain fix from willy] Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
dbd4eb81 |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
fs/iomap: Use the new blk_opf_t type Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for the combination of a request operation and request flags. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-56-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bf8d0853 |
|
10-Jun-2022 |
Keith Busch <kbusch@kernel.org> |
iomap: add support for dma aligned direct-io Use the address alignment requirements from the block_device for direct io instead of requiring addresses be aligned to the block size. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-12-kbusch@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
786f847f |
|
05-May-2022 |
Christoph Hellwig <hch@lst.de> |
iomap: add per-iomap_iter private data Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
908c5490 |
|
05-May-2022 |
Christoph Hellwig <hch@lst.de> |
iomap: allow the file system to provide a bio_set for direct I/O Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use. To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
9650b453 |
|
20-Apr-2022 |
Ming Lei <ming.lei@redhat.com> |
block: ignore RWF_HIPRI hint for sync dio So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed from userspace sync io interface, then block layer tries to poll until the bio is completed. But the current implementation calls blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or timeout easily. But looks no one reports this kind of issue, which should have been triggered in normal io poll sanity test or blktests block/007 as observed by Changhui, that means it is very likely that no one uses it or no one cares it. Also after io_uring is invented, io poll for sync dio becomes legacy interface. So ignore RWF_HIPRI hint for sync dio. CC: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Reported-by: Changhui Zhong <czhong@redhat.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a557e82e |
|
14-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: add a bdev_fua helper Add a helper to check the FUA flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c75e707f |
|
04-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove the per-bio/request write hint With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
489734ef |
|
28-Jan-2022 |
Eric Biggers <ebiggers@google.com> |
iomap: support direct I/O with fscrypt using blk-crypto Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Add support for this to the iomap DIO implementation by calling fscrypt_set_bio_crypt_ctx() to set encryption contexts on the bios. Don't check for the rare case where a DUN (crypto data unit number) discontiguity creates a boundary that bios must not cross. Instead, filesystems are expected to handle this in ->iomap_begin() by limiting the length of the mapping so that iomap doesn't have to worry about it. Co-developed-by: Satya Tangirala <satyat@google.com> Signed-off-by: Satya Tangirala <satyat@google.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220128233940.79464-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
|
#
07888c66 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4bdcd1dd |
|
28-Oct-2021 |
Jens Axboe <axboe@kernel.dk> |
mm: move filemap_range_needs_writeback() into header No functional changes in this patch, just in preparation for efficiently calling this light function from the block O_DIRECT handling. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4fdccaa0 |
|
23-Jul-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Add done_before argument to iomap_dio_rw Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
97308f8b |
|
22-Jul-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Support partial direct I/O on user copy failures In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and return a partial result. This allows the caller to deal with the page fault and retry the remainder of the request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
42c498c1 |
|
15-Jul-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Fix iomap_dio_rw return value for user copies When a user copy fails in one of the helpers of iomap_dio_rw, fail with -EFAULT instead of returning 0. This matches what iomap_dio_bio_actor returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these changes, iomap_dio_actor now consistently fails with -EFAULT when a user page cannot be faulted in. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
6b19b766 |
|
21-Oct-2021 |
Jens Axboe <axboe@kernel.dk> |
fs: get rid of the res2 iocb->ki_complete argument The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5a72e899 |
|
12-Oct-2021 |
Jens Axboe <axboe@kernel.dk> |
block: add a struct io_comp_batch argument to fops->iopoll() struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3e08773c |
|
12-Oct-2021 |
Christoph Hellwig <hch@lst.de> |
block: switch polling to be bio based Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ef99b2d3 |
|
12-Oct-2021 |
Christoph Hellwig <hch@lst.de> |
block: replace the spin argument to blk_iopoll with a flags argument Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f79d4749 |
|
12-Oct-2021 |
Christoph Hellwig <hch@lst.de> |
iomap: don't try to poll multi-bio I/Os in __iomap_dio_rw If an iocb is split into multiple bios we can't poll for both. So don't bother to even try to poll in that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a6d3d495 |
|
10-Aug-2021 |
Christoph Hellwig <hch@lst.de> |
iomap: switch __iomap_dio_rw to use iomap_iter Switch __iomap_dio_rw to use iomap_iter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
69f4a26c |
|
03-Aug-2021 |
Gao Xiang <hsiangkao@linux.alibaba.com> |
iomap: support reading inline data from non-zero pos The existing inline data support only works for cases where the entire file is stored as inline data. For larger files, EROFS stores the initial blocks separately and the remainder of the file ("file tail") adjacent to the inode. Generalise inline data to allow reading the inline file tail. Tails may not cross a page boundary in memory. We currently have no filesystems that support tails and writing, so that case is currently disabled (see iomap_write_begin_inline). Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
985b71db |
|
29-Apr-2021 |
Jens Axboe <axboe@kernel.dk> |
iomap: use filemap_range_needs_writeback() for O_DIRECT reads For reads, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a8affc03 |
|
10-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
block: rename BIO_MAX_PAGES to BIO_MAX_VECS Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c3b0e880 |
|
04-Feb-2021 |
Naohiro Aota <naohiro.aota@wdc.com> |
iomap: support REQ_OP_ZONE_APPEND A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
3e1a88ec |
|
09-Jan-2021 |
Pavel Begunkov <asml.silence@gmail.com> |
bio: add a helper calculating nr segments to alloc Add a helper function calculating the number of bvec segments we need to allocate to construct a bio. It doesn't change anything functionally, but will be used to not duplicate special cases in the future. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
213f6271 |
|
23-Jan-2021 |
Christoph Hellwig <hch@lst.de> |
iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag Add a flag to signal that only pure overwrites are allowed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2f632965 |
|
23-Jan-2021 |
Christoph Hellwig <hch@lst.de> |
iomap: pass a flags argument to iomap_dio_rw Pass a set of flags to iomap_dio_rw instead of the boolean wait_for_completion argument. The IOMAP_DIO_FORCE_WAIT flag replaces the wait_for_completion, but only needs to be passed when the iocb isn't synchronous to start with to simplify the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> [djwong: rework xfs_file.c so that we can push iomap changes separately] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
5724be5d |
|
23-Jan-2021 |
Christoph Hellwig <hch@lst.de> |
iomap: rename the flags variable in __iomap_dio_rw Rename flags to iomap_flags to make the usage a little more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
1a31182e |
|
28-Sep-2020 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
iomap: Call inode_dio_end() before generic_write_sync() iomap complete routine can deadlock with btrfs_fallocate because of the call to generic_write_sync(). P0 P1 inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) __iomap_dio_rw() inode_lock() <block> <submits IO> <completes IO> inode_unlock() <gets inode_lock()> inode_dio_wait() iomap_dio_complete() generic_write_sync() btrfs_file_fsync() inode_lock() <deadlock> inode_dio_end() is used to notify the end of DIO data in order to synchronize with truncate. Call inode_dio_end() before calling generic_write_sync(), so filesystems can lock i_rwsem during a sync. This matches the way it is done in fs/direct-io.c:dio_complete(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c3d4ed1a |
|
28-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
iomap: Allow filesystem to call iomap_dio_complete without i_rwsem This is to avoid the deadlock caused in btrfs because of O_DIRECT | O_DSYNC. Filesystems such as btrfs require i_rwsem while performing sync on a file. iomap_dio_rw() is called under i_rw_sem. This leads to a deadlock because of: iomap_dio_complete() generic_write_sync() btrfs_sync_file() Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems can call iomap_dio_complete() after unlocking i_rwsem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c114bbc6 |
|
10-Sep-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
iomap: Fix direct I/O write consistency check When a direct I/O write falls back to buffered I/O entirely, dio->size will be 0 in iomap_dio_complete. Function invalidate_inode_pages2_range will try to invalidate the rest of the address space. If there are any dirty pages in that range, the write will fail and a "Page cache invalidation failure on direct I/O" error will be logged. On gfs2, this can be reproduced as follows: xfs_io \ -c "open -ft foo" -c "pwrite 4k 4k" -c "close" \ -c "open -d foo" -c "pwrite 0 4k" Fix this by recognizing 0-length writes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a805c111 |
|
10-Sep-2020 |
Qian Cai <cai@lca.pw> |
iomap: fix WARN_ON_ONCE() from unprivileged users It is trivial to trigger a WARN_ON_ONCE(1) in iomap_dio_actor() by unprivileged users which would taint the kernel, or worse - panic if panic_on_warn or panic_on_taint is set. Hence, just convert it to pr_warn_ratelimited() to let users know their workloads are racing. Thank Dave Chinner for the initial analysis of the racing reproducers. Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
60263d58 |
|
23-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
iomap: fall back to buffered writes for invalidation failures Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2 Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
|
#
54752de9 |
|
23-Jul-2020 |
Dave Chinner <dchinner@redhat.com> |
iomap: Only invalidate page cache pages on direct IO writes The historic requirement for XFS to invalidate cached pages on direct IO reads has been lost in the twisty pages of history - it was inherited from Irix, which implemented page cache invalidation on read as a method of working around problems synchronising page cache state with uncached IO. XFS has carried this ever since. In the initial linux ports it was necessary to get mmap and DIO to play "ok" together and not immediately corrupt data. This was the state of play until the linux kernel had infrastructure to track unwritten extents and synchronise page faults with allocations and unwritten extent conversions (->page_mkwrite infrastructure). IOws, the page cache invalidation on DIO read was necessary to prevent trivial data corruptions. This didn't solve all the problems, though. There were peformance problems if we didn't invalidate the entire page cache over the file on read - we couldn't easily determine if the cached pages were over the range of the IO, and invalidation required taking a serialising lock (i_mutex) on the inode. This serialising lock was an issue for XFS, as it was the only exclusive lock in the direct Io read path. Hence if there were any cached pages, we'd just invalidate the entire file in one go so that subsequent IOs didn't need to take the serialising lock. This was a problem that prevented ranged invalidation from being particularly useful for avoiding the remaining coherency issues. This was solved with the conversion of i_mutex to i_rwsem and the conversion of the XFS inode IO lock to use i_rwsem. Hence we could now just do ranged invalidation and the performance problem went away. However, page cache invalidation was still needed to serialise sub-page/sub-block zeroing via direct IO against buffered IO because bufferhead state attached to the cached page could get out of whack when direct IOs were issued. We've removed bufferheads from the XFS code, and we don't carry any extent state on the cached pages anymore, and so this problem has gone away, too. IOWs, it would appear that we don't have any good reason to be invalidating the page cache on DIO reads anymore. Hence remove the invalidation on read because it is unnecessary overhead, not needed to maintain coherency between mmap/buffered access and direct IO anymore, and prevents anyone from using direct IO reads from intentionally invalidating the page cache of a file. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3ad99bec |
|
30-Nov-2019 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
iomap: remove lockdep_assert_held() Filesystems such as btrfs can perform direct I/O without holding the inode->i_rwsem in some of the cases like writing within i_size. So, remove the check for lockdep_assert_held() in iomap_dio_rw(). Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
8cecd0ba |
|
14-May-2019 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
iomap: add a filesystem hook for direct I/O bio submission This helps filesystems to perform tasks on the bio while submitting for I/O. This could be post-write operations such as data CRC or data replication for fs-handled RAID. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
|
#
e6249cdd |
|
02-May-2020 |
Ming Lei <ming.lei@redhat.com> |
block: add blk_io_schedule() for avoiding task hung in sync dio Sync dio could be big, or may take long time in discard or in case of IO failure. We have prevented task hung in submit_bio_wait() and blk_execute_rq(), so apply the same trick for prevent task hung from happening in sync dio. Add helper of blk_io_schedule() and use io_schedule_timeout() to prevent task hung warning. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Cc: Salman Qazi <sqazi@google.com> Cc: Jesse Barnes <jsbarnes@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d9973ce2 |
|
18-Mar-2020 |
yangerkun <yangerkun@huawei.com> |
iomap: fix comments in iomap_dio_rw Double 'three' exists in the comments of iomap_dio_rw. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
88cfd30e |
|
26-Nov-2019 |
Johannes Thumshirn <jthumshirn@suse.de> |
iomap: remove unneeded variable in iomap_dio_rw() The 'start' variable indicates the start of a filemap and is set to the iocb's position, which we have already cached as 'pos', upon function entry. 'pos' is used as a cursor indicating the current position and updated later in iomap_dio_rw(), but not before the last use of 'start'. Remove 'start' as it's synonym for 'pos' before we're entering the loop calling iomapp_apply(). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f550ee9b |
|
26-Nov-2019 |
Jan Kara <jack@suse.cz> |
iomap: Do not create fake iter in iomap_dio_bio_actor() iomap_dio_bio_actor() copies iter to a local variable and then limits it to a file extent we have mapped. When IO is submitted, iomap_dio_bio_actor() advances the original iter while the copied iter is advanced inside bio_iov_iter_get_pages(). This logic is non-obvious especially because both iters still point to same shared structures (such as pipe info) so if iov_iter_advance() changes anything in the shared structure, this scheme breaks. Let's just truncate and reexpand the original iter as needed instead of playing games with copying iters and keeping them in sync. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
419e9c38 |
|
21-Nov-2019 |
Jan Kara <jack@suse.cz> |
iomap: Fix pipe page leakage during splicing When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages because bio_iov_iter_get_pages() records that the pipe will have full extent worth of data however if file size is not block size aligned iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up and splice code gets confused leaking a pipe page with the file tail. Handle the situation similarly to the old direct IO implementation and revert iter to actually returned read amount which makes iter consistent with value returned from iomap_dio_rw() and thus the splice code is happy. Fixes: ff6a9292e6f6 ("iomap: implement direct I/O") CC: stable@vger.kernel.org Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e9f930ac |
|
11-Nov-2019 |
Jan Stancek <jstancek@redhat.com> |
iomap: fix return value of iomap_dio_bio_actor on 32bit systems Naresh reported LTP diotest4 failing for 32bit x86 and arm -next kernels on ext4. Same problem exists in 5.4-rc7 on xfs. The failure comes down to: openat(AT_FDCWD, "testdata-4.5918", O_RDWR|O_DIRECT) = 4 mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7b000 read(4, 0xb7f7b000, 4096) = 0 // expects -EFAULT Problem is conversion at iomap_dio_bio_actor() return. Ternary operator has a return type and an attempt is made to convert each of operands to the type of the other. In this case "ret" (int) is converted to type of "copied" (unsigned long). Both have size of 4 bytes: size_t copied = 0; int ret = -14; long long actor_ret = copied ? copied : ret; On x86_64: actor_ret == -14; On x86 : actor_ret == 4294967282 Replace ternary operator with 2 return statements to avoid this unwanted conversion. Fixes: 4721a6010990 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a9010042 |
|
29-Oct-2019 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
fs/iomap: remove redundant check in iomap_dio_rw() We've already check if it is READ iov_iter, no need check again. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c039b997 |
|
18-Oct-2019 |
Goldwyn Rodrigues <rgoldwyn@suse.com> |
iomap: use a srcmap for a read-modify-write I/O The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
|
#
13ef9544 |
|
15-Oct-2019 |
Jan Kara <jack@suse.cz> |
iomap: Allow forcing of waiting for running DIO in iomap_dio_rw() Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
838c4f3d |
|
19-Sep-2019 |
Christoph Hellwig <hch@lst.de> |
iomap: move the iomap_dio_rw ->end_io callback into a structure Add a new iomap_dio_ops structure that for now just contains the end_io handler. This avoid storing the function pointer in a mutable structure, which is a possible exploit vector for kernel code execution, and prepares for adding a submit_io handler that btrfs needs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6fe7b990 |
|
19-Sep-2019 |
Matthew Bobrowski <mbobrowski@mbobrowski.org> |
iomap: split size and error for iomap_dio_rw ->end_io Modify the calling convention for the iomap_dio_rw ->end_io() callback. Rather than passing either dio->error or dio->size as the 'size' argument, instead pass both the dio->error and the dio->size value separately. In the instance that an error occurred during a write, we currently cannot determine whether any blocks have been allocated beyond the current EOF and data has subsequently been written to these blocks within the ->end_io() callback. As a result, we cannot judge whether we should take the truncate failed write path. Having both dio->error and dio->size will allow us to perform such checks within this callback. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> [hch: minor cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
db074436 |
|
15-Jul-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
iomap: move the direct IO code into a separate file Move the direct IO code into a separate file so that we can group related functions in a single file instead of having a single enormous source file. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|