History log of /linux-master/fs/netfs/io.c
Revision Date Author Comments
# 2147caaa 29-Jan-2024 Marc Dionne <marc.dionne@auristor.com>

netfs: Fix i_dio_count leak on DIO read past i_size

If netfs_begin_read gets a NETFS_DIO_READ request that begins
past i_size, it won't perform any i/o and just return 0. This
will leak an increment to i_dio_count that is done at the top
of the function.

This can cause subsequent buffered read requests to block
indefinitely, waiting for a non existing dio operation to complete.

Add a inode_dio_end() for the NETFS_DIO_READ case, before returning.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240129094924.1221977-2-dhowells@redhat.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: <linux-afs@lists.infradead.org>
cc: <netfs@lists.linux.dev>
cc: <linux-fsdevel@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 202bc57b 09-Jan-2024 David Howells <dhowells@redhat.com>

netfs: Don't use certain unnecessary folio_*() functions

Filesystems should use folio->index and folio->mapping, instead of
folio_index(folio), folio_mapping() and folio_file_mapping() since
they know that it's in the pagecache.

Change this automagically with:

perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/netfs/*.c
perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/netfs/*.c
perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/netfs/*.c

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-erofs@lists.ozlabs.org
cc: linux-fsdevel@vger.kernel.org


# 807c6d09 05-Jan-2024 David Howells <dhowells@redhat.com>

netfs: Fix the loop that unmarks folios after writing to the cache

In the loop in netfs_rreq_unmark_after_write() that removes the PG_fscache
from folios after they've been written to the cache, as soon as we remove
the mark from a multipage folio, it can get split - and then we might see a
fragment of folio again.

Guard against this by advancing the 'unlocked' tracker to the index of the
last page in the folio to avoid a double removal of the PG_fscache mark.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 100ccd18 24-Nov-2023 David Howells <dhowells@redhat.com>

netfs: Optimise away reads above the point at which there can be no data

Track the file position above which the server is not expected to have any
data (the "zero point") and preemptively assume that we can satisfy
requests by filling them with zeroes locally rather than attempting to
download them if they're over that line - even if we've written data back
to the server. Assume that any data that was written back above that
position is held in the local cache. Note that we have to split requests
that straddle the line.

Make use of this to optimise away some reads from the server. We need to
set the zero point in the following circumstances:

(1) When we see an extant remote inode and have no cache for it, we set
the zero_point to i_size.

(2) On local inode creation, we set zero_point to 0.

(3) On local truncation down, we reduce zero_point to the new i_size if
the new i_size is lower.

(4) On local truncation up, we don't change zero_point.

(5) On local modification, we don't change zero_point.

(6) On remote invalidation, we set zero_point to the new i_size.

(7) If stored data is discarded from the pagecache or culled from fscache,
we must set zero_point above that if the data also got written to the
server.

(8) If dirty data is written back to the server, but not fscache, we must
set zero_point above that.

(9) If a direct I/O write is made, set zero_point above that.

Assuming the above, any read from the server at or above the zero_point
position will return all zeroes.

The zero_point value can be stored in the cache, provided the above rules
are applied to it by any code that culls part of the local cache.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# e0ace6ca 22-Nov-2023 David Howells <dhowells@redhat.com>

netfs, cachefiles: Pass upper bound length to allow expansion

Make netfslib pass the maximum length to the ->prepare_write() op to tell
the cache how much it can expand the length of a write to. This allows a
write to the server at the end of a file to be limited to a few bytes
whilst writing an entire block to the cache (something required by direct
I/O).

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 153a9961 21-Feb-2022 David Howells <dhowells@redhat.com>

netfs: Implement unbuffered/DIO write support

Implement support for unbuffered writes and direct I/O writes. If the
write is misaligned with respect to the fscrypt block size, then RMW cycles
are performed if necessary. DIO writes are a special case of unbuffered
writes with extra restriction imposed, such as block size alignment
requirements.

Also provide a field that can tell the code to add some extra space onto
the bounce buffer for use by the filesystem in the case of a
content-encrypted file.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 016dc851 14-Jan-2022 David Howells <dhowells@redhat.com>

netfs: Implement unbuffered/DIO read support

Implement support for unbuffered and DIO reads in the netfs library,
utilising the existing read helper code to do block splitting and
individual queuing. The code also handles extraction of the destination
buffer from the supplied iterator, allowing async unbuffered reads to take
place.

The read will be split up according to the rsize setting and, if supplied,
the ->clamp_length() method. Note that the next subrequest will be issued
as soon as issue_op returns, without waiting for previous ones to finish.
The network filesystem needs to pause or handle queuing them if it doesn't
want to fire them all at the server simultaneously.

Once all the subrequests have finished, the state will be assessed and the
amount of data to be indicated as having being obtained will be
determined. As the subrequests may finish in any order, if an intermediate
subrequest is short, any further subrequests may be copied into the buffer
and then abandoned.

In the future, this will also take care of doing an unbuffered read from
encrypted content, with the decryption being done by the library.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# c38f4e96 17-Jun-2021 David Howells <dhowells@redhat.com>

netfs: Provide func to copy data to pagecache for buffered write

Provide a netfs write helper, netfs_perform_write() to buffer data to be
written in the pagecache and mark the modified folios dirty.

It will perform "streaming writes" for folios that aren't currently
resident, if possible, storing data in partially modified folios that are
marked dirty, but not uptodate. It will also tag pages as belonging to
fs-specific write groups if so directed by the filesystem.

This is derived from generic_perform_write(), but doesn't use
->write_begin() and ->write_end(), having that logic rolled in instead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 4fcccc38 04-Oct-2023 David Howells <dhowells@redhat.com>

netfs: Make the refcounting of netfs_begin_read() easier to use

Make the refcounting of netfs_begin_read() easier to use by not eating the
caller's ref on the netfs_io_request it's given. This makes it easier to
use when we need to look in the request struct after.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 768ddb1e 27-May-2022 David Howells <dhowells@redhat.com>

netfs: Limit subrequest by size or number of segments

Limit a subrequest to a maximum size and/or a maximum number of contiguous
physical regions. This permits, for instance, an subreq's iterator to be
limited to the number of DMA'able segments that a large RDMA request can
handle.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# 92b6cc5d 26-Sep-2023 David Howells <dhowells@redhat.com>

netfs: Add iov_iters to (sub)requests to describe various buffers

Add three iov_iter structs:

(1) Add an iov_iter (->iter) to the I/O request to describe the
unencrypted-side buffer.

(2) Add an iov_iter (->io_iter) to the I/O request to describe the
encrypted-side I/O buffer. This may be a different size to the buffer
in (1).

(3) Add an iov_iter (->io_iter) to the I/O subrequest to describe the part
of the I/O buffer for that subrequest.

This will allow future patches to point to a bounce buffer instead for
purposes of handling oversize writes, decryption (where we want to save the
encrypted data to the cache) and decompression.

These iov_iters persist for the lifetime of the (sub)request, and so can be
accessed multiple times without worrying about them being deallocated upon
return to the caller.

The network filesystem must appropriately advance the iterator before
terminating the request.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org


# de4eda9d 15-Sep-2022 Al Viro <viro@zeniv.linux.org.uk>

use less confusing names for iov_iter direction initializers

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7e043a80 03-Nov-2022 David Howells <dhowells@redhat.com>

netfs: Fix missing xas_retry() calls in xarray iteration

netfslib has a number of places in which it performs iteration of an xarray
whilst being under the RCU read lock. It *should* call xas_retry() as the
first thing inside of the loop and do "continue" if it returns true in case
the xarray walker passed out a special value indicating that the walk needs
to be redone from the root[*].

Fix this by adding the missing retry checks.

[*] I wonder if this should be done inside xas_find(), xas_next_node() and
suchlike, but I'm told that's not an simple change to effect.

This can cause an oops like that below. Note the faulting address - this
is an internal value (|0x2) returned from xarray.

BUG: kernel NULL pointer dereference, address: 0000000000000402
...
RIP: 0010:netfs_rreq_unlock+0xef/0x380 [netfs]
...
Call Trace:
netfs_rreq_assess+0xa6/0x240 [netfs]
netfs_readpage+0x173/0x3b0 [netfs]
? init_wait_var_entry+0x50/0x50
filemap_read_page+0x33/0xf0
filemap_get_pages+0x2f2/0x3f0
filemap_read+0xaa/0x320
? do_filp_open+0xb2/0x150
? rmqueue+0x3be/0xe10
ceph_read_iter+0x1fe/0x680 [ceph]
? new_sync_read+0x115/0x1a0
new_sync_read+0x115/0x1a0
vfs_read+0xf3/0x180
ksys_read+0x5f/0xe0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Changes:
========
ver #2)
- Changed an unsigned int to a size_t to reduce the likelihood of an
overflow as per Willy's suggestion.
- Added an additional patch to fix the maths.

Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers")
Reported-by: George Law <glaw@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/166749229733.107206.17482609105741691452.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166757987929.950645.12595273010425381286.stgit@warthog.procyon.org.uk/ # v2


# b900f4b8 01-Mar-2022 David Howells <dhowells@redhat.com>

netfs: Split some core bits out into their own file

Split some core bits out into their own file. More bits will be added to
this file later.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

Link: https://lore.kernel.org/r/164623006934.3564931.17932680017894039748.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/164678218407.1200972.1731208226140990280.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/164692920944.2099075.11990502173226013856.stgit@warthog.procyon.org.uk/ # v3


# 16211268 01-Mar-2022 David Howells <dhowells@redhat.com>

netfs: Split fs/netfs/read_helper.c

Split fs/netfs/read_helper.c into two pieces, one to deal with buffered
writes and one to deal with the I/O mechanism.

Changes
=======
ver #2)
- Add kdoc reference to new file.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

Link: https://lore.kernel.org/r/164623005586.3564931.6149556072728481767.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/164678217075.1200972.5101072043126828757.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/164692919953.2099075.7156989585513833046.stgit@warthog.procyon.org.uk/ # v3


# 3be01750 07-Mar-2022 David Howells <dhowells@redhat.com>

netfs: Rename read_helper.c to io.c

Rename the read_helper.c file to io.c before splitting out the buffered
read functions and some other bits.

Changes
=======
ver #2)
- Rename read_helper.c before splitting.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

Link: https://lore.kernel.org/r/164678216109.1200972.16567696909952495832.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/164692918076.2099075.8120961172717347610.stgit@warthog.procyon.org.uk/ # v3