Searched +hist:5 +hist:cbf0398 (Results 1 - 5 of 5) sorted by relevance

/linux-master/fs/afs/
H A DKconfigdiff 523d27cd Thu Feb 06 07:22:21 MST 2020 David Howells <dhowells@redhat.com> afs: Convert afs to use the new fscache API

Change the afs filesystem to support the new afs driver.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole. There's also no longer a cell cookie.

(2) The volume cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). This function takes three parameters: a
string representing the "volume" in the index, a string naming the
cache to use (or NULL) and a u64 that conveys coherency metadata for
the volume.

For afs, I've made it render the volume name string as:

"afs,<cell>,<volume_id>"

and the coherency data is currently 0.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before, except that these are now stored in big endian
form instead of cpu endian. This makes the cache more copyable.

(4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file
is opened or closed to prevent a cache file from being culled and to
keep resources to hand that are needed to do I/O.

fscache_use_cookie() is given an indication if the cache is likely to
be modified locally (e.g. the file is open for writing).

fscache_unuse_cookie() is given a coherency update if we had the file
open for writing and will update that.

(5) fscache_invalidate() is now given uptodate auxiliary data and a file
size. It can also take a flag to indicate if this was due to a DIO
write. This is wrapped into afs_fscache_invalidate() now for
convenience.

(6) fscache_resize() now gets called from the finalisation of
afs_setattr(), and afs_setattr() does use/unuse of the cookie around
the call to support this.

(7) fscache_note_page_release() is called from afs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

Render the parts of the cookie key for an afs inode cookie as big endian.

Changes
=======
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Tested-by: kafs-testing@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021568841.640689.6684240152253400380.stgit@warthog.procyon.org.uk/ # v4
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
H A Dfsclient.cdiff 16069e13 Sun Nov 05 09:11:07 MST 2023 David Howells <dhowells@redhat.com> afs: Parse the VolSync record in the reply of a number of RPC ops

A number of fileserver RPC operations return a VolSync record as part of
their reply that gives some information about the state of the volume being
accessed, including:

(1) A volume Creation timestamp. For an RW volume, this is the time at
which the volume was created; if it changes, the RW volume was
presumably restored from a backup and all cached data should be
scrubbed as Data Version numbers could regress on the files in the
volume.

For an RO volume, this is the time it was last snapshotted from the RW
volume. It is expected to advance each time this happens; if it
regresses, cached data should be scrubbed.

(2) A volume Update timestamp (Auristor only). For an RW volume, this is
updated any time any change is made to a volume or its contents. If
it regresses, all cached data must be scrubbed.

For an RO volume, this is a copy of the RW volume's Update timestamp
at the point of snapshotting. It can be used as a version number when
checking to see if a callback on a RO volume was due to a snapshot.
If it regresses, all cached data must be scrubbed.

but this is currently not made use of by the in-kernel afs filesystem.

Make the afs filesystem use this by:

(1) Add an update time field to the afs_volsync struct and use a value of
TIME64_MIN in both that and the creation time to indicate that they
are unset.

(2) Add creation and update time fields to the afs_volume struct and use
this to track the two timestamps.

(3) Add a volsync_lock mutex to the afs_volume struct to control
modification access for when we detect a change in these values.

(3) Add a 'pre-op volsync' struct to the afs_operation struct to record
the state of the volume tracking before the op.

(4) Add a new counter, cb_scrub, to the afs_volume struct to count events
that require all data to be scrubbed. A copy is placed in the
afs_vnode struct (inode) and if they no longer match, a scrub takes
place.

(5) When the result of an operation is being parsed, parse the VolSync
data too, if it is provided. Note that the two timestamps are handled
separately, since they don't work in quite the same way.

- If the afs_volume tracking is unset, just set it and do nothing
else.

- If the result timestamps are the same as the ones in afs_volume, do
nothing.

- If the timestamps regress, increment cb_scrub if not already done
so.

- If the creation timestamp on a RW volume changes, increment cb_scrub
if not already done so.

- If the creation timestamp on a RO volume advances, update the server
list and see if the current server has been excluded, if so reissue
the op. Once over half of the replication sites have been updated,
increment cb_ro_snapshot to indicate updates may be required and
switch over to excluding unupdated replication sites.

- If the creation timestamp on a Backup volume advances, just
increment cb_ro_snapshot to trigger updates.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
diff f6cbb368 Fri Apr 24 08:10:00 MDT 2020 David Howells <dhowells@redhat.com> afs: Actively poll fileservers to maintain NAT or firewall openings

When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.

If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.

If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.

The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.

To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.

Fix this by adding a mechanism to emit regular probes.

Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.

Further notes:

(1) Probing is now no longer driven from the fileserver rotation
algorithm.

(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.

(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.

(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.

(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
diff f6cbb368 Fri Apr 24 08:10:00 MDT 2020 David Howells <dhowells@redhat.com> afs: Actively poll fileservers to maintain NAT or firewall openings

When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.

If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.

If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.

The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.

To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.

Fix this by adding a mechanism to emit regular probes.

Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.

Further notes:

(1) Probing is now no longer driven from the fileserver rotation
algorithm.

(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.

(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.

(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.

(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
H A Dwrite.cdiff adc9613f Sat May 21 01:45:55 MDT 2022 David Howells <dhowells@redhat.com> afs: Adjust ACK interpretation to try and cope with NAT

If a client's address changes, say if it is NAT'd, this can disrupt an in
progress operation. For most operations, this is not much of a problem,
but StoreData can be different as some servers modify the target file as
the data comes in, so if a store request is disrupted, the file can get
corrupted on the server.

The problem is that the server doesn't recognise packets that come after
the change of address as belonging to the original client and will bounce
them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if
the packet number falls within the initial sequence number window of a call
or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting
it. In both cases, firstPacket will be 1 and previousPacket will be 0 in
the ACK information.

Fix this by the following means:

(1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1
and previousPacket as 0, assume this indicates that the server saw the
incoming packets from a different peer and thus as a different call.
Fail the call with error -ENETRESET.

(2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the
first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the
ACK packet will cause it to get retransmitted, so the call will just
be repeated.

(3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of
the operation.

(4) Prioritise the error code over things like -ECONNRESET as the server
did actually respond.

(5) Make writeback treat -ENETRESET as a retryable error and make it
redirty all the pages involved in a write so that the VM will retry.

Note that there is still a circumstance that I can't easily deal with: if
the operation is fully received and processed by the server, but the reply
is lost due to address change. There's no way to know if the op happened.
We can examine the server, but a conflicting change could have been made by
a third party - and we can't tell the difference. In such a case, a
message like:

kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a)

will be logged to dmesg on the next op to touch the file and the client
will reset the inode state, including invalidating clean parts of the
pagecache.

Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 173ce1ca Fri Mar 11 08:58:21 MST 2022 David Howells <dhowells@redhat.com> afs: Fix potential thrashing in afs writeback

In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.

Fix this with three measures:

(1) Advance start to after the page we found.

(2) Break out of the loop and return if rescheduling is requested.

(3) Arbitrarily give up after a maximum of 5 skips.

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Acked-by: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 523d27cd Thu Feb 06 07:22:21 MST 2020 David Howells <dhowells@redhat.com> afs: Convert afs to use the new fscache API

Change the afs filesystem to support the new afs driver.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole. There's also no longer a cell cookie.

(2) The volume cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). This function takes three parameters: a
string representing the "volume" in the index, a string naming the
cache to use (or NULL) and a u64 that conveys coherency metadata for
the volume.

For afs, I've made it render the volume name string as:

"afs,<cell>,<volume_id>"

and the coherency data is currently 0.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before, except that these are now stored in big endian
form instead of cpu endian. This makes the cache more copyable.

(4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file
is opened or closed to prevent a cache file from being culled and to
keep resources to hand that are needed to do I/O.

fscache_use_cookie() is given an indication if the cache is likely to
be modified locally (e.g. the file is open for writing).

fscache_unuse_cookie() is given a coherency update if we had the file
open for writing and will update that.

(5) fscache_invalidate() is now given uptodate auxiliary data and a file
size. It can also take a flag to indicate if this was due to a DIO
write. This is wrapped into afs_fscache_invalidate() now for
convenience.

(6) fscache_resize() now gets called from the finalisation of
afs_setattr(), and afs_setattr() does use/unuse of the cookie around
the call to support this.

(7) fscache_note_page_release() is called from afs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

Render the parts of the cookie key for an afs inode cookie as big endian.

Changes
=======
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Tested-by: kafs-testing@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021568841.640689.6684240152253400380.stgit@warthog.procyon.org.uk/ # v4
diff 78525c74 Wed Aug 11 02:49:13 MDT 2021 David Howells <dhowells@redhat.com> netfs, 9p, afs, ceph: Use folios

Convert the netfs helper library to use folios throughout, convert the 9p
and afs filesystems to use folios in their file I/O paths and convert the
ceph filesystem to use just enough folios to compile.

With these changes, afs passes -g quick xfstests.

Changes
=======
ver #5:
- Got rid of folio_end{io,_read,_write}() and inlined the stuff it does
instead (Willy decided he didn't want this after all).

ver #4:
- Fixed a bug in afs_redirty_page() whereby it didn't set the next page
index in the loop and returned too early.
- Simplified a check in v9fs_vfs_write_folio_locked()[1].
- Undid a change to afs_symlink_readpage()[1].
- Used offset_in_folio() in afs_write_end()[1].
- Changed from using page_endio() to folio_end{io,_read,_write}()[1].

ver #2:
- Add 9p foliation.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: kafs-testing@auristor.com
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
diff 5c052248 Thu Aug 12 16:08:54 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix afs_launder_page() to set correct start file position

Fix afs_launder_page() to set the starting position of the StoreData RPC at
the offset into the page at which the modified data starts instead of at
the beginning of the page (the iov_iter is correctly offset).

The offset got lost during the conversion to passing an iov_iter into
afs_store_data().

Changes:
ver #2:
- Use page_offset() rather than manually calculating it[1].

Fixes: bd80d8a80e12 ("afs: Use ITER_XARRAY for writing")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/YST/0e92OdSH0zjg@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/162880783179.3421678.7795105718190440134.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162937512409.1449272.18441473411207824084.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/162981148752.1901565.3663780601682206026.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163005741670.2472992.2073548908229887941.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163221839087.3143591.14278359695763025231.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163292980654.4004896.7134735179887998551.stgit@warthog.procyon.org.uk/ # v2
diff 5a972474 Mon Jul 12 10:04:47 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix setting of writeback_index

Fix afs_writepages() to always set mapping->writeback_index to a page index
and not a byte position[1].

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/CAB9dFdvHsLsw7CMnB+4cgciWDSqVjuij4mH3TaXnHQB8sz5rHw@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/162610728339.3408253.4604750166391496546.stgit@warthog.procyon.org.uk/ # v2 (no v1)
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff 793fe82e Fri Jun 12 09:13:52 MDT 2020 David Howells <dhowells@redhat.com> afs: Fix truncation issues and mmap writeback size

Fix the following issues:

(1) Fix writeback to reduce the size of a store operation to i_size,
effectively discarding the extra data.

The problem comes when afs_page_mkwrite() records that a page is about
to be modified by mmap(). It doesn't know what bits of the page are
going to be modified, so it records the whole page as being dirty
(this is stored in page->private as start and end offsets).

Without this, the marshalling for the store to the server extends the
size of the file to the end of the page (in afs_fs_store_data() and
yfs_fs_store_data()).

(2) Fix setattr to actually truncate the pagecache, thereby clearing
the discarded part of a file.

(3) Fix setattr to check that the new size is okay and to disable
ATTR_SIZE if i_size wouldn't change.

(4) Force i_size to be updated as the result of a truncate.

(5) Don't truncate if ATTR_SIZE is not set.

(6) Call pagecache_isize_extended() if the file was enlarged.

Note that truncate_set_size() isn't used because the setting of i_size is
done inside afs_vnode_commit_status() under the vnode->cb_lock.

Found with the generic/029 and generic/393 xfstests.

Fixes: 31143d5d515e ("AFS: implement basic file write support")
Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
H A Dfile.cdiff 453924de Wed Nov 08 06:57:42 MST 2023 David Howells <dhowells@redhat.com> afs: Overhaul invalidation handling to better support RO volumes

Overhaul the third party-induced invalidation handling, making use of the
previously added volume-level event counters (cb_scrub and cb_ro_snapshot)
that are now being parsed out of the VolSync record returned by the
fileserver in many of its replies.

This allows better handling of RO (and Backup) volumes. Since these are
snapshot of a RW volume that are updated atomically simultantanously across
all servers that host them, they only require a single callback promise for
the entire volume. The currently upstream code assumes that RO volumes
operate in the same manner as RW volumes, and that each file has its own
individual callback - which means that it does a status fetch for *every*
file in a RO volume, whether or not the volume got "released" (volume
callback breaks can occur for other reasons too, such as the volumeserver
taking ownership of a volume from a fileserver).

To this end, make the following changes:

(1) Change the meaning of the volume's cb_v_break counter so that it is
now a hint that we need to issue a status fetch to work out the state
of a volume. cb_v_break is incremented by volume break callbacks and
by server initialisation callbacks.

(2) Add a second counter, cb_v_check, to the afs_volume struct such that
if this differs from cb_v_break, we need to do a check. When the
check is complete, cb_v_check is advanced to what cb_v_break was at
the start of the status fetch.

(3) Move the list of mmap'd vnodes to the volume and trigger removal of
PTEs that map to files on a volume break rather than on a server
break.

(4) When a server reinitialisation callback comes in, use the
server-to-volume reverse mapping added in a preceding patch to iterate
over all the volumes using that server and clear the volume callback
promises for that server and the general volume promise as a whole to
trigger reanalysis.

(5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE
(TIME64_MIN) value in the cb_expires_at field, reducing the number of
checks we need to make.

(6) Change afs_check_validity() to quickly see if various event counters
have been incremented or if the vnode or volume callback promise is
due to expire/has expired without making any changes to the state.
That is now left to afs_validate() as this may get more complicated in
future as we may have to examine server records too.

(7) Overhaul afs_validate() so that it does a single status fetch if we
need to check the state of either the vnode or the volume - and do so
under appropriate locking. The function does the following steps:

(A) If the vnode/volume is no longer seen as valid, then we take the
vnode validation lock and, if the volume promise has expired, the
volume check lock also. The latter prevents redundant checks being
made to find out if a new version of the volume got released.

(B) If a previous RPC call found that the volsync changed unexpectedly
or that a RO volume was updated, then we unmap all PTEs pointing to
the file to stop mmap being used for access.

(C) If the vnode is still seen to be of uncertain validity, then we
perform an FS.FetchStatus RPC op to jointly update the volume status
and the vnode status. This assessment is done as part of parsing the
reply:

If the RO volume creation timestamp advances, cb_ro_snapshot is
incremented; if either the creation or update timestamps changes in
an unexpected way, the cb_scrub counter is incremented

If the Data Version returned doesn't match the copy we have
locally, then we ask for the pagecache to be zapped. This takes
care of handling RO update.

(D) If cb_scrub differs between volume and vnode, the vnode's
pagecache is zapped and the vnode's cb_scrub is updated unless the
file is marked as having been deleted.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff d7bdba1c Wed Dec 22 10:21:04 MST 2021 David Howells <dhowells@redhat.com> 9p, afs, ceph, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()

In 9p, afs ceph, and nfs, gfpflags_allow_blocking() (which wraps a
test for __GFP_DIRECT_RECLAIM being set) is used to determine if
->releasepage() should wait for the completion of a DIO write to fscache
with something like:

if (folio_test_fscache(folio)) {
if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return false;
folio_wait_fscache(folio);
}

Instead, current_is_kswapd() should be used instead.

Note that this is based on a patch originally by Zhaoyang Huang[1]. In
addition to extending it to the other network filesystems and putting it on
top of my fscache rewrite, it also needs to include linux/swap.h in a bunch
of places. Can current_is_kswapd() be moved to linux/mm.h?

Changes
=======
ver #5:
- Dropping the changes for cifs.

Originally-signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Steve French <smfrench@gmail.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: linux-cachefs@redhat.com
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/1638952658-20285-1-git-send-email-huangzhaoyang@gmail.com/ [1]
Link: https://lore.kernel.org/r/164021590773.640689.16777975200823659231.stgit@warthog.procyon.org.uk/ # v4
diff 523d27cd Thu Feb 06 07:22:21 MST 2020 David Howells <dhowells@redhat.com> afs: Convert afs to use the new fscache API

Change the afs filesystem to support the new afs driver.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole. There's also no longer a cell cookie.

(2) The volume cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). This function takes three parameters: a
string representing the "volume" in the index, a string naming the
cache to use (or NULL) and a u64 that conveys coherency metadata for
the volume.

For afs, I've made it render the volume name string as:

"afs,<cell>,<volume_id>"

and the coherency data is currently 0.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before, except that these are now stored in big endian
form instead of cpu endian. This makes the cache more copyable.

(4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file
is opened or closed to prevent a cache file from being culled and to
keep resources to hand that are needed to do I/O.

fscache_use_cookie() is given an indication if the cache is likely to
be modified locally (e.g. the file is open for writing).

fscache_unuse_cookie() is given a coherency update if we had the file
open for writing and will update that.

(5) fscache_invalidate() is now given uptodate auxiliary data and a file
size. It can also take a flag to indicate if this was due to a DIO
write. This is wrapped into afs_fscache_invalidate() now for
convenience.

(6) fscache_resize() now gets called from the finalisation of
afs_setattr(), and afs_setattr() does use/unuse of the cookie around
the call to support this.

(7) fscache_note_page_release() is called from afs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

Render the parts of the cookie key for an afs inode cookie as big endian.

Changes
=======
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Tested-by: kafs-testing@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021568841.640689.6684240152253400380.stgit@warthog.procyon.org.uk/ # v4
diff 78525c74 Wed Aug 11 02:49:13 MDT 2021 David Howells <dhowells@redhat.com> netfs, 9p, afs, ceph: Use folios

Convert the netfs helper library to use folios throughout, convert the 9p
and afs filesystems to use folios in their file I/O paths and convert the
ceph filesystem to use just enough folios to compile.

With these changes, afs passes -g quick xfstests.

Changes
=======
ver #5:
- Got rid of folio_end{io,_read,_write}() and inlined the stuff it does
instead (Willy decided he didn't want this after all).

ver #4:
- Fixed a bug in afs_redirty_page() whereby it didn't set the next page
index in the loop and returned too early.
- Simplified a check in v9fs_vfs_write_folio_locked()[1].
- Undid a change to afs_symlink_readpage()[1].
- Used offset_in_folio() in afs_write_end()[1].
- Changed from using page_endio() to folio_end{io,_read,_write}()[1].

ver #2:
- Add 9p foliation.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: kafs-testing@auristor.com
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff 5cbf0398 Thu Feb 06 07:22:29 MST 2020 David Howells <dhowells@redhat.com> afs: Use new netfs lib read helper API

Make AFS use the new netfs read helpers to implement the VM read
operations:

- afs_readpage() now hands off responsibility to netfs_readpage().

- afs_readpages() is gone and replaced with afs_readahead().

- afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
diff e49c7b2f Fri Apr 10 13:51:51 MDT 2020 David Howells <dhowells@redhat.com> afs: Build an abstraction around an "operation" concept

Turn the afs_operation struct into the main way that most fileserver
operations are managed. Various things are added to the struct, including
the following:

(1) All the parameters and results of the relevant operations are moved
into it, removing corresponding fields from the afs_call struct.
afs_call gets a pointer to the op.

(2) The target volume is made the main focus of the operation, rather than
the target vnode(s), and a bunch of op->vnode->volume are made
op->volume instead.

(3) Two vnode records are defined (op->file[]) for the vnode(s) involved
in most operations. The vnode record (struct afs_vnode_param)
contains:

- The vnode pointer.

- The fid of the vnode to be included in the parameters or that was
returned in the reply (eg. FS.MakeDir).

- The status and callback information that may be returned in the
reply about the vnode.

- Callback break and data version tracking for detecting
simultaneous third-parth changes.

(4) Pointers to dentries to be updated with new inodes.

(5) An operations table pointer. The table includes pointers to functions
for issuing AFS and YFS-variant RPCs, handling the success and abort
of an operation and handling post-I/O-lock local editing of a
directory.

To make this work, the following function restructuring is made:

(A) The rotation loop that issues calls to fileservers that can be found
in each function that wants to issue an RPC (such as afs_mkdir()) is
extracted out into common code, in a new file called fs_operation.c.

(B) The rotation loops, such as the one in afs_mkdir(), are replaced with
a much smaller piece of code that allocates an operation, sets the
parameters and then calls out to the common code to do the actual
work.

(C) The code for handling the success and failure of an operation are
moved into operation functions (as (5) above) and these are called
from the core code at appropriate times.

(D) The pseudo inode getting stuff used by the dynamic root code is moved
over into dynroot.c.

(E) struct afs_iget_data is absorbed into the operation struct and
afs_iget() expects to be given an op pointer and a vnode record.

(F) Point (E) doesn't work for the root dir of a volume, but we know the
FID in advance (it's always vnode 1, unique 1), so a separate inode
getter, afs_root_iget(), is provided to special-case that.

(G) The inode status init/update functions now also take an op and a vnode
record.

(H) The RPC marshalling functions now, for the most part, just take an
afs_operation struct as their only argument. All the data they need
is held there. The result delivery functions write their answers
there as well.

(I) The call is attached to the operation and then the operation core does
the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

(*) Overhauling the rotation (again).

(*) Support for asynchronous I/O, where the fileserver rotation must be
done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
diff a58823ac Thu May 09 08:16:10 MDT 2019 David Howells <dhowells@redhat.com> afs: Fix application of status and callback to be under same lock

When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

(1) Allocating a joint {status,callback} record (afs_status_cb) before
calling the RPC function for each vnode for which the RPC reply
contains a status or a status plus a callback. A flag is set in the
record to indicate if a callback was actually received.

(2) These records are passed into the RPC functions to be filled in. The
afs_decode_status() and yfs_decode_status() functions are removed and
the cb_lock is no longer taken.

(3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
update the vnode.

(4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
the vnode.

(5) vnodes, expected data-version numbers and callback break counters
(cb_break) no longer need to be passed to the reply delivery
functions.

Note that, for the moment, the file locking functions still need
access to both the call and the vnode at the same time.

(6) afs_vnode_commit_status() is now given the cb_break value and the
expected data_version and the task of applying the status and the
callback to the vnode are now done here.

This is done under a single taking of vnode->cb_lock.

(7) afs_pages_written_back() is now called by afs_store_data() rather than
by the reply delivery function.

afs_pages_written_back() has been moved to before the call point and
is now given the first and last page numbers rather than a pointer to
the call.

(8) The indicator from YFS.RemoveFile2 as to whether the target file
actually got removed (status.abort_code == VNOVNODE) rather than
merely dropping a link is now checked in afs_unlink rather than in
xdr_decode_YFSFetchStatus().

Supplementary fixes:

(*) afs_cache_permit() now gets the caller_access mask from the
afs_status_cb object rather than picking it out of the vnode's status
record. afs_fetch_status() returns caller_access through its argument
list for this purpose also.

(*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
than a read lock and now sets the callback inside the same critical
section.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
diff a1b879ee Tue May 14 17:09:17 MDT 2019 David Howells <dhowells@redhat.com> afs: Fix key leak in afs_release() and afs_evict_inode()

Fix afs_release() to go through the cleanup part of the function if
FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the
cleanup). The cleanup involves discarding the refs on the key used for
file ops and the writeback key record.

Also fix afs_evict_inode() to clean up any left over wb keys attached to
the inode/vnode when it is removed.

Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content")
Signed-off-by: David Howells <dhowells@redhat.com>
H A Dinternal.hdiff 495f2ae9 Wed Oct 18 02:24:01 MDT 2023 David Howells <dhowells@redhat.com> afs: Fix fileserver rotation

Fix the fileserver rotation so that it doesn't use RTT as the basis for
deciding which server and address to use as this doesn't necessarily give a
good indication of the best path. Instead, use the configurable preference
list in conjunction with whatever probes have succeeded at the time of
looking.

To this end, make the following changes:

(1) Keep an array of "server states" to track what addresses we've tried
on each server and move the waitqueue entries there that we'll need
for probing.

(2) Each afs_server_state struct is made to pin the corresponding server's
endpoint state rather than the afs_operation struct carrying a pin on
the server we're currently looking at.

(3) Drop the server list preference; we now always rescan the server list.

(4) afs_wait_for_probes() now uses the server state list to guide it in
what it waits for (and to provide the waitqueue entries) and returns
an indication of whether we'd got a response, run out of responsive
addresses or the endpoint state had been superseded and we need to
restart the iteration.

(5) Call afs_get_address_preferences*() occasionally to refresh the
preference values.

(6) When picking a server, scan the addresses of the servers for which we
have as-yet untested communications, looking for the highest priority
one and use that instead of trying all the addresses for a particular
server in ascending-RTT order.

(7) When a Busy or Offline state is seen across all available servers, do
a short sleep.

(8) If we detect that we accessed a future RO volume version whilst it is
undergoing replication, reissue the op against the older version until
at least half of the servers are replicated.

(9) Whilst RO replication is ongoing, increase the frequency of Volume
Location server checks for that volume to every ten minutes instead of
hourly.

Also add a tracepoint to track progress through the rotation algorithm.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff 453924de Wed Nov 08 06:57:42 MST 2023 David Howells <dhowells@redhat.com> afs: Overhaul invalidation handling to better support RO volumes

Overhaul the third party-induced invalidation handling, making use of the
previously added volume-level event counters (cb_scrub and cb_ro_snapshot)
that are now being parsed out of the VolSync record returned by the
fileserver in many of its replies.

This allows better handling of RO (and Backup) volumes. Since these are
snapshot of a RW volume that are updated atomically simultantanously across
all servers that host them, they only require a single callback promise for
the entire volume. The currently upstream code assumes that RO volumes
operate in the same manner as RW volumes, and that each file has its own
individual callback - which means that it does a status fetch for *every*
file in a RO volume, whether or not the volume got "released" (volume
callback breaks can occur for other reasons too, such as the volumeserver
taking ownership of a volume from a fileserver).

To this end, make the following changes:

(1) Change the meaning of the volume's cb_v_break counter so that it is
now a hint that we need to issue a status fetch to work out the state
of a volume. cb_v_break is incremented by volume break callbacks and
by server initialisation callbacks.

(2) Add a second counter, cb_v_check, to the afs_volume struct such that
if this differs from cb_v_break, we need to do a check. When the
check is complete, cb_v_check is advanced to what cb_v_break was at
the start of the status fetch.

(3) Move the list of mmap'd vnodes to the volume and trigger removal of
PTEs that map to files on a volume break rather than on a server
break.

(4) When a server reinitialisation callback comes in, use the
server-to-volume reverse mapping added in a preceding patch to iterate
over all the volumes using that server and clear the volume callback
promises for that server and the general volume promise as a whole to
trigger reanalysis.

(5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE
(TIME64_MIN) value in the cb_expires_at field, reducing the number of
checks we need to make.

(6) Change afs_check_validity() to quickly see if various event counters
have been incremented or if the vnode or volume callback promise is
due to expire/has expired without making any changes to the state.
That is now left to afs_validate() as this may get more complicated in
future as we may have to examine server records too.

(7) Overhaul afs_validate() so that it does a single status fetch if we
need to check the state of either the vnode or the volume - and do so
under appropriate locking. The function does the following steps:

(A) If the vnode/volume is no longer seen as valid, then we take the
vnode validation lock and, if the volume promise has expired, the
volume check lock also. The latter prevents redundant checks being
made to find out if a new version of the volume got released.

(B) If a previous RPC call found that the volsync changed unexpectedly
or that a RO volume was updated, then we unmap all PTEs pointing to
the file to stop mmap being used for access.

(C) If the vnode is still seen to be of uncertain validity, then we
perform an FS.FetchStatus RPC op to jointly update the volume status
and the vnode status. This assessment is done as part of parsing the
reply:

If the RO volume creation timestamp advances, cb_ro_snapshot is
incremented; if either the creation or update timestamps changes in
an unexpected way, the cb_scrub counter is incremented

If the Data Version returned doesn't match the copy we have
locally, then we ask for the pagecache to be zapped. This takes
care of handling RO update.

(D) If cb_scrub differs between volume and vnode, the vnode's
pagecache is zapped and the vnode's cb_scrub is updated unless the
file is marked as having been deleted.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff 16069e13 Sun Nov 05 09:11:07 MST 2023 David Howells <dhowells@redhat.com> afs: Parse the VolSync record in the reply of a number of RPC ops

A number of fileserver RPC operations return a VolSync record as part of
their reply that gives some information about the state of the volume being
accessed, including:

(1) A volume Creation timestamp. For an RW volume, this is the time at
which the volume was created; if it changes, the RW volume was
presumably restored from a backup and all cached data should be
scrubbed as Data Version numbers could regress on the files in the
volume.

For an RO volume, this is the time it was last snapshotted from the RW
volume. It is expected to advance each time this happens; if it
regresses, cached data should be scrubbed.

(2) A volume Update timestamp (Auristor only). For an RW volume, this is
updated any time any change is made to a volume or its contents. If
it regresses, all cached data must be scrubbed.

For an RO volume, this is a copy of the RW volume's Update timestamp
at the point of snapshotting. It can be used as a version number when
checking to see if a callback on a RO volume was due to a snapshot.
If it regresses, all cached data must be scrubbed.

but this is currently not made use of by the in-kernel afs filesystem.

Make the afs filesystem use this by:

(1) Add an update time field to the afs_volsync struct and use a value of
TIME64_MIN in both that and the creation time to indicate that they
are unset.

(2) Add creation and update time fields to the afs_volume struct and use
this to track the two timestamps.

(3) Add a volsync_lock mutex to the afs_volume struct to control
modification access for when we detect a change in these values.

(3) Add a 'pre-op volsync' struct to the afs_operation struct to record
the state of the volume tracking before the op.

(4) Add a new counter, cb_scrub, to the afs_volume struct to count events
that require all data to be scrubbed. A copy is placed in the
afs_vnode struct (inode) and if they no longer match, a scrub takes
place.

(5) When the result of an operation is being parsed, parse the VolSync
data too, if it is provided. Note that the two timestamps are handled
separately, since they don't work in quite the same way.

- If the afs_volume tracking is unset, just set it and do nothing
else.

- If the result timestamps are the same as the ones in afs_volume, do
nothing.

- If the timestamps regress, increment cb_scrub if not already done
so.

- If the creation timestamp on a RW volume changes, increment cb_scrub
if not already done so.

- If the creation timestamp on a RO volume advances, update the server
list and see if the current server has been excluded, if so reissue
the op. Once over half of the replication sites have been updated,
increment cb_ro_snapshot to indicate updates may be required and
switch over to excluding unupdated replication sites.

- If the creation timestamp on a Backup volume advances, just
increment cb_ro_snapshot to trigger updates.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects

Change rxrpc's API such that:

(1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
rxrpc_peer record for a remote address and a corresponding function,
rxrpc_kernel_put_peer(), is provided to dispose of it again.

(2) When setting up a call, the rxrpc_peer object used during a call is
now passed in rather than being set up by rxrpc_connect_call(). For
afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
the full address (the service ID then has to be passed in as a
separate parameter).

(3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
get a pointer to the transport address for display purposed, and
another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
rxrpc address.

(4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
is then altered to take a peer. This now returns the RTT or -1 if
there are insufficient samples.

(5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().

(6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
peer the caller already has.

This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents. It also makes it easier to get hold of
the RTT. The following changes are made to afs:

(1) The addr_list struct's addrs[] elements now hold a peer struct pointer
and a service ID rather than a sockaddr_rxrpc.

(2) When displaying the transport address, rxrpc_kernel_remote_addr() is
used.

(3) The port arg is removed from afs_alloc_addrlist() since it's always
overridden.

(4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
now return an error that must be handled.

(5) afs_find_server() now takes a peer pointer to specify the address.

(6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
now do peer pointer comparison rather than address comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff 72904d7b Wed Oct 18 17:55:11 MDT 2023 David Howells <dhowells@redhat.com> rxrpc, afs: Allow afs to pin rxrpc_peer objects

Change rxrpc's API such that:

(1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
rxrpc_peer record for a remote address and a corresponding function,
rxrpc_kernel_put_peer(), is provided to dispose of it again.

(2) When setting up a call, the rxrpc_peer object used during a call is
now passed in rather than being set up by rxrpc_connect_call(). For
afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
the full address (the service ID then has to be passed in as a
separate parameter).

(3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
get a pointer to the transport address for display purposed, and
another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
rxrpc address.

(4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
is then altered to take a peer. This now returns the RTT or -1 if
there are insufficient samples.

(5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().

(6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
peer the caller already has.

This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents. It also makes it easier to get hold of
the RTT. The following changes are made to afs:

(1) The addr_list struct's addrs[] elements now hold a peer struct pointer
and a service ID rather than a sockaddr_rxrpc.

(2) When displaying the transport address, rxrpc_kernel_remote_addr() is
used.

(3) The port arg is removed from afs_alloc_addrlist() since it's always
overridden.

(4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
now return an error that must be handled.

(5) afs_find_server() now takes a peer pointer to specify the address.

(6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
now do peer pointer comparison rather than address comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
diff f710c2e4 Fri Sep 29 23:00:08 MDT 2023 Wedson Almeida Filho <walmeida@microsoft.com> afs: move afs_xattr_handlers to .rodata

This makes it harder for accidental or malicious changes to
afs_xattr_handlers at runtime.

Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Link: https://lore.kernel.org/r/20230930050033.41174-5-wedsonaf@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 523d27cd Thu Feb 06 07:22:21 MST 2020 David Howells <dhowells@redhat.com> afs: Convert afs to use the new fscache API

Change the afs filesystem to support the new afs driver.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole. There's also no longer a cell cookie.

(2) The volume cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). This function takes three parameters: a
string representing the "volume" in the index, a string naming the
cache to use (or NULL) and a u64 that conveys coherency metadata for
the volume.

For afs, I've made it render the volume name string as:

"afs,<cell>,<volume_id>"

and the coherency data is currently 0.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before, except that these are now stored in big endian
form instead of cpu endian. This makes the cache more copyable.

(4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file
is opened or closed to prevent a cache file from being culled and to
keep resources to hand that are needed to do I/O.

fscache_use_cookie() is given an indication if the cache is likely to
be modified locally (e.g. the file is open for writing).

fscache_unuse_cookie() is given a coherency update if we had the file
open for writing and will update that.

(5) fscache_invalidate() is now given uptodate auxiliary data and a file
size. It can also take a flag to indicate if this was due to a DIO
write. This is wrapped into afs_fscache_invalidate() now for
convenience.

(6) fscache_resize() now gets called from the finalisation of
afs_setattr(), and afs_setattr() does use/unuse of the cookie around
the call to support this.

(7) fscache_note_page_release() is called from afs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

Render the parts of the cookie key for an afs inode cookie as big endian.

Changes
=======
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Tested-by: kafs-testing@auristor.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021568841.640689.6684240152253400380.stgit@warthog.procyon.org.uk/ # v4
diff 78525c74 Wed Aug 11 02:49:13 MDT 2021 David Howells <dhowells@redhat.com> netfs, 9p, afs, ceph: Use folios

Convert the netfs helper library to use folios throughout, convert the 9p
and afs filesystems to use folios in their file I/O paths and convert the
ceph filesystem to use just enough folios to compile.

With these changes, afs passes -g quick xfstests.

Changes
=======
ver #5:
- Got rid of folio_end{io,_read,_write}() and inlined the stuff it does
instead (Willy decided he didn't want this after all).

ver #4:
- Fixed a bug in afs_redirty_page() whereby it didn't set the next page
index in the loop and returned too early.
- Simplified a check in v9fs_vfs_write_folio_locked()[1].
- Undid a change to afs_symlink_readpage()[1].
- Used offset_in_folio() in afs_write_end()[1].
- Changed from using page_endio() to folio_end{io,_read,_write}()[1].

ver #2:
- Add 9p foliation.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: kafs-testing@auristor.com
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/
diff b537a3c2 Thu Sep 09 17:01:52 MDT 2021 David Howells <dhowells@redhat.com> afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server

AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
Link: https://lore.kernel.org/r/951332.1631308745@warthog.procyon.org.uk/

Completed in 1709 milliseconds