Cross Reference: /linux-master/fs/nfs/pnfs.c

History log of /linux-master/fs/nfs/pnfs.c
Revision	Date	Author	Comments
# 2fdbc200	27-Feb-2024	NeilBrown <neilb@suse.de>	NFS: avoid infinite loop in pnfs_update_layout. If pnfsd_update_layout() is called on a file for which recovery has failed it will enter a tight infinite loop. NFS_LAYOUT_INVALID_STID will be set, nfs4_select_rw_stateid() will return -EIO, and nfs4_schedule_stateid_recovery() will do nothing, so nfs4_client_recover_expired_lease() will not wait. So the code will loop indefinitely. Break the loop by testing the validity of the open stateid at the top of the loop. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 8a6291bf	17-Nov-2023	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Fix the pnfs block driver's calculation of layoutget size Instead of relying on the value of the 'bytes_left' field, we should calculate the layout size based on the offset of the request that is being written out. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 6e7434ab	09-Sep-2023	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts If we're using the 'softerr' mount option, we may want to allow layoutget to return EAGAIN to allow knfsd server threads to return a JUKEBOX/DELAY error to the client instead of busy waiting. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# f6395572	08-Oct-2023	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Fix a hang in nfs4_evict_inode() We are not allowed to call pnfs_mark_matching_lsegs_return() without also holding a reference to the layout header, since doing so could lead to the reference count going to zero when we call pnfs_layout_remove_lseg(). This again can lead to a hang when we get to nfs4_evict_inode() and are unable to clear the layout pointer. pnfs_layout_return_unused_byserver() is guilty of this behaviour, and has been seen to trigger the refcount warning prior to a hang. Fixes: b6d49ecd1081 ("NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 28d4411f	20-Jan-2023	Olga Kornievskaia <olga.kornievskaia@gmail.com>	pNFS/filelayout: treat GETDEVICEINFO errors as layout failure When GETDEVICEINFO call fails, return the layout and fall back to MDS. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# b739a5bd	05-Oct-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked If the layout is recalled or revoked, we want to cancel I/O as quickly as possible so that we can return the layout. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# f5d39b02	22-Aug-2022	Peter Zijlstra <peterz@infradead.org>	freezer,sched: Rewrite core freezer logic Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states can be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
# edf79efc	13-Aug-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds Since pnfs_write_done_resend_to_mds() does not actually call end_page_writeback() on the pages that are being redirected to the metadata server, callers of fsync() do not see the I/O as complete until the writeback to the MDS finishes. We therefore do not need to set NFS_CONTEXT_RESEND_WRITES, since there is nothing to redrive. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 880265c7	31-May-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Avoid a live lock condition in pnfs_update_layout() If we're about to send the first layoutget for an empty layout, we want to make sure that we drain out the existing pending layoutget calls first. The reason is that these layouts may have been already implicitly returned to the server by a recall to which the client gave a NFS4ERR_NOMATCHING_LAYOUT response. The problem is that wait_var_event_killable() could in principle see the plh_outstanding count go back to '1' when the first process to wake up starts sending a new layoutget. If it fails to get a layout, then this loop can continue ad infinitum... Fixes: 0b77f97a7e42 ("NFSv4/pnfs: Fix layoutget behaviour after invalidation") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# fe44fb23	31-May-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE If the server tells us that a pNFS layout is not available for a specific file, then we should not keep pounding it with further layoutget requests. Fixes: 183d9e7b112a ("pnfs: rework LAYOUTGET retry handling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 3764a17e	14-May-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout Commit 587f03deb69b caused pnfs_update_layout() to stop returning ENOMEM when the memory allocation fails, and hence causes it to fall back to trying to do I/O through the MDS. There is no guarantee that this will fare any better. If we're failing the pNFS layout allocation, then we should just redirty the page and retry later. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 587f03deb69b ("pnfs: refactor send_layoutget") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 7c9d845f	28-Mar-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pNFS: Fix another issue with a list iterator pointing to the head In nfs4_callback_devicenotify(), if we don't find a matching entry for the deviceid, we're left with a pointer to 'struct nfs_server' that actually points to the list of super blocks associated with our struct nfs_client. Furthermore, even if we have a valid pointer, nothing pins the super block, and so the struct nfs_server could end up getting freed while we're using it. Since all we want is a pointer to the struct pnfs_layoutdriver_type, let's skip all the iteration over super blocks, and just use APIs to find the layout driver directly. Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Fixes: 1be5683b03a7 ("pnfs: CB_NOTIFY_DEVICEID") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 63d8a41b	21-Mar-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod Ensure that pNFS allocations that can be called from rpciod/nfsiod callback can fail in low memory mode, so that the threads don't block and loop forever. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# da48f267	29-Jan-2022	Trond Myklebust <trond.myklebust@hammerspace.com>	NFS: Convert GFP_NOFS to GFP_KERNEL Assume that sections that should not re-enter the filesystem are already protected with memalloc_nofs_save/restore call, so relax those GFP_NOFS instances which might be used by other contexts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# d6236a98	23-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: The layout barrier indicate a minimal value for the seqid The intention of the layout barrier is to ensure that we do not update the layout to match an older value than the current expectation. Fix the test in pnfs_layout_stateid_blocked() to reflect that it is legal for the seqid of the stateid to match that of the barrier. Fixes: aa95edf309ef ("NFSv4/pnfs: Fix the layout barrier update") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 45baadaa	23-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pNFS: Always allow update of a zero valued layout barrier A zero value for the layout barrier indicates that it has been cleared (since seqid '0' is an illegal value), so we should always allow it to be updated. Fixes: d29b468da4f9 ("pNFS/NFSv4: Improve rejection of out-of-order layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# 7c0bbf2d	26-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pNFS: Remove dead code Since commit 2b28a7bee453 ("fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t") it has not been legal to bump a zero refcount, so the code that tries to allow it if the NFS_LSEG_VALID flag is still set would cause trouble. Luckily, NFS_LSEG_VALID has its own refcount so we can never hit this bad code snippet in practice. Remove it to avoid confusion. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# e20772cb	26-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pNFS: Fix a layoutget livelock loop If NFS_LAYOUT_RETURN_REQUESTED is set, but there is no value set for the layout plh_return_seq, we can end up in a livelock loop in which every layout segment retrieved by a new call to layoutget is immediately invalidated by pnfs_layout_need_return(). To get around this, we should just set plh_return_seq to the current value of the layout stateid's seqid. Fixes: d474f96104bd ("NFS: Don't return layout segments that are in use") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# b4e89bcb	02-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: Clean up layout get on open Cache the layout in the arguments so we don't have to keep looking it up from the inode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 0b77f97a	02-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: Fix layoutget behaviour after invalidation If the layout gets invalidated, we should wait for any outstanding layoutget requests for that layout to complete, and we should resend them only after re-establishing the layout stateid. Fixes: d29b468da4f9 ("pNFS/NFSv4: Improve rejection of out-of-order layouts") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# aa95edf3	02-Jul-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4/pnfs: Fix the layout barrier update If we have multiple outstanding layoutget requests, the current code to update the layout barrier assumes that the outstanding layout stateids are updated in order. That's not necessarily the case. Instead of using the value of lo->plh_outstanding as a guesstimate for the window of values we need to accept, just wait to update the window until we're processing the last one. The intention here is just to ensure that we don't process 2^31 seqid updates without also updating the barrier. Fixes: 1bcf34fdac5f ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# a421d218	18-May-2021	Anna Schumaker <Anna.Schumaker@Netapp.com>	NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() Commit de144ff4234f changes _pnfs_return_layout() to call pnfs_mark_matching_lsegs_return() passing NULL as the struct pnfs_layout_range argument. Unfortunately, pnfs_mark_matching_lsegs_return() doesn't check if we have a value here before dereferencing it, causing an oops. I'm able to hit this crash consistently when running connectathon basic tests on NFS v4.1/v4.2 against Ontap. Fixes: de144ff4234f ("NFSv4: Don't discard segments marked for return in _pnfs_return_layout()") Cc: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# d1d97395	12-May-2021	Yang Li <yang.lee@linux.alibaba.com>	pNFS/NFSv4: Remove redundant initialization of 'rd_size' Variable 'rd_size' is being initialized however this value is never read as 'rd_size' is assigned a new value in for statement. Remove the redundant assignment. Clean up clang warning: fs/nfs/pnfs.c:2681:6: warning: Value stored to 'rd_size' during its initialization is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# fb700ef0	15-Apr-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4.1: Simplify layout return in pnfs_layout_process() If the server hands us a layout that does not match the one we currently hold, then have pnfs_mark_matching_lsegs_return() just ditch the old layout if NFS_LSEG_LAYOUTRETURN is not set. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# de144ff4	18-Apr-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4: Don't discard segments marked for return in _pnfs_return_layout() If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN flag, then the assumption is that it has some reporting requirement to perform through a layoutreturn (e.g. flexfiles layout stats or error information). Fixes: 6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 39fd0186	15-Apr-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFS: Don't discard pNFS layout segments that are marked for return If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN flag, then the assumption is that it has some reporting requirement to perform through a layoutreturn (e.g. flexfiles layout stats or error information). Fixes: e0b7d420f72a ("pNFS: Don't discard layout segments that are marked for return") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# ffb81717	19-Nov-2020	Gustavo A. R. Silva <gustavoars@kernel.org>	nfs: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly add multiple break/goto/return/fallthrough statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# d29b468d	22-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS/NFSv4: Improve rejection of out-of-order layouts If a layoutget ends up being reordered w.r.t. a layoutreturn, e.g. due to a layoutget-on-open not knowing a priori which file to lock, then we must assume the layout is no longer being considered valid state by the server. Incrementally improve our ability to reject such states by using the cached old stateid in conjunction with the plh_barrier to try to identify them. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 1bcf34fd	21-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn When we're scheduling a layoutreturn, we need to ignore any further incoming layouts with sequence ids that are going to be affected by the layout return. Fixes: 44ea8dfce021 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 08bd8dbe	21-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() If the server returns a new stateid that does not match the one in our cache, then try to return the one we hold instead of just invalidating it on the client side. This ensures that both client and server will agree that the stateid is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 814b8497	21-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process() If the server returns a new stateid that does not match the one in our cache, then pnfs_layout_process() will leak the layout segments returned by pnfs_mark_layout_stateid_invalid(). Fixes: 9888d837f3cf ("pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# cb2856c5	06-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	NFS/pNFS: Fix a leak of the layout 'plh_outstanding' counter If we exit _lgopen_prepare_attached() without setting a layout, we will currently leak the plh_outstanding counter. Fixes: 411ae722d10a ("pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 2c8d5fc3	05-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Stricter ordering of layoutget and layoutreturn If a layout return is in progress, we should wait for it to complete, in case the layout segment we are picking up gets returned too. Fixes: 30cb3ee299cb ("pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# c18d1e17	04-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Clean up pnfs_layoutreturn_free_lsegs() Remove the check for whether or not the stateid is NULL, and fix up the callers. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 078000d0	04-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: We want return-on-close to complete when evicting the inode If the inode is being evicted, it should be safe to run return-on-close, so we should do it to ensure we don't inadvertently leak layout segments. Fixes: 1c5bd76d17cc ("pNFS: Enable layoutreturn operation for return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 67bbceed	04-Jan-2021	Trond Myklebust <trond.myklebust@hammerspace.com>	pNFS: Mark layout for return if return-on-close was not sent If the layout return-on-close failed because the layoutreturn was never sent, then we should mark the layout for return again. Fixes: 9c47b18cf722 ("pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# b6d49ecd	24-Nov-2020	Trond Myklebust <trond.myklebust@hammerspace.com>	NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode When returning the layout in nfs4_evict_inode(), we need to ensure that the layout is actually done being freed before we can proceed to free the inode itself. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
# 9f266451	16-Sep-2020	Wang Qing <wangqing@vivo.com>	nfs: fix spellint typo in pnfs.c Change the comment typo: "manger" -> "manager". Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
# df561f66	23-Aug-2020	Gustavo A. R. Silva <gustavoars@kernel.org>	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
# 563c53e7	11-Aug-2020	Trond Myklebust <trond.myklebust@hammerspace.com>	NFS: Fix flexfiles read failover The current mirrored read failover code is correctly resetting the mirror index between failed reads, however it is not able to actually flip the RPC call over to the next RPC client. The end result is that we keep resending the RPC call to the same client over and over. The fix is to use the pnfs_read_resend_pnfs() mechanism to schedule a new RPC call, but we need to add the ability to pass in a mirror index so that we always retry the next mirror in the list. Fixes: 166bd5b889ac ("pNFS/flexfiles: Fix layoutstats handling during read failovers") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>