#
2f74991a |
|
30-Jan-2024 |
Kunwu Chan <chentao@kylinos.cn> |
nfsd: Simplify the allocation of slab caches in nfsd_file_cache_init Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
5ff318f6 |
|
14-Dec-2023 |
NeilBrown <neilb@suse.de> |
nfsd: use __fput_sync() to avoid delayed closing of files. Calling fput() directly or though filp_close() from a kernel thread like nfsd causes the final __fput() (if necessary) to be called from a workqueue. This means that nfsd is not forced to wait for any work to complete. If the ->release or ->destroy_inode function is slow for any reason, this can result in nfsd closing files more quickly than the workqueue can complete the close and the queue of pending closes can grow without bounces (30 million has been seen at one customer site, though this was in part due to a slowness in xfs which has since been fixed). nfsd does not need this. It is quite appropriate and safe for nfsd to do its own close work. There is no reason that close should ever wait for nfsd, so no deadlock can occur. It should be safe and sensible to change all fput() calls to __fput_sync(). However in the interests of caution this patch only changes two - the two that can be most directly affected by client behaviour and could occur at high frequency. - the fput() implicitly in flip_close() is changed to __fput_sync() by calling get_file() first to ensure filp_close() doesn't do the final fput() itself. If is where files opened for IO are closed. - the fput() in nfsd_read() is also changed. This is where directories opened for readdir are closed. This ensure that minimal fput work is queued to the workqueue. This removes the need for the flush_delayed_fput() call in nfsd_file_close_inode_sync() Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
ffb40259 |
|
14-Dec-2023 |
NeilBrown <neilb@suse.de> |
nfsd: Don't leave work of closing files to a work queue The work of closing a file can have non-trivial cost. Doing it in a separate work queue thread means that cost isn't imposed on the nfsd threads and an imbalance can be created. This can result in files being queued for the work queue more quickly that the work queue can process them, resulting in unbounded growth of the queue and memory exhaustion. To avoid this work imbalance that exhausts memory, this patch moves all closing of files into the nfsd threads. This means that when the work imposes a cost, that cost appears where it would be expected - in the work of the nfsd thread. A subsequent patch will ensure the final __fput() is called in the same (nfsd) thread which calls filp_close(). Files opened for NFSv3 are never explicitly closed by the client and are kept open by the server in the "filecache", which responds to memory pressure, is garbage collected even when there is no pressure, and sometimes closes files when there is particular need such as for rename. These files currently have filp_close() called in a dedicated work queue, so their __fput() can have no effect on nfsd threads. This patch discards the work queue and instead has each nfsd thread call flip_close() on as many as 8 files from the filecache each time it acts on a client request (or finds there are no pending client requests). If there are more to be closed, more threads are woken. This spreads the work of __fput() over multiple threads and imposes any cost on those threads. The number 8 is somewhat arbitrary. It needs to be greater than 1 to ensure that files are closed more quickly than they can be added to the cache. It needs to be small enough to limit the per-request delays that will be imposed on clients when all threads are busy closing files. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
05580bbf |
|
31-Jan-2024 |
Jeff Layton <jlayton@kernel.org> |
nfsd: adapt to breakup of struct file_lock Most of the existing APIs have remained the same, but subsystems that access file_lock fields directly need to reach into struct file_lock_core now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-42-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
ce7df055 |
|
22-Oct-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Make the file_delayed_close workqueue UNBOUND workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND There's no harm in closing a cached file descriptor on another core. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
0a97c01c |
|
30-Nov-2023 |
Nhat Pham <nphamcs@gmail.com> |
list_lru: allow explicit memcg and NUMA node selection Patch series "workload-specific and memory pressure-driven zswap writeback", v8. There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. This patch (of 6): The interface of list_lru is based on the assumption that the list node and the data it represents belong to the same allocated on the correct node/memcg. While this assumption is valid for existing slab objects LRU such as dentries and inodes, it is undocumented, and rather inflexible for certain potential list_lru users (such as the upcoming zswap shrinker and the THP shrinker). It has caused us a lot of issues during our development. This patch changes list_lru interface so that the caller must explicitly specify numa node and memcg when adding and removing objects. The old list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and list_lru_del_obj(), respectively. It also extends the list_lru API with a new function, list_lru_putback, which undoes a previous list_lru_isolate call. Unlike list_lru_add, it does not increment the LRU node count (as list_lru_isolate does not decrement the node count). list_lru_putback also allows for explicit memcg and NUMA node selection. Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Chris Li <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Seth Jennings <sjenning@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
856e5949 |
|
11-Sep-2023 |
Qi Zheng <zhengqi.arch@bytedance.com> |
nfsd: dynamically allocate the nfsd-filecache shrinker Use new APIs to dynamically allocate the nfsd-filecache shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-13-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Tom Talpey <tom@talpey.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d59b3515 |
|
11-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Handle EOPENSTALE correctly in the filecache The nfsd_open code handles EOPENSTALE correctly, by retrying the call to fh_verify() and __nfsd_open(). However the filecache just drops the error on the floor, and immediately returns nfserr_stale to the caller. This patch ensures that we propagate the EOPENSTALE code back to nfsd_file_do_acquire, and that we handle it correctly. Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230911183027.11372-1-trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
147abcac |
|
19-Apr-2023 |
Dai Ngo <dai.ngo@oracle.com> |
NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop The following request sequence to the same file causes the NFS client and server getting into an infinite loop with COMMIT and NFS4ERR_DELAY: OPEN REMOVE WRITE COMMIT Problem reported by recall11, recall12, recall14, recall20, recall22, recall40, recall42, recall48, recall50 of nfstest suite. This patch restores the handling of race condition in nfsd_file_do_acquire with unlink to that prior of the regression. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
92e4a673 |
|
14-Apr-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: simplify the delayed disposal list code When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list. Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
c4c649ab |
|
24-Nov-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Convert filecache to rhltable While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion. A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation). Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered. Light testing with fstests generic/531 show no regressions. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
dcb779fc |
|
15-Feb-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: allow reaping files still under writeback On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback. The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems. Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
972cc0e0 |
|
25-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: update comment over __nfsd_file_cache_purge Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
b2ff1bd7 |
|
17-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: don't take/put an extra reference when putting a file The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
b680cb9b |
|
05-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: add some comments to nfsd_file_do_acquire David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
c6593366 |
|
05-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: don't kill nfsd_files because of lease break error An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
d69b8dbf |
|
06-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
6c31e4c9 |
|
06-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries Since v4 files are expected to be long-lived, there's little value in closing them out of the cache when there is conflicting access. Change the comparator to also match the gc value in the key. Change both of the current users of that key to set the gc value in the key to "true". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
b8bea9f6 |
|
05-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: don't open-code clear_and_wake_up_bit Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
4c475eee |
|
06-Feb-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: don't fsync nfsd_files on last close Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block. We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT). I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed. If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error). The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
70f62231 |
|
06-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: allow nfsd_file_get to sanely handle a NULL pointer ...and remove some now-useless NULL pointer checks in its callers. Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
4bdbba54 |
|
20-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: don't free files unconditionally in __nfsd_file_cache_purge nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-by: Torkil Svensgaard <torkil@drcmr.dk> Reported-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
0b3a551f |
|
05-Jan-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: fix handling of cached open files in nfsd4_open codepath Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") added the ability to cache an open fd over a compound. There are a couple of problems with the way this currently works: It's racy, as a newly-created nfsd_file can end up with its PENDING bit cleared while the nf is hashed, and the nf_file pointer is still zeroed out. Other tasks can find it in this state and they expect to see a valid nf_file, and can oops if nf_file is NULL. Also, there is no guarantee that we'll end up creating a new nfsd_file if one is already in the hash. If an extant entry is in the hash with a valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with the value of op_file and the old nf_file will leak. Fix both issues by making a new nfsd_file_acquirei_opened variant that takes an optional file pointer. If one is present when this is called, we'll take a new reference to it instead of trying to open the file. If the nfsd_file already has a valid nf_file, we'll just ignore the optional file and pass the nfsd_file back as-is. Also rework the tracepoints a bit to allow for an "opened" variant and don't try to avoid counting acquisitions in the case where we already have a cached open file. Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") Cc: Trond Myklebust <trondmy@hammerspace.com> Reported-by: Stanislav Saner <ssaner@redhat.com> Reported-and-Tested-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-and-Tested-by: Torkil Svensgaard <torkil@drcmr.dk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
ac3a2585 |
|
11-Dec-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: rework refcounting in filecache The filecache refcounting is a bit non-standard for something searchable by RCU, in that we maintain a sentinel reference while it's hashed. This in turn requires that we have to do things differently in the "put" depending on whether its hashed, which we believe to have led to races. There are other problems in here too. nfsd_file_close_inode_sync can end up freeing an nfsd_file while there are still outstanding references to it, and there are a number of subtle ToC/ToU races. Rework the code so that the refcount is what drives the lifecycle. When the refcount goes to zero, then unhash and rcu free the object. A task searching for a nfsd_file is allowed to bump its refcount, but only if it's not already 0. Ensure that we don't make any other changes to it until a reference is held. With this change, the LRU carries a reference. Take special care to deal with it when removing an entry from the list, and ensure that we only repurpose the nf_lru list_head when the refcount is 0 to ensure exclusive access to it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
d7064eaf |
|
03-Nov-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Add an nfsd_file_fsync tracepoint Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset. Examples: <...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
22ae4c11 |
|
02-Nov-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: fix up the filecache laundrette scheduling We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU. Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
82141185 |
|
02-Nov-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: reorganize filecache.c In a coming patch, we're going to rework how the filecache refcounting works. Move some code around in the function to reduce the churn in the later patches, and rename some of the functions with (hopefully) clearer names: nfsd_file_flush becomes nfsd_file_fsync, and nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue. Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free, to better match the name of the function from which it's called. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
1f696e23 |
|
02-Nov-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: remove the pages_flushed statistic from filecache We're counting mapping->nrpages, but not all of those are necessarily dirty. We don't really have a simple way to count just the dirty pages, so just remove this stat since it's not accurate. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
3f054211 |
|
31-Oct-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Fix licensing header in filecache.c Add a missing SPDX header. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
b3276c1f |
|
01-Nov-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Flesh out a documenting comment for filecache.c Record what we've learned recently about the NFSD filecache in a documenting comment so our future selves don't forget what all this is for. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
4d1ea845 |
|
28-Oct-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection NFSv4 operations manage the lifetime of nfsd_file items they use by means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be garbage collected. Introduce a mechanism to enable garbage collection for nfsd_file items used only by NFSv2/3 callers. Note that the change in nfsd_file_put() ensures that both CLOSE and DELEGRETURN will actually close out and free an nfsd_file on last reference of a non-garbage-collected file. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394 Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
dcf3f809 |
|
28-Oct-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2. That commit attempted to make files available to other users as soon as all NFSv4 clients were done with them, rather than waiting until the filecache LRU had garbage collected them. It gets the reference counting wrong, for one thing. But it also misses that DELEGRETURN should release a file in the same fashion. In fact, any nfsd_file_put() on an file held open by an NFSv4 client needs potentially to release the file immediately... Clear the way for implementing that idea. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de>
|
#
bdd6b562 |
|
05-Nov-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint When we fail to insert into the hashtable with a non-retryable error, we'll free the object and then goto out_status. If the tracepoint is enabled, it'll end up accessing the freed object when it tries to grab the fields out of it. Set nf to NULL after freeing it to avoid the issue. Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
d3aefd2b |
|
31-Oct-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: fix net-namespace logic in __nfsd_file_cache_purge If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely. Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
243a5263 |
|
04-Oct-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: rework hashtable handling in nfsd_do_file_acquire nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough to get a reference after finding it in the hash. Take the rcu_read_lock() and call rhashtable_lookup directly. Switch to using rhashtable_lookup_insert_key as well, and use the usual retry mechanism if we hit an -EEXIST. Rename the "retry" bool to open_retry, and eliminiate the insert_err goto target. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
8d0d254b |
|
30-Sep-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: fix nfsd_file_unhash_and_dispose nfsd_file_unhash_and_dispose() is called for two reasons: We're either shutting down and purging the filecache, or we've gotten a notification about a file delete, so we want to go ahead and unhash it so that it'll get cleaned up when we close. We're either walking the hashtable or doing a lookup in it and we don't take a reference in either case. What we want to do in both cases is to try and unhash the object and put it on the dispose list if that was successful. If it's no longer hashed, then we don't want to touch it, with the assumption being that something else is already cleaning up the sentinel reference. Instead of trying to selectively decrement the refcount in this function, just unhash it, and if that was successful, move it to the dispose list. Then, the disposal routine will just clean that up as usual. Also, just make this a void function, drop the WARN_ON_ONCE, and the comments about deadlocking since the nature of the purported deadlock is no longer clear. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
1342f9dd |
|
22-Sep-2022 |
ChenXiaoSong <chenxiaosong2@huawei.com> |
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
427f5f83 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Ensure nf_inode is never dereferenced The documenting comment for struct nf_file states: /* * A representation of a file that has been opened by knfsd. These are hashed * in the hashtable by inode pointer value. Note that this object doesn't * hold a reference to the inode by itself, so the nf_inode pointer should * never be dereferenced, only used for comparison. */ Replace the two existing dereferences to make the comment always true. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
5e138c4a |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: NFSv4 CLOSE should release an nfsd_file immediately The last close of a file should enable other accessors to open and use that file immediately. Leaving the file open in the filecache prevents other users from accessing that file until the filecache garbage-collects the file -- sometimes that takes several seconds. Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
b40a2839 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Move nfsd_file_trace_alloc() tracepoint Avoid recording the allocation of an nfsd_file item that is immediately released because a matching item was already inserted in the hash. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
be023006 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Separate tracepoints for acquire and create These tracepoints collect different information: the create case does not open a file, so there's no nf_file available. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
0ec8e9d1 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Clean up unused code after rhashtable conversion Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
ce502f81 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Convert the filecache to use rhashtable Enable the filecache hash table to start small, then grow with the workload. Smaller server deployments benefit because there should be lower memory utilization. Larger server deployments should see improved scaling with the number of open files. Suggested-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
fc22945e |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Set up an rhashtable for the filecache Add code to initialize and tear down an rhashtable. The rhashtable is not used yet. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
c7b824c3 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Replace the "init once" mechanism In a moment, the nfsd_file_hashtbl global will be replaced with an rhashtable. Replace the one or two spots that need to check if the hash table is available. We can easily reuse the SHUTDOWN flag for this purpose. Document that this mechanism relies on callers to hold the nfsd_mutex to prevent init, shutdown, and purging to run concurrently. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f0743c2b |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Remove nfsd_file::nf_hashval The value in this field can always be computed from nf_inode, thus it is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
cb7ec76e |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: nfsd_file_hash_remove can compute hashval Remove an unnecessary use of nf_hashval. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
a8455110 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Refactor __nfsd_file_close_inode() The code that computes the hashval is the same in both callers. To prevent them from going stale, reframe the documenting comments to remove descriptions of the underlying hash table structure, which is about to be replaced. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
87553263 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode Remove an unnecessary usage of nf_hashval. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f53cef15 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Remove lockdep assertion from unhash_and_release_locked() IIUC, holding the hash bucket lock is needed only in nfsd_file_unhash, and there is already a lockdep assertion there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
54f7df70 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: No longer record nf_hashval in the trace log I'm about to replace nfsd_file_hashtbl with an rhashtable. The individual hash values will no longer be visible or relevant, so remove them from the tracepoints. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
6df19411 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Never call nfsd_file_gc() in foreground paths The checks in nfsd_file_acquire() and nfsd_file_put() that directly invoke filecache garbage collection are intended to keep cache occupancy between a low- and high-watermark. The reason to limit the capacity of the filecache is to keep filecache lookups reasonably fast. However, invoking garbage collection at those points has some undesirable negative impacts. Files that are held open by NFSv4 clients often push the occupancy of the filecache over these watermarks. At that point: - Every call to nfsd_file_acquire() and nfsd_file_put() results in an LRU walk. This has the same effect on lookup latency as long chains in the hash table. - Garbage collection will then run on every nfsd thread, causing a lot of unnecessary lock contention. - Limiting cache capacity pushes out files used only by NFSv3 clients, which are the type of files the filecache is supposed to help. To address those negative impacts, remove the direct calls to the garbage collector. Subsequent patches will address maintaining lookup efficiency as cache capacity increases. Suggested-by: Wang Yugui <wangyugui@e16-tech.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
edead3a5 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Fix the filecache LRU shrinker Without LRU item rotation, the shrinker visits only a few items on the end of the LRU list, and those would always be long-term OPEN files for NFSv4 workloads. That makes the filecache shrinker completely ineffective. Adopt the same strategy as the inode LRU by using LRU_ROTATE. Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
4a0e73e6 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Leave open files out of the filecache LRU There have been reports of problems when running fstests generic/531 against Linux NFS servers with NFSv4. The NFS server that hosts the test's SCRATCH_DEV suffers from CPU soft lock-ups during the test. Analysis shows that: fs/nfsd/filecache.c 482 ret = list_lru_walk(&nfsd_file_lru, 483 nfsd_file_lru_cb, 484 &head, LONG_MAX); causes nfsd_file_gc() to walk the entire length of the filecache LRU list every time it is called (which is quite frequently). The walk holds a spinlock the entire time that prevents other nfsd threads from accessing the filecache. What's more, for NFSv4 workloads, none of the items that are visited during this walk may be evicted, since they are all files that are held OPEN by NFS clients. Address this by ensuring that open files are not kept on the LRU list. Reported-by: Frank van der Linden <fllinden@amazon.com> Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386 Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
c46203ac |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Trace filecache LRU activity Observe the operation of garbage collection and the lifetime of filecache items. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
668ed92e |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: WARN when freeing an item still linked via nf_lru Add a guardrail to prevent freeing memory that is still on a list. This includes either a dispose list or the LRU list. This is the sign of a bug, but this class of bugs can be detected so that they don't endanger system stability, especially while debugging. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
8b330f78 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Zero counters when the filecache is re-initialized If nfsd_file_cache_init() is called after a shutdown, be sure the stat counters are reset. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
df2aff52 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Record number of flush calls Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
94660cc1 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Report the number of items evicted by the LRU walk Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
39f1d1ff |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Refactor nfsd_file_lru_scan() Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
3bc6d347 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Refactor nfsd_file_gc() Refactor nfsd_file_gc() to use the new list_lru helper. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
0bac5a26 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Add nfsd_file_lru_dispose_list() helper Refactor the invariant part of nfsd_file_lru_walk_list() into a separate helper function. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
904940e9 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Report average age of filecache items This is a measure of how long items stay in the filecache, to help assess how efficient the cache is. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
d6329327 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Report count of freed filecache items Surface the count of freed nfsd_file items. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
29d4bdbb |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Report count of calls to nfsd_file_acquire() Count the number of successful acquisitions that did not create a file (ie, acquisitions that do not result in a compulsory cache miss). This count can be compared directly with the reported hit count to compute a hit ratio. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
0fd244c1 |
|
08-Jul-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Report filecache LRU size Surface the NFSD filecache's LRU list length to help field troubleshooters monitor filecache issues. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
23ba98de |
|
29-Jul-2022 |
Jeff Layton <jlayton@kernel.org> |
nfsd: eliminate the NFSD_FILE_BREAK_* flags We had a report from the spring Bake-a-thon of data corruption in some nfstest_interop tests. Looking at the traces showed the NFS server allowing a v3 WRITE to proceed while a read delegation was still outstanding. Currently, we only set NFSD_FILE_BREAK_* flags if NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for COMMIT ops, where we need a writeable filehandle but don't need to break read leases. It doesn't make any sense to consult that flag when allocating a file since the file may be used on subsequent calls where we do want to break the lease (and the usage of it here seems to be reverse from what it should be anyway). Also, after calling nfsd_open_break_lease, we don't want to clear the BREAK_* bits. A lease could end up being set on it later (more than once) and we need to be able to break those leases as well. This means that the NFSD_FILE_BREAK_* flags now just mirror NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just drop those flags and unconditionally call nfsd_open_break_lease every time. Reported-by: Olga Kornieskaia <kolga@netapp.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
e33c267a |
|
31-May-2022 |
Roman Gushchin <roman.gushchin@linux.dev> |
mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
b6c71c66 |
|
31-May-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Fix potential use-after-free in nfsd_file_put() nfsd_file_put_noref() can free @nf, so don't dereference @nf immediately upon return from nfsd_file_put_noref(). Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Fixes: 999397926ab3 ("nfsd: Clean up nfsd_file_put()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
08af54b3 |
|
11-May-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: nfsd_file_put() can sleep Now that there are no more callers of nfsd_file_put() that might hold a spin lock, ensure the lockdep infrastructure can catch newly introduced calls to nfsd_file_put() made while a spinlock is held. Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracle.com/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
0122e882 |
|
27-Mar-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Trace filecache opens Instrument calls to nfsd_open_verified() to get a sense of the filecache hit rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
fb70bf12 |
|
30-Mar-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Instantiate a struct file when creating a regular NFSv4 file There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: JianHong Yin <jiyin@redhat.com>
|
#
f4d84c52 |
|
27-Mar-2022 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Clean up nfsd_open_verified() Its only caller always passes S_IFREG as the @type parameter. As an additional clean-up, add a kerneldoc comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
b8962a9d |
|
22-Apr-2022 |
Amir Goldstein <amir73il@gmail.com> |
nfsd: use fsnotify group lock helpers Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context and that could cause a deadlock when fsnotify mark allocation went into direct reclaim and nfsd shrinker tried to free existing fsnotify marks. To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS flag on nfsd fsnotify group to prevent going into direct reclaim from fsnotify_add_inode_mark(). Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
867a448d |
|
22-Apr-2022 |
Amir Goldstein <amir73il@gmail.com> |
fsnotify: pass flags argument to fsnotify_alloc_group() Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
99939792 |
|
31-Mar-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Clean up nfsd_file_put() Make it a little less racy, by removing the refcount_read() test. Then remove the redundant 'is_hashed' variable. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
6b8a9433 |
|
31-Mar-2022 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Fix a write performance regression The call to filemap_flush() in nfsd_file_put() is there to ensure that we clear out any writes belonging to a NFSv3 client relatively quickly and avoid situations where the file can't be evicted by the garbage collector. It also ensures that we detect write errors quickly. The problem is this causes a regression in performance for some workloads. So try to improve matters by deferring writeback until we're ready to close the file, and need to detect errors so that we can force the client to resend. Tested-by: Jan Kara <jack@suse.cz> Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
cbcc268b |
|
13-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Move many prototypes to pagemap.h These functions are page cache functionality and don't need to be declared in fs.h. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
|
#
4d2eeafe |
|
24-Feb-2022 |
Amir Goldstein <amir73il@gmail.com> |
nfsd: more robust allocation failure handling in nfsd_file_cache_init The nfsd file cache table can be pretty large and its allocation may require as many as 80 contigious pages. Employ the same fix that was employed for similar issue that was reported for the reply cache hash table allocation several years ago by commit 8f97514b423a ("nfsd: more robust allocation failure handling in nfsd_reply_cache_init"). Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/ Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Amir Goldstein <amir73il@gmail.com>
|
#
3988a578 |
|
30-Dec-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Rename boot verifier functions Clean up: These functions handle what the specs call a write verifier, which in the Linux NFS server implementation is now divorced from the server's boot instance Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
555dbf1a |
|
18-Dec-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Replace use of rwsem with errseq_t The nfsd_file nf_rwsem is currently being used to separate file write and commit instances to ensure that we catch errors and apply them to the correct write/commit. We can improve scalability at the expense of a little accuracy (some extra false positives) by replacing the nf_rwsem with more careful use of the errseq_t mechanism to track errors across the different operations. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ cel: rebased on zero-verifier fix ]
|
#
1463b38e |
|
30-Nov-2021 |
NeilBrown <neilb@suse.de> |
NFSD: simplify per-net file cache management We currently have a 'laundrette' for closing cached files - a different work-item for each network-namespace. These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a list, and are freed using rcu. The list is not necessary as we have a per-namespace structure (struct nfsd_net) which can hold a link to the nfsd_fcache_disposal. The use of kfree_rcu is also unnecessary as the cache is cleaned of all files associated with a given namespace, and no new files can be added, before the nfsd_fcache_disposal is freed. So add a '->fcache_disposal' link to nfsd_net, and discard the list management and rcu usage. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
24dca905 |
|
25-Oct-2021 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
fsnotify: Protect fsnotify_handle_inode_event from no-inode events FAN_FS_ERROR allows events without inodes - i.e. for file system-wide errors. Even though fsnotify_handle_inode_event is not currently used by fanotify, this patch protects other backends from cases where neither inode or dir are provided. Also document the constraints of the interface (inode and dir cannot be both NULL). Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
19598141 |
|
30-Sep-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Fix a warning for nfsd_file_close_inode Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
d30881f5 |
|
18-Feb-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
nfsd: Don't keep looking up unhashed files in the nfsd file cache If a file is unhashed, then we're going to reject it anyway and retry, so make sure we skip it when we're doing the RCU lockless lookup. This avoids a number of unnecessary nfserr_jukebox returns from nfsd_file_acquire() Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
950cc0d2 |
|
02-Dec-2020 |
Amir Goldstein <amir73il@gmail.com> |
fsnotify: generalize handle_inode_event() The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
231307df |
|
25-Nov-2020 |
Huang Guobin <huangguobin4@huawei.com> |
nfsd: Fix error return code in nfsd_file_cache_init() Fix to return PTR_ERR() error code from the error handling case instead of 0 in function nfsd_file_cache_init(), as done elsewhere in this function. Fixes: 65294c1f2c5e7("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Huang Guobin <huangguobin4@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
ae3c57b5 |
|
17-Jul-2020 |
J. Bruce Fields <bfields@redhat.com> |
nfsd: Cache R, RW, and W opens separately The nfsd open code has always kept separate read-only, read-write, and write-only opens as necessary to ensure that when a client closes or downgrades, we don't retain more access than necessary. Also, I didn't realize the cache behaved this way when I wrote 94415b06eb8a "nfsd4: a client's own opens needn't prevent delegations". There I assumed fi_fds[O_WRONLY] and fi_fds[O_RDWR] would always be distinct. The violation of that assumption is triggering a WARN_ON_ONCE() and could also cause the server to give out a delegation when it shouldn't. Fixes: 94415b06eb8a ("nfsd4: a client's own opens needn't prevent delegations") Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
b9a1b977 |
|
22-Jul-2020 |
Amir Goldstein <amir73il@gmail.com> |
fsnotify: create method handle_inode_event() in fsnotify_operations The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
b54cecf5 |
|
06-Jun-2020 |
Amir Goldstein <amir73il@gmail.com> |
fsnotify: pass dir argument to handle_event() callback The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
9a02aa40 |
|
08-Jul-2020 |
Amir Goldstein <amir73il@gmail.com> |
nfsd: use fsnotify_data_inode() to get the unlinked inode The inode argument to handle_event() is about to become obsolete. Link: https://lore.kernel.org/r/20200708111156.24659-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
057a2274 |
|
13-Feb-2020 |
Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> |
fs: nfsd: fileache.c: Use built-in RCU list checking list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
50d0def9 |
|
02-Feb-2020 |
Chen Zhou <chenzhou10@huawei.com> |
nfsd: make nfsd_filecache_wq variable static Fix sparse warning: fs/nfsd/filecache.c:55:25: warning: symbol 'nfsd_filecache_wq' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
689827cd |
|
13-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: convert file cache to use over/underflow safe refcount Use the 'refcount_t' type instead of 'atomic_t' for improved refcounting safety. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
5011af4c |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Fix stable writes Strictly speaking, a stable write error needs to reflect the write + the commit of that write (and only that write). To ensure that we don't pick up the write errors from other writebacks, add a rw_semaphore to provide exclusion. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
90d2f1da |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Fix a soft lockup race in nfsd_file_mark_find_or_create() If nfsd_file_mark_find_or_create() keeps winning the race for the nfsd_file_fsnotify_group->mark_mutex against nfsd_file_mark_put() then it can soft lock up, since fsnotify_add_inode_mark() ends up always finding an existing entry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
b6669305 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Reduce the number of calls to nfsd_file_gc() Don't call nfsd_file_gc() on every put of the reference in nfsd_file_put(). Instead, do it only when we're expecting the refcount to go to 1. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
55f84cc4 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Schedule the laundrette regularly irrespective of file errors Emsure we schedule the laundrette even if the struct file is carrying file errors. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
bd6e1cec |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Remove unused constant NFSD_FILE_LRU_RESCAN Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
9542e6a6 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Containerise filecache laundrette Ensure that if the filecache laundrette gets stuck, it only affects the knfsd instances of one container. The notifier callbacks can be called from various contexts so avoid using synchonous filesystem operations that might deadlock. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
36ebbdb9 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: cleanup nfsd_file_lru_dispose() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
28c7d86b |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: fix filecache lookup If the lookup keeps finding a nfsd_file with an unhashed open file, then retry once only. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org Fixes: 65294c1f2c5e "nfsd: add a new struct file caching facility to nfsd" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
2a67803e |
|
01-Nov-2019 |
Mao Wenan <maowenan@huawei.com> |
nfsd: Drop LIST_HEAD where the variable it declares is never used. The declarations were introduced with the file, but the declared variables were not used. Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
055b24a8 |
|
02-Sep-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Don't garbage collect files that might contain write errors If a file may contain unstable writes that can error out, then we want to avoid garbage collecting the struct nfsd_file that may be tracking those errors. So in the garbage collector, we try to avoid collecting files that aren't clean. Furthermore, we avoid immediately kicking off the garbage collector in the case where the reference drops to zero for the case where there is a write error that is being tracked. If the file is unhashed while an error is pending, then declare a reboot, to ensure the client resends any unstable writes. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
5e113224 |
|
02-Sep-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: nfsd_file cache entries should be per net namespace Ensure that we can safely clear out the file cache entries when the nfs server is shut down on a container. Otherwise, the file cache may end up pinning the mounts. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
bb13f35b |
|
19-Aug-2019 |
YueHaibing <yuehaibing@huawei.com> |
nfsd: remove duplicated include from filecache.c Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
65294c1f |
|
18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: add a new struct file caching facility to nfsd Currently, NFSv2/3 reads and writes have to open a file, do the read or write and then close it again for each RPC. This is highly inefficient, especially when the underlying filesystem has a relatively slow open routine. This patch adds a new open file cache to knfsd. Rather than doing an open for each RPC, the read/write handlers can call into this cache to see if there is one already there for the correct filehandle and NFS_MAY_READ/WRITE flags. If there isn't an entry, then we create a new one and attempt to perform the open. If there is, then we wait until the entry is fully instantiated and return it if it is at the end of the wait. If it's not, then we attempt to take over construction. Since the main goal is to speed up NFSv2/3 I/O, we don't want to close these files on last put of these objects. We need to keep them around for a little while since we never know when the next READ/WRITE will come in. Cache entries have a hardcoded 1s timeout, and we have a recurring workqueue job that walks the cache and purges any entries that have expired. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Richard Sharpe <richard.sharpe@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|