Cross Reference: /linux-master/fs/nfsd/nfscache.c

History log of /linux-master/fs/nfsd/nfscache.c
Revision	Date	Author	Comments
# 192d80cd	03-Feb-2024	Kunwu Chan <chentao@kylinos.cn>	nfsd: Simplify the allocation of slab caches in nfsd_drc_slab_create Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. And change cache name from 'nfsd_drc' to 'nfsd_cacherep'. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 4b148854	26-Jan-2024	Josef Bacik <josef@toxicpanda.com>	nfsd: make all of the nfsd stats per-network namespace We have a global set of counters that we modify for all of the nfsd operations, but now that we're exposing these stats across all network namespaces we need to make the stats also be per-network namespace. We already have some caching stats that are per-network namespace, so move these definitions into the same counter and then adjust all the helpers and users of these stats to provide the appropriate nfsd_net struct so that the stats are maintained for the per-network namespace objects. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# d98416cc	26-Jan-2024	Josef Bacik <josef@toxicpanda.com>	nfsd: rename NFSD_NET_* to NFSD_STATS_* We're going to merge the stats all into per network namespace in subsequent patches, rename these nn counters to be consistent with the rest of the stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# d0ab8b64	13-Nov-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Remove nfsd_drc_gc() tracepoint This trace point was for debugging the DRC's garbage collection. In the field it's just noise. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# bf51c52a	10-Nov-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Fix checksum mismatches in the duplicate reply cache nfsd_cache_csum() currently assumes that the server's RPC layer has been advancing rq_arg.head[0].iov_base as it decodes an incoming request, because that's the way it used to work. On entry, it expects that buf->head[0].iov_base points to the start of the NFS header, and excludes the already-decoded RPC header. These days however, head[0].iov_base now points to the start of the RPC header during all processing. It no longer points at the NFS Call header when execution arrives at nfsd_cache_csum(). In a retransmitted RPC the XID and the NFS header are supposed to be the same as the original message, but the contents of the retransmitted RPC header can be different. For example, for krb5, the GSS sequence number will be different between the two. Thus if the RPC header is always included in the DRC checksum computation, the checksum of the retransmitted message might not match the checksum of the original message, even though the NFS part of these messages is identical. The result is that, even if a matching XID is found in the DRC, the checksum mismatch causes the server to execute the retransmitted RPC transaction again. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 49cecd86	10-Nov-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Update nfsd_cache_append() to use xdr_stream When inserting a DRC-cached response into the reply buffer, ensure that the reply buffer's xdr_stream is updated properly. Otherwise the server will send a garbage response. Cc: stable@vger.kernel.org # v6.3+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 8eea99a8	11-Sep-2023	Qi Zheng <zhengqi.arch@bytedance.com>	nfsd: dynamically allocate the nfsd-reply shrinker In preparation for implementing lockless slab shrink, use new APIs to dynamically allocate the nfsd-reply shrinker, so that it can be freed asynchronously via RCU. Then it doesn't need to wait for RCU read-side critical section when releasing the struct nfsd_net. Link: https://lkml.kernel.org/r/20230911094444.68966-34-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Neil Brown <neilb@suse.de> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Tom Talpey <tom@talpey.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
# e7421ce7	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Rename struct svc_cacherep The svc_ prefix is identified with the SunRPC layer. Although the duplicate reply cache caches RPC replies, it is only for the NFS protocol. Rename the struct to better reflect its purpose. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# cb18eca4	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Remove svc_rqst::rq_cacherep Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# c135e126	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Refactor the duplicate reply cache shrinker Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# a9507f6a	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Replace nfsd_prune_bucket() Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache. A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# ff0d1693	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Rename nfsd_reply_cache_alloc() For readability, rename to match the other helpers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 35308e7f	09-Jul-2023	Chuck Lever <chuck.lever@oracle.com>	NFSD: Refactor nfsd_reply_cache_free_locked() To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held. Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock). For readability, rename the helpers nfsd_cacherep_<verb>. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# ed9ab734	16-Jun-2023	Jeff Layton <jlayton@kernel.org>	nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net Commit f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache into nfsd startup, but didn't account for the stats counters, which can be accessed before nfsd is ever started. The result can be a NULL pointer dereference when someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still shut down. This is a regression and a user-triggerable oops in the right situation: - non-x86_64 arch - /proc/fs/nfsd is mounted in the namespace - nfsd is not started in the namespace - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats" Although this is easy to trigger on some arches (like aarch64), on x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the fixed_percpu_data. That struct looks just enough like a newly initialized percpu var to allow nfsd_reply_cache_stats_show to access it without Oopsing. Move the initialization of the per-net+per-cpu reply-cache counters back into nfsd_init_net, while leaving the rest of the reply cache allocations to be done at nfsd startup time. Kudos to Eirik who did most of the legwork to track this down. Cc: stable@vger.kernel.org # v6.3+ Fixes: f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") Reported-and-tested-by: Eirik Fuller <efuller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# cee4db19	08-Jan-2023	Chuck Lever <chuck.lever@oracle.com>	SUNRPC: Refactor RPC server dispatch method Currently, svcauth_gss_accept() pre-reserves response buffer space for the RPC payload length and GSS sequence number before returning to the dispatcher, which then adds the header's accept_stat field. The problem is the accept_stat field is supposed to go before the length and seq_num fields. So svcauth_gss_release() has to relocate the accept_stat value (see svcauth_gss_prepare_to_wrap()). To enable these fields to be added to the response buffer in the correct (final) order, the pointer to the accept_stat has to be made available to svcauth_gss_accept() so that it can set it before reserving space for the length and seq_num fields. As a first step, move the pointer to the location of the accept_stat field into struct svc_rqst. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 8dd41d70	08-Jan-2023	Chuck Lever <chuck.lever@oracle.com>	SUNRPC: Push svcxdr_init_encode() into svc_process_common() Now that all vs_dispatch functions invoke svcxdr_init_encode(), it is common code and can be pushed down into the generic RPC server. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 64776611	22-Sep-2022	ChenXiaoSong <chenxiaosong2@huawei.com>	nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show(). Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# e33c267a	31-May-2022	Roman Gushchin <roman.gushchin@linux.dev>	mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
# fd5e363e	23-May-2022	Julian Schroeder <jumaco@amazon.com>	nfsd: destroy percpu stats counters after reply cache shutdown Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed. Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# add1511c	28-Sep-2021	Chuck Lever <chuck.lever@oracle.com>	NFSD: Streamline the rare "found" case Move a rarely called function call site out of the hot path. This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 0f29ce32	28-Sep-2021	Chuck Lever <chuck.lever@oracle.com>	NFSD: Skip extra computation for RC_NOCACHE case Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 378a6109	30-Sep-2021	Chuck Lever <chuck.lever@oracle.com>	NFSD: De-duplicate hash bucket indexing Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 7578b2f6	30-Sep-2021	Chuck Lever <chuck.lever@oracle.com>	NFSD: Remove be32_to_cpu() from DRC hash function Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for global spinlock elimination"), billed as a clean-up, added be32_to_cpu() to the DRC hash function without explanation. That commit removed two comments that state that byte-swapping in the hash function is unnecessary without explaining whether there was a need for that change. On some Intel CPUs, the swab32 instruction is known to cause a CPU pipeline stall. be32_to_cpu() does not add extra randomness, since the hash multiplication is done /before/ shifting to the high-order bits of the result. As a micro-optimization, remove the unnecessary transform from the DRC hash function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 8847ecc9	20-Sep-2021	Chuck Lever <chuck.lever@oracle.com>	NFSD: Optimize DRC bucket pruning DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting). I added a trace_printk() in prune_bucket() to see just how long it takes to prune. Here are two ends of the spectrum: prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining ... prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining Pruning latency is noticeable on fast transports with fast storage. By noticeable, I mean that the latency measured here in the worst case is the same order of magnitude as the round trip time for cached server operations. We could do something like moving expired entries to an expired list and then free them later instead of freeing them right in prune_bucket(). But simply limiting the number of entries that can be pruned by a lookup is simple and retains more entries in the cache, making the DRC somewhat more effective. Comparison with a 70/30 fio 8KB 12 thread direct I/O test: Before: write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets WRITE: 1848726 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds) After: write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets WRITE: 1891144 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds) Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# e567b98c	06-Jan-2021	Amir Goldstein <amir73il@gmail.com>	nfsd: protect concurrent access to nfsd stats counters nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 8c38b705	14-Sep-2020	Rik van Riel <riel@surriel.com>	silence nfscache allocation warnings with kvzalloc silence nfscache allocation warnings with kvzalloc Currently nfsd_reply_cache_init attempts hash table allocation through kmalloc, and manually falls back to vzalloc if that fails. This makes the code a little larger than needed, and creates a significant amount of serial console spam if you have enough systems. Switching to kvzalloc gets rid of the allocation warnings, and makes the code a little cleaner too as a side effect. Freeing of nn->drc_hashtbl is already done using kvfree currently. Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# c25bf185	03-Jun-2020	J. Bruce Fields <bfields@redhat.com>	nfsd: safer handling of corrupted c_type This can only happen if there's a bug somewhere, so let's make it a WARN not a printk. Also, I think it's safest to ignore the corruption rather than trying to fix it by removing a cache entry. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 027690c7	01-Jun-2020	J. Bruce Fields <bfields@redhat.com>	nfsd4: make drc_slab global, not per-net I made every global per-network-namespace instead. But perhaps doing that to this slab was a step too far. The kmem_cache_create call in our net init method also seems to be responsible for this lockdep warning: [ 45.163710] Unable to find swap-space signature [ 45.375718] trinity-c1 (855): attempted to duplicate a private mapping with mremap. This is not supported. [ 46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix this program [ 51.011723] [ 51.013378] ====================================================== [ 51.013875] WARNING: possible circular locking dependency detected [ 51.014378] 5.2.0-rc2 #1 Not tainted [ 51.014672] ------------------------------------------------------ [ 51.015182] trinity-c2/886 is trying to acquire lock: [ 51.015593] 000000005405f099 (slab_mutex){+.+.}, at: slab_attr_store+0xa2/0x130 [ 51.016190] [ 51.016190] but task is already holding lock: [ 51.016652] 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500 [ 51.017266] [ 51.017266] which lock already depends on the new lock. [ 51.017266] [ 51.017909] [ 51.017909] the existing dependency chain (in reverse order) is: [ 51.018497] [ 51.018497] -> #1 (kn->count#43){++++}: [ 51.018956] __lock_acquire+0x7cf/0x1a20 [ 51.019317] lock_acquire+0x17d/0x390 [ 51.019658] __kernfs_remove+0x892/0xae0 [ 51.020020] kernfs_remove_by_name_ns+0x78/0x110 [ 51.020435] sysfs_remove_link+0x55/0xb0 [ 51.020832] sysfs_slab_add+0xc1/0x3e0 [ 51.021332] __kmem_cache_create+0x155/0x200 [ 51.021720] create_cache+0xf5/0x320 [ 51.022054] kmem_cache_create_usercopy+0x179/0x320 [ 51.022486] kmem_cache_create+0x1a/0x30 [ 51.022867] nfsd_reply_cache_init+0x278/0x560 [ 51.023266] nfsd_init_net+0x20f/0x5e0 [ 51.023623] ops_init+0xcb/0x4b0 [ 51.023928] setup_net+0x2fe/0x670 [ 51.024315] copy_net_ns+0x30a/0x3f0 [ 51.024653] create_new_namespaces+0x3c5/0x820 [ 51.025257] unshare_nsproxy_namespaces+0xd1/0x240 [ 51.025881] ksys_unshare+0x506/0x9c0 [ 51.026381] __x64_sys_unshare+0x3a/0x50 [ 51.026937] do_syscall_64+0x110/0x10b0 [ 51.027509] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.028175] [ 51.028175] -> #0 (slab_mutex){+.+.}: [ 51.028817] validate_chain+0x1c51/0x2cc0 [ 51.029422] __lock_acquire+0x7cf/0x1a20 [ 51.029947] lock_acquire+0x17d/0x390 [ 51.030438] __mutex_lock+0x100/0xfa0 [ 51.030995] mutex_lock_nested+0x27/0x30 [ 51.031516] slab_attr_store+0xa2/0x130 [ 51.032020] sysfs_kf_write+0x11d/0x180 [ 51.032529] kernfs_fop_write+0x32a/0x500 [ 51.033056] do_loop_readv_writev+0x21d/0x310 [ 51.033627] do_iter_write+0x2e5/0x380 [ 51.034148] vfs_writev+0x170/0x310 [ 51.034616] do_pwritev+0x13e/0x160 [ 51.035100] __x64_sys_pwritev+0xa3/0x110 [ 51.035633] do_syscall_64+0x110/0x10b0 [ 51.036200] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.036924] [ 51.036924] other info that might help us debug this: [ 51.036924] [ 51.037876] Possible unsafe locking scenario: [ 51.037876] [ 51.038556] CPU0 CPU1 [ 51.039130] ---- ---- [ 51.039676] lock(kn->count#43); [ 51.040084] lock(slab_mutex); [ 51.040597] lock(kn->count#43); [ 51.041062] lock(slab_mutex); [ 51.041320] [ 51.041320] * DEADLOCK * [ 51.041320] [ 51.041793] 3 locks held by trinity-c2/886: [ 51.042128] #0: 000000001f55e152 (sb_writers#5){.+.+}, at: vfs_writev+0x2b9/0x310 [ 51.042739] #1: 00000000c7d6c034 (&of->mutex){+.+.}, at: kernfs_fop_write+0x25b/0x500 [ 51.043400] #2: 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500 Reported-by: kernel test robot <lkp@intel.com> Fixes: 3ba75830ce17 "drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 0b175b18	02-May-2020	Chuck Lever <chuck.lever@oracle.com>	NFSD: Add tracepoints to NFSD's duplicate reply cache Try to capture DRC failures. Two additional clean-ups: - Introduce Doxygen-style comments for the main entry points - Remove a dprintk that fires for an allocation failure. This was the only dprintk in the REPCACHE class. Reported-by: kbuild test robot <lkp@intel.com> [ cel: force typecast for display of checksum values ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
# 78e70e78	06-Aug-2019	He Zhe <zhe.he@windriver.com>	nfsd4: Fix kernel crash when reading proc file reply_cache_stats reply_cache_stats uses wrong parameter as seq file private structure and thus causes the following kernel crash when users read /proc/fs/nfsd/reply_cache_stats BUG: kernel NULL pointer dereference, address: 00000000000001f9 PGD 0 P4D 0 Oops: 0000 [#3] SMP PTI CPU: 6 PID: 1502 Comm: cat Tainted: G D 5.3.0-rc3+ #1 Hardware name: Intel Corporation Broadwell Client platform/Basking Ridge, BIOS BDW-E2R1.86C.0118.R01.1503110618 03/11/2015 RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0 Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00 RSP: 0018:ffffaa520106fe08 EFLAGS: 00010246 RAX: 000000cfe1a77123 RBX: 0000000000000000 RCX: 0000000000291b46 RDX: 000000cf00000000 RSI: 0000000000000006 RDI: 0000000000291b28 RBP: ffffaa520106fe20 R08: 0000000000000006 R09: 000000cfe17e55dd R10: ffffa424e47c0000 R11: 000000000000030b R12: 0000000000000001 R13: ffffa424e5697000 R14: 0000000000000001 R15: ffffa424e5697000 FS: 00007f805735f580(0000) GS:ffffa424f8f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000001f9 CR3: 00000000655ce005 CR4: 00000000003606e0 Call Trace: seq_read+0x194/0x3e0 __vfs_read+0x1b/0x40 vfs_read+0x95/0x140 ksys_read+0x61/0xe0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f805728b861 Code: fe ff ff 50 48 8d 3d 86 b4 09 00 e8 79 e0 01 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 d9 19 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54 RSP: 002b:00007ffea1ce3c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f805728b861 RDX: 0000000000020000 RSI: 00007f8057183000 RDI: 0000000000000003 RBP: 00007f8057183000 R08: 00007f8057182010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000559a60e8ff10 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 Modules linked in: CR2: 00000000000001f9 ---[ end trace 01613595153f0cba ]--- RIP: 0010:nfsd_reply_cache_stats_show+0x3b/0x2d0 Code: 41 54 49 89 f4 48 89 fe 48 c7 c7 b3 10 33 88 53 bb e8 03 00 00 e8 88 82 d1 ff bf 58 89 41 00 e8 eb c5 85 00 48 83 eb 01 75 f0 <41> 8b 94 24 f8 01 00 00 48 c7 c6 be 10 33 88 4c 89 ef bb e8 03 00 RSP: 0018:ffffaa52004b3e08 EFLAGS: 00010246 RAX: 0000002bab45a7c6 RBX: 0000000000000000 RCX: 0000000000291b4c RDX: 0000002b00000000 RSI: 0000000000000004 RDI: 0000000000291b28 RBP: ffffaa52004b3e20 R08: 0000000000000004 R09: 0000002bab1c8c7a R10: ffffa424e5500000 R11: 00000000000002a9 R12: 0000000000000001 R13: ffffa424e4475000 R14: 0000000000000001 R15: ffffa424e4475000 FS: 00007f805735f580(0000) GS:ffffa424f8f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000001f9 CR3: 00000000655ce005 CR4: 00000000003606e0 Killed Fixes: 3ba75830ce17 ("nfsd4: drc containerization") Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 689d7ba4	05-Jun-2019	J. Bruce Fields <bfields@redhat.com>	nfsd: fix cleanup of nfsd_reply_cache_init on failure The failure to unregister the shrinker results will result in corruption when the nfsd_net is freed. Also clean up the drc_slab while we're here. Reported-by: syzbot+83a43746cebef3508b49@syzkaller.appspotmail.com Fixes: db17b61765c2 ("nfsd4: drc containerization") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 3ba75830	17-May-2019	J. Bruce Fields <bfields@redhat.com>	nfsd4: drc containerization The nfsd duplicate reply cache should not be shared between network namespaces. The most straightforward way to fix this is just to move every global in the code to per-net-namespace memory, so that's what we do. Still todo: sort out which members of nfsd_stats should be global and which per-net-namespace. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# b401170f	16-May-2019	J. Bruce Fields <bfields@redhat.com>	nfsd: don't call nfsd_reply_cache_shutdown twice The caller is cleaning up on ENOMEM, don't try to do it here too. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# ca79b0c2	28-Dec-2018	Arun KS <arunks@codeaurora.org>	mm: convert totalram_pages and totalhigh_pages variables to atomic totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 736c6625	01-Oct-2018	Trond Myklebust <trondmy@gmail.com>	knfsd: Improve lookup performance in the duplicate reply cache using an rbtree Use an rbtree to ensure the lookup/insert of an entry in a DRC bucket is O(log(N)). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# ed00c2f6	03-Oct-2018	Trond Myklebust <trondmy@gmail.com>	knfsd: Further simplify the cache lookup Order the structure so that the key can be compared using memcmp(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 76ecec21	01-Oct-2018	Trond Myklebust <trondmy@gmail.com>	knfsd: Simplify NFS duplicate replay cache Simplify the duplicate replay cache by initialising the preallocated cache entry, so that we can use it as a key for the cache lookup. Note that the 99.999% case we want to optimise for is still the one where the lookup fails, and we have to add this entry to the cache, so preinitialising should not cause a performance penalty. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 3e87da51	01-Oct-2018	Trond Myklebust <trondmy@gmail.com>	knfsd: Remove dead code from nfsd_cache_lookup The preallocated cache entry is always set to type RC_NOCACHE, and that type isn't changed until we later call nfsd_cache_update(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# fad953ce	12-Jun-2018	Kees Cook <keescook@chromium.org>	treewide: Use array_size() in vzalloc() The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| vzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| vzalloc( - sizeof(char) * COUNT + COUNT , ...) \| vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) \| vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) \| vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
# 7e5d0e0d	27-Mar-2018	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: Do not refuse to serve out of cache Currently the knfsd replay cache appears to try to refuse replying to retries that come within 200ms of the cache entry being created. That makes limited sense in today's world of high speed TCP. After a TCP disconnection, a client can very easily reconnect and retry an rpc in less than 200ms. If this logic drops that retry, however, the client may be quite slow to retry again. This logic is original to the first reply cache implementation in 2.1, and may have made more sense for UDP clients that retried much more frequently. After this patch we will still drop on finding the original request still in progress. We may want to fix that as well at some point, though it's less likely. Note that svc_check_conn_limits is often the cause of those disconnections. We may want to fix that some day. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# b2441318	01-Nov-2017	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 5b5e0928	27-Feb-2017	Alexey Dobriyan <adobriyan@gmail.com>	lib/vsprintf.c: remove %Z support Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 8f97514b	26-Oct-2016	Jeff Layton <jlayton@kernel.org>	nfsd: more robust allocation failure handling in nfsd_reply_cache_init Currently, we try to allocate the cache as a single, large chunk, which can fail if no big chunks of memory are available. We _do_ try to size it according to the amount of memory in the box, but if the server is started well after boot time, then the allocation can fail due to memory fragmentation. Fall back to doing a vzalloc if the kcalloc fails, and switch the shutdown code to do a kvfree to handle freeing correctly. Reported-by: Olaf Hering <olaf@aepfle.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 3e80dbcd	04-Nov-2015	Jeff Layton <jlayton@kernel.org>	nfsd: remove recurring workqueue job to clean DRC We have a shrinker, we clean out the cache when nfsd is shut down, and prune the chains on each request. A recurring workqueue job seems like unnecessary overhead. Just remove it. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# e79017dd	13-Sep-2015	Julia Lawall <Julia.Lawall@lip6.fr>	nfsd: drop null test before destroy functions Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) { $kmem_cache_destroy\\|mempool_destroy\\|dma_pool_destroy$(x); x = NULL; -} // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a68465c9	19-Mar-2015	Kinglong Mee <kinglongmee@gmail.com>	NFSD: Error out when register_shrinker() fail If register_shrinker() failed, nfsd will cause a NULL pointer access as, [ 9250.875465] nfsd: last server has exited, flushing export cache [ 9251.427270] BUG: unable to handle kernel NULL pointer dereference at (null) [ 9251.427393] IP: [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.427579] PGD 13e4d067 PUD 13e4c067 PMD 0 [ 9251.427633] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 9251.427706] Modules linked in: ip6t_rpfilter ip6t_REJECT bnep bluetooth xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs xfs microcode ppdev serio_raw pcspkr xor libcrc32c raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core nfsd(OE-) auth_rpcgss nfs_acl lockd sunrpc(E) ata_generic pata_acpi [ 9251.428240] CPU: 0 PID: 1557 Comm: rmmod Tainted: G OE 3.16.0-rc2+ #22 [ 9251.428366] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 9251.428496] task: ffff880000849540 ti: ffff8800136f4000 task.ti: ffff8800136f4000 [ 9251.428593] RIP: 0010:[<ffffffff8136fc29>] [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.428696] RSP: 0018:ffff8800136f7ea0 EFLAGS: 00010207 [ 9251.428751] RAX: 0000000000000000 RBX: ffffffffa0116d48 RCX: dead000000200200 [ 9251.428814] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0116d48 [ 9251.428876] RBP: ffff8800136f7ea0 R08: ffff8800136f4000 R09: 0000000000000001 [ 9251.428939] R10: 8080808080808080 R11: 0000000000000000 R12: ffffffffa011a5a0 [ 9251.429002] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000018ac090 [ 9251.429064] FS: 00007fb9acef0740(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000 [ 9251.429164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9251.429221] CR2: 0000000000000000 CR3: 0000000031a17000 CR4: 00000000001407f0 [ 9251.429306] Stack: [ 9251.429410] ffff8800136f7eb8 ffffffff8136fcdd ffffffffa0116d20 ffff8800136f7ed0 [ 9251.429511] ffffffff8118a0f2 0000000000000000 ffff8800136f7ee0 ffffffffa00eb765 [ 9251.429610] ffff8800136f7ef0 ffffffffa010e93c ffff8800136f7f78 ffffffff81104ac2 [ 9251.429709] Call Trace: [ 9251.429755] [<ffffffff8136fcdd>] list_del+0xd/0x30 [ 9251.429896] [<ffffffff8118a0f2>] unregister_shrinker+0x22/0x40 [ 9251.430037] [<ffffffffa00eb765>] nfsd_reply_cache_shutdown+0x15/0x90 [nfsd] [ 9251.430106] [<ffffffffa010e93c>] exit_nfsd+0x9/0x6cd [nfsd] [ 9251.430192] [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200 [ 9251.430280] [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90 [ 9251.430395] [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b [ 9251.430457] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08 [ 9251.430691] RIP [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0 [ 9251.430755] RSP <ffff8800136f7ea0> [ 9251.430805] CR2: 0000000000000000 [ 9251.431033] ---[ end trace 080f3050d082b4ea ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 4d152e2c	19-Nov-2014	Jeff Layton <jlayton@kernel.org>	sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it In a later patch, we're going to need some atomic bit flags. Since that field will need to be an unsigned long, we mitigate that space consumption by migrating some other bitflags to the new field. Start with the rq_secure flag. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# ef9b16dc	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: Reorder nfsd_cache_match to check more powerful discriminators first We would normally expect the xid and the checksum to be the best discriminators. Check them before looking at the procedure number, etc. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 89a26b3d	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: split DRC global spinlock into per-bucket locks Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 31e60f52	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: convert num_drc_entries to an atomic_t ...so we can remove the spinlocking around it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 11acf6ef	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: Remove the cache_hash list Now that the lru list is per-bucket, we don't need a second list for searches. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# bedd4b61	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: convert the lru list into a per-bucket thing Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 7142b98d	06-Aug-2014	Trond Myklebust <trond.myklebust@primarydata.com>	nfsd: Clean up drc cache in preparation for global spinlock elimination Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# b3d8d128	17-Jun-2014	Jeff Layton <jlayton@kernel.org>	nfsd: clean up sparse endianness warnings in nfscache.c We currently hash the XID to determine a hash bucket to use for the reply cache entry, which is fed into hash_32 without byte-swapping it. Add __force to make sparse happy, and add some comments to explain why. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 1b19453d	05-Jun-2014	Jeff Layton <jlayton@kernel.org>	nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry Currently, the DRC cache pruner will stop scanning the list when it hits an entry that is RC_INPROG. It's possible however for a call to take a very long time. In that case, we don't want it to block other entries from being pruned if they are expired or we need to trim the cache to get back under the limit. Fix the DRC cache pruner to just ignore RC_INPROG entries. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a0ef5e19	05-Dec-2013	Jeff Layton <jlayton@kernel.org>	nfsd: don't try to reuse an expired DRC entry off the list Currently when we are processing a request, we try to scrape an expired or over-limit entry off the list in preference to allocating a new one from the slab. This is unnecessarily complicated. Just use the slab layer. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 781c2a5a	02-Dec-2013	Jeff Layton <jlayton@kernel.org>	nfsd: when reusing an existing repcache entry, unhash it first The DRC code will attempt to reuse an existing, expired cache entry in preference to allocating a new one. It'll then search the cache, and if it gets a hit it'll then free the cache entry that it was going to reuse. The cache code doesn't unhash the entry that it's going to reuse however, so it's possible for it end up designating an entry for reuse and then subsequently freeing the same entry after it finds it. This leads it to a later use-after-free situation and usually some list corruption warnings or an oops. Fix this by simply unhashing the entry that we intend to reuse. That will mean that it's not findable via a search and should prevent this situation from occurring. Cc: stable@vger.kernel.org # v3.10+ Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: g. artim <gartim@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 1ab6c499	27-Aug-2013	Dave Chinner <dchinner@redhat.com>	fs: convert fs shrinkers to new scan/count API Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# c8c797f9	05-Apr-2013	Wei Yongjun <yongjun_wei@trendmicro.com.cn>	nfsd: make symbol nfsd_reply_cache_shrinker static symbol 'nfsd_reply_cache_shrinker' only used within this file. It should be static. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 0733c7ba	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: scale up the number of DRC hash buckets with cache size We've now increased the size of the duplicate reply cache by quite a bit, but the number of hash buckets has not changed. So, we've gone from an average hash chain length of 16 in the old code to 4096 when the cache is its largest. Change the code to scale out the number of buckets with the max size of the cache. At the same time, we also need to fix the hash function since the existing one isn't really suitable when there are more than 256 buckets. Move instead to use the stock hash_32 function for this. Testing on a machine that had 2048 buckets showed that this gave a smaller longest:average ratio than the existing hash function: The formula here is longest hash bucket searched divided by average number of entries per bucket at the time that we saw that longest bucket: old hash: 68/(39258/2048) == 3.547404 hash_32: 45/(33773/2048) == 2.728807 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 98d821bd	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: keep stats on worst hash balancing seen so far The typical case with the DRC is a cache miss, so if we keep track of the max number of entries that we've ever walked over in a search, then we should have a reasonable estimate of the longest hash chain that we've ever seen. With that, we'll also keep track of the total size of the cache when we see the longest chain. In the case of a tie, we prefer to track the smallest total cache size in order to properly gauge the worst-case ratio of max vs. avg chain length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a2f999a3	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: add new reply_cache_stats file in nfsdfs For presenting statistics relating to duplicate reply cache. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 6c6910cd	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: track memory utilization by the DRC Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 9dc56143	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: break out comparator into separate function Break out the function that compares the rqstp and checksum against a reply cache entry. While we're at it, track the efficacy of the checksum over the NFS data by tracking the cases where we would have incorrectly matched a DRC entry if we had not tracked it or the length. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 0b9ea37f	27-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: eliminate one of the DRC cache searches The most common case is to do a search of the cache, followed by an insert. In the case where we have to allocate an entry off the slab, then we end up having to redo the search, which is wasteful. Better optimize the code for the common case by eliminating the initial search of the cache and always preallocating an entry. In the case of a cache hit, we'll end up just freeing that entry but that's preferable to an extra search. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# ac534ff2	15-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: fix startup order in nfsd_reply_cache_init If we end up doing "goto out_nomem" in this function, we'll call nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and free entries, but that list may not be initialized yet if the server is starting up for the first time. It's also possible for the shrinker to kick in before we've initialized the LRU list. Rearrange the initialization so that the LRU list_head and cache size are initialized before doing any of the allocations that might fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a517b608	18-Mar-2013	Jeff Layton <jlayton@kernel.org>	nfsd: only unhash DRC entries that are in the hashtable It's not safe to call hlist_del() on a newly initialized hlist_node. That leads to a NULL pointer dereference. Only do that if the entry is hashed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# b67bfe0d	27-Feb-2013	Sasha Levin <sasha.levin@oracle.com>	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 56edc86b	15-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum kbuild test robot says: tree: git://linux-nfs.org/~bfields/linux.git for-3.9 head: deb4534f4f3be7aea7d9d24c3b0d58f370cbf9ef commit: 01a7decf75930925322c5efc87af0b5e58eb8650 [32/44] nfsd: keep a checksum of the first 256 bytes of request config: i386-randconfig-x088 (attached as .config) All warnings: fs/nfsd/nfscache.c: In function 'nfsd_cache_csum': >> fs/nfsd/nfscache.c:266:9: warning: comparison of distinct pointer types lacks a cast [enabled by default] vim +266 fs/nfsd/nfscache.c 250 __wsum csum; 251 struct xdr_buf buf = &rqstp->rq_arg; 252 const unsigned char p = buf->head[0].iov_base; 253 size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len, 254 RC_CSUMLEN); 255 size_t len = min(buf->head[0].iov_len, csum_len); 256 257 /* rq_arg.head first / 258 csum = csum_partial(p, len, 0); 259 csum_len -= len; 260 261 / Continue into page array */ 262 idx = buf->page_base / PAGE_SIZE; 263 base = buf->page_base & ~PAGE_MASK; 264 while (csum_len) { 265 p = page_address(buf->pages[idx]) + base; > 266 len = min(PAGE_SIZE - base, csum_len); 267 csum = csum_partial(p, len, csum); 268 csum_len -= len; 269 base = 0; 270 ++idx; 271 } 272 return csum; 273 } 274 Signed-off-by: Jeff Layton <jlayton@redhat.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 1ac83629	14-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: fix comments on nfsd_cache_lookup Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 01a7decf	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: keep a checksum of the first 256 bytes of request Now that we're allowing more DRC entries, it becomes a lot easier to hit problems with XID collisions. In order to mitigate those, calculate a checksum of up to the first 256 bytes of each request coming in and store that in the cache entry, along with the total length of the request. This initially used crc32, but Chuck Lever and Jim Rees pointed out that crc32 is probably more heavyweight than we really need for generating these checksums, and recommended looking at using the same routines that are used to generate checksums for IP packets. On an x86_64 KVM guest measurements with ftrace showed ~800ns to use csum_partial vs ~1750ns for crc32. The difference probably isn't terribly significant, but for now we may as well use csum_partial. Signed-off-by: Jeff Layton <jlayton@redhat.com> Stones-thrown-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 5976687a	03-Feb-2013	Jeff Layton <jlayton@kernel.org>	sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# b4e7f2c9	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: register a shrinker for DRC cache entries Since we dynamically allocate them now, allow the system to call us up to release them if it gets low on memory. Since these entries aren't replaceable, only free ones that are expired or that are over the cap. The the seeks value is set to '1' however to indicate that freeing the these entries is low-cost. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# aca8a23d	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: add recurring workqueue job to clean the cache It's not sufficient to only clean the cache when requests come in. What if we have a flurry of activity and then the server goes idle? Add a workqueue job that will clean the cache every RC_EXPIRE period. Care is taken to only run this when we expect to have entries expiring. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 2c6b691c	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: when updating an entry with RC_NOCACHE, just free it There's no need to keep entries around that we're declaring RC_NOCACHE. Ditto if there's a problem with the entry. With this change too, there's no need to test for RC_UNUSED in the search function. If the entry's in the hash table then it's either INPROG or DONE. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 13cc8a78	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: remove the cache_disabled flag With the change to dynamically allocate entries, the cache is never disabled on the fly. Remove this flag. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 0338dd15	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: dynamically allocate DRC entries The existing code keeps a fixed-size cache of 1024 entries. This is much too small for a busy server, and wastes memory on an idle one. This patch changes the code to dynamically allocate and free these cache entries. A cap on the number of entries is retained, but it's much larger than the existing value and now scales with the amount of low memory in the machine. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 0ee0bf7e	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: track the number of DRC entries in the cache Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 56c2548b	04-Feb-2013	Jeff Layton <jlayton@kernel.org>	nfsd: always move DRC entries to the end of LRU list when updating timestamp ...otherwise, we end up with the list ordering wrong. Currently, it's not a problem since we skip RC_INPROG entries, but keeping the ordering strict will be necessary for a later patch that adds a cache cleaner. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# a4a3ec32	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: break out hashtable search into separate function Later, we'll need more than one call site for this, so break it out into a new function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# d1a0774d	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: clean up and clarify the cache expiration code Add a preprocessor constant for the expiry time of cache entries, and move the test for an expired entry into a function. Note that the current code does not test for RC_INPROG. It just assumes that it won't take more than 2 minutes to fill out an in-progress entry. I'm not sure how valid that assumption is though, so let's just ensure that we never consider an RC_INPROG entry to be expired. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 25e6b8b0	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: remove redundant test from nfsd_reply_cache_free Entries can only get a c_type of RC_REPLBUFF iff they are RC_DONE. Therefore the test for RC_DONE isn't necessary here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# f09841fd	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: add alloc and free functions for DRC entries Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 8a8bc40d	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: create a dedicated slabcache for DRC entries Currently we use kmalloc() which wastes a little bit of memory on each allocation since it's a power of 2 allocator. Since we're allocating a 1024 of these now, and may need even more later, let's create a new slabcache for them. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 6dc88895	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: remove unneeded spinlock in nfsd_cache_update The locking rules for cache entries say that locking the cache_lock isn't needed if you're just touching the current entry. Earlier in this function we set rp->c_state to RC_UNUSED without any locking, so I believe it's ok to do the same here. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 7b9e8522	28-Jan-2013	Jeff Layton <jlayton@kernel.org>	nfsd: fix IPv6 address handling in the DRC Currently, it only stores the first 16 bytes of any address. struct sockaddr_in6 is 28 bytes however, so we're currently ignoring the last 12 bytes of the address. Expand the c_addr field to a sockaddr_in6, and cast it to a sockaddr_in as necessary. Also fix the comparitor to use the existing RPC helpers for this. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 1091006c	23-Jan-2011	J. Bruce Fields <bfields@redhat.com>	nfsd: turn on reply cache for NFSv4 It's sort of ridiculous that we've never had a working reply cache for NFSv4. On the other hand, we may still not: our current reply cache is likely not very good, especially in the TCP case (which is the only case that matters for v4). What we really need here is some serious testing. Anyway, here's a start. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
# 5a0e3ad6	24-Mar-2010	Tejun Heo <tj@kernel.org>	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
# 7663dacd	04-Dec-2009	J. Bruce Fields <bfields@citi.umich.edu>	nfsd: remove pointless paths in file headers The new .h files have paths at the top that are now out of date. While we're here, just remove all of those from fs/nfsd; they never served any purpose. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# 9a74af21	03-Dec-2009	Boaz Harrosh <bharrosh@panasas.com>	nfsd: Move private headers to source directory Lots of include/linux/nfsd/* headers are only used by nfsd module. Move them to the source directory Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# 341eb184	03-Dec-2009	Boaz Harrosh <bharrosh@panasas.com>	nfsd: Source files #include cleanups Now that the headers are fixed and carry their own wait, all fs/nfsd/ source files can include a minimal set of headers. and still compile just fine. This patch should improve the compilation speed of the nfsd module. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# cf0a586c	31-Mar-2009	Greg Banks <gnb@sgi.com>	knfsd: fix reply cache memory corruption Fix a regression in the reply cache introduced when the code was converted to use proper Linux lists. When a new entry needs to be inserted, the case where all the entries are currently being used by threads is not correctly detected. This can result in memory corruption and a crash. In the current code this is an extremely unlikely corner case; it would require the machine to have 1024 nfsd threads and all of them to be busy at the same time. However, upcoming reply cache changes make this more likely; a crash due to this problem was actually observed in field. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# fca4217c	31-Mar-2009	Greg Banks <gnb@sgi.com>	knfsd: reply cache cleanups Make REQHASH() an inline function. Rename hash_list to cache_hash. Fix an obsolete comment. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# d5c3428b	09-Nov-2007	J. Bruce Fields <bfields@citi.umich.edu>	nfsd: fail module init on reply cache init failure If the reply cache initialization fails due to a kmalloc failure, currently we try to soldier on with a reduced (or nonexistant) reply cache. Better to just fail immediately: the failure is then much easier to understand and debug, and it could save us complexity in some later code. (But actually, it doesn't help currently because the cache is also turned off in some odd failure cases; we should probably find a better way to handle those failure cases some day.) Fix some minor style problems while we're at it, and rename nfsd_cache_init() to remove the need for a comment describing it. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
# 27459f09	12-Feb-2007	Chuck Lever <chuck.lever@oracle.com>	[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses Expand the rq_addr field to allow it to contain larger addresses. Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then everywhere the 'sockaddr_in' was referenced, we use instead an accessor function (svc_addr_in) which safely casts the _storage to _in. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 4b3bb06b	08-Dec-2006	Yan Burman <burman.yan@gmail.com>	[PATCH] nfsd: replace kmalloc+memset with kcalloc + simplify NULL check Replace kmalloc+memset with kcalloc and simplify Signed-off-by: Yan Burman <burman.yan@gmail.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# c7afef1f	20-Oct-2006	Al Viro <viro@ftp.linux.org.uk>	[PATCH] nfsd: misc endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# f116629d	26-Jun-2006	Akinobu Mita <mita@miraclelinux.com>	[PATCH] fs: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under fs/. Cc: Ian Kent <raven@themaw.net> Acked-by: Joel Becker <joel.becker@oracle.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Hans Reiser <reiserfs-dev@namesys.com> Cc: Urban Widmark <urban@teststation.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# f99d49ad	07-Nov-2005	Jesper Juhl <jesper.juhl@gmail.com>	[PATCH] kfree cleanup: fs This is the fs/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in fs/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 1da177e4	16-Apr-2005	Linus Torvalds <torvalds@ppc970.osdl.org>	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!