History log of /linux-master/net/sunrpc/cache.c
Revision Date Author Comments
# 8032bf12 09-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 81895a65 05-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
- T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 37324e6b 31-Mar-2022 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Cache deferral injection

Cache deferral injection stress-tests the cache deferral logic as
well as upper layer protocol deferred request handlers. This
facility is for developers and professional testers to ensure
coverage of the rqst deferral code paths. To date, we haven't
had an adequate way to ensure these code paths are covered
during testing, short of temporary code changes to force their
use.

A file called /sys/kernel/debug/fail_sunrpc/ignore-cache-wait
enables administrators to disable cache deferral injection while
allowing other types of sunrpc errors to be injected. The default
setting is that cache deferral injection is enabled (ignore=false).

To enable support for cache deferral injection,
CONFIG_FAULT_INJECTION, CONFIG_FAULT_INJECTION_DEBUG_FS, and
CONFIG_SUNRPC_DEBUG must all be set to "Y".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 359745d7 21-Jan-2022 Muchun Song <songmuchun@bytedance.com>

proc: remove PDE_DATA() completely

Remove PDE_DATA() completely and replace it with pde_data().

[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]

Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0c217d50 01-Sep-2021 NeilBrown <neilb@suse.de>

SUNRPC: improve error response to over-size gss credential

When the NFS server receives a large gss (kerberos) credential and tries
to pass it up to rpc.svcgssd (which is deprecated), it triggers an
infinite loop in cache_read().

cache_request() always returns -EAGAIN, and this causes a "goto again".

This patch:
- changes the error to -E2BIG to avoid the infinite loop, and
- generates a WARN_ONCE when rsi_request first sees an over-sized
credential. The warning suggests switching to gssproxy.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 4b5cff7e 27-Nov-2020 Roberto Bergantinos Corpas <rbergant@redhat.com>

sunrpc: clean-up cache downcall

We can simplify code around cache_downcall unifying memory
allocations using kvmalloc. This has the benefit of getting rid of
cache_slow_downcall (and queue_io_mutex), and also matches userland
allocation size and limits.

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 27a1e8a0 19-Oct-2020 Roberto Bergantinos Corpas <rbergant@redhat.com>

sunrpc: raise kernel RPC channel buffer size

Its possible that using AUTH_SYS and mountd manage-gids option a
user may hit the 8k RPC channel buffer limit. This have been observed
on field, causing unanswered RPCs on clients after mountd fails to
write on channel :

rpc.mountd[11231]: auth_unix_gid: error writing reply

Userland nfs-utils uses a buffer size of 32k (RPC_CHAN_BUF_SIZE), so
lets match those two.

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0aa99c4d 21-Sep-2020 J. Bruce Fields <bfields@redhat.com>

sunrpc: simplify do_cache_clean

Is it just me, or is the logic written in a slightly convoluted way?

I find it a little easier to read this way.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9dbc1f45 15-Sep-2020 Xu Wang <vulab@iscas.ac.cn>

sunrpc: cache : Replace seq_printf with seq_puts

seq_puts is a lot cheaper than seq_printf, so use that to print
literal strings.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 43e33924 05-Apr-2020 Yihao Wu <wuyihao@linux.alibaba.com>

SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge

Deleting list entry within hlist_for_each_entry_safe is not safe unless
next pointer (tmp) is protected too. It's not, because once hash_lock
is released, cache_clean may delete the entry that tmp points to. Then
cache_purge can walk to a deleted entry and tries to double free it.

Fix this bug by holding only the deleted entry's reference.

Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NeilBrown <neilb@suse.de>
[ cel: removed unused variable ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 78a947f5 01-Mar-2020 Trond Myklebust <trondmy@gmail.com>

sunrpc: Add tracing for cache events

Add basic tracing for debugging the sunrpc cache events.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 277f27e2 01-Mar-2020 Trond Myklebust <trondmy@gmail.com>

SUNRPC/cache: Allow garbage collection of invalid cache entries

If the cache entry never gets initialised, we want the garbage
collector to be able to evict it. Otherwise if the upcall daemon
fails to initialise the entry, we end up never expiring it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[ cel: resolved a merge conflict ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 65286b88 01-Mar-2020 Trond Myklebust <trondmy@gmail.com>

nfsd: export upcalls must not return ESTALE when mountd is down

If the rpc.mountd daemon goes down, then that should not cause all
exports to start failing with ESTALE errors. Let's explicitly
distinguish between the cache upcall cases that need to time out,
and those that do not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 51cae673 19-Feb-2020 Amol Grover <frextrite@gmail.com>

sunrpc: Pass lockdep expression to RCU lists

detail->hash_table[] is traversed using hlist_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of detail->hash_lock.

Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


# 97a32539 03-Feb-2020 Alexey Dobriyan <adobriyan@gmail.com>

proc: convert everything to "struct proc_ops"

The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

llseek => proc_lseek
unlocked_ioctl => proc_ioctl

xxx => proc_xxx

delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 809fe3c5 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

sunrpc: clean up cache entry add/remove from hashtable

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1d821637 06-Jan-2020 Trond Myklebust <trondmy@gmail.com>

sunrpc: Fix potential leaks in sunrpc_cache_unhash()

When we unhash the cache entry, we need to handle any pending upcalls
by calling cache_fresh_unlocked().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f559935e 20-Oct-2017 Arnd Bergmann <arnd@arndb.de>

nfs: use time64_t internally

The timestamps for the cache are all in boottime seconds, so they
don't overflow 32-bit values, but the use of time_t is deprecated
because it generally does overflow when used with wall-clock time.

There are multiple possible ways of avoiding it:

- leave time_t, which is safe here, but forces others to
look into this code to determine that it is over and over.

- use a more generic type, like 'int' or 'long', which is known
to be sufficient here but loses the documentation of referring
to timestamps

- use ktime_t everywhere, and convert into seconds in the few
places where we want realtime-seconds. The conversion is
sometimes expensive, but not more so than the conversion we
do today.

- use time64_t to clarify that this code is safe. Nothing would
change for 64-bit architectures, but it is slightly less
efficient on 32-bit architectures.

Without a clear winner of the three approaches above, this picks
the last one, favouring readability over a small performance
loss on 32-bit architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 5fcaf698 01-Oct-2019 Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

sunrpc: fix crash when cache_head become valid before update

I was investigating a crash in our Virtuozzo7 kernel which happened in
in svcauth_unix_set_client. I found out that we access m_client field
in ip_map structure, which was received from sunrpc_cache_lookup (we
have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
these field looks uninitialized (m_client == 0x74 don't look like a
pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.

It looks like the problem appeared from our previous fix to sunrpc (1):
commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued
request")

And we've also found a patch already fixing our patch (2):
commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")

Though the crash is eliminated, I think the core of the problem is not
completely fixed:

Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
cache_fresh_locked which was added in (1) to fix crash. These way
cache_is_valid won't say the cache is valid anymore and in
svcauth_unix_set_client the function cache_check will return error
instead of 0, and we don't count entry as initialized.

But it looks like we need to remove cache_fresh_locked completely in
sunrpc_cache_lookup:

In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
that cache_requests with no readers also release corresponding
cache_head, to fix their leak. We with Vasily were not sure if
cache_fresh_locked and cache_fresh_unlocked should be used in pair or
not, so we've guessed to use them in pair.

Now we see that we don't want the CACHE_VALID bit set here by
cache_fresh_locked, as "valid" means "initialized" and there is no
initialization in sunrpc_cache_add_entry. Both expiry_time and
last_refresh are not used in cache_fresh_unlocked code-path and also not
required for the initial fix.

So to conclude cache_fresh_locked was called by mistake, and we can just
safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
ideologically better for me. Hope I don't miss something here.

Here is our crash backtrace:
[13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
[13108726.326365] IP: [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
[13108726.326448] PGD 0
[13108726.326468] Oops: 0002 [#1] SMP
[13108726.326497] Modules linked in: nbd isofs xfs loop kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4
[13108726.327173] dns_resolver cls_u32 binfmt_misc arptable_filter arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat
[13108726.327817] ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm dm_multipath ghash_clmulni_intel uas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd tg3 smartpqi scsi_transport_sas mdio libcrc32c i2c_core usb_storage ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_cumulative_82_0_r1]
[13108726.328403] CPU: 35 PID: 63742 Comm: nfsd ve: 51332 Kdump: loaded Tainted: G W O ------------ 3.10.0-862.20.2.vz7.73.29 #1 73.29
[13108726.328491] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
[13108726.328554] task: ffffa0a6a41b1160 ti: ffffa0c2a74bc000 task.ti: ffffa0c2a74bc000
[13108726.328610] RIP: 0010:[<ffffffffc01f79eb>] [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
[13108726.328706] RSP: 0018:ffffa0c2a74bfd80 EFLAGS: 00010246
[13108726.328750] RAX: 0000000000000001 RBX: ffffa0a6183ae000 RCX: 0000000000000000
[13108726.328811] RDX: 0000000000000074 RSI: 0000000000000286 RDI: ffffa0c2a74bfcf0
[13108726.328864] RBP: ffffa0c2a74bfe00 R08: ffffa0bab8c22960 R09: 0000000000000001
[13108726.328916] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0a32aa7f000
[13108726.328969] R13: ffffa0a6183afac0 R14: ffffa0c233d88d00 R15: ffffa0c2a74bfdb4
[13108726.329022] FS: 0000000000000000(0000) GS:ffffa0e17f9c0000(0000) knlGS:0000000000000000
[13108726.329081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13108726.332311] CR2: 0000000000000074 CR3: 00000026a1b28000 CR4: 00000000007607e0
[13108726.334606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13108726.336754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13108726.338908] PKRU: 00000000
[13108726.341047] Call Trace:
[13108726.343074] [<ffffffff8a2c78b4>] ? groups_alloc+0x34/0x110
[13108726.344837] [<ffffffffc01f5eb4>] svc_set_client+0x24/0x30 [sunrpc]
[13108726.346631] [<ffffffffc01f2ac1>] svc_process_common+0x241/0x710 [sunrpc]
[13108726.348332] [<ffffffffc01f3093>] svc_process+0x103/0x190 [sunrpc]
[13108726.350016] [<ffffffffc07d605f>] nfsd+0xdf/0x150 [nfsd]
[13108726.351735] [<ffffffffc07d5f80>] ? nfsd_destroy+0x80/0x80 [nfsd]
[13108726.353459] [<ffffffff8a2bf741>] kthread+0xd1/0xe0
[13108726.355195] [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
[13108726.356896] [<ffffffff8a9556dd>] ret_from_fork_nospec_begin+0x7/0x21
[13108726.358577] [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
[13108726.360240] Code: 4c 8b 45 98 0f 8e 2e 01 00 00 83 f8 fe 0f 84 76 fe ff ff 85 c0 0f 85 2b 01 00 00 49 8b 50 40 b8 01 00 00 00 48 89 93 d0 1a 00 00 <f0> 0f c1 02 83 c0 01 83 f8 01 0f 8e 53 02 00 00 49 8b 44 24 38
[13108726.363769] RIP [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
[13108726.365530] RSP <ffffa0c2a74bfd80>
[13108726.367179] CR2: 0000000000000074

Fixes: d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f69d6d8e 18-Aug-2019 Jeff Layton <jeff.layton@primarydata.com>

sunrpc: add a new cache_detail operation for when a cache is flushed

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 64a38e84 26-Jul-2019 Dave Wysochanski <dwysocha@redhat.com>

SUNRPC: Track writers of the 'channel' file to improve cache_listeners_exist

The sunrpc cache interface is susceptible to being fooled by a rogue
process just reading a 'channel' file. If this happens the kernel
may think a valid daemon exists to service the cache when it does not.
For example, the following may fool the kernel:
cat /proc/net/rpc/auth.unix.gid/channel

Change the tracking of readers to writers when considering whether a
listener exists as all valid daemon processes either open a channel
file O_RDWR or O_WRONLY. While this does not prevent a rogue process
from "stealing" a message from the kernel, it does at least improve
the kernels perception of whether a valid process servicing the cache
exists.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a57caf8c 08-Jul-2019 Denis Efremov <efremov@linux.com>

sunrpc/cache: remove the exporting of cache_seq_next

The function cache_seq_next is declared static and marked
EXPORT_SYMBOL_GPL, which is at best an odd combination. Because the
function is not used outside of the net/sunrpc/cache.c file it is
defined in, this commit removes the EXPORT_SYMBOL_GPL() marking.

Fixes: d48cf356a130 ("SUNRPC: Remove non-RCU protected lookup")
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ddc64d0a 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 363

Based on 1 normalized pattern(s):

released under terms in gpl version 2 see copying

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 5 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081035.689962394@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 9d69338c 21-Mar-2019 NeilBrown <neilb@suse.com>

sunrpc/cache: handle missing listeners better.

If no handler (such as rpc.mountd) has opened
a cache 'channel', the sunrpc cache responds to
all lookup requests with -ENOENT. This is particularly
important for the auth.unix.gid cache which is
optional.

If the channel was open briefly and an upcall was written to it,
this upcall remains pending even when the handler closes the
channel. When an upcall is pending, the code currently
doesn't check if there are still listeners, it only performs
that check before sending an upcall.

As the cache treads a recently closes channel (closed less than
30 seconds ago) as "potentially still open", there is a
reasonable sized window when a request can become pending
in a closed channel, and thereby block lookups indefinitely.

This can easily be demonstrated by running
cat /proc/net/rpc/auth.unix.gid/channel

and then trying to mount an NFS filesystem from this host. It
will block indefinitely (unless mountd is run with --manage-gids,
or krb5 is used).

When cache_check() finds that an upcall is pending, it should
perform the "cache_listeners_exist()" exist test. If no
listeners do exist, the request should be negated.

With this change in place, there can still be a 30second wait on
mount, until the cache gives up waiting for a handler to come
back, but this is much better than an indefinite wait.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d58431ea 04-Apr-2019 NeilBrown <neilb@suse.com>

sunrpc: don't mark uninitialised items as VALID.

A recent commit added a call to cache_fresh_locked()
when an expired item was found.
The call sets the CACHE_VALID flag, so it is important
that the item actually is valid.
There are two ways it could be valid:
1/ If ->update has been called to fill in relevant content
2/ if CACHE_NEGATIVE is set, to say that content doesn't exist.

An expired item that is waiting for an update will be neither.
Setting CACHE_VALID will mean that a subsequent call to cache_put()
will be likely to dereference uninitialised pointers.

So we must make sure the item is valid, and we already have code to do
that in try_to_negate_entry(). This takes the hash lock and so cannot
be used directly, so take out the two lines that we need and use them.

Now cache_fresh_locked() is certain to be called only on
a valid item.

Cc: stable@kernel.org # 2.6.35
Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4ecd55ea 28-Nov-2018 Vasily Averin <vvs@virtuozzo.com>

sunrpc: fix cache_head leak due to queued request

After commit d202cce8963d, an expired cache_head can be removed from the
cache_detail's hash.

However, the expired cache_head may be waiting for a reply from a
previously submitted request. Such a cache_head has an increased
refcounter and therefore it won't be freed after cache_put(freeme).

Because the cache_head was removed from the hash it cannot be found
during cache_clean() and can be leaked forever, together with stalled
cache_request and other taken resources.

In our case we noticed it because an entry in the export cache was
holding a reference on a filesystem.

Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup")
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: stable@kernel.org # 2.6.35
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 1863d77f 01-Oct-2018 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock

Now that the reader functions are all RCU protected, use a regular
spinlock rather than a reader/writer lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d48cf356 01-Oct-2018 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Remove non-RCU protected lookup

Clean up the cache code by removing the non-RCU protected lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ae74136b 02-Oct-2018 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock

Instead of the reader/writer spinlock, allow cache lookups to use RCU
for looking up entries. This is more efficient since modifications can
occur while other entries are being looked up.

Note that for now, we keep the reader/writer spinlock until all users
have been converted to use RCU-safe freeing of their cache entries.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b92a8fab 01-Oct-2018 Trond Myklebust <trondmy@gmail.com>

SUNRPC: Refactor sunrpc_cache_lookup

This is a trivial split into lookup and insert functions, no change in
behavior.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# d6444062 23-Mar-2018 Joe Perches <joe@perches.com>

net: Use octal not symbolic permissions

Prefer the direct use of octal for permissions.

Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.

Miscellanea:

o Whitespace neatening around these conversions.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b68e6ee 13-Feb-2018 NeilBrown <neilb@suse.com>

SUNRPC: cache: ignore timestamp written to 'flush' file.

The interface for flushing the sunrpc auth cache was poorly
designed and has caused problems a number of times.

The design is that you write a timestamp, and all entries
created before that time are discarded.
The most obvious problem is that this is not what people
actually want. They want to just flush the whole cache.
The 1-second granularity can be a problem, as can the use
of wall-clock time.

A current problem is that code will write the current time to
this file - expecting it to clear everything - and if the
seconds number ticks over before this timestamp is checked,
the test "then >= now" fails, and a full flush isn't forced.

So lets just drop the subtleties and always flush the whole
cache. The worst this could do is impose an extra cost
refilling it, but that would require someone to be using
non-standard tools.

We still report an error if the string written is not a number,
but we cause any valid number to flush the whole cache.

Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a9a08845 11-Feb-2018 Linus Torvalds <torvalds@linux-foundation.org>

vfs: do bulk POLL* -> EPOLL* replacement

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d34971a6 17-Oct-2017 Bhumika Goyal <bhumirks@gmail.com>

sunrpc: make the function arg as const

Make the struct cache_detail *tmpl argument of the function
cache_create_net as const as it is only getting passed to kmemup having
the argument as const void *.
Add const to the prototype too.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ade994f4 02-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

net: annotate ->poll() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 8ccc8691 07-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

sunrpc: use simple_read_from_buffer for reading cache flush

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 6489a8f4 07-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

sunrpc: update the comments of sunrpc proc path

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 863d7d9c 07-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

sunrpc/nfs: cleanup procfs/pipefs entry in cache_detail

Record flush/channel/content entries is useless, remove them.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# 471a930a 07-Feb-2017 Kinglong Mee <kinglongmee@gmail.com>

SUNRPC: Drop all entries from cache_detail when cache_purge()

User always free the cache_detail after sunrpc_destroy_cache_detail(),
so, it must cleanup up entries that left in the cache_detail,
otherwise, NULL reference may be caused when using the left entries.

Also, NeriBrown suggests "write a stand-alone cache_purge()."

v3, move the cache_fresh_unlocked() out of write lock,
v2, a stand-alone cache_purge(), not only for sunrpc_destroy_cache_detail

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2b477c00 21-Dec-2016 Neil Brown <neilb@suse.com>

svcrpc: free contexts immediately on PROC_DESTROY

We currently handle a client PROC_DESTROY request by turning it
CACHE_NEGATIVE, setting the expired time to now, and then waiting for
cache_clean to clean it up later. Since we forgot to set the cache's
nextcheck value, that could take up to 30 minutes. Also, though there's
probably no real bug in this case, setting CACHE_NEGATIVE directly like
this probably isn't a great idea in general.

So let's just remove the entry from the cache directly, and move this
bit of cache manipulation to a helper function.

Signed-off-by: Neil Brown <neilb@suse.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c935bc5 14-Nov-2016 Peter Zijlstra <peterz@infradead.org>

locking/atomic, kref: Add kref_read()

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

atomic_read(&kref->refcount)
kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 77b00bc0 01-Sep-2016 Ke Wang <ke.wang@spreadtrum.com>

sunrpc: queue work on system_power_efficient_wq

sunrpc uses workqueue to clean cache regulary. There is no real dependency
of executing work on the cpu which queueing it.

On a idle system, especially for a heterogeneous systems like big.LITTLE,
it is observed that the big idle cpu was woke up many times just to service
this work, which against the principle of power saving. It would be better
if we can schedule it on a cpu which the scheduler believes to be the most
appropriate one.

After apply this patch, system_wq will be replaced by
system_power_efficient_wq for sunrpc. This functionality is enabled when
CONFIG_WQ_POWER_EFFICIENT is selected.

Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>


# d8d29138 02-Jun-2016 NeilBrown <neilb@suse.com>

sunrpc: remove 'inuse' flag from struct cache_detail.

This field is not currently in use.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a6ab1e81 03-Mar-2016 NeilBrown <neilb@suse.com>

sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race

sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer
set. In this case it aborts the queuing of the upcall.
However it has already taken a new counted reference on "h" and
doesn't "put" it, even though it frees the data structure holding the reference.

So let's delay the "cache_get" until we know we need it.

Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b7052cd7 18-Feb-2016 Stefan Hajnoczi <stefanha@redhat.com>

sunrpc/cache: fix off-by-one in qword_get()

The qword_get() function NUL-terminates its output buffer. If the input
string is in hex format \xXXXX... and the same length as the output
buffer, there is an off-by-one:

int qword_get(char **bpp, char *dest, int bufsize)
{
...
while (len < bufsize) {
...
*dest++ = (h << 4) | l;
len++;
}
...
*dest = '\0';
return len;
}

This patch ensures the NUL terminator doesn't fall outside the output
buffer.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 5955102c 22-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

wrappers for ->i_mutex access

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 77862036 15-Oct-2015 Neil Brown <neilb@suse.com>

sunrpc/cache: make cache flushing more reliable.

The caches used to store sunrpc authentication information can be
flushed by writing a timestamp to a file in /proc.

This timestamp has a one-second resolution and any entry in cache that
was last_refreshed *before* that time is treated as expired.

This is problematic as it is not possible to reliably flush the cache
without interrupting NFS service.
If the current time is written to the "flush" file, any entry that was
added since the current second started will still be treated as valid.
If one second beyond than the current time is written to the file
then no entries can be valid until the second ticks over. This will
mean that no NFS request will be handled for up to 1 second.

To resolve this issue we make two changes:

1/ treat an entry as expired if the timestamp when it was last_refreshed
is before *or the same as* the expiry time. This means that current
code which writes out the current time will now flush the cache
reliably.

2/ when a new entry in added to the cache - set the last_refresh timestamp
to 1 second *beyond* the current flush time, when that not in the
past.
This ensures that newly added entries will always be valid.

Now that we have a very reliable way to flush the cache, and also
since we are using "since-boot" timestamps which are monotonic,
change cache_purge() to set the smallest future flush_time which
will work, and leave it there: don't revert to '1'.

Also disable the setting of the 'flush_time' far into the future.
That has never been useful and is now awkward as it would cause
last_refresh times to be strange.
Finally: if a request is made to set the 'flush_time' to the current
second, assume the intent is to flush the cache and advance it, if
necessary, to 1 second beyond the current 'flush_time' so that all
active entries will be deemed to be expired.

As part of this we need to add a 'cache_detail' arg to cache_init()
and cache_fresh_locked() so they can find the current ->flush_time.

Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Olaf Kirch <okir@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 129e5824 26-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

sunrpc: Switch to using hash list instead single list

Switch using list_head for cache_head in cache_detail,
it is useful of remove an cache_head entry directly from cache_detail.

v8, using hash list, not head list

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c8c081b7 26-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

sunrpc/nfsd: Remove redundant code by exports seq_operations functions

Nfsd has implement a site of seq_operations functions as sunrpc's cache.
Just exports sunrpc's codes, and remove nfsd's redundant codes.

v8, same as v6

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 9936f2ae 26-Jul-2015 Kinglong Mee <kinglongmee@gmail.com>

sunrpc: Store cache_detail in seq_file's private directly

Cleanup.

Just store cache_detail in seq_file's private,
an allocated handle is redundant.

v8, same as v6.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 41416f23 15-Apr-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

lib/string_helpers.c: change semantics of string_escape_mem

The current semantics of string_escape_mem are inadequate for one of its
current users, vsnprintf(). If that is to honour its contract, it must
know how much space would be needed for the entire escaped buffer, and
string_escape_mem provides no way of obtaining that (short of allocating a
large enough buffer (~4 times input string) to let it play with, and
that's definitely a big no-no inside vsnprintf).

So change the semantics for string_escape_mem to be more snprintf-like:
Return the size of the output that would be generated if the destination
buffer was big enough, but of course still only write to the part of dst
it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
It is then up to the caller to detect whether output was truncated and to
append a '\0' if desired. Also, we must output partial escape sequences,
otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
they previously contained.

This also fixes a bug in the escaped_string() helper function, which used
to unconditionally pass a length of "end-buf" to string_escape_mem();
since the latter doesn't check osz for being insanely large, it would
happily write to dst. For example, kasprintf(GFP_KERNEL, "something and
then %pE", ...); is an easy way to trigger an oops.

In test-string_helpers.c, the -ENOMEM test is replaced with testing for
getting the expected return value even if the buffer is too small. We
also ensure that nothing is written (by relying on a NULL pointer deref)
if the output size is 0 by passing NULL - this has to work for
kasprintf("%pE") to work.

In net/sunrpc/cache.c, I think qword_add still has the same semantics.
Someone should definitely double-check this.

In fs/proc/array.c, I made the minimum possible change, but longer-term it
should stop poking around in seq_file internals.

[andriy.shevchenko@linux.intel.com: simplify qword_add]
[andriy.shevchenko@linux.intel.com: add missed curly braces]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1711fd9a 07-Mar-2015 Al Viro <viro@ZenIV.linux.org.uk>

sunrpc: fix braino in ->poll()

POLL_OUT isn't what callers of ->poll() are expecting to see; it's
actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
bit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Cc: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1b2e122d 28-Nov-2014 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

sunrpc/cache: convert to use string_escape_str()

There is nice kernel helper to escape a given strings by provided rules. Let's
use it instead of custom approach.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bfields@redhat.com: fix length calculation]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ecca063b 15-Apr-2014 Kinglong Mee <kinglongmee@gmail.com>

SUNRPC: Fix printk that is not only for nfsd

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 63862b5b 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>

net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 056785ea 12-Dec-2013 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

net/sunrpc/cache: simplify code by using hex_pack_byte()

hex_pack_byte() is a fast way to convert a byte in its ASCII representation. We
may use it instead of custom approach.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# a95e691f 14-Jul-2013 Al Viro <viro@zeniv.linux.org.uk>

rpc_create_*_dir: don't bother with qstr

just pass the name

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0bebc633 12-Jun-2013 NeilBrown <neilb@suse.de>

sunrpc: Don't schedule an upcall on a replaced cache entry.

When a cache entry is replaced, the "expiry_time" get set to
zero by a call to "cache_fresh_locked(..., 0)" at the end of
"sunrpc_cache_update".

This low expiry time makes cache_check() think that the 'refresh_age'
is negative, so the 'age' is comparatively large and a refresh is
triggered.
However refreshing a replaced entry it pointless, it cannot achieve
anything useful.

So teach cache_check to ignore a low refresh_age when expiry_time
is zero.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 7715cde8 12-Jun-2013 NeilBrown <neilb@suse.de>

net/sunrpc: xpt_auth_cache should be ignored when expired.

commit d202cce8963d9268ff355a386e20243e8332b308
sunrpc: never return expired entries in sunrpc_cache_lookup

moved the 'entry is expired' test from cache_check to
sunrpc_cache_lookup, so that it happened early and some races could
safely be ignored.

However the ip_map (in svcauth_unix.c) has a separate single-item
cache which allows quick lookup without locking. An entry in this
case would not be subject to the expiry test and so could be used
well after it has expired.

This is not normally a big problem because the first time it is used
after it is expired an up-call will be scheduled to refresh the entry
(if it hasn't been scheduled already) and the old entry will then
be invalidated. So on the second attempt to use it after it has
expired, ip_map_cached_get will discard it.

However that is subtle and not ideal, so replace the "!cache_valid"
test with "cache_is_expired".
In doing this we drop the test on the "CACHE_VALID" bit. This is
unnecessary as the bit is never cleared, and an entry will only
be cached if the bit is set.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 013920eb 12-Jun-2013 NeilBrown <neilb@suse.de>

sunrpc/cache: ensure items removed from cache do not have pending upcalls.

It is possible for a race to set CACHE_PENDING after cache_clean()
has removed a cache entry from the cache.
If CACHE_PENDING is still set when the entry is finally 'put',
the cache_dequeue() will never happen and we can leak memory.

So set a new flag 'CACHE_CLEANED' when we remove something from
the cache, and don't queue any upcall if it is set.

If CACHE_PENDING is set before CACHE_CLEANED, the call that
cache_clean() makes to cache_fresh_unlocked() will free memory
as needed. If CACHE_PENDING is set after CACHE_CLEANED, the
test in sunrpc_cache_pipe_upcall will ensure that the memory
is not allocated.

Reported-by: <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2a1c7f53 12-Jun-2013 NeilBrown <neilb@suse.de>

sunrpc/cache: use cache_fresh_unlocked consistently and correctly.

cache_fresh_unlocked() is called when a cache entry
has been updated and ensures that if there were any
pending upcalls, they are cleared.

So every time we update a cache entry, we should call this,
and this should be the only way that we try to clear
pending calls (that sort of uniformity makes code sooo much
easier to read).

try_to_negate_entry() will (possibly) mark an entry as
negative. If it doesn't, it is because the entry already
is VALID.
So the entry will be valid on exit, so it is appropriate to
call cache_fresh_unlocked().
So tidy up try_to_negate_entry() to do that, and remove
partial open-coded cache_fresh_unlocked() from the one
call-site of try_to_negate_entry().

In the other branch of the 'switch(cache_make_upcall())',
we again have a partial open-coded version of cache_fresh_unlocked().
Replace that with a real call.

And again in cache_clean(), use a real call to cache_fresh_unlocked().

These call sites might previously have called
cache_revisit_request() if CACHE_PENDING wasn't set.
This is never necessary because cache_revisit_request() can
only do anything if the item is in the cache_defer_hash,
However any time that an item is added to the cache_defer_hash
(setup_deferral), the code immediately tests CACHE_PENDING,
and removes the entry again if it is clear. So all other
places we only need to 'cache_revisit_request' if we've
just cleared CACHE_PENDING.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f9e1aedc 12-Jun-2013 NeilBrown <neilb@suse.de>

sunrpc/cache: remove races with queuing an upcall.

We currently queue an upcall after setting CACHE_PENDING,
and dequeue after clearing CACHE_PENDING.
So a request should only be present when CACHE_PENDING is set.

However we don't combine the test and the enqueue/dequeue in
a protected region, so it is possible (if unlikely) for a race
to result in a request being queued without CACHE_PENDING set,
or a request to be absent despite CACHE_PENDING.

So: include a test for CACHE_PENDING inside the regions of
enqueue and dequeue where queue_lock is held, and abort
the operation if the value is not as expected.

Also remove the early 'return' from cache_dequeue() to ensure that it
always removes all entries: As there is no locking between setting
CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not
inconceivable for some other thread to clear CACHE_PENDING and then
someone else to set it and call sunrpc_cache_pipe_upcall, both before
the original threads completed the call.

With this, it perfectly safe and correct to:
- call cache_dequeue() if and only if we have just
cleared CACHE_PENDING
- call sunrpc_cache_pipe_upcall() (via cache_make_upcall)
if and only if we have just set CACHE_PENDING.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b6040f97 28-Mar-2013 chaoting fan <tingchaofan@gmail.com>

sunrpc: the cache_detail in cache_is_valid is unused any more

The cache_detail(*detail) in function cache_is_valid is not used any
more.

Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2e0fb404 29-Apr-2013 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib, net: make isodigit() public and use it

There are at least two users of isodigit(). Let's make it a public
function of ctype.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d9dda78b 31-Mar-2013 Al Viro <viro@zeniv.linux.org.uk>

procfs: new helper - PDE_DATA(inode)

The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a7823c79 22-Mar-2013 Alexey Khoroshilov <khoroshilov@ispras.ru>

SUNRPC/cache: add module_put() on error path in cache_open()

If kmalloc() fails in cache_open(), module cd->owner left locked.
The patch adds module_put(cd->owner) on this path.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# b67bfe0d 27-Feb-2013 Sasha Levin <sasha.levin@oracle.com>

hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 496ad9aa 23-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

new helper: file_inode(file)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d94af6de 04-Feb-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: move cache_detail->cache_request callback call to cache_read()

The reason to move cache_request() callback call from
sunrpc_cache_pipe_upcall() to cache_read() is that this garantees, that cache
access will be done userspace process context (only userspace process have
proper root context).
This is required for NFSd support in container: svc_export_request() (which is
cache_request callback) calls d_path(), which, in turn, traverse dentry up to
current->fs->root. Kernel threads always have global root, while container
have be in "root jail" - i.e. have it's own nested root.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 21cd1254 04-Feb-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function

Passing this pointer is redundant since it's stored on cache_detail structure,
which is also passed to sunrpc_cache_pipe_upcall () function.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2d438338 04-Feb-2013 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: rework cache upcall logic

For most of SUNRPC caches (except NFS DNS cache) cache_detail->cache_upcall is
redundant since all that it's implementations are doing is calling
sunrpc_cache_pipe_upcall() with proper function address argument.
Cache request function address is now stored on cache_detail structure and
thus all the code can be simplified.
Now, for those cache details, which doesn't have cache_upcall callback (the
only one, which still has is nfs_dns_resolve_template)
sunrpc_cache_pipe_upcall will be called instead.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# ec168676 04-Jan-2013 Yanchuan Nian <ycnian@gmail.com>

nfsd: Remove write permission from file content

The write function doesn't be implemented in file content, and it's meaningless
to write data into this file directly. Remove write permission from it.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 0db74d9a 23-Oct-2012 Weston Andros Adamson <dros@netapp.com>

SUNRPC: remove BUG_ON calls from cache_read

Replace BUG_ON() with WARN_ON_ONCE() in two parts of cache_read().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 212ba906 16-Jul-2012 Sasha Levin <levinsasha928@gmail.com>

SUNRPC: Prevent kernel stack corruption on long values of flush

The buffer size in read_flush() is too small for the longest possible values
for it. This can lead to a kernel stack corruption:

[ 43.047329] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff833e64b4
[ 43.047329]
[ 43.049030] Pid: 6015, comm: trinity-child18 Tainted: G W 3.5.0-rc7-next-20120716-sasha #221
[ 43.050038] Call Trace:
[ 43.050435] [<ffffffff836c60c2>] panic+0xcd/0x1f4
[ 43.050931] [<ffffffff833e64b4>] ? read_flush.isra.7+0xe4/0x100
[ 43.051602] [<ffffffff810e94e6>] __stack_chk_fail+0x16/0x20
[ 43.052206] [<ffffffff833e64b4>] read_flush.isra.7+0xe4/0x100
[ 43.052951] [<ffffffff833e6500>] ? read_flush_pipefs+0x30/0x30
[ 43.053594] [<ffffffff833e652c>] read_flush_procfs+0x2c/0x30
[ 43.053596] [<ffffffff812b9a8c>] proc_reg_read+0x9c/0xd0
[ 43.053596] [<ffffffff812b99f0>] ? proc_reg_write+0xd0/0xd0
[ 43.053596] [<ffffffff81250d5b>] do_loop_readv_writev+0x4b/0x90
[ 43.053596] [<ffffffff81250fd6>] do_readv_writev+0xf6/0x1d0
[ 43.053596] [<ffffffff812510ee>] vfs_readv+0x3e/0x60
[ 43.053596] [<ffffffff812511b8>] sys_readv+0x48/0xb0
[ 43.053596] [<ffffffff8378167d>] system_call_fastpath+0x1a/0x1f

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 203b42f7 21-Aug-2012 Tejun Heo <tj@kernel.org>

workqueue: make deferrable delayed_work initializer names consistent

Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>


# 200724a7 11-Jul-2012 NeilBrown <neilb@suse.de>

SUNRPC/cache: fix reporting of expired cache entries in 'content' file.

Entries that are in a sunrpc cache but are not valid should be reported
with a leading '#' so they look like a comment.
Commit d202cce8963d9 (sunrpc: never return expired entries in sunrpc_cache_lookup)
broke this for expired entries.

This particularly applies to entries that have been replaced by newer entries.
sunrpc_cache_update sets the expiry of the replaced entry to '0', but it
remains in the cache until the next 'cache_clean'.
The result is that if you

echo 0 2000000000 1 0 > /proc/net/rpc/auth.unix.gid/channel

several times, then

cat /proc/net/rpc/auth.unix.gid/content

It will display multiple entries for the one uid, which is at least confusing:

#uid cnt: gids...
0 1: 0
0 1: 0
0 1: 0

With this patch, expired entries are marked as comments so you get

#uid cnt: gids...
0 1: 0
# 0 1: 0
# 0 1: 0

These expired entries will never be seen by cache_check() as they are always
*after* a non-expired entry with the same key - so the extra check is only
needed in c_show()

Signed-off-by: NeilBrown <neilb@suse.de>

--
It's not a big problem, but it had me confused for a while, so it could
well confuse others.
Thanks,
NeilBrown
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 95c96174 14-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com>

net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d8d1749 17-Jan-2012 Dan Carpenter <dan.carpenter@oracle.com>

nfsd: don't allow zero length strings in cache_parse()

There is no point in passing a zero length string here and quite a
few of that cache_parse() implementations will Oops if count is
zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 2c5f8467 19-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: generic cache register routines removed

All cache users now uses network-namespace-aware routines, so generic ones
are obsolete.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>


# 0a402d5a 19-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: cache creation and destruction routines introduced

This patch prepares infrastructure for network namespace aware cache detail
allocation.
One note about adding network namespace link to cache structure. It's going to
be used later in NFS DNS cache parsing routine (nfs_dns_parse for rpc_pton()
call).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>


# 820f9442 25-Nov-2011 Stanislav Kinsbursky <skinsbursky@parallels.com>

SUNRPC: split cache creation and PipeFS registration

This precursor patch splits SUNRPC cache creation and PipeFS registartion.
It's required for latter split of NFS DNS resolver cache creation per network
namespace context and PipeFS registration/unregistration on MOUNT/UMOUNT
events.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 64f1426f 24-Jul-2011 Al Viro <viro@zeniv.linux.org.uk>

sunrpc: propagate umode_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f5c8593b 06-Dec-2011 Stanislav Kinsbursky <skinsbursky@parallels.com>

NFSd: use network-namespace-aware cache registering routines

v2: cache_register_net() and cache_unregister_net() GPL exports added

This is a cleanup patch. Hope, some day generic cache_register() and
cache_unregister() will be removed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# fdef7aa5 04-Jan-2011 J. Bruce Fields <bfields@redhat.com>

svcrpc: ensure cache_check caller sees updated entry

Supposes cache_check runs simultaneously with an update on a different
CPU:

cache_check task doing update
^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^

1. test for CACHE_VALID 1'. set entry->data
& !CACHE_NEGATIVE

2. use entry->data 2'. set CACHE_VALID

If the two memory writes performed in step 1' and 2' appear misordered
with respect to the reads in step 1 and 2, then the caller could get
stale data at step 2 even though it saw CACHE_VALID set on the cache
entry.

Add memory barriers to prevent this.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6bab93f8 03-Jan-2011 J. Bruce Fields <bfields@redhat.com>

svcrpc: take lock on turning entry NEGATIVE in cache_check

We attempt to turn a cache entry negative in place. But that entry may
already have been filled in by some other task since we last checked
whether it was valid, so we could be modifying an already-valid entry.
If nothing else there's a likely leak in such a case when the entry is
eventually put() and contents are not freed because it has
CACHE_NEGATIVE set.

So, take the cache_lock just as sunrpc_cache_update() does.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d76d1815 02-Jan-2011 J. Bruce Fields <bfields@redhat.com>

svcrpc: avoid double reply caused by deferral race

Commit d29068c431599fa "sunrpc: Simplify cache_defer_req and related
functions." asserted that cache_check() could determine success or
failure of cache_defer_req() by checking the CACHE_PENDING bit.

This isn't quite right.

We need to know whether cache_defer_req() created a deferred request,
in which case sending an rpc reply has become the responsibility of the
deferred request, and it is important that we not send our own reply,
resulting in two different replies to the same request.

And the CACHE_PENDING bit doesn't tell us that; we could have
succesfully created a deferred request at the same time as another
thread cleared the CACHE_PENDING bit.

So, partially revert that commit, to ensure that cache_check() returns
-EAGAIN if and only if a deferred request has been created.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>


# a6f8dbc6 04-Oct-2010 Arnd Bergmann <arnd@arndb.de>

sunrpc: remove the big kernel lock

The sunrpc cache_ioctl function does not need the big kernel lock
because it uses its own queue_lock already.

rpc_pipe_ioctl apparently should be using i_lock like the other
operations on the pipe file descriptor do.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 6038f373 15-Aug-2010 Arnd Bergmann <arnd@arndb.de>

llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>


# e33534d5 06-Oct-2010 NeilBrown <neilb@suse.de>

sunrpc/cache: centralise handling of size limit on deferred list.

We limit the number of 'defer' requests to DFR_MAX.

The imposition of this limit is spread about a bit - sometime we don't
add new things to the list, sometimes we remove old things.

Also it is currently applied to requests which we are 'waiting' for
rather than 'deferring'. This doesn't seem ideal as 'waiting'
requests are naturally limited by the number of threads.

So gather the DFR_MAX handling code to one place and only apply it to
requests that are actually being deferred.

This means that not all 'cache_deferred_req' structures go on the
'cache_defer_list, so we need to be careful when adding and removing
things.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# d29068c4 06-Oct-2010 NeilBrown <neilb@suse.de>

sunrpc: Simplify cache_defer_req and related functions.

The return value from cache_defer_req is somewhat confusing.
Various different error codes are returned, but the single caller is
only interested in success or failure.

In fact it can measure this success or failure itself by checking
CACHE_PENDING, which makes the point of the code more explicit.

So change cache_defer_req to return 'void' and test CACHE_PENDING
after it completes, to see if the request was actually deferred or
not.

Similarly setup_deferral and cache_wait_req don't need a return value,
so make them void and remove some code.

The call to cache_revisit_request (to guard against a race) is only
needed for the second call to setup_deferral, so move it out of
setup_deferral to after that second call. With the first call the
race is handled differently (by explicitly calling
'wait_for_completion').

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 277f68db 21-Sep-2010 NeilBrown <neilb@suse.de>

sunrpc: fix race in new cache_wait code.

If we set up to wait for a cache item to be filled in, and then find
that it is no longer pending, it could be that some other thread is
in 'cache_revisit_request' and has moved our request to its 'pending' list.
So when our setup_deferral calls cache_revisit_request it will find nothing to
put on the pending list, and do nothing.

We then return from cache_wait_req, thus leaving the 'sleeper'
on-stack structure open to being corrupted by subsequent stack usage.

However that 'sleeper' could still be on the 'pending' list that the
other thread is looking at and so any corruption could cause it to behave badly.

To avoid this race we simply take the same path as if the
'wait_for_completion_interruptible_timeout' was interrupted and if the
sleeper is no longer on the list (which it won't be) we wait on the
completion - which will ensure that any other cache_revisit_request
will have let go of the sleeper.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 4f42d0d5 27-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Make the /proc/net/rpc appear in net namespaces

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 593ce16b 27-Sep-2010 Pavel Emelyanov <xemul@parallels.com>

sunrpc: Add routines that allow registering per-net caches

Existing calls do the same, but for the init_net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e95dffa4 21-Sep-2010 NeilBrown <neilb@suse.de>

sunrpc/cache: fix recent breakage of cache_clean_deferred

commit 6610f720e9e8103c22d1f1ccf8fbb695550a571f
broke cache_clean_deferred as entries are no longer added to the
pending list for subsequent revisiting.

So put those requests back on the pending list.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e7f483ea 21-Sep-2010 Andy Shevchenko <andy.shevchenko@gmail.com>

sunrpc/cache: don't use custom hex_to_bin() converter

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 11174492 12-Aug-2010 NeilBrown <neilb@suse.de>

sunrpc/cache: change deferred-request hash table to use hlist.

Being a hash table, hlist is the best option.

There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 06497524 19-Sep-2010 J. Bruce Fields <bfields@redhat.com>

nfsd4: fix hang on fast-booting nfs servers

The last_close field of a cache_detail is initialized to zero, so the
condition

detail->last_close < seconds_since_boot() - 30

may be false even for a cache that was never opened.

However, we want to immediately fail upcalls to caches that were never
opened: in the case of the auth_unix_gid cache, especially, which may
never be opened by mountd (if the --manage-gids option is not set), we
want to fail the upcall immediately. Otherwise client requests will be
dropped unnecessarily on reboot.

Also document these conditions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 3211af11 26-Aug-2010 J. Bruce Fields <bfields@redhat.com>

svcrpc: cache deferral cleanup

Attempt to make obvious the first-try-sleeping-then-try-deferral logic
by putting that logic into a top-level function that calls helpers.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 6610f720 26-Aug-2010 J. Bruce Fields <bfields@redhat.com>

svcrpc: minor cache cleanup

Pull out some code into helper functions, fix a typo.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# f16b6e8d 12-Aug-2010 NeilBrown <neilb@suse.de>

sunrpc/cache: allow threads to block while waiting for cache update.

The current practice of waiting for cache updates by queueing the
whole request to be retried has (at least) two problems.

1/ With NFSv4, requests can be quite complex and re-trying a whole
request when a later part fails should only be a last-resort, not a
normal practice.

2/ Large requests, and in particular any 'write' request, will not be
queued by the current code and doing so would be undesirable.

In many cases only a very sort wait is needed before the cache gets
valid data.

So, providing the underlying transport permits it by setting
->thread_wait,
arrange to wait briefly for an upcall to be completed (as reflected in
the clearing of CACHE_PENDING).
If the short wait was not long enough and CACHE_PENDING is still set,
fall back on the old approach.

The 'thread_wait' value is set to 5 seconds when there are spare
threads, and 1 second when there are no spare threads.

These values are probably much higher than needed, but will ensure
some forward progress.

Note that as we only request an update for a non-valid item, and as
non-valid items are updated in place it is extremely unlikely that
cache_check will return -ETIMEDOUT. Normally cache_defer_req will
sleep for a short while and then find that the item is_valid.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c5b29f88 12-Aug-2010 NeilBrown <neilb@suse.de>

sunrpc: use seconds since boot in expiry cache

This protects us from confusion when the wallclock time changes.

We convert to and from wallclock when setting or reading expiry
times.

Also use seconds since boot for last_clost time.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# e2aa7f83 05-Aug-2010 Andrea Gelmini <andrea.gelmini@gelma.net>

net: sunrpc: removed duplicated #include

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# 8eab945c 01-Jul-2010 Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

sunrpc: make the cache cleaner workqueue deferrable

This patch makes the cache_cleaner workqueue deferrable, to prevent
unnecessary system wake-ups, which is very important for embedded
battery-powered devices.

do_cache_clean() is called every 30 seconds at the moment, and often
makes the system wake up from its power-save sleep state. With this
change, when the workqueue uses a deferrable timer, the
do_cache_clean() invocation will be delayed and combined with the
closest "real" wake-up. This improves the power consumption situation.

Note, I tried to create a DECLARE_DELAYED_WORK_DEFERRABLE() helper
macro, similar to DECLARE_DELAYED_WORK(), but failed because of the
way the timer wheel core stores the deferrable flag (it is the
LSBit in the time->base pointer). My attempt to define a static
variable with this bit set ended up with the "initializer element is
not constant" error.

Thus, I have to use run-time initialization, so I created a new
cache_initialize() function which is called once when sunrpc is
being initialized.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# d5724956 25-May-2010 Huang Weiyi <weiyi.huang@gmail.com>

sunrpc: Remove duplicated #include

Remove duplicated #include('s) in
net/sunrpc/cache.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# 9918ff26 19-May-2010 Frederic Weisbecker <fweisbec@gmail.com>

sunrpc: Pushdown the bkl from sunrpc cache ioctl

Pushdown the bkl to cache_ioctl_pipefs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Nfs <linux-nfs@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# 99df95a2 13-Apr-2010 Frederic Weisbecker <fweisbec@gmail.com>

sunrpc: Include missing smp_lock.h

Now that cache_ioctl_procfs() calls the bkl explicitly, we need to
include the relevant header as well.

This fixes the following build error:

net/sunrpc/cache.c: In function 'cache_ioctl_procfs':
net/sunrpc/cache.c:1355: error: implicit declaration of function 'lock_kernel'
net/sunrpc/cache.c:1359: error: implicit declaration of function 'unlock_kernel'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>


# d79b6f4d 29-Mar-2010 Frederic Weisbecker <fweisbec@gmail.com>

procfs: Push down the bkl from ioctl

Push down the bkl from procfs's ioctl main handler to its users.
Only three procfs users implement an ioctl (non unlocked) handler.
Turn them into unlocked_ioctl and push down the Devil inside.

v2: PDE(inode)->data doesn't need to be under bkl
v3: And don't forget to git-add the result
v4: Use wrappers to pushdown instead of an invasive and error prone
handlers surgery.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>


# a5990ea1 11-Mar-2010 Li Zefan <lizf@cn.fujitsu.com>

sunrpc/cache: fix module refcnt leak in a failure path

Don't forget to release the module refcnt if seq_open() returns failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# d202cce8 02-Feb-2010 NeilBrown <neilb@suse.de>

sunrpc: never return expired entries in sunrpc_cache_lookup

If sunrpc_cache_lookup finds an expired entry, remove it from
the cache and return a freshly created non-VALID entry instead.
This ensures that we only ever get a usable entry, or an
entry that will become usable once an update arrives.
i.e. we will never need to repeat the lookup.

This allows us to remove the 'is_expired' test from cache_check
(i.e. from cache_is_valid). cache_check should never get an expired
entry as 'lookup' will never return one. If it does happen - due to
inconvenient timing - then just accept it as still valid, it won't be
very much past it's use-by date.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2f50d8b6 02-Feb-2010 NeilBrown <neilb@suse.de>

sunrpc/cache: factor out cache_is_expired

This removes a tiny bit of code duplication, but more important
prepares for following patch which will perform the expiry check in
cache_lookup and the rest of the validity check in cache_check.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 3af4974e 02-Feb-2010 NeilBrown <neilb@suse.de>

sunrpc: don't keep expired entries in the auth caches.

currently expired entries remain in the auth caches as long
as there is a reference.
This was needed long ago when the auth_domain cache used the same
cache infrastructure. But since that (being a very different sort
of cache) was separated, this test is no longer needed.

So remove the test on refcnt and tidy up the surrounding code.

This allows the cache_dequeue call (which needed to be there to
drop a potentially awkward reference) can be moved outside of the
spinlock which is a better place for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f64f9e71 29-Nov-2009 Joe Perches <joe@perches.com>

net: Move && and || to end of previous line

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd68c374 09-Sep-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: avoid variable over-loading in cache_defer_req

In cache_defer_req, 'dreq' is used for two significantly different
values that happen to be of the same type.

This is both confusing, and makes it hard to extend the range of one of
the values as we will in the next patch.
So introduce 'discard' to take one of the values.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 67e7328f 09-Sep-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req

Using list_del_init is generally safer than list_del, and it will
allow us, in a subsequent patch, to see if an entry has already been
processed or not.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 908329f2 09-Sep-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.

The extra call to cache_revisit_request in cache_fresh_unlocked is not
needed, as should have been fairly clear at the time of
commit 4013edea9a0b6cdcb1fdf5d4011e47e068fd6efb

If there are requests to be revisited, then we can be sure that
CACHE_PENDING is set, so the second call is sufficient.

So remove the first call.
Then remove the 'new' parameter,
then remove the return value for cache_fresh_locked which is only used
to provide the value for 'new'.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9e4c6379 09-Sep-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: change cache_defer_req to return -ve error, not boolean.

As "cache_defer_req" does not sound like a predicate, having it return
a boolean value can be confusing. It is more consistent to return
0 for success and negative for error.

Exactly what error code to return is not important as we don't
differentiate between reasons why the request wasn't deferred,
we only care about whether it was deferred or not.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f7e86ab9 19-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: cache must take a reference to the cache detail's module on open()

Otherwise we Oops if the module containing the cache detail is removed
before all cache readers have closed the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 8854e82d 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Add an rpc_pipefs front end for the sunrpc cache code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 173912a6 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Move procfs-specific stuff out of the generic sunrpc cache code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# bc74b4f5 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Allow the cache_detail to specify alternative upcall mechanisms

For events that are rare, such as referral DNS lookups, it makes limited
sense to have a daemon constantly listening for upcalls on a channel. An
alternative in those cases might simply be to run the app that fills the
cache using call_usermodehelper_exec() and friends.

The following patch allows the cache_detail to specify alternative upcall
mechanisms for these particular cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# da77005f 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Remove the global temporary write buffer in net/sunrpc/cache.c

While we do want to protect against multiple concurrent readers and writers
on each upcall/downcall pipe, we don't want to limit concurrent reading and
writing to separate caches.

This patch therefore replaces the static buffer 'write_buf', which can only
be used by one writer at a time, with use of the page cache as the
temporary buffer for downcalls. We still fall back to using the the old
global buffer if the downcall is larger than PAGE_CACHE_SIZE, since this is
apparently needed by the SPKM security context initialisation.

It then replaces the use of the global 'queue_io_mutex' with the
inode->i_mutex in cache_read() and cache_write().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 5b7a1b9f 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Ensure we initialise the cache_detail before creating procfs files

Also ensure that we destroy those files before we destroy the cache_detail.
Otherwise, user processes might attempt to write into uninitialised caches.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 2da8ca26 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com>

NFSD: Clean up the idmapper warning...

What part of 'internal use' is so hard to understand?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 989a19b9 03-Aug-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: recheck cache validity after cache_defer_req

If cache_defer_req did not leave the request on a queue, then it could
possibly have waited long enough that the cache became valid. So check the
status after the call.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 5c4d2639 03-Aug-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: make sure deferred requests eventually get revisited.

While deferred requests normally get revisited quite quickly,
it is possible for a request to remain in the deferral queue
when the cache item is discarded. We can easily make sure that
doesn't happen by calling cache_revisit_request just before
the final 'put'.

Also there is a small chance that a race would cause one thread to
defer a request against a cache item while another thread is failing
to queue an upcall for that item. So when the upcall fails, make
sure to revisit all deferred requests.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# f866a819 03-Aug-2009 NeilBrown <neilb@suse.de>

sunrpc/cache: rename queue_loose to cache_dequeue

'loose' was a mis-spelling of 'lose', and even that wasn't a good
word choice.
So give this function a more useful name.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 6aad89c8 10-Jun-2009 Anton Blanchard <anton@samba.org>

sunrpc: align cache_clean work's timer

Align cache_clean work.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 99b76233 25-Mar-2009 Alexey Dobriyan <adobriyan@gmail.com>

proc 2/2: remove struct proc_dir_entry::owner

Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.

We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.

But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.

->read_proc/->write_proc were just fixed to not require ->owner for
protection.

rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.

Removing ->owner will also make PDE smaller.

So, let's nuke it.

Kudos to Jeff Layton for reminding about this, let's say, oversight.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>


# 24c3767e 23-Dec-2008 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: The sunrpc server code should not be used by out-of-tree modules

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# e7fe2336 02-May-2008 Denis V. Lunev <den@openvz.org>

sunrpc: assign PDE->data before gluing PDE into /proc tree

Simply replace proc_create and further data assigned with proc_create_data.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67eb6ff6 31-Jan-2008 J. Bruce Fields <bfields@citi.umich.edu>

svcrpc: move unused field from cache_deferred_req

This field is set once and never used; probably some artifact of an
earlier implementation idea.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 2ce8f047 28-Feb-2008 Wang Chen <wangchen@cn.fujitsu.com>

[SUNRPC]: Use proc_create() to setup ->proc_fops first

Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d2f7e79e 14-Jul-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Move exported symbol definitions after function declaration part 2

Do it for the server code...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# dbf847ec 08-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: allow cache_register to return error on failure

Newer server features such as nfsv4 and gss depend on proc to work, so a
failure to initialize the proc files they need should be treated as
fatal.

Thanks to Andrew Morton for style fix and compile fix in case where
CONFIG_NFSD_V4 is undefined.

Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# ffe9386b 12-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

nfsd: move cache proc (un)registration to separate function

Just some minor cleanup.

Also I don't see much point in trying to register further proc entries
if initial entries fail; so just stop trying in that case.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# df95a9d4 08-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: cache unregistration needn't return error

There's really nothing much the caller can do if cache unregistration
fails. And indeed, all any caller does in this case is print an error
and continue. So just return void and move the printk's inside
cache_unregister.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# a490c681 06-Nov-2007 J. Bruce Fields <bfields@citi.umich.edu>

knfsd: fix cache.c comment

The path here must be left over from some earlier draft; fix it. And do
some more minor cleanup while we're there.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 01b2969a 26-Oct-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: Prevent length underflow in read_flush()

Make sure we compare an unsigned length to an unsigned count in
read_flush().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


# 9a429c49 01-Jan-2008 Eric Dumazet <dada1@cosmosbay.com>

[NET]: Add some acquires/releases sparse annotations.

Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec931035 10-Oct-2007 Pavel Emelyanov <xemul@openvz.org>

[SUNRPC]: Make the sunrpc use the seq_open_private()

Just switch to the consolidated code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4011cd97 07-Aug-2007 Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Replace flush_workqueue() with cancel_work_sync() and friends

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 56b3d975 11-Jul-2007 Philippe De Muyter <phdm@macqel.be>

[NET]: Make all initialized struct seq_operations const.

Make all initialized struct seq_operations in net/ const

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14690fc6 26-Apr-2007 Martin Peschke <mp3@de.ibm.com>

[SUNRPC]: cleanup: use seq_release_private() where appropriate

We can save some lines of code by using seq_release_private().

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da7071d7 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] mark struct file_operations const 8

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cca5172a 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] SUNRPC: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46121cf7 30-Jan-2007 Chuck Lever <chuck.lever@oracle.com>

SUNRPC: fix print format for tk_pid

The tk_pid field is an unsigned short. The proper print format specifier for
that type is %5u, not %4d.

Also clean up some miscellaneous print formatting nits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>


# 01f3bd1f 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: svcrpc: remove another silent drop from deferral code

There's no point deferring something just to immediately fail the deferral,
especially now that we can do something more useful in the failure case by
returning an error.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e0bb89ef 13-Dec-2006 J.Bruce Fields <bfields@fieldses.org>

[PATCH] knfsd: nfsd: don't drop silently on upcall deferral

To avoid tying up server threads when nfsd makes an upcall (to mountd, to get
export options, to idmapd, for nfsv4 name<->id mapping, etc.), we temporarily
"drop" the request and save enough information so that we can revisit it
later.

Certain failures during the deferral process can cause us to really drop the
request and never revisit it.

This is often less than ideal, and is unacceptable in the NFSv4 case--rfc 3530
forbids the server from dropping a request without also closing the
connection.

As a first step, we modify the deferral code to return -ETIMEDOUT (which is
translated to nfserr_jukebox in the v3 and v4 cases, and remains a drop in the
v2 case).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 303b46bb 08-Dec-2006 Josef Sipek <jsipek@fsl.cs.sunysb.edu>

[PATCH] struct path: convert sunrpc

Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 65f27f38 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>


# 52bad64d 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: Separate delayable and non-delayable events.

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 2f34931f 05-Aug-2006 Neil Brown <neilb@suse.de>

[PATCH] knfsd: fix race related problem when adding items to and svcrpc auth cache

If we don't find the item we are lookng for, we allocate a new one, and
then grab the lock again and search to see if it has been added while we
did the alloc. If it had been added we need to 'cache_put' the newly
created item that we are never going to use. But as it hasn't been
initialised properly, putting it can cause an oops.

So move the ->init call earlier to that it will always be fully initilised
if we have to put it.

Thanks to Philipp Matthias Hahn <pmhahn@svs.Informatik.Uni-Oldenburg.de>
for reporting the problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f2d39586 22-May-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Fix two problems that can cause rmmod nfsd to die

Both cause the 'entries' count in the export cache to be non-zero at module
removal time, so unregistering that cache fails and results in an oops.

1/ exp_pseudoroot (used for NFSv4 only) leaks a reference to an export
entry.
2/ sunrpc_cache_update doesn't increment the entries count when it adds
an entry.

Thanks to "david m. richter" <richterd@citi.umich.edu> for triggering the
problem and finding one of the bugs.

Cc: "david m. richter" <richterd@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 74cae61a 27-Mar-2006 Adrian Bunk <bunk@stusta.de>

[PATCH] fs/nfsd/export.c,net/sunrpc/cache.c: make needlessly global code static

We can now make some code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# baab935f 27-Mar-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Convert sunrpc_cache to use krefs

.. it makes some of the code nicer.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# ebd0cb1a 27-Mar-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Unexport cache_fresh and fix a small race

Cache_fresh is now only used in cache.c, so unexport it.

Part of cache_fresh (setting CACHE_VALID) should really be done under the
lock, while part (calling cache_revisit_request etc) must be done outside the
lock. So we split it up appropriately.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4013edea 27-Mar-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: An assortment of little fixes to the sunrpc cache code

- in cache_check, h must be non-NULL as it has been de-referenced,
so don't bother checking for NULL.

- When a cache-item is updated, we need to call cache_revisit_request to see
if there is a pending request waiting for that item. We were using
a transition to CACHE_VALID to see if that was needed, however that is
wrong as an expired entry will still be marked 'valid' (as the data is valid
and will need to be released). So instead use an off transition for
CACHE_PENDING which is exactly the right thing to test.

- Add a little bit more debugging info.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 15a5f6bd 27-Mar-2006 NeilBrown <neilb@suse.de>

[PATCH] knfsd: Create cache_lookup function instead of using a macro to declare one

The C++-like 'template' approach proves to be too ugly and hard to work with.

The old 'template' won't go away until all users are updated.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4a3e2f71 20-Mar-2006 Arjan van de Ven <arjan@infradead.org>

[NET] sem2mutex: net/

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09a62660 08-Jan-2006 Kris Katterjohn <kjak@users.sourceforge.net>

[NET]: Change some "if (x) BUG();" to "BUG_ON(x);"

This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f35279d3 06-Sep-2005 Bruce Allan <bwa@us.ibm.com>

[PATCH] sunrpc: cache_register can use wrong module reference

When registering an RPC cache, cache_register() always sets the owner as the
sunrpc module. However, there are RPC caches owned by other modules. With
the incorrect owner setting, the real owning module can be removed potentially
with an open reference to the cache from userspace.

For example, if one were to stop the nfs server and unmount the nfsd
filesystem, the nfsd module could be removed eventhough rpc.idmapd had
references to the idtoname and nametoid caches (i.e.
/proc/net/rpc/nfs4.<cachename>/channel is still open). This resulted in a
system panic on one of our machines when attempting to restart the nfs
services after reloading the nfsd module.

The following patch adds a 'struct module *owner' field in struct
cache_detail. The owner is further assigned to the struct proc_dir_entry
in cache_register() so that the module cannot be unloaded while user-space
daemons have an open reference on the associated file under /proc.

Signed-off-by: Bruce Allan <bwa@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!