#
11974eec |
|
07-Mar-2024 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: Read unlock folio on nfs_page_create_from_folio() error The netfs conversion lost a folio_unlock() for the case where nfs_page_create_from_folio() returns an error (usually -ENOMEM). Restore it. Reported-by: David Jeffery <djeffery@redhat.com> Cc: <stable@vger.kernel.org> # 6.4+ Fixes: 000dbe0bec05 ("NFS: Convert buffered read paths to use netfs when fscache is enabled") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Acked-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
303a7805 |
|
09-Jun-2023 |
Anna Schumaker <Anna.Schumaker@Netapp.com> |
NFSv4.2: Rework scratch handling for READ_PLUS (again) I found that the read code might send multiple requests using the same nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is how we ended up occasionally double-freeing the scratch buffer, but also means we set a NULL pointer but non-zero length to the xdr scratch buffer. This results in an oops the first time decoding needs to copy something to scratch, which frequently happens when decoding READ_PLUS hole segments. I fix this by moving scratch handling into the pageio read code. I provide a function to allocate scratch space for decoding read replies, and free the scratch buffer when the nfs_pgio_header is freed. Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
000dbe0b |
|
20-Feb-2023 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Convert buffered read paths to use netfs when fscache is enabled Convert the NFS buffered read code paths to corresponding netfs APIs, but only when fscache is configured and enabled. The netfs API defines struct netfs_request_ops which must be filled in by the network filesystem. For NFS, we only need to define 5 of the functions, the main one being the issue_read() function. The issue_read() function is called by the netfs layer when a read cannot be fulfilled locally, and must be sent to the server (either the cache is not active, or it is active but the data is not available). Once the read from the server is complete, netfs requires a call to netfs_subreq_terminated() which conveys either how many bytes were read successfully, or an error. Note that issue_read() is called with a structure, netfs_io_subrequest, which defines the IO requested, and contains a start and a length (both in bytes), and assumes the underlying netfs will return a either an error on the whole region, or the number of bytes successfully read. The NFS IO path is page based and the main APIs are the pgio APIs defined in pagelist.c. For the pgio APIs, there is no way for the caller to know how many RPCs will be sent and how the pages will be broken up into underlying RPCs, each of which will have their own completion and return code. In contrast, netfs is subrequest based, a single subrequest may contain multiple pages, and a single subrequest is initiated with issue_read() and terminated with netfs_subreq_terminated(). Thus, to utilze the netfs APIs, NFS needs some way to accommodate the netfs API requirement on the single response to the whole subrequest, while also minimizing disruptive changes to the NFS pgio layer. The approach taken with this patch is to allocate a small structure for each nfs_netfs_issue_read() call, store the final error and number of bytes successfully transferred in the structure, and update these values as each RPC completes. The refcount on the structure is used as a marker for the last RPC completion, is incremented in nfs_netfs_read_initiate(), and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header contains a valid pointer to the data. On the final put (which signals the final outstanding RPC is complete) in nfs_netfs_read_completion(), call netfs_subreq_terminated() with either the final error value (if one or more READs complete with an error) or the number of bytes successfully transferred (if all RPCs complete successfully). Note that when all RPCs complete successfully, the number of bytes transferred is capped to the length of the subrequest. Capping the transferred length to the subrequest length prevents "Subreq overread" warnings from netfs. This is due to the "aligned_len" in nfs_pageio_add_page(), and the corner case where NFS requests a full page at the end of the file, even when i_size reflects only a partial page (NFS overread). Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Tested-by: Daire Byrne <daire@dneg.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
01c3a400 |
|
20-Feb-2023 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Rename readpage_async_filler to nfs_read_add_folio Rename readpage_async_filler to nfs_read_add_folio to better reflect what this function does (add a folio to the nfs_pageio_descriptor), and simplify arguments to this function by removing struct nfs_readdesc. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Tested-by: Daire Byrne <daire@dneg.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
9c88ea00 |
|
09-Mar-2023 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Fix /proc/PID/io read_bytes for buffered reads Prior to commit 8786fde8421c ("Convert NFS from readpages to readahead"), nfs_readpages() used the old mm interface read_cache_pages() which called task_io_account_read() for each NFS page read. After this commit, nfs_readpages() is converted to nfs_readahead(), which now uses the new mm interface readahead_page(). The new interface requires callers to call task_io_account_read() themselves. In addition, to nfs_readahead() task_io_account_read() should also be called from nfs_read_folio(). Fixes: 8786fde8421c ("Convert NFS from readpages to readahead") Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/ Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
2de3d04b |
|
19-Jan-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove unnecessary check in nfs_read_folio() All the callers are expected to supply a valid struct file argument, so there is no need for the NULL check. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
ab75bff1 |
|
19-Jan-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Convert buffered reads to use folios Perform a largely mechanical conversion of references to struct page and page-specific functions to use the folio equivalents. Note that the fscache functionality remains untouched. Instead we just pass in the folio page. This should be OK, as long as we use order 0 folios together with fscache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
0b768a96 |
|
16-May-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
nfs: Leave pages in the pagecache if readpage failed The pagecache handles readpage failing by itself; it doesn't want filesystems to remove pages from under it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
65d023af |
|
29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
nfs: Convert nfs to read_folio This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
89c2be8a |
|
06-Mar-2022 |
NeilBrown <neilb@suse.de> |
NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
fc1c5abf |
|
01-Mar-2022 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Rename fscache read and write pages functions Rename NFS fscache functions in a more consistent fashion to better reflect when we read from and write to fscache. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
8786fde8 |
|
22-Jan-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
Convert NFS from readpages to readahead NFS is one of the last two users of the deprecated ->readpages aop. This conversion looks straightforward, but I have only compile-tested it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
16f2f4e6 |
|
27-Aug-2021 |
David Howells <dhowells@redhat.com> |
nfs: Implement cache I/O by accessing the cache directly Move NFS to using fscache DIO API instead of the old upstream I/O API as that has been removed. This is a stopgap solution as the intention is that at sometime in the future, the cache will move to using larger blocks and won't be able to store individual pages in order to deal with the potential for data corruption due to the backing filesystem being able insert/remove bridging blocks of zeros into its extent list[1]. NFS then reads and writes cache pages synchronously and one page at a time. The preferred change would be to use the netfs lib, but the new I/O API can be used directly. It's just that as the cache now needs to track data for itself, caching blocks may exceed page size... This code is somewhat borrowed from my "fallback I/O" patchset[2]. Changes ======= ver #3: - Restore lost =n fallback for nfs_fscache_release_page()[2]. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna.schumaker@netapp.com> cc: linux-nfs@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YO17ZNOcq+9PajfQ@mit.edu [1] Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2] Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/163906981318.143852.17220018647843475985.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967184451.1823006.6450645559828329590.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021577632.640689.11069627070150063812.stgit@warthog.procyon.org.uk/ # v4
|
#
edfa0b16 |
|
02-Nov-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Add offset to nfs_aop_readahead tracepoint Add the byte offset of the readahead request to the tracepoint output so we know where the read starts. Before this patch: cat-8104 [002] ..... 813.168775: nfs_aop_readahead: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256 cat-8104 [002] ..... 813.174973: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256 ret=0 cat-8104 [002] ..... 813.175963: nfs_aop_readahead: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=256 cat-8104 [002] ..... 813.183742: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xe55807f6 version=1756509392533525500 nr_pages=1 ret=0 After this patch: cat-6392 [001] ..... 73.107782: nfs_aop_readahead: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 offset=5242880 nr_pages=256 cat-6392 [001] ..... 73.112466: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 nr_pages=256 ret=0 cat-6392 [001] ..... 73.115692: nfs_aop_readahead: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 offset=6291456 nr_pages=256 cat-6392 [001] ..... 73.123283: nfs_aop_readahead_done: fileid=00:31:141 fhandle=0xed22403f version=1756511950029502774 nr_pages=256 ret=0 Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
d9f87743 |
|
16-Oct-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Replace dprintk callsites in nfs_readpage(s) These new events report slightly different information for readpage and readpages/readahead. For readpage: fsx-1387 [006] 380.761896: nfs_aop_readpage: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 fsx-1387 [006] 380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0 The index of a synchronous single-page read is reported. For readpages: fsx-1387 [006] 380.760847: nfs_aop_readahead: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 fsx-1387 [006] 380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0 The count of pages requested is reported. nfs_readpages does not wait for the READ requests to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
8cfb9015 |
|
27-Aug-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Always provide aligned buffers to the RPC read layers Instead of messing around with XDR padding in the RDMA layer, we should just give the RPC layer an aligned buffer. Try to avoid creating extra RPC calls by aligning to the smaller value of ALIGN(len, rsize) and PAGE_SIZE. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
ba512c1b |
|
29-Jun-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Fix fscache read from NFS after cache error Earlier commits refactored some NFS read code and removed nfs_readpage_async(), but neglected to properly fixup nfs_readpage_from_fscache_complete(). The code path is only hit when something unusual occurs with the cachefiles backing filesystem, such as an IO error or while a cookie is being invalidated. Mark page with PG_checked if fscache IO completes in error, unlock the page, and let the VM decide to re-issue based on PG_uptodate. When the VM reissues the readpage, PG_checked allows us to skip over fscache and read from the server. Link: https://marc.info/?l=linux-nfs&m=162498209518739 Fixes: 1e83b173b266 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
e0340f16 |
|
29-Jun-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Ensure nfs_readpage returns promptly when internal error occurs A previous refactoring of nfs_readpage() might end up calling wait_on_page_locked_killable() even if readpage_async_filler() failed with an internal error and pg_error was non-zero (for example, if nfs_create_request() failed). In the case of an internal error, skip over wait_on_page_locked_killable() as this is only needed when the read is sent and an error occurs during completion handling. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
b42ad64f |
|
24-Jun-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Remove unnecessary inode parameter from nfs_pageio_complete_read() Simplify nfs_pageio_complete_read() by using the inode pointer saved inside nfs_pageio_descriptor. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1e83b173 |
|
28-Jan-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() Add nfs_pageio_complete_read() and call this from both nfs_readpage() and nfs_readpages(), since the submission and accounting is the same for both functions. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
0c119e3a |
|
28-Jan-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Call readpage_async_filler() from nfs_readpage_async() Refactor slightly so nfs_readpage_async() calls into readpage_async_filler(). Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
1af7e7f8 |
|
28-Jan-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Refactor nfs_readpage() and nfs_readpage_async() to use nfs_readdesc Both nfs_readpage() and nfs_readpages() use similar code. This patch should be no functional change, and refactors nfs_readpage_async() to use nfs_readdesc to enable future merging of nfs_readpage_async() and nfs_readpage_async_filler(). Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
6ddfd213 |
|
28-Jan-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read succeeds There is a small inconsistency with nfs_readpage() vs nfs_readpages() with regards to NFSIOS_READPAGES. In readpage we unconditionally increment NFSIOS_READPAGES at the top, which means even if the read fails. In readpages, we increment NFSIOS_READPAGES at the bottom based on how many pages were successfully read. Change readpage to be consistent with readpages and so NFSIOS_READPAGES only reflects successful, non-fscache reads. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
49dee700 |
|
28-Jan-2021 |
Dave Wysochanski <dwysocha@redhat.com> |
NFS: Clean up nfs_readpage() and nfs_readpages() In prep for the new fscache netfs API, refactor nfs_readpage() and nfs_readpages() for future patches. No functional change. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
fd2b6121 |
|
12-May-2020 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Trace short NFS READs A short read can generate an -EIO error without there being an error on the wire. This tracepoint acts as an eyecatcher when there is no obvious I/O error. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
93ce4af7 |
|
06-Apr-2020 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Clean up process of marking inode stale. Instead of the various open coded calls to set the NFS_INO_STALE bit and call nfs_zap_caches(), consolidate them into a single function nfs_set_inode_stale(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
8c9cb714 |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: When resending after a short write, reset the reply count to zero If we're resending a write due to a short read or write, ensure we reset the reply count to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
2343172d |
|
06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Clean up generic file read tracepoints Clean up the generic file read tracepoints so they do pass the full structures as arguments. Also ensure we report the number of bytes actually read. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
8f54c7a4 |
|
14-Aug-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix spurious EIO read errors If the client attempts to read a page, but the read fails due to some spurious error (e.g. an ACCESS error or a timeout, ...) then we need to allow other processes to retry. Also try to report errors correctly when doing a synchronous readpage. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
457c8996 |
|
19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for missed files Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9fcd5960 |
|
07-Apr-2019 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Add a helper to return a pointer to the open context of a struct nfs_page Add a helper for when we remove the explicit pointer to the open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
28b1d3f5 |
|
07-Apr-2019 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Remove unused argument from nfs_create_request() All the callers of nfs_create_request() are now creating page group heads, so we can remove the redundant 'last' page argument. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
df3accb8 |
|
13-Feb-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Pass error information to the pgio error cleanup routine Allow the caller to pass error information when cleaning up a failed I/O request so that we can conditionally take action to cancel the request altogether if the error turned out to be fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
1c6c4b74 |
|
24-Sep-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Remove private spinlock in struct nfs_pgio_header Now that each struct nfs_pgio_header corresponds to one RPC call, we only have one writer to the struct nfs_pgio_header. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
#
8224b273 |
|
21-Aug-2017 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: Add static NFS I/O tracepoints Tools like tcpdump and rpcdebug can be very useful. But there are plenty of environments where they are difficult or impossible to use. For example, we've had customers report I/O failures during workloads so heavy that collecting network traffic or enabling RPC debugging are themselves onerous. The kernel's static tracepoints are lightweight (less likely to introduce timing changes) and efficient (the trace data is compact). They also work in scenarios where capturing network traffic is not possible due to lack of hardware support (some InfiniBand HCAs) or where data or network privacy is a concern. Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT is initiated, and when it completes. Record the arguments and results of each operation, which are not shown by existing sunrpc module's tracepoints. For instance, the recorded offset and count can be used to match an "initiate" event to a "done" event. If an NFS READ result returns fewer bytes than requested or zero, seeing the EOF flag can be probative. Seeing an NFS4ERR_BAD_STATEID result is also indication of a particular class of problems. The timing information attached to each event record can often be useful as well. Usage example: [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done /sys/kernel/debug/tracing/events/nfs/*initiate*/filter /sys/kernel/debug/tracing/events/nfs/*done/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3680 oldest event ts: 78.367422 now ts: 100.124419 dropped events: 0 read events: 74 ... and so on. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3bde7afd |
|
20-Aug-2017 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Remove unused parameter gfp_flags from nfs_pageio_init() Now that the mirror allocation has been moved, the parameter can go. Also remove the redundant symbol export. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
fbe77c30 |
|
19-Apr-2017 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: move rw_mode to nfs_pageio_header Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
8cd79788 |
|
07-Oct-2016 |
Huang Ying <ying.huang@intel.com> |
mm: remove page_file_index After using the offset of the swap entry as the key of the swap cache, the page_index() becomes exactly same as page_file_index(). So the page_file_index() is removed and the callers are changed to use page_index() instead. Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cbebaf89 |
|
17-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
NFS: Fix a double page unlock Since commit 0bcbf039f6b2, nfs_readpage_release() has been used to unlock the page in the read code. Fixes: 0bcbf039f6b2 ("nfs: handle request add failure properly") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0bcbf039 |
|
05-Dec-2015 |
Peng Tao <tao.peng@primarydata.com> |
nfs: handle request add failure properly When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
d600ad1f |
|
03-Dec-2015 |
Peng Tao <tao.peng@primarydata.com> |
NFS41: pop some layoutget errors to application For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we should just bail out instead of hiding the error and retrying inband IO. Change all the call sites to pop the error all the way up. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
f8417b48 |
|
16-Oct-2015 |
Kinglong Mee <kinglongmee@gmail.com> |
NFSv4.1/pnfs: Retry through MDS when getting bad length of data If non rpc-based layout driver return bad length of data, nfs retries by calling rpc_restart_call_prepare() that cause an NULL reference panic. This patch lets nfs retry through MDS for non rpc-based layout driver return bad length of data. [13034.883329] BUG: unable to handle kernel NULL pointer dereference at (null) [13034.884902] IP: [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.886558] PGD 0 [13034.888126] Oops: 0000 [#1] KASAN [13034.889710] Modules linked in: blocklayoutdriver(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c coretemp btrfs crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon auth_rpcgss shpchp nfs_acl lockd vmw_vmci parport_pc xor raid6_pq grace parport sunrpc i2c_piix4 vmwgfx drm_kms_helper ttm drm mptspi e1000 serio_raw scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache] [13034.898260] CPU: 0 PID: 10112 Comm: kworker/0:1 Tainted: G OE 4.3.0-rc5+ #279 [13034.899932] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [13034.903342] Workqueue: events bl_read_cleanup [blocklayoutdriver] [13034.905059] task: ffff88006a9148c0 ti: ffff880035e90000 task.ti: ffff880035e90000 [13034.906827] RIP: 0010:[<ffffffffa00db372>] [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.910522] RSP: 0018:ffff880035e97b58 EFLAGS: 00010282 [13034.912378] RAX: fffffbfff04a5a94 RBX: ffff880068fe4858 RCX: 0000000000000003 [13034.914339] RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 0000000000000282 [13034.916236] RBP: ffff880035e97b68 R08: 0000000000000001 R09: 0000000000000001 [13034.918229] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [13034.920007] R13: ffff880068fe4858 R14: ffff880068fe4a60 R15: 0000000000001000 [13034.921845] FS: 0000000000000000(0000) GS:ffffffff82247000(0000) knlGS:0000000000000000 [13034.923645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13034.925525] CR2: 0000000000000000 CR3: 00000000063dd000 CR4: 00000000001406f0 [13034.932808] Stack: [13034.934813] ffff880068fe4780 0000000000001000 ffff880035e97ba8 ffffffffa08800d2 [13034.936675] ffffffffa088029d ffff880068fe4780 ffff880068fe4858 ffffffffa089c0a0 [13034.938593] ffff880068fe47e0 ffff88005d59faf0 ffff880035e97be0 ffffffffa087e08f [13034.940454] Call Trace: [13034.942388] [<ffffffffa08800d2>] nfs_readpage_result+0x112/0x200 [nfs] [13034.944317] [<ffffffffa088029d>] ? nfs_readpage_done+0xdd/0x160 [nfs] [13034.946267] [<ffffffffa087e08f>] nfs_pgio_result+0x9f/0x120 [nfs] [13034.948166] [<ffffffffa09266cc>] pnfs_ld_read_done+0x7c/0x1e0 [nfsv4] [13034.950247] [<ffffffffa03b07ee>] bl_read_cleanup+0x2e/0x60 [blocklayoutdriver] [13034.952156] [<ffffffff810ebf62>] process_one_work+0x412/0x870 [13034.954102] [<ffffffff810ebe84>] ? process_one_work+0x334/0x870 [13034.955949] [<ffffffff810ebb50>] ? queue_delayed_work_on+0x40/0x40 [13034.957985] [<ffffffff810ec441>] worker_thread+0x81/0x6a0 [13034.959817] [<ffffffff810ec3c0>] ? process_one_work+0x870/0x870 [13034.961785] [<ffffffff810f43bd>] kthread+0x17d/0x1a0 [13034.963544] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.965479] [<ffffffff81100428>] ? finish_task_switch+0x88/0x220 [13034.967223] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.968929] [<ffffffff81b6ae5f>] ret_from_fork+0x3f/0x70 [13034.970534] [<ffffffff810f4240>] ? kthread_create_on_node+0x330/0x330 [13034.972176] Code: c7 43 50 40 84 0d a0 e8 3d fe 1c e1 48 8d 7b 58 c7 83 e4 00 00 00 00 00 00 00 e8 ca fe 1c e1 4c 8b 63 58 4c 89 e7 e8 be fe 1c e1 <49> 83 3c 24 00 74 12 48 c7 43 50 f0 a2 0e a0 b8 01 00 00 00 5b [13034.977148] RIP [<ffffffffa00db372>] rpc_restart_call_prepare+0x62/0x90 [sunrpc] [13034.978780] RSP <ffff880035e97b58> [13034.980399] CR2: 0000000000000000 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
6f29b9bb |
|
20-Sep-2015 |
Kinglong Mee <kinglongmee@gmail.com> |
NFS: Do cleanup before resetting pageio read/write to mds There is a reference leak of layout segment after resetting pageio read/write to mds. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3708f842 |
|
16-Apr-2015 |
Nicolas Iooss <nicolas.iooss_linux@m4x.org> |
Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" This reverts commit 5a254d08b086d80cbead2ebcee6d2a4b3a15587a. Since commit 5a254d08b086 ("nfs: replace nfs_add_stats with nfs_inc_stats when add one"), nfs_readpage and nfs_do_writepage use nfs_inc_stats to increment NFSIOS_READPAGES and NFSIOS_WRITEPAGES instead of nfs_add_stats. However nfs_inc_stats does not do the same thing as nfs_add_stats with value 1 because these functions work on distinct stats: nfs_inc_stats increments stats from "enum nfs_stat_eventcounters" (in server->io_stats->events) and nfs_add_stats those from "enum nfs_stat_bytecounters" (in server->io_stats->bytes). Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Fixes: 5a254d08b086 ("nfs: replace nfs_add_stats with nfs_inc_stats...") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
2b0143b5 |
|
17-Mar-2015 |
David Howells <dhowells@redhat.com> |
VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
88e7fbd4 |
|
04-Mar-2015 |
David Howells <dhowells@redhat.com> |
NFS: Don't use d_inode as a variable name Don't use d_inode as a variable name as it now masks a function name. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a7d42ddb |
|
19-Sep-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: add mirroring support to pgio layer This patch adds mirrored write support to the pgio layer. The default is to use one mirror, but pgio callers may define callbacks to change this to any value up to the (arbitrarily selected) limit of 16. The basic idea is to break out members of nfs_pageio_descriptor that cannot be shared between mirrored DSes and put them in a new structure. Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
|
#
abde71f4 |
|
09-Jun-2014 |
Tom Haynes <loghyr@primarydata.com> |
pnfs: Add nfs_rpc_ops in calls to nfs_initiate_pgio Signed-off-by: Tom Haynes <loghyr@primarydata.com>
|
#
5a254d08 |
|
22-Nov-2014 |
Li RongQing <roy.qing.li@gmail.com> |
nfs: replace nfs_add_stats with nfs_inc_stats when add one Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
04462789 |
|
25-Jun-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: get rid of duplicate dprintk This was introduced by a merge error with my recent pgio patchset. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
d45f60c6 |
|
09-Jun-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: merge nfs_pgio_data into _header struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1e7f3a48 |
|
09-Jun-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: move nfs_pgio_data and remove nfs_rw_header nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
68072992 |
|
15-May-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: support page groups in nfs_read_completion nfs_read_completion relied on the fact that there was a 1:1 mapping of page to nfs_request, but this has now changed. Regions not covered by a request have already been zeroed elsewhere. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
67d0338e |
|
15-May-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: page group syncing in read path Operations that modify state for a whole page must be syncronized across all requests within a page group. In the read path, this is calling unlock_page and SetPageUptodate. Both of these functions should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
2bfc6e56 |
|
15-May-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: add support for multiple nfs reqs per page Add "page groups" - a circular list of nfs requests (struct nfs_page) that all reference the same page. This gives nfs read and write paths the ability to account for sub-page regions independently. This somewhat follows the design of struct buffer_head's sub-page accounting. Only "head" requests are ever added/removed from the inode list in the buffered write path. "head" and "sub" requests are treated the same through the read path and the rest of the write/commit path. Requests are given an extra reference across the life of the list. Page groups are never rejoined after being split. If the read/write request fails and the client falls back to another path (ie revert to MDS in PNFS case), the already split requests are pushed through the recoalescing code again, which may split them further and then coalesce them into properly sized requests on the wire. Fragmentation shouldn't be a problem with the current design, because we flush all requests in page group when a non-contiguous request is added, so the only time resplitting should occur is on a resend of a read or write. This patch lays the groundwork for sub-page splitting, but does not actually do any splitting. For now all page groups have one request as pg_test functions don't yet split pages. There are several related patches that are needed support multiple requests per page group. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
8c8f1ac1 |
|
15-May-2014 |
Weston Andros Adamson <dros@primarydata.com> |
nfs: remove unused arg from nfs_create_request @inode is passed but not used. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
41d8d5b7 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common nfs_pageio_ops struct At this point the read and write structures look identical, so combine them into something shared by both. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
cf485fcd |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common generic_pg_pgios() What we have here is two functions that look identical. Let's share some more code! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
c3766276 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common multiple_pgios() function Once again, these two functions look identical in the read and write case. Time to combine them together! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1ed26f33 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common initiate_pgio() function Most of this code is the same for both the read and write paths, so combine everything and use the rw_ops when necessary. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ef2c488c |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a generic_pgio function These functions are almost identical on both the read and write side. FLUSH_COND_STABLE will never be set for the read path, so leaving it in the generic code won't hurt anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
844c9e69 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common pgio_error function At this point, the read and write versions of this function look identical so both should use the same function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ce59515c |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common rpcsetup function for reads and writes Write adds a little bit of code dealing with flush flags, but since "how" will always be 0 when reading we can share the code. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
6f92fa45 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common rpc_call_ops struct The read and write paths set up this struct in exactly the same way, so create a single shared struct. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
0eecb214 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common nfs_pgio_result_common function Combining these functions will let me make a single nfs_rw_common_ops struct (see the next patch). Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
a4cdda59 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common pgio_rpc_prepare function The read and write paths do exactly the same thing for the rpc_prepare rpc_op. This patch combines them together into a single function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
4a0de55c |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common rw_header_alloc and rw_header_free function I create a new struct nfs_rw_ops to decide the differences between reads and writes. This struct will be set when initializing a new nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new header is allocated. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
00bfa30a |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common pgio_alloc and pgio_release function These functions are identical for the read and write paths so they can be combined. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
c0752cdf |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common read and write header struct The only difference is the write verifier field, but we can keep that for a little bit longer. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9c7e1b3d |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common read and write data struct At this point, the only difference between nfs_read_data and nfs_write_data is the write verifier. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9137bdf3 |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common results structure for reads and writes Reads and writes have very similar results. This patch combines the two structs together with comments to show where the differing fields are used. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
3c6b899c |
|
06-May-2014 |
Anna Schumaker <Anna.Schumaker@netapp.com> |
NFS: Create a common argument structure for reads and writes Reads and writes have very similar arguments. This patch combines them together and documents the few fields used only by write. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
fab5fc25 |
|
16-Apr-2014 |
Christoph Hellwig <hch@lst.de> |
nfs: remove ->read_pageio_init from rpc ops The read_pageio_init method is just a very convoluted way to grab the right nfs_pageio_ops vector. The vector to chose is not a choice of protocol version, but just a pNFS vs MDS I/O choice that can simply be done inside nfs_pageio_init_read based on the presence of a layout driver, and a new force_mds flag to the special case of falling back to MDS I/O on a pNFS-capable volume. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
1e8968c5 |
|
17-Dec-2013 |
Niels de Vos <ndevos@redhat.com> |
NFS: dprintk() should not print negative fileids and inode numbers A fileid in NFS is a uint64. There are some occurrences where dprintk() outputs a signed fileid. This leads to confusion and more difficult to read debugging (negative fileids matching positive inode numbers). Signed-off-by: Niels de Vos <ndevos@redhat.com> CC: Santosh Pradhan <spradhan@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
ef1820f9 |
|
04-Sep-2013 |
NeilBrown <neilb@suse.de> |
NFSv4: Don't try to recover NFSv4 locks when they are lost. When an NFSv4 client loses contact with the server it can lose any locks that it holds. Currently when it reconnects to the server it simply tries to reclaim those locks. This might succeed even though some other client has held and released a lock in the mean time. So the first client might think the file is unchanged, but it isn't. This isn't good. If, when recovery happens, the locks cannot be claimed because some other client still holds the lock, then we get a message in the kernel logs, but the client can still write. So two clients can both think they have a lock and can both write at the same time. This is equally not good. There was a patch a while ago http://comments.gmane.org/gmane.linux.nfs/41917 which tried to address some of this, but it didn't seem to go anywhere. That patch would also send a signal to the process. That might be useful but for now this patch just causes writes to fail. For NFSv4 (unlike v2/v3) there is a strong link between the lock and the write request so we can fairly easily fail any IO of the lock is gone. While some applications might not expect this, it is still safer than allowing the write to succeed. Because this is a fairly big change in behaviour a module parameter, "recover_locks", is introduced which defaults to true (the current behaviour) but can be set to "false" to tell the client not to try to recover things that were lost. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c58c8441 |
|
18-Mar-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't accept more reads/writes if the open context recovery failed If the state recovery failed, we want to ensure that the application doesn't try to use the same file descriptor for more reads or writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6db6dd7d |
|
03-Jan-2013 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that we free the rpc_task after read and write cleanups are done This patch ensures that we free the rpc_task after the cleanup callbacks are done in order to avoid a deadlock problem that can be triggered if the callback needs to wait for another workqueue item to complete. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Weston Andros Adamson <dros@netapp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org [>= 3.5]
|
#
d56b4ddf |
|
31-Jul-2012 |
Mel Gorman <mgorman@suse.de> |
nfs: teach the NFS client how to treat PG_swapcache pages Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
89d77c8f |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Convert v4 into a module This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ddda8e0a |
|
30-Jul-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Convert v2 into a module The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1abb5088 |
|
20-Jun-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Create an read_pageio_init() function pNFS needs to select a read function based on the layout driver currently in use, so I let each NFS version decide how to best handle initializing reads. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2701d086 |
|
24-May-2012 |
Andy Adamson <andros@netapp.com> |
NFSv4.1 add nfs_inode book keeping for mdsthreshold Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9f0ec176 |
|
27-Apr-2012 |
Andy Adamson <andros@netapp.com> |
NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS file layout to quickly try the MDS or perhaps another DS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1385b811 |
|
04-May-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix sparse warnings Fix the following sparse warnings: fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was not declared. Should it be static? fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function 'nfs_readhdr_alloc' fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg' was not declared. Should it be static? Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com> Cc: Boaz Harrosh <bharrosh@panasas.com>
|
#
4bd8b010 |
|
30-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Simplify the nfs_read_completion functions Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
|
#
25b11dcd |
|
30-Apr-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up nfs read and write error paths Move the error handling for nfs_generic_pagein() into a single function. Ditto for nfs_generic_flush(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
|
#
9146ab50 |
|
01-May-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Read cleanups Remove unused variables, and reformat some code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
|
#
584aa810 |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: rewrite directio read to use async coalesce code This also has the advantage that it allows directio to use pnfs. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
061ae2ed |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: create completion structure to pass into page_init functions Factors out the code that will need to change when directio starts using these code paths. This will allow directio to use the generic pagein and flush routines Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4db6e0b7 |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: merge _full and _partial read rpc_ops Decouple nfs_pgio_header and nfs_read_data, and have (possibly multiple) nfs_read_datas each take a refcount on nfs_pgio_header. For the moment keeps nfs_read_header as a way to preallocate a single nfs_read_data with the nfs_pgio_header. The code doesn't need this, and would be prettier without, but given the amount of churn I am already introducing I didn't want to play with tuning new mempools. This also fixes bug in pnfs_ld_handle_read_error. In the case of desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing replay attempt to do nothing. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
30dd374f |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: create struct nfs_page_array Both nfs_read_data and nfs_write_data devote several fields which can be combined into a single shared struct. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cd841605 |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: create common nfs_pgio_header for both read and write In order to avoid duplicating all the data in nfs_read_data whenever we split it up into multiple RPC calls (either due to a short read result or due to rsize < PAGE_SIZE), we split out the bits that are the same per RPC call into a separate "header" structure. The goal this patch moves towards is to have a single header refcounted by several rpc_data structures. Thus, want to always refer from rpc_data to the header, and not the other way. This patch comes close to that ideal, but the directio code currently needs some special casing, isolated in the nfs_direct_[read_write]hdr_release() functions. This will be dealt with in a future patch. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c5996c4e |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: reverse arg order in nfs_initiate_[read|write] Make it consistent with nfs_initiate_commit. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
73fb7bc7 |
|
20-Apr-2012 |
Fred Isaman <iisaman@netapp.com> |
NFS: put open context on error in nfs_pagein_multi Cc: <stable@vger.kernel.org> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9ffc93f2 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
#
ea7c3303 |
|
19-Mar-2012 |
Bryan Schumaker <bjschuma@netapp.com> |
NFS: Remove nfs4_setup_sequence from generic read code This is an NFS v4 specific operation, so it belongs in the NFS v4 code and not the generic client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8dd37758 |
|
15-Mar-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code Move more pnfs-isms out of the generic commit code. Bugfixes: - filelayout_scan_commit_lists doesn't need to get/put the lseg. In fact since it is run under the inode->i_lock, the lseg_put() can deadlock. - Ensure that we distinguish between what needs to be done for commit-to-data server and what needs to be done for commit-to-MDS using the new flag PG_COMMIT_TO_DS. Otherwise we may end up calling put_lseg() on a bucket for a struct nfs_page that got written through the MDS. - Fix a case where we were using list_del() on an nfs_page->wb_list instead of list_del_init(). - filelayout_initiate_commit needs to call filelayout_commit_release on error instead of the mds_ops->rpc_release(). Otherwise it won't clear the commit lock. Cleanups: - Let the files layout manage the commit lists for the pNFS case. Don't expose stuff like pnfs_choose_commit_list, and the fact that the commit buckets hold references to the layout segment in common code. - Cast out the put_lseg() calls for the struct nfs_read/write_data->lseg into the pNFS layer from whence they came. - Let the pNFS layer manage the NFS_INO_PNFS_COMMIT bit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
|
#
9d12b216 |
|
17-Jan-2012 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv41: Add a new helper nfs4_init_sequence() Clean up Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
62e4a769 |
|
10-Nov-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Revert pnfs ugliness from the generic NFS read code path pNFS-specific code belongs in the pnfs layer. It should not be hijacking generic NFS read or write code paths. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
b6ee8cd2 |
|
19-Oct-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Get rid of the nfs_rdata_mempool We don't need a mempool in order to guarantee reliable NFS read performance. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fba73005 |
|
19-Oct-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Don't rely on PageError in nfs_readpage_release_partial Don't rely on the PageError flag to tell us if one of the partial reads of the page failed. Instead, replace that with a dedicated flag in the struct nfs_page. Then clean out redundant uses of the PageError flag: the VM no longer checks it for reads. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fbb5a9ab |
|
19-Oct-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Get rid of unnecessary calls to ClearPageError() in read code The generic file read code does that for us anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d00c5d43 |
|
19-Oct-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Get rid of nfs_restart_rpc() It can trivially be replaced with rpc_restart_call_prepare. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
9b7eecdc |
|
22-Sep-2011 |
Peng Tao <bergwolf@gmail.com> |
pnfs: recoalesce when ld read pagelist fails For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3d4ff43d |
|
22-Jun-2011 |
Al Viro <viro@zeniv.linux.org.uk> |
nfs_open_context doesn't need struct path either just dentry, please... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1f945357 |
|
13-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up - simplify the switch to read/write-through-MDS Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of completely reinitialising the struct nfs_pageio_descriptor. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
493292dd |
|
13-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Move the pnfs read code into pnfs.c ...and ensure that we recoalese to take into account differences in block sizes when falling back to read through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d097971d |
|
12-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request Instead of looking up the rsize and wsize, the routines that generate the RPC requests should really be using the pg_bsize, since that is what we use when deciding whether or not to coalesce write requests... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
50828d7e |
|
12-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cache rpc_ops in struct nfs_pageio_descriptor Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
275acaaf |
|
12-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up: split out the RPC transmission from nfs_pagein_multi/one ...and do the same for nfs_flush_multi/one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3b609184 |
|
15-Jul-2011 |
Peng Tao <bergwolf@gmail.com> |
NFS: fix return value of nfs_pagein_one/nfs_flush_one Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
6e4efd56 |
|
12-Jul-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up nfs_read_rpcsetup and nfs_write_rpcsetup Split them up into two parts: one which sets up the struct nfs_read/write_data, the other which sets up the actual RPC call or pNFS call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7c24d948 |
|
13-Jun-2011 |
Andy Adamson <andros@netapp.com> |
NFSv4.1: File layout only supports whole file layouts Ask for whole file layouts. Until support for layout segments is fully supported in the file layout code, discard non-whole file layouts. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e885de1a |
|
10-Jun-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.1: Fall back to ordinary i/o through the mds if we have no layout segment Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d8007d4d |
|
10-Jun-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.1: Add an initialisation callback for pNFS Ensure that we always get a layout before setting up the i/o request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1751c363 |
|
10-Jun-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup of the nfs_pageio code in preparation for a pnfs bugfix We need to ensure that the layouts are set up before we can decide to coalesce requests. To do so, we want to further split up the struct nfs_pageio_descriptor operations into an initialisation callback, a coalescing test callback, and a 'do i/o' callback. This patch cleans up the existing callback methods before adding the 'initialisation' callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dfed206b |
|
25-May-2011 |
Benny Halevy <bhalevy@panasas.com> |
NFSv4.1: unify pnfs_pageio_init functions Use common code for pnfs_pageio_init_{read,write} and use a common generic pg_test function. Note that this function always assumes the the layout driver's pg_test method is implemented. [Fix BUG] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
|
#
fb3296eb |
|
22-May-2011 |
Benny Halevy <bhalevy@panasas.com> |
pnfs: Use byte-range for layoutget Add offset and count parameters to pnfs_update_layout and use them to get the layout in the pageio path. Order cache layout segments in the following order: * offset (ascending) * length (descending) * iomode (RW before READ) Test byte range against the layout segment in use in pnfs_{read,write}_pg_test so not to coalesce pages not using the same layout segment. [fix lseg ordering] [clean up pnfs_find_lseg lseg arg] [remove unnecessary FIXME] [fix ordering in pnfs_insert_layout] [clean up pnfs_insert_layout] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
|
#
a75b9df9 |
|
11-May-2011 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.1: Ensure that layoutget uses the correct gfp modes Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8f68cd42 |
|
15-Mar-2011 |
Stephen Rothwell <sfr@canb.auug.org.au> |
nfs: BKL is no longer needed, so remove the include Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
36fe432d |
|
03-Mar-2011 |
Fred Isaman <iisaman@netapp.com> |
NFSv4.1: Clear lseg pointer in ->doio function Now that we have access to the pointer, clear it immediately after the put, instead of in caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
c76069bd |
|
03-Mar-2011 |
Fred Isaman <iisaman@netapp.com> |
NFSv4.1: rearrange ->doio args This will make it possible to clear the lseg pointer in the same function as it is put, instead of in the caller nfs_pageio_doio(). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cbdabc7f |
|
28-Feb-2011 |
Andy Adamson <andros@netapp.com> |
NFSv4.1: filelayout async error handler Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dc70d7b3 |
|
28-Feb-2011 |
Andy Adamson <andros@netapp.com> |
NFSv4.1: filelayout read Attempt a pNFS file layout read by setting up the nfs_read_data struct and calling nfs_initiate_read with the data server rpc client and the filelayout rpc call ops. Error handling is implemented in a subsequent patch. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
64419a9b |
|
28-Feb-2011 |
Andy Adamson <andros@netapp.com> |
NFSv4.1: generic read Separate the rpc run portion of nfs_read_rpcsetup into a new function nfs_initiate_read that is called for normal NFS I/O. Add a pNFS read_pagelist function that is called instead of nfs_intitate_read for pNFS reads. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bae724ef |
|
28-Feb-2011 |
Fred Isaman <iisaman@netapp.com> |
NFSv4.1: shift pnfs_update_layout locations Move the pnfs_update_layout call location to nfs_pageio_do_add_request(). Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach it to each nfs_read_data so it can be sent to the layout driver. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
94ad1c80 |
|
28-Feb-2011 |
Fred Isaman <iisaman@netapp.com> |
NFSv4.1: coelesce across layout stripes Add a pg_test layout driver hook which is used to avoid coelescing I/O across layout stripes. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
2df485a7 |
|
07-Dec-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
nfs: remove extraneous and problematic calls to nfs_clear_request When a nfs_page is freed, nfs_free_request is called which also calls nfs_clear_request to clean out the lock and open contexts and free the pagecache page. However, a couple of places in the nfs code call nfs_clear_request themselves. What happens here if the refcount on the request is still high? We'll be releasing contexts and freeing pointers while the request is possibly still in use. Remove those bare calls to nfs_clear_context. That should only be done when the request is being freed. Note that when doing this, we need to watch out for tests of req->wb_page. Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked() would check the value of req->wb_page to figure out if the page is mapped into the nfsi->nfs_page_tree. We now indicate the page is mapped using the new bit PG_MAPPED in req->wb_flags . Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e5e94017 |
|
19-Oct-2010 |
Benny Halevy <bhalevy@panasas.com> |
NFS: create and destroy inode's layout cache At the start of the io paths, try to grab the relevant layout information. This will initiate the inode's layout cache, but stubs ensure the cache stays empty. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Dean Hildebrand <dhildebz@umich.edu> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dfb4f309 |
|
24-Sep-2010 |
Benny Halevy <bhalevy@panasas.com> |
NFSv4.1: keep seq_res.sr_slot as pointer rather than an index Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE resulted in numerous bugs. Keeping the current slot as a pointer to the slot table is more straight forward and robust as it's implicitly set up to NULL wherever the seq_res member is initialized to zeroes. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f11ac8db |
|
25-Jun-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Ensure that we track the NFSv4 lock state in read/write requests. This patch fixes bugzilla entry 14501: https://bugzilla.kernel.org/show_bug.cgi?id=14501 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
035168ab |
|
16-Jun-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4.1: Make nfs4_setup_sequence take a nfs_server argument In anticipation of the day when we have per-filesystem sessions, and also in order to allow the session to change in the event of a filesystem migration event. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
93870d76 |
|
12-May-2010 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Read requests can use GFP_KERNEL. There is no danger of deadlock should the allocation trigger page writeback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
0110ee15 |
|
07-Dec-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured Also rename it: it is used in generic code, and so should not have a 'nfs4' prefix. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d61e612a |
|
05-Dec-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv41: Clean up slot table management We no longer need to maintain a distinction between nfs41_sequence_done and nfs41_sequence_free_slot. This fixes a number of slot table leakages in the NFSv4.1 code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e608e79f |
|
04-Dec-2009 |
Andy Adamson <andros@netapp.com> |
nfs41: call free slot from nfs4_restart_rpc nfs41_sequence_free_slot can be called multiple times on SEQUENCE operation errors. No reason to inline nfs4_restart_rpc Reported-by: Trond Myklebust <trond.myklebust@netapp.com> nfs_writeback_done and nfs_readpage_retry call nfs4_restart_rpc outside the error handler, and the slot is not freed prior to restarting in the rpc_prepare state during session reset. Fix this by moving the call to nfs41_sequence_free_slot from the error path of nfs41_sequence_done into nfs4_restart_rpc, and by removing the test for NFS4CLNT_SESSION_SETUP. Always free slot and goto the rpc prepare state on async errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1ae88b2e |
|
12-Aug-2009 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix an O_DIRECT Oops... We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
405f5571 |
|
11-Jul-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
eedc020e |
|
01-Apr-2009 |
Andy Adamson <andros@netapp.com> |
nfs41: use rpc prepare call state for session reset [nfs41: change nfs4_restart_rpc argument] [nfs41: check for session not minorversion] [nfs41: trigger the state manager for session reset] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [always define nfs4_restart_rpc] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f11c88af |
|
01-Apr-2009 |
Andy Adamson <andros@netapp.com> |
nfs41: read sequence setup/done support Implement the read rpc_call_prepare method for asynchronuos nfs rpcs, call nfs41_setup_sequence from respective rpc_call_validate_args methods. Call nfs4_sequence_done from respective rpc_call_done methods. Note that we need to pass a pointer to the nfs_server in calls data for passing on to nfs4_sequence_done. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: client data server write validate and release] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [move the nfs4_sequence_free_slot call in nfs_readpage_retry from] [nfs41: separate free slot from sequence done] [remove nfs_readargs.nfs_server, use calldata->inode instead] Signed-off-by: Andy Adamson <andros@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Support sessions with O_DIRECT] Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: nfs4_sequence_free_slot use nfs_client for data server] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5f7dbd5c |
|
01-Apr-2009 |
Andy Adamson <andros@netapp.com> |
nfs41: set up seq_res.sr_slotid Initialize nfs4_sequence_res sr_slotid to NFS4_MAX_SLOT_TABLE. [was nfs41: sequence res use slotid] Signed-off-by: Andy Adamson <andros@netapp.com> [pulled definition of struct nfs4_sequence_res.sr_slotid to here] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7f8e05f6 |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
NFS: Store pages from an NFS inode into a local cache Store pages from an NFS inode into the cache data storage object associated with that inode. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
9a9fc1c0 |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
NFS: Read pages from FS-Cache into an NFS inode Read pages from an FS-Cache data storage object representing an inode into an NFS inode. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
f42b293d |
|
03-Apr-2009 |
David Howells <dhowells@redhat.com> |
NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching nfs_readpage_async() needs to be non-static so that it can be used as a fallback for the local on-disk caching should an EIO crop up when reading the cache. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
|
#
136221fc |
|
23-Dec-2008 |
Wu Fengguang <fengguang.wu@intel.com> |
nfs: remove redundant tests on reading new pages aops->readpages() and its NFS helper readpage_async_filler() will only be called to do readahead I/O for newly allocated pages. So it's not necessary to test for the always 0 dirty/uptodate page flags. The removal of nfs_wb_page() call also fixes a readahead bug: the NFS readahead has been synchronous since 2.6.23, because that call will clear PG_readahead, which is the reminder for asynchronous readahead. More background: the PG_readahead page flag is shared with PG_reclaim, one for read path and the other for write path. clear_page_dirty_for_io() unconditionally clears PG_readahead to prevent possible readahead residuals, assuming itself to be always called in the write path. However, NFS is one and the only exception in that it _always_ calls clear_page_dirty_for_io() in the read path, i.e. for readpages()/readpage(). Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3110ff80 |
|
02-May-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
nfs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
dbae4c73 |
|
14-Apr-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that rpc_run_task() errors are propagated back to the caller Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
fdd1e74c |
|
15-Apr-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that the read code cleans up properly when rpc_run_task() fails In the case of readpage() we need to ensure that the pages get unlocked, and that the error is flagged. In the case of O_DIRECT, we need to ensure that the pages are all released. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
4af68bff |
|
19-Mar-2008 |
Fred Isaman <iisaman@citi.umich.edu> |
nfs: remove duplicate initializations of nfs_read_data field Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
f8512ad0 |
|
19-Mar-2008 |
Fred Isaman <iisaman@citi.umich.edu> |
nfs: don't ignore return value from nfs_pageio_add_request Ignoring the return value from nfs_pageio_add_request can cause deadlocks. In read path: call nfs_pageio_add_request from readpage_async_filler assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original request. BUT, since return code is ignored, readpage_async_filler assumes it has been added, and does nothing further, leaving page locked. do_generic_mapping_read will eventually call lock_page, resulting in deadlock In write path: page is marked dirty by generic_perform_write nfs_writepages is called call nfs_pageio_add_request from nfs_page_async_flush assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original request, yet marking the request as locked (PG_BUSY) and in writeback, clearing dirty marks. The next time a write is done to the page, deadlock will result as nfs_write_end calls nfs_update_request Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
5e4424af |
|
25-Feb-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC: Remove now-redundant RCU-safe rpc_task free path Now that we've tightened up the locking rules for RPC queue wakeups, we can remove the RCU-safe kfree calls... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
101070ca |
|
19-Feb-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Ensure that the asynchronous RPC calls complete on nfsiod. We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod rather than on rpciod, so that they don't deadlock when the resulting umount calls rpc_shutdown_client(). Hence we specify that read, write and commit calls must complete on nfsiod. Ditto for NFSv4 open, lock, locku and close asynchronous calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
383ba719 |
|
19-Feb-2008 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a deadlock with lazy umount We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare() and task->tk_ops->rpc_call_done() to call mntput() in any way, since that will cause a deadlock when the call to rpc_shutdown_client() attempts to wait on 'task' to complete. We can avoid the above deadlock by moving calls to mntput to task->tk_ops->rpc_release() callback, since at that time the task will be marked as completed, and so rpc_shutdown_client won't attempt to wait on it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
eebd2aa3 |
|
04-Feb-2008 |
Christoph Lameter <clameter@sgi.com> |
Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3a10c30a |
|
22-Jan-2008 |
Benny Halevy <bhalevy@panasas.com> |
nfs: obliterate NFS_FLAGS macro use NFS_I(inode)->flags instead Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
07737691 |
|
25-Oct-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS/SUNRPC: Convert users of rpc_init_task+rpc_execute to rpc_run_task() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bdc7f021 |
|
14-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up the (commit|read|write)_setup() callback routines Move the common code for setting up the nfs_write_data and nfs_read_data structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
3ff7576d |
|
14-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC: Clean up the initialisation of priority queue scheduling info. We want the default scheduling priority (priority == 0) to remain RPC_PRIORITY_NORMAL. Also ensure that the priority wait queue scheduling is per process id instead of sometimes being per thread, and sometimes being per inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
84115e1c |
|
14-Jul-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC: Cleanup of rpc_task initialisation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
150030b7 |
|
06-Dec-2007 |
Matthew Wilcox <willy@infradead.org> |
NFS: Switch from intr mount option to TASK_KILLABLE By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr' mount option. We have to use _killable everywhere instead of _interruptible as we get rid of rpc_clnt_sigmask/sigunmask. Signed-off-by: Liam R. Howlett <howlett@gmail.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
#
8850df99 |
|
28-Sep-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix atime revalidation in read() NFSv3 will correctly update atime on a read() call, so there is no need to set the NFS_INO_INVALID_ATIME flag unless the call to nfs_refresh_inode() fails. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cd3758e3 |
|
10-Aug-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace file->private_data with calls to nfs_file_open_context() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
20c2df83 |
|
19-Jul-2007 |
Paul Mundt <lethal@linux-sh.org> |
mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
#
88be9f99 |
|
05-Jun-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Replace vfsmount and dentry in nfs_open_context with struct path Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
de05a0cc |
|
20-May-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Minor read optimisation... Since PG_uptodate may now end up getting set during the call to nfs_wb_page(), we can avoid putting a read request on the wire in those situations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
60945cb7 |
|
10-May-2007 |
Nate Diller <nate.diller@gmail.com> |
NFS: use zero_user_page Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8d5658c9 |
|
10-Apr-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8b09bee3 |
|
02-Apr-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup for nfs_readpages() Do the coalescing of read requests into block sized requests at start of I/O as we scan through the pages instead of going through a second pass. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
bcb71bba |
|
02-Apr-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Another cleanup of the read/write request coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d8a5ad75 |
|
02-Apr-2007 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup the coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a3f565b1 |
|
30-Jan-2007 |
Chuck Lever <chuck.lever@oracle.com> |
NFS: fix print format for tk_pid The tk_pid field is an unsigned short. The proper print format specifier for that type is %5u, not %4d. Also clean up some miscellaneous print formatting nits. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8e0969f0 |
|
13-Dec-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Remove nfs_readpage_sync() It makes no sense to maintain 2 parallel systems for reading in pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e18b890b |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e6b4f8da |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove SLAB_NOFS SLAB_NOFS is an alias of GFP_NOFS. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
49a70f27 |
|
04-Dec-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup: add common helper nfs_page_length() Clean up a lot of ad-hoc page length calculations in fs/nfs/write.c Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
a99b71c9 |
|
17-Oct-2006 |
Frank Filz <ffilzlnx@us.ibm.com> |
NFS: Remove use of the Big Kernel Lock around calls to rpc_execute. Remove use of the Big Kernel Lock around calls to rpc_execute. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
cf1308ff |
|
19-Nov-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix missing page_unlock() in nfs_readpage Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
0b671301 |
|
14-Nov-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix asynchronous read error handling We must always call ->read_done() before we truncate the page data, or decide to flag an error. The reasons are that in NFSv2, ->read_done() is where the eof flag gets set. in NFSv3/v4 ->read_done() handles EJUKEBOX-type errors, and v4 state recovery. However, we need to mark the pages as uptodate before we deal with short read errors, since we may need to modify the nfs_read_data arguments. We therefore split the current nfs_readpage_result() into two parts: nfs_readpage_result(), which calls ->read_done() etc, and nfs_readpage_retry(), which subsequently handles short reads. Note: Removing the code that retries in case of a short read also fixes a bug in nfs_direct_read_result(), which used to return a corrupted number of bytes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
8aca67f0 |
|
13-Nov-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
SUNRPC: Fix a potential race in rpc_wake_up_task() Use RCU to ensure that we can safely call rpc_finish_wakeup after we've called __rpc_do_wake_up_task. If not, there is a theoretical race, in which the rpc_task finishes executing, and gets freed first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1a1d92c1 |
|
27-Sep-2006 |
Alexey Dobriyan <adobriyan@gmail.com> |
[PATCH] Really ignore kmem_cache_destroy return value * Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
5f004cf2 |
|
14-Sep-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Make read() return an ESTALE if the file has been deleted Currently, a read() request will return EIO even if the file has been deleted on the server, simply because that is what the VM will return if the call to readpage() fails to update the page. Ensure that readpage() marks the inode as stale if it receives an ESTALE. Then return that error to userland. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
54ceac45 |
|
22-Aug-2006 |
David Howells <dhowells@redhat.com> |
NFS: Share NFS superblocks per-protocol per-server per-FSID The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
7a524111 |
|
15-Sep-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix Oopsable condition in nfs_readpage_sync() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
e9f7bee1 |
|
08-Sep-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: large non-page-aligned direct I/O clobbers memory The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can allow data->npages to be one larger than rpages. This causes a page pointer to be written beyond the end of the pagevec in nfs_read_data (or nfs_write_data). Fix this by making nfs_(read|write)_alloc() calculate the size of the pagevec array, and initialise data->npages. Also get rid of the redundant argument to nfs_commit_alloc(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
79558f36 |
|
22-Aug-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Fix issue with EIO on NFS read The problem is that we may be caching writes that would extend the file and create a hole in the region that we are reading. In this case, we need to detect the eof from the server, ensure that we zero out the pages that are part of the hole and mark them as up to date. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 856b603b01b99146918c093969b6cb1b1b0f1c01 commit)
|
#
e4e20512 |
|
03-Aug-2006 |
Adrian Bunk <bunk@stusta.de> |
NFS: make 2 functions static nfs_writedata_free() and nfs_readdata_free() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
266bee88 |
|
27-Jun-2006 |
David Brownell <david-b@pacbell.net> |
[PATCH] fix static linking of NFS Builds on ARM report link problems with common configurations like statically linked NFS (for nfsroot). The symptom is that __init section code references __exit section code; that won't work since the exit sections are discarded (since they can never be called). The best fix for these particular cases would be an "__init_or_exit" section annotation. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f7b422b1 |
|
09-Jun-2006 |
David Howells <dhowells@redhat.com> |
NFS: Split fs/nfs/inode.c As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached patch splits it up into a number of files: (*) fs/nfs/inode.c Strictly inode specific functions. (*) fs/nfs/super.c Superblock management functions for NFS and NFS4, normal access, clones and referrals. The NFS4 superblock functions _could_ move out into a separate conditionally compiled file, but it's probably not worth it as there're so many common bits. (*) fs/nfs/namespace.c Some namespace-specific functions have been moved here. (*) fs/nfs/nfs4namespace.c NFS4-specific namespace functions (this could be merged into the previous file). This file is conditionally compiled. (*) fs/nfs/internal.h Inter-file declarations, plus a few simple utility functions moved from fs/nfs/inode.c. Additionally, all the in-.c-file externs have been moved here, and those files they were moved from now includes this file. For the most part, the functions have not been changed, only some multiplexor functions have changed significantly. I've also: (*) Added some extra banner comments above some functions. (*) Rearranged the function order within the files to be more logical and better grouped (IMO), though someone may prefer a different order. (*) Reduced the number of #ifdefs in .c files. (*) Added missing __init and __exit directives. Signed-Off-By: David Howells <dhowells@redhat.com>
|
#
0d0b5cb3 |
|
24-May-2006 |
Chuck Lever <cel@netapp.com> |
NFS: Optimize allocation of nfs_read/write_data structures Clean up use of page_array, and fix an off-by-one error noticed by Tom Talpey which causes kmalloc calls in cases where using the page_array is sufficient. Test plan: Normal client functional testing with r/wsize=32768. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1de3fc12 |
|
24-May-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Clean up and fix page zeroing when we have short reads The code that is supposed to zero the uninitialised partial pages when the server returns a short read is currently broken: it looks at the nfs_page wb_pgbase and wb_bytes fields instead of the equivalent nfs_read_data values when deciding where to start truncating the page. Also ensure that we are more careful about setting PG_uptodate before retrying a short read: the retry will change the nfs_read_data args.pgbase and args.count. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
93d2341c |
|
26-Mar-2006 |
Matthew Dobson <colpatch@us.ibm.com> |
[PATCH] mempool: use mempool_create_slab_pool() Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3feb2d49 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
ec06c096 |
|
20-Mar-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup of NFS read code Same callback hierarchy inversion as for the NFS write calls. This patch is not strictly speaking needed by the O_DIRECT code, but avoids confusing differences between the asynchronous read and write code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
91d5b470 |
|
20-Mar-2006 |
Chuck Lever <cel@netapp.com> |
NFS: add I/O performance counters Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
40859d7e |
|
30-Nov-2005 |
Chuck Lever <cel@netapp.com> |
NFS: support large reads and writes on the wire Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
963d8fe5 |
|
03-Jan-2006 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
RPC: Clean up RPC task structure Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
d530838b |
|
04-Nov-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFSv4: Fix problem with OPEN_DOWNGRADE RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny bits specified must be exactly equal to the union of the share_access and share_deny bits specified for some subset of the OPENs in effect for current openowner on the current file. Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file with O_WRONLY access mode. Fix the problem by replacing nfs4_find_state() with a modified version of nfs_find_open_context(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
0e574af1 |
|
27-Oct-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
NFS: Cleanup initialisation of struct nfs_fattr Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
10d2c46f |
|
22-Sep-2005 |
Nick Wilson <njw@osdl.org> |
[PATCH] NFS: fix client oops when debugging is on nfs_readpage_release() causes an oops while accessing a file with NFS debugging turned on (echo 32767 > /proc/sys/sunrpc/nfs_debug) and a kernel built with CONFIG_DEBUG_SLAB. This patch moves the debugging statement above nfs_release_request() to avoid accessing freed memory. Signed-off-by: Nick Wilson <njw@osdl.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dc59250c |
|
18-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: Introduce the use of inode->i_lock to protect fields in nfsi Down the road we want to eliminate the use of the global kernel lock entirely from the NFS client. To do this, we need to protect the fields in the nfs_inode structure adequately. Start by serializing updates to the "cache_validity" field. Note this change addresses an SMP hang found by njw@osdl.org, where processes deadlock because nfs_end_data_update and nfs_revalidate_mapping update the "cache_validity" field without proper serialization. Test plan: Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on large SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
55296809 |
|
18-Aug-2005 |
Chuck Lever <cel@citi.umich.edu> |
[PATCH] NFS: split nfsi->flags into two fields Certain bits in nfsi->flags can be manipulated with atomic bitops, and some are better manipulated via logical bitmask operations. This patch splits the flags field into two. The next patch introduces atomic bitops for one of the fields. Test plan: Millions of fsx ops on SMP clients. Signed-off-by: Chuck Lever <cel@netapp.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c6a556b8 |
|
22-Jun-2005 |
Trond Myklebust <Trond.Myklebust@netapp.com> |
[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|