#
4d927b03 |
|
20-Dec-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn This function checks whether the filesystem has been been marked to be withdrawn eventually or has been withdrawn already. Rename this function to avoid confusing code like checking for gfs2_withdrawing() when gfs2_withdrawn() has already returned true. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
1181f2d9 |
|
13-Nov-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix inode_go_instantiate description Fixes a "function parameter or member gl not described in inode_go_instantiate" warning. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
bbacb395 |
|
01-Sep-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Remove freeze_go_demote_ok Before commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the freeze glock was kept around in the glock cache in shared mode without being actively held while a filesystem is in thawed state. In that state, memory pressure could have eventually evicted the freeze glock, and the freeze_go_demote_ok callback was needed to prevent that from happening. With the freeze / thaw rework, the freeze glock is now always actively held in shared mode while a filesystem is thawed, and the freeze_go_demote_ok hack is no longer needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
580f721b |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
gfs2: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-38-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
52954b75 |
|
11-Sep-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix another freeze/thaw hang On a thawed filesystem, the freeze glock is held in shared mode. In order to initiate a cluster-wide freeze, the node initiating the freeze drops the freeze glock and grabs it in exclusive mode. The other nodes recognize this as contention on the freeze glock; function freeze_go_callback is invoked. This indicates to them that they must freeze the filesystem locally, drop the freeze glock, and then re-acquire it in shared mode before being able to unfreeze the filesystem locally. While a node is trying to re-acquire the freeze glock in shared mode, additional contention can occur. In that case, the node must behave in the same way as above. Unfortunately, freeze_go_callback() contains a check that causes it to bail out when the freeze glock isn't held in shared mode. Fix that to allow the glock to be unlocked or held in shared mode. In addition, update a reference to trylock_super() which has been renamed to super_trylock_shared() in the meantime. Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
e7beb8b6 |
|
23-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rename SDF_DEACTIVATING to SDF_KILL Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious that this relates to the kill_sb filesystem operation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
8a8b8d91 |
|
05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
gfs2: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-45-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
f246dd4b |
|
21-Jun-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs: Get rid of unnucessary locking in inode_go_dump Commit 27a2660f1ef9 ("gfs2: Dump nrpages for inodes and their glocks") added some locking around reading inode->i_data.nrpages. That locking doesn't do anything really, so get rid of it. With that, the glock argument to ->go_dump() can be made const again as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b77b4a48 |
|
14-Nov-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rework freeze / thaw logic So far, at mount time, gfs2 would take the freeze glock in shared mode and then immediately drop it again, turning it into a cached glock that can be reclaimed at any time. To freeze the filesystem cluster-wide, the node initiating the freeze would take the freeze glock in exclusive mode, which would cause the freeze glock's freeze_go_sync() callback to run on each node. There, gfs2 would freeze the filesystem and schedule gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the freeze glock in shared mode, thaw the filesystem, and drop the freeze glock again. The initiating node would keep the freeze glock held in exclusive mode. To thaw the filesystem, the initiating node would drop the freeze glock again, which would allow gfs2_freeze_func() to resume on all nodes, leaving the filesystem in the thawed state. It turns out that in freeze_go_sync(), we cannot reliably and safely freeze the filesystem. This is primarily because the final unmount of a filesystem takes a write lock on the s_umount rw semaphore before calling into gfs2_put_super(), and freeze_go_sync() needs to call freeze_super() which also takes a write lock on the same semaphore, causing a deadlock. We could work around this by trying to take an active reference on the super block first, which would prevent unmount from running at the same time. But that can fail, and freeze_go_sync() isn't actually allowed to fail. To get around this, this patch changes the freeze glock locking scheme as follows: At mount time, each node takes the freeze glock in shared mode. To freeze a filesystem, the initiating node first freezes the filesystem locally and then drops and re-acquires the freeze glock in exclusive mode. All other nodes notice that there is contention on the freeze glock in their go_callback callbacks, and they schedule gfs2_freeze_func() to run. There, they freeze the filesystem locally and drop and re-acquire the freeze glock before re-thawing the filesystem. This is happening outside of the glock state engine, so there, we are allowed to fail. From a cluster point of view, taking and immediately dropping a glock is indistinguishable from taking the glock and only dropping it upon contention, so this new scheme is compatible with the old one. Thanks to Li Dong <lidong@vivo.com> for reporting a locking bug in gfs2_freeze_func() in a previous version of this commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
644f6bf7 |
|
21-Apr-2023 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: gfs2_ail_empty_gl no log flush on error Before this patch, function gfs2_ail_empty_gl called gfs2_log_flush even in cases where it encountered an error. It should probably skip the log flush and leave the file system in an inconsistent state, letting a subsequent withdraw force the journal to be replayed to reestablish metadata consistency. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b97e583c |
|
21-Apr-2023 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Issue message when revokes cannot be written Before this patch, function gfs2_ail_empty_gl would silently return an error to the caller. This would get silently set into sd_log_error which would cause a withdraw, but there was no indication why the file system was withdrawn. This patch adds a fs_err to log the appropriate error message. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
24ab1582 |
|
21-Apr-2023 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: return errors from gfs2_ail_empty_gl Before this patch, function gfs2_ail_empty_gl did not return errors it encountered from __gfs2_trans_begin. Those errors usually came from the fact that the file system was made read-only, often due to unmount (but theoretically could be due to -o remount,ro), which prevented the transaction from starting. The inability to start a transaction prevented its revokes from being properly written to the journal for glocks during unmount (and transition to ro). That meant glocks could be unlocked without the metadata properly revoked in the journal. So other nodes could grab the glock thinking that their lvb values were correct, but in fact corresponded to the glock without its revokes properly synced. That presented as lvb mismatch errors. This patch allows gfs2_ail_empty_gl to return the error properly to the caller. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
55534c09 |
|
13-Apr-2023 |
Markus Elfring <Markus.Elfring@web.de> |
gfs2: Move variable assignment behind a null pointer check in inode_go_dump Since commit 27a2660f1ef9 ("gfs2: Dump nrpages for inodes and their glocks"), inode_go_dump() computes the address of inode within ip before checking if ip is NULL. This isn't a bug by itself, but it can give rise to bugs later. Avoid that by checking if ip is NULL first. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
cfcdb5ba |
|
27-Mar-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix inode height consistency check The maximum allowed height of an inode's metadata tree depends on the filesystem block size; it is lower for bigger-block filesystems. When reading in an inode, make sure that the height doesn't exceed the maximum allowed height. Arrays like sd_heightsize are sized to be big enough for any filesystem block size; they will often be slightly bigger than what's needed for a specific filesystem. Reported-by: syzbot+45d4691b1ed3c48eba05@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
11551cf1 |
|
15-Dec-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
gfs2: replace obvious uses of b_page with b_folio These places just use b_page to get to the buffer's address_space. Link: https://lkml.kernel.org/r/20221215214402.3522366-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
6c0246a9 |
|
05-Dec-2022 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Cease delete work during unmount Add a check to delete_work_func() so that it quits when it finds that the filesystem is deactivating. This speeds up the delete workqueue draining in gfs2_kill_sb(). In addition, make sure that iopen_go_callback() won't queue any new delete work while the filesystem is deactivating. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
fd5f446f |
|
24-Jan-2023 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: check gl_object in rgrp glops Function gfs2_clear_rgrpd() is called during unmount to free all rgrps and their sub-objects. If the rgrp glock is held (e.g. in SH) it calls gfs2_glock_cb() to unlock, then calls flush_delayed_work() to make sure any glock work is finished. However, there is a race with other cluster nodes who may request the rgrp glock in another mode (say, EX). Func gfs2_clear_rgrpd() calls glock_clear_object() which sets gl_object to NULL but that's done without holding the gl_lockref spin_lock. While the lock is not held Another node's demote request can cause the state machine to run again, and since the gl_lockref is released in do_xmote, the second process's call to do_xmote can call go_inval (rgrp_go_inval) after the gl_object has been cleared, which results in NULL pointer reference of the rgrp glock's gl_object. Other go_inval glops functions don't require the gl_object to exist, as evidenced by function inode_go_inval() which explicitly checks for if (ip) before referencing gl_object. This patch does the same thing for rgrp glocks. Both the go_inval and go_sync ops are patched to check the existence of gl_object (rgd) before trying to dereference it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f0e56edc |
|
20-Dec-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Split the two kinds of glock "delete" work Function delete_work_func() is used for two purposes: * to immediately try to evict the glock's inode, and * to verify after a little while that the inode has been deleted as expected, and didn't just get skipped. These two operations are not separated very well, so introduce two new glock flags to improved that. Split gfs2_queue_delete_work() into gfs2_queue_try_to_evict and gfs2_queue_verify_evict(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
3056dc46 |
|
09-Dec-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of GLF_PENDING_DELETE flag Get rid of the GLF_PENDING_DELETE glock flag introduced by commit a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"). The only use of that flag is to prevent the iopen glock from being demoted (i.e., unlocked) while delete work is pending. It turns out that demoting the iopen glock while delete work is pending is perfectly fine; we only need to make sure that the glock isn't being freed while still in use. This is ensured by the previous patch because delete_work_func() owns a reference while the work is queued or running. With these changes, gfs2_queue_delete_work() no longer takes the glock spin lock, so we can use it in iopen_go_callback() instead of open-coding it there. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
70376c7f |
|
04-Dec-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Always check inode size of inline inodes Check if the inode size of stuffed (inline) inodes is within the allowed range when reading inodes from disk (gfs2_dinode_in()). This prevents us from on-disk corruption. The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just truncate inline data to the maximum allowed size don't actually make sense, and they can be removed now as well. Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
7db35444 |
|
04-Dec-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Cosmetic gfs2_dinode_{in,out} cleanup In each of the two functions, add an inode variable that points to &ip->i_inode and use that throughout the rest of the function. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5f38a4d3 |
|
09-Jun-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Make go_instantiate take a glock Make go_instantiate take a glock instead of a glock holder as its argument: this handler is supposed to instantiate the object associated with the glock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
86c30a01 |
|
10-Jun-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Add new go_held glock operation Right now, inode_go_instantiate() contains functionality that relates to how a glock is held rather than the glock itself, like waiting for pending direct I/O to complete and completing interrupted truncates. This code is meant to be run each time a holder is acquired, but go_instantiate is actually only called once, when the glock is instantiated. To fix that, introduce a new go_held glock operation that is called each time a glock holder is acquired. Move the holder specific code in inode_go_instantiate() over to inode_go_held(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
de3f906f |
|
02-Jun-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Revert 'Fix "truncate in progress" hang' Now that interrupted truncates are completed in the context of the process taking the glock, there is no need for the glock state engine to delegate that task to gfs2_quotad or for quotad to perform those truncates anymore. Get rid of the obsolete associated infrastructure. Reverts commit 813e0c46c9e2 ("GFS2: Fix "truncate in progress" hang"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
53d69132 |
|
10-Jun-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Instantiate glocks ouside of glock state engine Instantiate glocks outside of the glock state engine: there is no real reason for instantiating them inside the glock state engine; it only complicates the code. Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait() using the new gfs2_glock_holder_ready() helper. On top of that, the only other place that acquires a glock without using gfs2_glock_wait() or gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call gfs2_glock_holder_ready() there as well. If a dinode has a pending truncate, the glock-specific instantiate function for inodes wakes up the truncate function in the quota daemon. Waiting for the completion of the truncate was previously done by the glock state engine, but we now need to wait in inode_go_instantiate(). This also means that gfs2_instantiate() will now no longer return any "special" error codes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
74382e27 |
|
14-Dec-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: dump inode object for iopen glocks Before this patch, glock dumps would not dump the gl_object for iopen glocks. This information can help us debug problems related to eviction: when AN iopen glock is blocked we can see the status of its underlying inode and its flags, etc. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
1d05ee7e |
|
12-Nov-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: remove redundant set of INSTANTIATE_NEEDED Function rgrp_go_inval calls gfs2_rgrp_brelse to invalidate the in-core rgrp structures. After the call it set GLF_INSTANTIATE_NEEDED, which is redundant, since gfs2_rgrp_brelse also sets it. This patch simply removes the redundant set_bit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4b3113a2 |
|
05-Oct-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: remove RDF_UPTODATE flag The new GLF_INSTANTIATE_NEEDED flag obsoletes the old rgrp flag GFS2_RDF_UPTODATE, so this patch replaces it like we did with inodes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
ec1d398d |
|
05-Oct-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Eliminate GIF_INVALID flag With the addition of the new GLF_INSTANTIATE_NEEDED flag, the GIF_INVALID flag is now redundant. This patch removes it. Since inode_instantiate is only called when instantiation is needed, the check in inode_instantiate is removed too. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f2e70d8f |
|
06-Oct-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix GL_SKIP node_scope problems Before this patch, when a glock was locked, the very first holder on the queue would unlock the lockref and call the go_instantiate glops function (if one existed), unless GL_SKIP was specified. When we introduced the new node-scope concept, we allowed multiple holders to lock glocks in EX mode and share the lock. But node-scope introduced a new problem: if the first holder has GL_SKIP and the next one does NOT, since it is not the first holder on the queue, the go_instantiate op was not called. Eventually the GL_SKIP holder may call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was still a window of time in which another non-GL_SKIP holder assumes the instantiate function had been called by the first holder. In the case of rgrp glocks, this led to a NULL pointer dereference on the buffer_heads. This patch tries to fix the problem by introducing two new glock flags: GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function needs to be called to "fill in" or "read in" the object before it is referenced. GLF_INSTANTIATE_IN_PROG which is used to determine when a process is in the process of reading in the object. Whenever a function needs to reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate" function. As before, the gl_lockref spin_lock is unlocked during the IO operation, which may take a relatively long amount of time to complete. While unlocked, if another process determines go_instantiate is still needed, it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared, it needs to check GLF_INSTANTIATE_NEEDED again because the other process's go_instantiate operation may not have been successful. Functions that previously called the instantiate sub-functions now call directly into gfs2_instantiate so the new bits are managed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
3278b977 |
|
29-Sep-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: change go_lock to go_instantiate Before this patch, the go_lock glock operations (glops) did not do any actual locking. They were used to instantiate objects, like reading in dinodes and rgrps from the media. This patch renames the functions to go_instantiate for clarity. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c1442f6b |
|
12-Sep-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: move GL_SKIP check from glops to do_promote Before this patch, each individual "go_lock" glock operation (glop) checked the GL_SKIP flag, and if set, would skip further processing. This patch changes the logic so the go_lock caller, function go_promote, checks the GL_SKIP flag before calling the go_lock op in the first place. This avoids having to unnecessarily unlock gl_lockref.lock only to re-lock it again. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
fffe9bee |
|
30-Jul-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Delay withdraw from atomic context Before this patch, if function __gfs2_ail_flush detected an error syncing the ail list, it call gfs2_ail_error which called gfs2_withdraw. Since __gfs2_ail_flush deals with a specific glock, we shouldn't withdraw immediately because the withdraw code (signal_our_withdraw) uses glocks in its processing. This patch changes the call from gfs2_withdraw to gfs2_withdraw_delayed which defers the withdraw until a more appropriate context, such as the logd daemon, discovers the intent to withdraw. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
69a61144 |
|
24-May-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: trivial clean up of gfs2_ail_error This patch does not change function. It adds variable sdp to clean up function gfs2_ail_error and make it more readable. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
9d9b1605 |
|
01-Jun-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix glock recursion in freeze_go_xmote_bh We must not call gfs2_consist (which does a file system withdraw) from the freeze glock's freeze_go_xmote_bh function because the withdraw will try to use the freeze glock, thus causing a glock recursion error. This patch changes freeze_go_xmote_bh to call function gfs2_assert_withdraw_delayed instead of gfs2_consist to avoid recursion. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
4194dec4 |
|
19-May-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix I_NEW check in gfs2_dinode_in Patch 4a378d8a0d96 added a new check for I_NEW inodes, but unfortunately it used the wrong variable, i_flags. This caused GFS2 to withdraw when gfs2_lookup_by_inum needed to refresh an I_NEW inode. This patch switches to use the correct variable, i_state. Fixes: 4a378d8a0d96 ("gfs2: be careful with inode refresh") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c551f66c |
|
30-Mar-2021 |
Lee Jones <lee.jones@linaro.org> |
gfs2: Fix a number of kernel-doc warnings Building the kernel with W=1 results in a number of kernel-doc warnings like incorrect function names and parameter descriptions. Fix those, mostly by adding missing parameter descriptions, removing left-over descriptions, and demoting some less important kernel-doc comments into regular comments. Originally proposed by Lee Jones; improved and combined into a single patch by Andreas. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f68effb3 |
|
19-Mar-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Eliminate gh parameter from go_xmote_bh func The only glock that uses go_xmote_bh glops function is the freeze glock which uses freeze_go_xmote_bh. It does not use its gh parameter, so this patch eliminates the unneeded parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4a378d8a |
|
12-Feb-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
gfs2: be careful with inode refresh 1) gfs2_dinode_in() should *not* touch ->i_rdev on live inodes; even "zero and immediately reread the same value from dinode" is broken - have it overlap with ->release() of char device and you can get all kinds of bogus behaviour. 2) mismatch on inode type on live inodes should be treated as fs corruption rather than blindly setting ->i_mode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2129b428 |
|
17-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Per-revoke accounting in transactions In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c968f578 |
|
29-Jan-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Clean up on-stack transactions Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only needs to be set in the exceptional case of on-stack transactions. Split off __gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded version in gfs2_ail_empty_gl. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
15e20a30 |
|
03-Feb-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Use sb_start_intwrite in gfs2_ail_empty_gl Commit 2e60d7683c8d ("GFS2: update freeze code to use freeze/thaw_super on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite protection for the on-stack transactions in gfs2_ail_empty_gl with no explanation. I can't think of a valid reason for doing that, so revert that change. This simplifies the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f39e7d3a |
|
24-Nov-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Don't freeze the file system during unmount GFS2's freeze/thaw mechanism uses a special freeze glock to control its operation. It does this with a sync glock operation (glops.c) called freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the glops function causes the file system to be frozen. This is intended. However, GFS2's mount and unmount processes also hold the freeze glock to prevent other processes, perhaps on different cluster nodes, from mounting the frozen file system in read-write mode. Before this patch, there was no check in freeze_go_sync for whether a freeze in intended or whether the glock demote was caused by a normal unmount. So it was trying to freeze the file system it's trying to unmount, which ends up in a deadlock. This patch adds an additional check to freeze_go_sync so that demotes of the freeze glock are ignored if they come from the unmount process. Fixes: 20b329129009 ("gfs2: Fix regression in freeze_go_sync") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
515b269d |
|
23-Nov-2020 |
Alexander Aring <aahringo@redhat.com> |
gfs2: set lockdep subclass for iopen glocks This patch introduce a new globs attribute to define the subclass of the glock lockref spinlock. This avoid the following lockdep warning, which occurs when we lock an inode lock while an iopen lock is held: ============================================ WARNING: possible recursive locking detected 5.10.0-rc3+ #4990 Not tainted -------------------------------------------- kworker/0:1/12 is trying to acquire lock: ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20 but task is already holding lock: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&gl->gl_lockref.lock); lock(&gl->gl_lockref.lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/12: #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260 stack backtrace: CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: delete_workqueue delete_work_func Call Trace: dump_stack+0x8b/0xb0 __lock_acquire.cold+0x19e/0x2e3 lock_acquire+0x150/0x410 ? lockref_get+0x9/0x20 _raw_spin_lock+0x27/0x40 ? lockref_get+0x9/0x20 lockref_get+0x9/0x20 delete_work_func+0x188/0x260 process_one_work+0x237/0x540 worker_thread+0x4d/0x3b0 ? process_one_work+0x540/0x540 kthread+0x127/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
16e6281b |
|
22-Nov-2020 |
Alexander Aring <aahringo@redhat.com> |
gfs2: Fix deadlock dumping resource group glocks Commit 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump") introduced additional locking in gfs2_rgrp_go_dump, which is also used for dumping resource group glocks via debugfs. However, on that code path, the glock spin lock is already taken in dump_glock, and taking it again in gfs2_glock2rgrp leads to deadlock. This can be reproduced with: $ mkfs.gfs2 -O -p lock_nolock /dev/FOO $ mount /dev/FOO /mnt/foo $ touch /mnt/foo/bar $ cat /sys/kernel/debug/gfs2/FOO/glocks Fix that by not taking the glock spin lock inside the go_dump callback. Fixes: 0e539ca1bbbe ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
20b32912 |
|
18-Nov-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix regression in freeze_go_sync Patch 541656d3a513 ("gfs2: freeze should work on read-only mounts") changed the check for glock state in function freeze_go_sync() from "gl->gl_state == LM_ST_SHARED" to "gl->gl_req == LM_ST_EXCLUSIVE". That's wrong and it regressed gfs2's freeze/thaw mechanism because it caused only the freezing node (which requests the glock in EX) to queue freeze work. All nodes go through this go_sync code path during the freeze to drop their SHared hold on the freeze glock, allowing the freezing node to acquire it in EXclusive mode. But all the nodes must freeze access to the file system locally, so they ALL must queue freeze work. The freeze_work calls freeze_func, which makes a request to reacquire the freeze glock in SH, effectively blocking until the thaw from the EX holder. Once thawed, the freezing node drops its EX hold on the freeze glock, then the (blocked) freeze_func reacquires the freeze glock in SH again (on all nodes, including the freezer) so all nodes go back to a thawed state. This patch changes the check back to gl_state == LM_ST_SHARED like it was prior to 541656d3a513. Fixes: 541656d3a513 ("gfs2: freeze should work on read-only mounts") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4a55752a |
|
26-Oct-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Split up gfs2_meta_sync into inode and rgrp versions Before this patch, function gfs2_meta_sync called filemap_fdatawrite to write the address space for the metadata being synced. That's great for inodes, but resource groups all point to the same superblock-address space, sdp->sd_aspace. Each rgrp has its own range of blocks on which it should operate. That meant every time an rgrp's metadata was synced, it would write all of them instead of just the range. This patch eliminates function gfs2_meta_sync and tailors specific metasync functions for inodes and rgrps. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
ed3adb37 |
|
20-Oct-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync Once a withdraw has occurred, ignore errors that are the consequence of the withdraw. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
23cfb0c3 |
|
15-Oct-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Eliminate gl_vm The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428 ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores the location of resource groups within their address space. This structure is in a union with iopen glock specific fields. It was introduced because at unmount time, the resource group objects were destroyed before flushing out any pending resource group glock work, and flushing out such work could require flushing / truncating the address space. Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"), any pending resource group glock work is flushed out before destroying the resource group objects. So the resource group objects will now always exist in rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values where needed instead of caching them. This also eliminates the union. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
0e539ca1 |
|
06-Oct-2020 |
Andrew Price <anprice@redhat.com> |
gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump When an rindex entry is found to be corrupt, compute_bitstructs() calls gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this: gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf); gfs2_rgrp_dump then dereferences the gl without checking it and we get BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280 because there's no rgrp glock involved while reading the rindex on mount. Fix this by changing gfs2_rgrp_dump to take an rgrp argument. Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
541656d3 |
|
25-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: freeze should work on read-only mounts Before this patch, function freeze_go_sync, called when promoting the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag. That's only set for read-write mounts. Read-only mounts don't use a journal, so the bit is never set, so the freeze never happened. This patch removes the check for SDF_JOURNAL_LIVE for freeze requests but still checks it when deciding whether to flush a journal. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
cbcc89b6 |
|
05-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: initialize transaction tr_ailX_lists earlier Since transactions may be freed shortly after they're created, before a log_flush occurs, we need to initialize their ail1 and ail2 lists earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush(). This moves the initialization to the point when the transaction is first created. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
a0e3cc65 |
|
16-Jan-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Turn gl_delete into a delayed work This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f286d627 |
|
13-Jan-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Keep track of deleted inode generations in LVBs When deleting an inode, keep track of the generation of the deleted inode in the inode glock Lock Value Block (LVB). When trying to delete an inode remotely, check the last-known inode generation against the deleted inode generation to skip duplicate remote deletes. This avoids taking the resource group glock in order to verify the block type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
bbae10fa |
|
08-May-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Don't ignore inode write errors during inode_go_sync Before for this patch, function inode_go_sync ignored io errors during inode_go_sync, overwriting them with metadata write errors: error = filemap_fdatawait(mapping); mapping_set_error(mapping, error); } error = filemap_fdatawait(metamapping); ... return error; So any errors returned by the inode write would be forgotten if the metadata write succeeded. This patch still does both writes, but only sets error if it's still zero. That way, any errors will be reported by to the caller, do_xmote, which will take appropriate action and report the error. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
1c634f94 |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Do proper error checking for go_sync family of glops functions Before this patch, function do_xmote would try to sync out the glock dirty data by calling the appropriate glops function XXX_go_sync() but it did not check for a good return code. If the sync was not possible due to an io error or whatever, do_xmote would continue on and call go_inval and release the glock to other cluster nodes. When those nodes go to replay the journal, they may already be holding glocks for the journal records that should have been synced, but were not due to the ignored error. This patch introduces proper error code checking to the go_sync family of glops functions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
9ff78289 |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Before this patch, if gfs2_ail_empty_gl saw there was nothing on the ail list, it would return and not flush the log. The problem is that there could still be a revoke for the rgrp sitting on the sd_log_le_revoke list that's been recently taken off the ail list. But that revoke still needs to be written, and the rgrp_go_inval still needs to call log_flush_wait to ensure the revokes are all properly written to the journal before we relinquish control of the glock to another node. If we give the glock to another node before we have this knowledge, the node might crash and its journal replayed, in which case the missing revoke would allow the journal replay to replay the rgrp over top of the rgrp we already gave to another node, thus overwriting its changes and corrupting the file system. This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather than returning. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
33dbd1e4 |
|
22-May-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix infinite loop when checking ail item count before go_inval Before this patch, the rgrp_go_inval and inode_go_inval functions each checked if there were any items left on the ail count (by way of a count), and if so, did a withdraw. But the withdraw code now uses glocks when changing the file system to read-only status. So we can not have glock functions withdrawing or a hang will likely result: The glocks can't be serviced by the work_func if the work_func is busy doing its own withdraw. This patch removes the checks from the go_inval functions and adds a centralized check in do_xmote to warn about the problem and not withdraw, but flag the error so it's eventually caught when the logd daemon eventually runs. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
601ef0d5 |
|
28-Jan-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Force withdraw to replay journals and wait for it to finish When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the file system, and none of the other nodes know about the problem. Later, when the problem is fixed and the withdrawn node is rebooted, it would then discover that its own journal was incomplete, and replay it. However, replaying it at this point is almost guaranteed to introduce corruption because the other nodes are likely to have used affected resource groups that appeared in the journal since the time of the withdraw. Replaying the journal later will overwrite any changes made, and not through any fault of dlm, which was instructed during the withdraw to release those resources. This patch makes file system withdraws seen by the entire cluster. Withdrawing nodes dequeue their journal glock to allow recovery. The remaining nodes check all the journals to see if they are clean or in need of replay. They try to replay dirty journals, but only the journals of withdrawn nodes will be "not busy" and therefore available for replay. Until the journal replay is complete, no i/o related glocks may be given out, to ensure that the replay does not cause the aforementioned corruption: We cannot allow any journal replay to overwrite blocks associated with a glock once it is held. The "live" glock which is now used to signal when a withdraw occurs. When a withdraw occurs, the node signals its withdraw by dequeueing the "live" glock and trying to enqueue it in EX mode, thus forcing the other nodes to all see a demote request, by way of a "1CB" (one callback) try lock. The "live" glock is not granted in EX; the callback is only just used to indicate a withdraw has occurred. Note that all nodes in the cluster must wait for the recovering node to finish replaying the withdrawing node's journal before continuing. To this end, it checks that the journals are clean multiple times in a retry loop. Also note that the withdraw function may be called from a wide variety of situations, and therefore, we need to take extra precautions to make sure pointers are valid before using them in many circumstances. We also need to take care when glocks decide to withdraw, since the withdraw code now uses glocks. Also, before this patch, if a process encountered an error and decided to withdraw, if another process was already withdrawing, the second withdraw would be silently ignored, which set it free to unlock its glocks. That's correct behavior if the original withdrawer encounters further errors down the road. But if secondary waiters don't wait for the journal replay, unlocking glocks will allow other nodes to use them, despite the fact that the journal containing those blocks is being replayed. The replay needs to finish before our glocks are released to other nodes. IOW, secondary withdraws need to wait for the first withdraw to finish. For example, if an rgrp glock is unlocked by a process that didn't wait for the first withdraw, a journal replay could introduce file system corruption by replaying a rgrp block that has already been granted to a different cluster node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
a72d2401 |
|
13-Jun-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Allow some glocks to be used during withdraw We need to allow some glocks to be enqueued, dequeued, promoted, and demoted when we're withdrawn. For example, to maintain metadata integrity, we should disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like iopen or the transaction glocks may be safely used because none of their metadata goes through the journal. So in general, we should disallow all glocks with an address space, and allow all the others. One exception is: we need to allow our active journal to be demoted so others may recover it. Allowing glocks after withdraw gives us the ability to take appropriate action (in a following patch) to have our journal properly replayed by another node rather than just abandoning the current transactions and pretending nothing bad happened, leaving the other nodes free to modify the blocks we had in our journal, which may result in file system corruption. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
b3422cac |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Rework how rgrp buffer_heads are managed Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
badb55ec |
|
23-Jan-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Split gfs2_lm_withdraw into two functions Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
2e9eeaa1 |
|
13-Dec-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: eliminate ssize parameter from gfs2_struct2blk Every caller of function gfs2_struct2blk specified sizeof(u64). This patch eliminates the unnecessary parameter and replaces the size calculation with a new superblock variable that is computed to be the maximum number of block pointers we can fit inside a log descriptor, as is done for pointers per dinode and indirect block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
eb43e660 |
|
14-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Introduce function gfs2_withdrawn Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN bit to call it. This does not change the logic or function of gfs2, and it facilitates later improvements to the withdraw sequence. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
098b9c14 |
|
04-Oct-2019 |
Aliasgar Surti <aliasgar.surti500@gmail.com> |
gfs2: removed unnecessary semicolon There is use of unnecessary semicolon after switch case. Removed the semicolon. Signed-off-by: Aliasgar Surti <aliasgar.surti500@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f29e62ee |
|
13-May-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: replace more printk with calls to fs_info and friends This patch replaces a few leftover printk errors with calls to fs_info and similar, so that the file system having the error is properly logged. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
3792ce97 |
|
09-May-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: dump fsid when dumping glock problems Before this patch, if a glock error was encountered, the glock with the problem was dumped. But sometimes you may have lots of file systems mounted, and that doesn't tell you which file system it was for. This patch adds a new boolean parameter fsid to the dump_glock family of functions. For non-error cases, such as dumping the glocks debugfs file, the fsid is not dumped in order to keep lock dumps and glocktop as clean as possible. For all error cases, such as GLOCK_BUG_ON, the file system id is now printed. This will make it easier to debug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
04aea0ca |
|
07-May-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN Before this patch, the superblock flag indicating when a file system is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to the more obvious SDF_WITHDRAWN. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
7336d0e6 |
|
31-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f4686c26 |
|
02-May-2019 |
Abhi Das <adas@redhat.com> |
gfs2: read journal in large chunks Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
23e93c9b |
|
13-Feb-2019 |
Bob Peterson <rpeterso@redhat.com> |
Revert "gfs2: read journal in large chunks to locate the head" This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
27a2660f |
|
18-Apr-2018 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Dump nrpages for inodes and their glocks This patch is based on an idea from Steve Whitehouse. The idea is to dump the number of pages for inodes in the glock dumps. The additional locking required me to drop const from quite a few places. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
2a5f14f2 |
|
09-Nov-2018 |
Abhi Das <adas@redhat.com> |
gfs2: read journal in large chunks to locate the head Use bio(s) to read in the journal sequentially in large chunks and locate the head of the journal. This version addresses the issues Christoph pointed out w.r.t error handling and using deprecated API. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
|
#
95582b00 |
|
08-May-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
vfs: change inode times to use struct timespec64 struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
|
#
805c0907 |
|
08-Jan-2018 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Log the reason for log flushes in every log header This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c1696fb8 |
|
16-Jan-2018 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Introduce new gfs2_log_header_v2 This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
eebd2e81 |
|
01-Aug-2017 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of gfs2_set_nlink Remove gfs2_set_nlink which prevents the link count of an inode from becoming non-zero once it has reached zero. The next commit reduces the amount of waiting on glocks when an inode is evicted from memory. With that, an inode can become reallocated before all the remote-unlink callbacks from a previous delete are processed, which causes the link count to change from zero to non-zero. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
e7cb550d |
|
21-Jul-2017 |
Wang Xibo <wang.xibo@zte.com.cn> |
GFS2: fix code parameter error in inode_go_lock In inode_go_lock() function, the parameter order of list_add() is error. According to the define of list_add(), the first parameter is new entry and the second is the list head, so ip->i_trunc_list should be the first parameter and the sdp->sd_trunc_list should be second. Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn> Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
bc98a42c |
|
17-Jul-2017 |
David Howells <dhowells@redhat.com> |
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
6f6597ba |
|
30-Jun-2017 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Protect gl->gl_object by spin lock Put all remaining accesses to gl->gl_object under the gl->gl_lockref.lock spinlock to prevent races. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
4fd1a579 |
|
30-Jun-2017 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of flush_delayed_work in gfs2_evict_inode So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock work queue to make sure that inode glops which dereference gl->gl_object have finished running before the inode is destroyed. However, flushing the work queue may do more work than needed, and in particular, it may call into DLM, which we want to avoid here. Use a bit lock (GIF_GLOP_PENDING) to synchronize between the inode glops and gfs2_evict_inode instead to get rid of the flushing. In addition, flush the work queues of existing glocks before reusing them for new inodes to get those glocks into a known state: the glock state engine currently doesn't handle glock re-appropriation correctly. (We may be able to fix the glock state engine instead later.) Based on a patch by Steven Whitehouse <swhiteho@redhat.com>. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
73cc8625 |
|
01-Apr-2016 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Get rid of dead code in inode_go_demote_ok Function inode_go_demote_ok had some code that was only executed if gl_holders was not empty. However, if gl_holders was not empty, the only caller, demote_ok(), returns before inode_go_demote_ok would ever be called. Therefore, it's dead code, so I removed it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f39814f6 |
|
24-Dec-2015 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Invalid security labels of inodes when they go invalid When gfs2 releases the glock of an inode, it must invalidate all information cached for that inode, including the page cache and acls. Use the new security_inode_invalidate_secctx hook to also invalidate security labels in that case. These items will be reread from disk when needed after reacquiring the glock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com [PM: fixed spelling errors and description line lengths] Signed-off-by: Paul Moore <pmoore@redhat.com>
|
#
f3dd1649 |
|
29-Oct-2015 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Remove gl_spin define Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in gl_lockref). Remove that define to make the references to gl_lockref.lock more obvious. Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
15562c43 |
|
16-Mar-2015 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Move glock superblock pointer to field gl_name What uniquely identifies a glock in the glock hash table is not gl_name, but gl_name and its superblock pointer. This patch makes the gl_name field correspond to a unique glock identifier. That will allow us to simplify hashing with a future patch, since the hash algorithm can then take the gl_name and hash its components in one operation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
39b0f1e9 |
|
05-Jun-2015 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Don't brelse rgrp buffer_heads every allocation This patch allows the block allocation code to retain the buffers for the resource groups so they don't need to be re-read from buffer cache with every request. This is a performance improvement that's especially noticeable when resource groups are very large. For example, with 2GB resource groups and 4K blocks, there can be 33 blocks for every resource group. This patch allows those 33 buffers to be kept around and not read in and thrown away with every operation. The buffers are released when the resource group is either synced or invalidated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
|
#
e7ccaf5f |
|
12-Jun-2015 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Don't add all glocks to the lru The glocks used for resource groups often come and go hundreds of thousands of times per second. Adding them to the lru list just adds unnecessary contention for the lru_lock spin_lock, especially considering we're almost certainly going to re-use the glock and take it back off the lru microseconds later. We never want the glock shrinker to cull them anyway. This patch adds a new bit in the glops that determines which glock types get put onto the lru list and which ones don't. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
2e60d768 |
|
13-Nov-2014 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: update freeze code to use freeze/thaw_super on all nodes The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d29c0afe |
|
03-Oct-2014 |
Fabian Frederick <fabf@skynet.be> |
GFS2: use _RET_IP_ instead of (unsigned long)__builtin_return_address(0) use macro definition Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
6b49d1d9 |
|
28-Jun-2014 |
Geert Uytterhoeven <geert@linux-m68k.org> |
GFS2: memcontrol: Spelling s/invlidate/invalidate/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: cluster-devel@redhat.com Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
24972557 |
|
01-May-2014 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
4e857c58 |
|
17-Mar-2014 |
Peter Zijlstra <peterz@infradead.org> |
arch: Mass conversion of smp_mb__*() Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
d69a3c65 |
|
21-Feb-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Move log buffer lists into transaction Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ac3beb6a |
|
16-Jan-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Don't use ENOBUFS when ENOMEM is the correct error code Al Viro has tactfully pointed out that we are using the incorrect error code in some cases. This patch fixes that, and also removes the (unused) return value for glock dumping. > * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since > we don't support STREAMS it's probably fair game, but... what the hell? Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
|
#
70d4ee94 |
|
06-Dec-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use only a single address space for rgrps Prior to this patch, GFS2 had one address space for each rgrp, stored in the glock. This patch changes them to use a single address space in the super block. This therefore saves (sizeof(struct address_space) * nr_of_rgrps) bytes of memory and for large filesystems, that can be significant. It would be nice to be able to do something similar and merge the inode metadata address space into the same global address space. However, that is rather more complicated as the on-disk location doesn't have a 1:1 mapping with the inodes in general. So while it could be done, it will be a more complicated operation as it requires changing a lot more code paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7005c3e4 |
|
06-Dec-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use range based functions for rgrp sync/invalidation Each rgrp header is represented as a single extent on disk, so we can calculate the position within the address space, since we are using address spaces mapped 1:1 to the disk. This means that it is possible to use the range based versions of filemap_fdatawrite/wait and for invalidating the page cache. Our eventual intent is to then be able to merge the address spaces used for rgrps into a single address space, rather than to have one for each glock, saving memory and reducing complexity. Since during umount, the rgrp structures are disposed of before the glocks, we need to store the extent information in the glock so that is is available for a final invalidation. This patch uses a field which is otherwise unused in rgrp glocks to do that, so that we do not have to expand the size of a glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
582d2f7a |
|
19-Dec-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Wait for async DIO in glock state changes We need to wait for any outstanding DIO to complete in a couple of situations. Firstly, in case we are changing out of deferred mode (in inode_go_sync) where GLF_DIRTY will not be set. That call could be prefixed with a test for gl_state == LM_ST_DEFERRED but it doesn't seem worth it bearing in mind that the test for outstanding DIO is very quick anyway, in the usual case that there is none. The second case is in inode_go_lock which will catch the cases where we have a cached EX lock, but where we grant deferred locks against it so that there is no glock state transistion. We only need to wait if the state is not deferred, since DIO is valid anyway in that state. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e66cf161 |
|
15-Oct-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use lockref for glocks Currently glocks have an atomic reference count and also a spinlock which covers various internal fields, such as the state. This intent of this patch is to replace the spinlock and the atomic reference count with a lockref structure. This contains a spinlock which we can continue to use as before, and a reference counter which is used in conjuction with the spinlock to replace the previous atomic counter. As a result of this there are some new rules for reference counting on glocks. We need to distinguish between reference count changes under gl_spin (which are now just increment or decrement of the new counter, provided the count cannot hit zero) and those which are outside of gl_spin, but which now take gl_spin internally. The conversion is relatively straight forward. There is probably some further clean up which can be done, but the priority at this stage is to make the change in as simple a manner as possible. A consequence of this change is that the reference count is being decoupled from the lru list processing. This should allow future adoption of the lru_list code with glocks in due course. The reason for using the "dead" state and not just relying on 0 being the "invalid state" is so that in due course 0 ref counts can be allowable. The intent is to eventually be able to remove the ref count changes which are currently hidden away in state_change(). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1bc333f4 |
|
26-Jul-2013 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: don't overrun reserved revokes When run during fsync, a gfs2_log_flush could happen between the time when gfs2_ail_flush checked the number of blocks to revoke, and when it actually started the transaction to do those revokes. This occassionally caused it to need more revokes than it reserved, causing gfs2 to crash. Instead of just reserving enough revokes to handle the blocks that currently need them, this patch makes gfs2_ail_flush reserve the maximum number of revokes it can, without increasing the total number of reserved log blocks. This patch also passes the number of reserved revokes to __gfs2_ail_flush() so that it doesn't go over its limit and cause a crash like we're seeing. Non-fsync calls to __gfs2_ail_flush will still cause a BUG() necessary revokes are skipped. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5d054964 |
|
14-Jun-2013 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: aggressively issue revokes in gfs2_log_flush This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
81ffbf65 |
|
10-Apr-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add origin indicator to glock callbacks This patch adds a bool indicating whether the demote request was originated locally or remotely. This is then used by the iopen ->go_callback() to make 100% sure that it will only respond to remote callbacks. Since ->evict_inode() uses GL_NOCACHE when it attempts to get an exclusive lock on the iopen lock, this may result in extra scheduling of the workqueue in case that the exclusive promotion request failed. This patch prevents that from happening. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d0546426 |
|
31-Jan-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
gfs2: Convert uids and gids between dinodes and vfs inodes. When reading dinodes from the disk convert uids and gids into kuids and kgids to store in vfs data structures. When writing to dinodes to the disk convert kuids and kgids in the in memory structures into plain uids and gids. For now all on disk data structures are assumed to be stored in the initial user namespace. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
dba2d70c |
|
14-Nov-2012 |
David Teigland <teigland@redhat.com> |
GFS2: only use lvb on glocks that need it Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
06dfc306 |
|
24-Oct-2012 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Rename glops go_xmote_th to go_sync [Editorial: This is a nit, but has been a minor irritation for a long time:] This patch renames glops structure item for go_xmote_th to go_sync. The functionality is unchanged; it's just for readability. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
8eae1ca0 |
|
15-Oct-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Review bug traps in glops.c Two of the bug traps here could really be warnings. The others are converted from BUG() to GLOCK_BUG_ON() since we'll most likely need to know the glock state in order to debug any issues which arise. As a result of this, __dump_glock has to be renamed and is no longer static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a0b4df29 |
|
17-Sep-2012 |
Eric Sandeen <sandeen@redhat.com> |
GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl gfs2_ail_empty_gl() contains an "inline version" of gfs2_trans_begin(), so it needs an explicit sb_start_intwrite() as well, to balance the sb_end_intwrite() which will be called by gfs2_trans_end(). With this, xfstest 068 passes on lock_nolock local gfs2. Without it, we reach a writer count of -1 and get stuck. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
6de1e2f3 |
|
27-Apr-2012 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Remove redundant metadata block type check This patch removes a redundant metadata block check. See description below. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c50b91c4 |
|
16-Apr-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Remove bd_list_tr This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bfe86848 |
|
28-Oct-2011 |
Miklos Szeredi <mszeredi@suse.cz> |
filesystems: add set_nlink() Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
b5b24d7a |
|
07-Sep-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix AIL flush issue during fsync Unfortunately, it is not enough to just ignore locked buffers during the AIL flush from fsync. We need to be able to ignore all buffers which are locked, dirty or pinned at this stage as they might have been added subsequent to the log flush earlier in the fsync function. In addition, this means that we no longer need to rely on i_mutex to keep out writes during fsync, so we can, as a side-effect, remove that protection too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-By: Abhijith Das <adas@redhat.com>
|
#
8339ee54 |
|
31-Aug-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Make resource groups "append only" during life of fs Since we have ruled out supporting online filesystem shrink, it is possible to make the resource group list append only during the life of a super block. This gives several benefits: Firstly, we only need to read new rindex elements as they are added rather than needing to reread the whole rindex file each time one element is added. Secondly, the rindex glock can be held for much shorter periods of time, and is completely removed from the fast path for allocations. The lock is taken in shared mode only when updating the resource groups when the first allocation occurs, and after a grow has taken place. Thirdly, this results in a reduction in code size, and everything gets a lot simpler to understand in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7c9ca621 |
|
31-Aug-2011 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme Here is an update of Bob's original rbtree patch which, in addition, also resolves the rather strange ref counting that was being done relating to the bitmap blocks. Originally we had a dual system for journaling resource groups. The metadata blocks were journaled and also the rgrp itself was added to a list. The reason for adding the rgrp to the list in the journal was so that the "repolish clones" code could be run to update the free space, and potentially send any discard requests when the log was flushed. This was done by comparing the "cloned" bitmap with what had been written back on disk during the transaction commit. Due to this, there was a requirement to hang on to the rgrps' bitmap buffers until the journal had been flushed. For that reason, there was a rather complicated set up in the ->go_lock ->go_unlock functions for rgrps involving both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference count on the buffers. However, the journal maintains a reference count on the buffers anyway, since they are being journaled as metadata buffers. So by moving the code which deals with the post-journal accounting for bitmap blocks to the metadata journaling code, we can entirely dispense with the rather strange buffer ref counting scheme and also the requirement to journal the rgrps. The net result of all this is that the ->sd_rindex_spin is left to do exactly one job, and that is to look after the rbtree or rgrps. This patch is designed to be a stepping stone towards using RCU for the rbtree of resource groups, however the reduction in the number of uses of the ->sd_rindex_spin is likely to have benefits for multi-threaded workloads, anyway. The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also be removed in future in favour of calling the functions directly where required in the code. That will allow locking of resource groups without needing to actually read them in - something that could be useful in speeding up statfs. In the mean time though it is valid to dereference ->bi_bh only when the rgrp is locked. This is basically the same rule as before, modulo the references not being valid until the following journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Benjamin Marzinski <bmarzins@redhat.com>
|
#
f1818529 |
|
05-Aug-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix bug trap and journaled data fsync Journaled data requires that a complete flush of all dirty data for the file is done, in order that the ail flush which comes after will succeed. Also the recently enhanced bug trap can trigger falsely in case an ail flush from fsync races with a page read. This updates the bug trap such that it will ignore buffers which are locked and only trigger on dirty and/or pinned buffers when the ail flush is run from fsync. The original bug trap is retained when ail flush is run from ->go_sync() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
75549186 |
|
02-Aug-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix bug-trap in ail flush code The assert was being tested under the wrong lock, a legacy of the original code. Also, if it does trigger, the resulting information was not always a lot of help. This moves the patch under the correct lock and also prints out more useful information in tacking down the source of the problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
9964afbb |
|
16-Jun-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add S_NOSEC support This adds S_NOSEC support to GFS2. We set/reset the flag either when a user calls setattr or when we have just regained the glock from another node. The flag is only set if there are no xattrs on the inode and there is no suid bit set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@ZenIV.linux.org.uk>
|
#
7cf8dcd3 |
|
15-Jun-2011 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Automatically adjust glock min hold time This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
17d539f0 |
|
15-Jun-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Cache dir hash table in a contiguous buffer This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
380f7c65 |
|
14-Jul-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Resolve inode eviction and ail list interaction bug This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>
|
#
1ce53368 |
|
13-Jun-2011 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: force a log flush when invalidating the rindex glock Right now, there is nothing that forces the log to get flushed when a node drops its rindex glock so that another node can grow the filesystem. If the log doesn't get flushed, GFS2 can corrupt the sd_log_le_rg list in the following way. A node puts an rgd on the list in rg_lo_add(), and then the rindex glock is dropped so the other node can grow the filesystem. When the node reacquires the rindex glock, that rgd gets deleted in clear_rgrpdi() before ever being removed from the list by gfs2_log_flush(). This code simply forces a log flush when the rindex glock is invalidated, solving the problem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d4b2cf1b |
|
09-May-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Move gfs2_refresh_inode() and friends into glops.c Eventually there will only be a single caller of this code, so lets move it where it can be made static at some future date. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
dba898b0 |
|
14-Apr-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Clean up fsync() This patch is designed to clean up GFS2's fsync implementation and ensure that it really does get everything on disk. Since ->write_inode() has been updated, we can call that via the vfs library function sync_inode_metadata() and the only remaining thing that has to be done is to ensure that we get any revoke records in the log after the inode has been written back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
001e8e8d |
|
30-Mar-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Don't try to deallocate unlinked inodes when mounted ro This adds a couple of missing tests to avoid read-only nodes from attempting to deallocate unlinked inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Michel Andre de la Porte <madelaporte@ubi.com>
|
#
d6a079e8 |
|
11-Mar-2011 |
Dave Chinner <dchinner@redhat.com> |
GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bc015cb8 |
|
19-Jan-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use RCU for glock hash table This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
#
846f4045 |
|
16-Dec-2010 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Don't flush delete workqueue when releasing the transaction lock There is no requirement to flush the delete workqueue before a gfs2 filesystem is suspended. The workqueue's work will just be suspended along with the rest of the tasks on the filesystem. The resolves a deadlock situation where the transaction lock's demotion code was trying to flush the delete workqueue while at the same time, the workqueue was waiting for the transaction lock. The delete workqueue is flushed by gfs2_make_fs_ro() already, so that umount/remount are correctly protected anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
13466985 |
|
06-Oct-2010 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix type mapping for demote_rq interface Mostly the glock operations follow the type of the glock. The one exception is the transaction glock, so we need to check for that directly. Reported-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a2e0f799 |
|
11-Aug-2010 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Remove i_disksize With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
009d8518 |
|
07-Dec-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Metadata address space clean up Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c65f7fb5 |
|
02-Oct-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use forget_all_cached_acls() Invalidate all the cached ACLs when we drop the glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b94a170e |
|
23-Jul-2009 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: remove dcache entries for remote deleted inodes When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
09010978 |
|
20-May-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Improve resource group error handling This patch improves the error handling in the case where we discover that the summary information in the resource group doesn't match the bitmap information while in the process of allocating blocks. Originally this resulted in a kernel bug, but this patch changes that so that we return -EIO and print some messages explaining what went wrong, and how to fix it. We also remember locally not to try and allocate from the same rgrp again, so that a subsequent allocation in a different rgrp should succeed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
52fcd11c |
|
20-Apr-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Clear dirty bit at end of inode glock sync The dirty bit can get set during the inode glock sync. Its too complicated to change that at the moment, so this is the quick fix - to clear the bit again at the end of the function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
6bac243f |
|
09-Mar-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Clean up of glops.c This cleans up a number of bits of code mostly based in glops.c. A couple of simple functions have been merged into the callers to make it more obvious what is going on, the mysterious raising of i_writecount around the truncate_inode_pages() call has been removed. The meta_go_* operations have been renamed rgrp_go_* since that is the only lock type that they are used with. The unused argument of gfs2_read_sb has been removed. Also a bug has been fixed where a check for the rindex inode was in the wrong callback. More comments are added, and the debugging code is improved too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
64d576ba |
|
12-Feb-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add a "demote a glock" interface to sysfs This adds a sysfs file called demote_rq to GFS2's per filesystem directory. Its possible to use this file to demote arbitrary glocks in exactly the same way as if a request had come in from a remote node. This is intended for testing issues relating to caching of data under glocks. Despite that, the interface is generic enough to send requests to any type of glock, but be careful as its not always safe to send an arbitrary message to an arbitrary glock. For that reason and to prevent DoS, this interface is restricted to root only. The messages look like this: <type>:<glocknumber> <mode> Example: echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq Which means "please demote inode glock (type 2) number 13324 so that I can get an EX (exclusive) lock". The lock modes are those which would normally be sent by a remote node in its callback so if you want to unlock a glock, you use EX, to demote to shared, use SH or PR (depending on whether you like GFS2 or DLM lock modes better!). If the glock doesn't exist, you'll get -ENOENT returned. If the arguments don't make sense, you'll get -EINVAL returned. The plan is that this interface will be used in combination with the blktrace patch which I recently posted for comments although it is, of course, still useful in its own right. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d8348de0 |
|
05-Feb-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix deadlock on journal flush This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f057f6cd |
|
12-Jan-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Merge lock_dlm module into GFS2 This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
97cc1025 |
|
20-Nov-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Kill two daemons with one patch This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
813e0c46 |
|
18-Nov-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix "truncate in progress" hang Following on from the recent clean up of gfs2_quotad, this patch moves the processing of "truncate in progress" inodes from the glock workqueue into gfs2_quotad. This fixes a hang due to the "truncate in progress" processing requiring glocks in order to complete. It might seem odd to use gfs2_quotad for this particular item, but we have to use a pre-existing thread since creating a thread implies a GFP_KERNEL memory allocation which is not allowed from the glock workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd may deadlock if used for this operation. gfs2_scand and gfs2_glockd are both scheduled for removal at some (hopefully not too distant) future point. That leaves only gfs2_quotad whose workload is generally fairly light and is easily adapted for this extra task. Also, as a result of this change, it opens the way for a future patch to make the reading of the inode's information asynchronous with respect to the glock workqueue, which is another improvement that has been on the list for some time now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fa75cedc |
|
10-Nov-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add more detail to debugfs glock dumps Although the glock dumps print quite a lot of information about the glocks themselves, there are more things which can be usefully added to the dump realting to the objects themselves. This patch adds a few more fields to the inode and resource group lines, which should be useful for debugging. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
383f01fb |
|
04-Nov-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Banish struct gfs2_dinode_host The final field in gfs2_dinode_host was the i_flags field. Thats renamed to i_diskflags in order to avoid confusion with the existing inode flags, and moved into the inode proper at a suitable location to avoid creating a "hole". At that point struct gfs2_dinode_host is no longer needed and as promised (quite some time ago!) it can now be removed completely. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
6802e340 |
|
21-May-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up the glock core This patch implements a number of cleanups to the core of the GFS2 glock code. As a result a lot of code is removed. It looks like a really big change, but actually a large part of this patch is either removing or moving existing code. There are some new bits too though, such as the new run_queue() function which is considerably streamlined. Highlights of this patch include: o Fixes a cluster coherency bug during SH -> EX lock conversions o Removes the "glmutex" code in favour of a single bit lock o Removes the ->go_xmote_bh() for inodes since it was duplicating ->go_lock() o We now only use the ->lm_lock() function for both locks and unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED) o The fast path is considerably shortly, giving performance gains especially with lock_nolock o The glock_workqueue is now used for all the callbacks from the DLM which allows us to simplify the lock_dlm module (see following patch) o The way is now open to make further changes such as eliminating the two threads (gfs2_glockd and gfs2_scand) in favour of a more efficient scheme. This patch has undergone extensive testing with various test suites so it should be pretty stable by now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
|
#
091806ed |
|
28-Apr-2008 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] filesystem consistency error from do_strip This patch fixes a GFS2 filesystem consistency error reported from function do_strip. The problem was caused by a timing window that allowed two vfs inodes to be created in memory that point to the same file. The problem is fixed by making the vfs's iget_test, iget_set mechanism check and set a new bit in the in-core gfs2_inode structure while the vfs inode spin_lock is held. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
cf45b752 |
|
31-Jan-2008 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Remove rgrp and glock version numbers This patch further reduces GFS2's memory requirements by eliminating the 64-bit version number fields in lieu of a couple bits. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3042a2cc |
|
02-Nov-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Reorder writeback for glock sync Previously we were doing (write data, wait for data, write metadata, wait for metadata). After this patch we so (write metadata, write data, wait for data, wait for metadata) which should be more efficient. Also I noticed that the drop_bh and xmote_bh functions were almost identical. In fact the only difference was a single test, and that test is such that in the drop_bh case, it would always evaluate to the correct result. As such we can use the xmote_bh functions in all the places where we were using the drop_bh function and remove the drop_bh functions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f91a0d3e |
|
15-Oct-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove useless i_cache from inodes The i_cache was designed to keep references to the indirect blocks used during block mapping so that they didn't have to be looked up continually. The idea failed because there are too many places where the i_cache needs to be freed, and this has in the past been the cause of many bugs. In addition there was no performance benefit being gained since the disk blocks in question were cached anyway. So this patch removes it in order to simplify the code to prepare for other changes which would otherwise have had to add further support for this feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3cc3f710 |
|
15-Oct-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Use ->page_mkwrite() for mmap() This cleans up the mmap() code path for GFS2 by implementing the page_mkwrite function for GFS2. We are thus able to use the generic filemap_fault function for our ->fault() implementation. This now means that shared writable mappings will be much more efficiently shared across the cluster if there is a reasonable proportion of read activity (the greater proportion, the better). As a side effect, it also reduces the size of the code, removes special cases from readpage and readpages, and makes the code path easier to follow. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1ad38c43 |
|
03-Sep-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up gfs2_trans_add_revoke() The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1e1a3d03 |
|
27-Aug-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Introduce gfs2_remove_from_ail This collects together the operations required to remove a gfs2_bufdata from the ail lists. Its only called from two places to start with, but expect to see more of this function in future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c4f68a13 |
|
23-Aug-2007 |
Benjamin Marzinski <bmarzins@redhat.com> |
[GFS2] delay glock demote for a minimum hold time When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in a cluster, it will deadlock. The reason is that do_no_page() will repeatedly call gfs2_sharewrite_nopage(), because each node keeps giving up the glock too early, and is forced to call unmap_mapping_range(). This bumps the mapping->truncate_count sequence count, forcing do_no_page() to retry. This patch institutes a minimum glock hold time a tenth a second. This insures that even in heavy contention cases, the node has enough time to get some useful work done before it gives up the glock. A second issue is that when gfs2_glock_dq() is called from within a page fault to demote a lock, and the associated page needs to be written out, it will try to acqire a lock on it, but it has already been locked at a higher level. This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this issue. This is the same patch as Steve Whitehouse originally proposed to fix this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock before it queues up the work on it. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bb3b0e3d |
|
16-Aug-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up invalidatepage/releasepage This patch fixes some bugs relating to journaled data files by cleaning up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now never block during gfs2_releasepage(), instead we always either release or refuse to release depending on the status of the buffers. This fixes Red Hat bugzillas #248969 and #252392. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
|
#
b524fe64 |
|
02-May-2007 |
Benjamin Marzinski <bmarzins@redhat.com> |
[GFS2] flush the glock completely in inode_go_sync Fix for bz #231910 When filemap_fdatawrite() is called on the inode mapping in data=ordered mode, it will add the glock to the log. In inode_go_sync(), if you do the gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock and its associated data buffers will be on the log again. This means you can demote a lock from exclusive, without having it flushed from the log. The attached patch simply moves the gfs2_log_flush up to after the filemap_fdatawrite() call. Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but that caused me to trip the following assert. GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61 It appears that gfs2_log_flush() puts some of the glocks buffers in the busy state and the filemap_fdatawrite() call is necessary to flush them. This makes me worry slightly that a related problem could happen because of moving the gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that gfs2_ail_empty_gl() would catch that case as well. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3e9f45bd |
|
08-May-2007 |
Guillaume Chazarain <guichaz@yahoo.fr> |
Factor outstanding I/O error handling Cleanup: setting an outstanding error on a mapping was open coded too many times. Factor it out in mapping_set_error(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c3f49bc2 |
|
07-Mar-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix bz 229873, alternate test: assertion "!ip->i_inode.i_mapping->nrpages" failed The following removes an incorrect assertion from the GFS2 glops code. This fixes Red Hat bz 229873. Thanks to Abhijith Das for testing the patch and confirming the fix. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
|
#
cad5b939 |
|
28-Feb-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix bz 230143, incorrect flushing of rgrps The below patch fixes a problem where we were not flushing rgrps correctly. It only occurred in the specific case that a callback was received for an rgrp which was dirty and when a journal log flush had not already resulted in the rgrp being flushed anyway. This fixes Red Hat bz 230143, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
cd354f1a |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b5d32bea |
|
21-Jan-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Tidy up glops calls This patch doesn't make any changes to the ordering of the various operations related to glocking, but it does tidy up the calls to the glops.c functions to make the structure more obvious. The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be made static within glock.c since they are called by every set of glock operations. The xmote_th and drop_th glock operations are then made conditional upon those two routines existing and called from the previously mentioned functions in glock.c respectively. Also it can be seen that the go_sync operation isn't needed since it can easily be replaced by calls to xmote_bh and drop_bh respectively. This results in no longer (confusingly) calling back into routines in glock.c from glops.c and also reducing the glock operations by one member. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1c0f4872 |
|
21-Jan-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove local exclusive glock mode Here is a patch for GFS2 to remove the local exclusive flag. In the places it was used, mutex's are always held earlier in the call path, so it appears redundant in the LM_ST_SHARED case. Also, the GFS2 holders were setting local exclusive in any case where the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock code where the flag was tested have been replaced with tests for the lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well as globally exclusive). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e5dab552 |
|
18-Jan-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove the "greedy" function from glock.[ch] The "greedy" code was an attempt to retain glocks for a minimum length of time when they relate to mmap()ed files. The current implementation of this feature is not, however, ideal in that it required allocating memory in order to do this and its overly complicated. It also misses the mark by ignoring the other I/O operations which are just as likely to suffer from the same problem. So the plan is to remove this now and then add the functionality back as part of the glock state machine at a later date (and thus take into account all the possible users of this feature) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b004157a |
|
23-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix journal flush problem This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1a14d3a6 |
|
20-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Simplify glops functions The go_sync callback took two flags, but one of them was set on every call, so this patch removes once of the flags and makes the previously conditional operations (on this flag), unconditional. The go_inval callback took three flags, each of which was set on every call to it. This patch removes the flags and makes the operations unconditional, which makes the logic rather more obvious. Two now unused flags are also removed from incore.h. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
9e2dbdac |
|
08-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove gfs2_inode_attr_in This function wasn't really doing the right thing. There was no need to update the inode size at this point and the updating of the i_blocks field has now been moved to the places where di_blocks is updated. A result of this patch and some those preceeding it is that unlocking a glock is now a much more efficient process, since there is no longer any requirement to copy data from the gfs2 inode into the vfs inode at this point. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bfded27b |
|
01-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Shrink gfs2_inode (8) - i_vn This shrinks the size of the gfs2_inode by 8 bytes by replacing the version counter with a one bit valid/invalid flag. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b60623c2 |
|
31-Oct-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Shrink gfs2_inode (3) - di_mode This removes the duplicate di_mode field in favour of using the inode->i_mode field. This saves 4 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
55167622 |
|
13-Oct-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[GFS2] split and annotate gfs2_log_head Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ddacfaf7 |
|
03-Oct-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Move logging code into log.c (mostly) This moves the logging code from meta_io.c into log.c and glops.c. As a result the routines can now be static and all the logging code is together in log.c, leaving meta_io.c with just metadata i/o code in it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7276b3b0 |
|
21-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Tidy up meta_io code Fix a bug in the directory reading code, where we might have dereferenced a NULL pointer in case of OOM. Updated the directory code to use the new & improved version of gfs2_meta_ra() which now returns the first block that was being read. Previously it was releasing it requiring following code to grab the block again at each point it was called. Also turned off readahead on directory lookups since we are reading a hash table, and therefore reading the entries in order is very unlikely. Readahead is still used for all other calls to the directory reading function (e.g. when growing the hash table). Removed the DIO_START constant. Everywhere this was used, it was used to unconditionally start i/o aside from a couple of places, so I've removed it and made the couple of exceptions to this rule into separate functions. Also hunted through the other DIO flags and removed them as arguments from functions which were always called with the same combination of arguments. Updated gfs2_meta_indirect_buffer to be a bit more efficient and hopefully also be a bit easier to read. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7d308590 |
|
18-Sep-2006 |
Fabio Massimo Di Nitto <fabbione@ubuntu.com> |
[GFS2] Export lm_interface to kernel headers lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
94610610 |
|
09-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove unused function from glock.c The callback for iopen locks is unused, so this removes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ea67eedb |
|
05-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix end of multi-line structures As per Jan Engelhardt's request, I've added a ',' to the end of each of the multi-line structures which didn't already have one (most already did). Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
75d3b817 |
|
04-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Tidy up bmap/inode code As per Jan Engelhardt's third set of comments, this make various code style changes and moves the structures from format.h into super.c, which was the only place that format.h was actually used. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e9fc2aa0 |
|
01-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update copyright, tidy up incore.h As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5e2b0613 |
|
30-Aug-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove unused code from glock layer Remove the unused sync feature from glocks. This is currently done by calling the required functions to sync pages/blocks directly so this code isn't needed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
8fb4b536 |
|
30-Aug-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Make glock operations const For all the usual reasons of enforcing correctness and potentially reducing code size, this patch makes the glock operations const. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f45b7ddd |
|
27-Jul-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Use a bio to read the superblock This means that we don't need to create a special inode just to contain a struct address_space in order to read a single disk block. Instead we read the disk block directly. Its slightly faster, and uses slightly less memory, but the real reason for doing this is that it removes a special case from the glock code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ba7f7290 |
|
26-Jul-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove page.[ch] The remaining routines in page.c were all only used in one other file, so they are now moved into the files where they are referenced and made static. Thus page.[ch] are no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3a8476dd |
|
19-Jun-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove debugging printks A few of my printks slipped through last time. Also fix a couple of minor bugs. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
feaa7bba |
|
14-Jun-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix unlinked file handling This patch fixes the way we have been dealing with unlinked, but still open files. It removes all limits (other than memory for inodes, as per every other filesystem) on numbers of these which we can support on GFS2. It also means that (like other fs) its the responsibility of the last process to close the file to deallocate the storage, rather than the person who did the unlinking. Note that with GFS2, those two events might take place on different nodes. Also there are a number of other changes: o We use the Linux inode subsystem as it was intended to be used, wrt allocating GFS2 inodes o The Linux inode cache is now the point which we use for local enforcement of only holding one copy of the inode in core at once (previous to this we used the glock layer). o We no longer use the unlinked "special" file. We just ignore it completely. This makes unlinking more efficient. o We now use the 4th block allocation state. The previously unused state is used to track unlinked but still open inodes. o gfs2_inoded is no longer needed o Several fields are now no longer needed (and removed) from the in core struct gfs2_inode o Several fields are no longer needed (and removed) from the in core superblock There are a number of future possible optimisations and clean ups which have been made possible by this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3a8a9a10 |
|
18-May-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update copyright date to 2006 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bd896801 |
|
18-May-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove semaphore.h from C files We no longer use semaphores, everything has been converted to mutex or rwsem, so we don't need to include this header any more. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b09e593d |
|
07-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix a ref count bug and other clean ups This fixes a ref count bug that sometimes showed up a umount time (causing it to hang) but it otherwise mostly harmless. At the same time there are some clean ups including making the log operations structures const, moving a memory allocation so that its not done in the fast path of checking to see if there is an outstanding transaction related to a particular glock. Removes the sd_log_wrap varaible which was updated, but never actually used anywhere. Updates the gfs2 ioctl() to run without the kernel lock (which it never needed anyway). Removes the "invalidate inodes" loop from GFS2's put_super routine. This is done in kill super anyway so we don't need to do it here. The loop was also bogus in that if there are any inodes "stuck" at this point its a bug and we need to know about it rather than hide it by hanging forever. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5c676f6d |
|
27-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Macros removal in gfs2.h As suggested by Pekka Enberg <penberg@cs.helsinki.fi>. The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h The other macros are gone from gfs2.h as (although not requested by Pekka Enberg) are a number of included header file which are now included individually. The inode number comparison function is now an inline function. The DT2IF and IF2DT may be addressed in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7359a19c |
|
12-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix for root inode ref count bug Umount is now working correctly again. The bug was due to not getting an extra ref count when mounting the fs. We should have bumped it by two (once for the internal pointer to the root inode from the super block and once for the inode hanging off the dcache entry for root). Also this patch tidys up the code dealing with looking up and creating inodes. We now pass Linux inodes (with gfs2_inodes attached) rather than the other way around and this reduces code duplication in various places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b3b94faa |
|
16-Jan-2006 |
David Teigland <teigland@redhat.com> |
[GFS2] The core of GFS2 This patch contains all the core files for GFS2. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|