History log of /linux-master/fs/gfs2/glock.c
Revision Date Author Comments
# 4d927b03 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn

This function checks whether the filesystem has been been marked to be
withdrawn eventually or has been withdrawn already. Rename this
function to avoid confusing code like checking for gfs2_withdrawing()
when gfs2_withdrawn() has already returned true.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 015af1af 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Mark withdraws as unlikely

Mark the gfs2_withdrawn(), gfs2_withdrawing(), and
gfs2_withdraw_in_prog() inline functions as likely to return %false.
This allows to get rid of likely() and unlikely() annotations at the
call sites of those functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f9f229c1 13-Nov-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add GL_NOBLOCK flag

Add a GL_NOBLOCK flag for trying to take a glock without sleeping. This
will be used for implementing non-blocking lookup (MAY_NOT_BLOCK in
gfs2_permission, LOOKUP_RCU in gfs2_drevalidate).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 600f111e 17-Nov-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

fs: Rename mapping private members

It is hard to find where mapping->private_lock, mapping->private_list and
mapping->private_data are used, due to private_XXX being a relatively
common name for variables and structure members in the kernel. To fit
with other members of struct address_space, rename them all to have an
i_ prefix. Tested with an allmodconfig build.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20231117215823.2821906-1-willy@infradead.org
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# bb25b975 01-Nov-2023 Su Hui <suhui@nfschina.com>

gfs2: remove dead code in add_to_queue

clang static analyzer complains that value stored to 'gh' is never read.
The code of this line is useless after commit 0b93bac2271e
("gfs2: Remove LM_FLAG_PRIORITY flag"). Remove this code to save space.

Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a304c23c 11-Sep-2023 Qi Zheng <zhengqi.arch@bytedance.com>

gfs2: dynamically allocate the gfs2-glock shrinker

Use new APIs to dynamically allocate the gfs2-glock shrinker.

Link: https://lkml.kernel.org/r/20230911094444.68966-9-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 0ede61d8 29-Sep-2023 Christian Brauner <brauner@kernel.org>

file: convert to SLAB_TYPESAFE_BY_RCU

In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.

The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.

With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.

In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().

In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.

Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.

Cc: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 62862485 12-Sep-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: fix glock shrinker ref issues

Before this patch, function gfs2_scan_glock_lru would only try to free
glocks that had a reference count of 0. But if the reference count ever
got to 0, the glock should have already been freed.

Shrinker function gfs2_dispose_glock_lru checks whether glocks on the
LRU are demote_ok, and if so, tries to demote them. But that's only
possible if the reference count is at least 1.

This patch changes gfs2_scan_glock_lru so it will try to demote and/or
dispose of glocks that have a reference count of 1 and which are either
demotable, or are already unlocked.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e7beb8b6 23-Aug-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename SDF_DEACTIVATING to SDF_KILL

Rename the SDF_DEACTIVATING flag to SDF_KILL to make it more obvious
that this relates to the kill_sb filesystem operation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3c69c437 23-Aug-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename sd_{ glock => kill }_wait

Rename sd_glock_wait to sd_kill_wait: we'll use it for other things
related to "killing" a filesystem on unmount soon (kill_sb).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 66fa9912 25-Jul-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: conversion deadlock do_promote bypass

Consider the following case:
1. A glock is held in shared mode.
2. A process requests the glock in exclusive mode (rename).
3. Before the lock is granted, more processes (read / ls) request the
glock in shared mode again.
4. gfs2 sends a request to dlm for the lock in exclusive mode because
that holder is at the head of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
response with state == LM_ST_SHARED and LM_OUT_CANCELED. So at that
point, the glock is still held in shared mode.
6. finish_xmote gets called to process the response from dlm. It detects
that the glock is not in the requested mode and no demote is in
progress, so it moves the canceled holder to the tail of the queue
and finds the new holder at the head of the queue. That holder is
requesting the glock in shared mode.
7. finish_xmote calls do_xmote to transition the glock into shared mode,
but the glock is already in shared mode and so do_xmote complains
about that with:
GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);

Instead, in finish_xmote, after moving the canceled holder to the tail
of the queue, check if any new holders can be granted. Only call
do_xmote to repeat the dlm request if the holder at the head of the
queue is requesting the glock in a mode that is incompatible with the
mode the glock is currently held in.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0b93bac2 08-Aug-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove LM_FLAG_PRIORITY flag

The last user of this flag was removed in commit b77b4a4815a9 ("gfs2:
Rework freeze / thaw logic").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# de3e7f97 08-Aug-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: do_promote cleanup

Change function do_promote to return true on success, and false
otherwise.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# af1abe11 16-Nov-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename remaining "transaction" glock references

The transaction glock was repurposed to serve as the new freeze glock
years ago. Don't refer to it as the transaction glock anymore.

Also, to be more precise, call it the "freeze glock" instead of the
"freeze lock". Ditto for the journal glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6c0246a9 05-Dec-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Cease delete work during unmount

Add a check to delete_work_func() so that it quits when it finds that
the filesystem is deactivating. This speeds up the delete workqueue
draining in gfs2_kill_sb().

In addition, make sure that iopen_go_callback() won't queue any new
delete work while the filesystem is deactivating.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f0e56edc 20-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Split the two kinds of glock "delete" work

Function delete_work_func() is used for two purposes:

* to immediately try to evict the glock's inode, and

* to verify after a little while that the inode has been deleted as
expected, and didn't just get skipped.

These two operations are not separated very well, so introduce two new
glock flags to improved that. Split gfs2_queue_delete_work() into
gfs2_queue_try_to_evict and gfs2_queue_verify_evict().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0247f4e9 06-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Move delete workqueue into super block

Move the global delete workqueue into struct gfs2_sbd so that we can
flush / drain it without interfering with other filesystems.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3056dc46 09-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Get rid of GLF_PENDING_DELETE flag

Get rid of the GLF_PENDING_DELETE glock flag introduced by commit
a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"). The only use
of that flag is to prevent the iopen glock from being demoted (i.e.,
unlocked) while delete work is pending. It turns out that demoting the
iopen glock while delete work is pending is perfectly fine; we only need
to make sure that the glock isn't being freed while still in use. This
is ensured by the previous patch because delete_work_func() owns a
reference while the work is queued or running.

With these changes, gfs2_queue_delete_work() no longer takes the glock
spin lock, so we can use it in iopen_go_callback() instead of
open-coding it there.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 228804a3 09-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make glock lru list scanning safer

In __gfs2_glock_put(), remove the glock from the lru list *after*
dropping the glock lock. This prevents deadlocks against
gfs2_scan_glock_lru().

In gfs2_scan_glock_lru(), make sure that the glock's reference count is
zero before moving the glock to the dispose list. This skips glocks
that are marked dead as well as glocks that are still in use.
Additionally, switch to spin_trylock() as we already do in
gfs2_dispose_glock_lru(); this alone would also be enough to prevent
deadlocks against __gfs2_glock_put().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 8fb8f70e 09-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up gfs2_scan_glock_lru

Switch to list_for_each_entry_safe() and eliminate the "skipped" list in
gfs2_scan_glock_lru().

At the same time, scan the requested number of items to scan, not one
more than that number.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 9ffa1888 23-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gl_object races fix

Function glock_clear_object() checks if the specified glock is still
pointing at the right object and clears the gl_object pointer. To
handle the case of incompletely constructed inodes, glock_clear_object()
also allows gl_object to be NULL.

However, in the teardown case, when iget_failed() is called and the
inode is removed from the inode hash, by the time we get to the
glock_clear_object() calls in gfs2_put_super() and its helpers, we don't
have exclusion against concurrent gfs2_inode_lookup() and
gfs2_create_inode() calls, and the inode and iopen glocks may already be
pointing at another inode, so the checks in glock_clear_object() are
incorrect.

To better handle this case, always completely disassociate an inode from
its glocks before tearing it down. In addition, get rid of a duplicate
glock_clear_object() call in gfs2_evict_inode(). That way,
glock_clear_object() will only ever be called when the glock points at
the current inode, and the NULL check in glock_clear_object() can be
removed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6b46a061 09-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove support for glock holder auto-demotion (2)

As a follow-up to the previous commit, move the recovery related code in
__gfs2_glock_dq() to gfs2_glock_dq() where it better fits. No
functional change.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# ba3e77a4 09-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove support for glock holder auto-demotion

Remove the support for glock holder auto-demotion (commit dc732906c245
and folow-ups) as we are not planning to use this feature, and the
additional code therefore only adds unnecessary complexity.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f0c0ade8 05-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor gfs2_try_evict cleanup

In gfs2_try_evict(), when an inode can't be evicted, we are grabbing a
temporary reference on the inode glock to poke that glock. That should
be safe, but it's easier to just grab an inode reference as we already
do earlier in this function.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 88f4a9f8 05-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Partially revert gfs2_inode_lookup change

Commit c412a97cf6c5 changed delete_work_func() to always perform an
inode lookup when gfs2_try_evict() fails. This doesn't make sense as a
gfs2_try_evict() failure indicates that the inode is likely still in
use. Revert that change.

Fixes: c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3781ec9e 05-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Uninline and improve glock_{set,clear}_object

Those functions have reached a size at which having them inline isn't
useful anymore, so uninline them. In addition, report the glock name on
assertion failures.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 97236ad5 04-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Avoid dequeuing GL_ASYNC glock holders twice

When a locking request fails, the associated glock holder is
automatically dequeued from the list of active and waiting holders. For
GL_ASYNC locking requests, this will obviously happen asynchronously
and it can race with attempts to cancel that locking request via
gfs2_glock_dq(). Therefore, don't forget to check if a locking request
has already been dequeued in gfs2_glock_dq().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4ad02083 02-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make gfs2_glock_hold return its glock argument

This allows code like 'gl = gfs2_glock_hold(...)'.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 86934198 18-Aug-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Clear flags when withdraw prevents xmote

There are a couple places in function do_xmote where normal processing
is circumvented due to withdraws in progress. However, since we bypass
most of do_xmote() we bypass telling dlm to lock the dlm lock, which
means dlm will never respond with a completion callback. Since the
completion callback ordinarily clears GLF_LOCK, this patch changes
function do_xmote to handle those situations more gracefully so the
file system may be unmounted after withdraw.

A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
which is cleared by function finish_xmote(). Since the withdraw causes
us to skip the majority of do_xmote, it therefore also skips the call
to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
manually.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 053640a7 18-Aug-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Dequeue waiters when withdrawn

When a withdraw occurs, ordinary (not system) glocks may not be granted
anymore. Later, when the file system is unmounted, gfs2_gl_hash_clear()
tries to clear out all the glocks, but these un-grantable pending
waiters prevent some glocks from being freed. So the unmount hangs, at
least for its ten-minute timeout period.

This patch takes measures to remove any pending waiters from
the glocks that will never be granted. This allows the unmount to
proceed in a reasonable period of time.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c412a97c 22-Aug-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes

Before this patch, delete_work_func() would check for the GLF_DEMOTE
flag on the iopen glock and if set, it would perform special processing.
However, there was a race whereby the GLF_DEMOTE flag could be set by
another process after the check. Then when it called
gfs2_lookup_by_inum() which calls gfs2_inode_lookup(), it tried to lock
the iopen glock in SH mode, but the GLF_DEMOTE flag prevented the
request from being granted. But the iopen glock could never be demoted
because that happens when the inode is evicted, and the evict was never
completed because of the failed lookup.

To fix that, change function gfs2_inode_lookup() so that when
GFS2_BLKST_UNLINKED inodes are searched, it uses the LM_FLAG_TRY flag
for the iopen glock. If the locking request fails, fail
gfs2_inode_lookup() with -EAGAIN so that delete_work_func() can retry
the operation later.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# cbe6d257 05-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add GL_NOPID flag for process-independent glock holders

Add a GL_NOPID flag to indicate that once a glock holder has been acquired, it
won't be associated with the current process anymore. This is useful for iopen
and flock glocks which are associated with open files, as well as journal glock
holders and similar which are associated with the filesystem.

Once GL_NOPID is used for all applicable glocks (see the next patches),
processes will no longer be falsely reported as holding glocks which they are
not actually holding in the glocks dump file. Unlike before, when a process is
reported as having "(ended)", this will indicate an actual bug.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 56535dc6 23-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add flocks to glockfd debugfs file

Include flock glocks in the "glockfd" debugfs file. Those are similar to the
iopen glocks; while an open file is holding an flock, it is holding the file's
flock glock.

We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the
"glockfd" file would block on flock operations. Instead, use the file->f_lock
spin lock to protect the f_fl_gh.gh_gl glock pointer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4480c27c 08-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add glockfd debugfs file

When a process has a gfs2 file open, the file is keeping a reference on the
underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in
shared mode. In other words, the process depends on the iopen glock of each
open gfs2 file. Expose those dependencies in a new "glockfd" debugfs file.

The new debugfs file contains one line for each gfs2 file descriptor,
specifying the tgid, file descriptor number, and glock name, e.g.,

1601 6 5/816d

This list is compiled by iterating all tasks on the system using find_ge_pid(),
and all file descriptors of each task using task_lookup_next_fd_rcu(). To make
that work from gfs2, export those two functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6feaec81 10-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: List traversal in do_promote is safe

In do_promote(), we're never removing the current entry from the list
and so the list traversal is actually safe. Switch back to
list_for_each_entry().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0befb851 10-Jun-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: do_promote glock holder stealing fix

In do_promote(), when the glock had no strong holders, we were
accidentally calling demote_incompat_holders() with new_gh == NULL, so
no weak holders were considered incompatible. Instead, the new holder
should have been passed in.

For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent
may_grant() from complaining. This means that the new holder will now
be recognized as a current holder, so skip over it explicitly in
demote_incompat_holders() to prevent it from being dequeued.

To further clarify things, we can now rename new_gh to current_gh in
demote_incompat_holders(); after all, the HIF_HOLDER flag is already set,
which means the new holder is already a current holder.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 8f0028fc 10-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Use better variable name

In do_promote() and add_to_queue(), use current_gh as the variable name
for the first strong holder we could find: this matches the variable
name is may_grant(), and more clearly indicates that we're interested in
one (any) of the current strong holders.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5f38a4d3 09-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make go_instantiate take a glock

Make go_instantiate take a glock instead of a glock holder as its argument:
this handler is supposed to instantiate the object associated with the glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 86c30a01 10-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add new go_held glock operation

Right now, inode_go_instantiate() contains functionality that relates to
how a glock is held rather than the glock itself, like waiting for
pending direct I/O to complete and completing interrupted truncates.
This code is meant to be run each time a holder is acquired, but
go_instantiate is actually only called once, when the glock is
instantiated.

To fix that, introduce a new go_held glock operation that is called each
time a glock holder is acquired. Move the holder specific code in
inode_go_instantiate() over to inode_go_held().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# de3f906f 02-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Revert 'Fix "truncate in progress" hang'

Now that interrupted truncates are completed in the context of the
process taking the glock, there is no need for the glock state engine to
delegate that task to gfs2_quotad or for quotad to perform those
truncates anymore. Get rid of the obsolete associated infrastructure.

Reverts commit 813e0c46c9e2 ("GFS2: Fix "truncate in progress" hang").

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 53d69132 10-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Instantiate glocks ouside of glock state engine

Instantiate glocks outside of the glock state engine: there is no real
reason for instantiating them inside the glock state engine; it only
complicates the code.

Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait()
using the new gfs2_glock_holder_ready() helper. On top of that, the only
other place that acquires a glock without using gfs2_glock_wait() or
gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call
gfs2_glock_holder_ready() there as well.

If a dinode has a pending truncate, the glock-specific instantiate function
for inodes wakes up the truncate function in the quota daemon. Waiting for
the completion of the truncate was previously done by the glock state
engine, but we now need to wait in inode_go_instantiate().

This also means that gfs2_instantiate() will now no longer return any
"special" error codes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# bdff777c 09-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix up gfs2_glock_async_wait

Since commit 1fc05c8d8426 ("gfs2: cancel timed-out glock requests"), a
pending locking request can be canceled by calling gfs2_glock_dq() on
the pending holder. In gfs2_glock_async_wait(), when we time out, use
that to cancel the remaining locking requests and dequeue the locking
requests already granted. That's simpler as well as more efficient than
waiting for all locking requests to eventually be granted and dequeuing
them then.

In addition, gfs2_glock_async_wait() promises that by the time the
function completes, all glocks are either granted or dequeued, but the
implementation doesn't keep that promise if individual locking requests
fail. Fix that as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 44dab005 14-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor gfs2_glock_nq_m cleanup

Add state and flags arguments to gfs2_rlist_alloc() to make it somewhat more
obvious which state and flags an rlist uses. With that, stop knocking off
flags in gfs2_glock_nq_m() and its nq_m_sync() helper that are never set in the
first place.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 565f82b5 26-May-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Rewrap overlong comment in do_promote

Rewrap the comment to keep the line length below 80 characters.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e33c267a 31-May-2022 Roman Gushchin <roman.gushchin@linux.dev>

mm: shrinkers: provide shrinkers with names

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 11d8b79e 08-May-2022 Kees Cook <keescook@chromium.org>

gfs2: Use container_of() for gfs2_glock(aspace)

Clang's structure layout randomization feature gets upset when it sees
struct address_space (which is randomized) cast to struct gfs2_glock.
This is due to seeing the mapping pointer as being treated as an array
of gfs2_glock, rather than "something else, before struct address_space":

In file included from fs/gfs2/acl.c:23:
fs/gfs2/meta_io.h:44:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct gfs2_glock *'
return (((struct gfs2_glock *)mapping) - 1)->gl_name.ln_sbd;
^

Replace the instances of open-coded pointer math with container_of()
usage, and update the allocator to match.

Some cleanups and conversion of gfs2_glock_get() and
gfs2_glock_dealloc() by Andreas.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205041550.naKxwCBj-lkp@intel.com
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bill Wendling <morbo@google.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a4e8145e 14-Jan-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Initialize gh_error in gfs2_glock_nq

The gh_error field if a glock holder is initialized to zero in
gfs2_holder_init(). When a locking operation fails, gh_error is set to
an error code; when it succeeds, the gh_error value is left unchanged.
The field isn't initialized in gfs2_holder_reinit(), which is a problem.
Instead of fixing that directly, initialize gh_error in gfs2_glock_nq().
That also obsoletes the assignment in do_flock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5a27a43e 12-Jan-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make use of list_is_first

Use list_is_first() instead of open-coding it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1fc05c8d 23-Jan-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: cancel timed-out glock requests

The gfs2 evict code tries to upgrade the iopen glock from SH to EX. If
the attempt to upgrade times out, gfs2 needs to tell dlm to cancel the
lock request or it can deadlock. We also need to wake up the process
waiting for the lock when dlm sends its AST back to gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 356b8103 03-Feb-2022 Andreas Gruenbacher <agruenba@redhat.com>

Revert "gfs2: check context in gfs2_glock_put"

It turns out that the might_sleep() call that commit 660a6126f8c3 adds
is triggering occasional data corruption in testing. We're not sure
about the root cause yet, but since this commit was added as a debugging
aid only, revert it for now.

This reverts commit 660a6126f8c3208f6df8d552039cda078a8426d1.

Fixes: 660a6126f8c3 ("gfs2: check context in gfs2_glock_put")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3c5c67ec 29-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix gfs2_instantiate description

The description of gfs2_instantiate accidentally lists a glock argument,
but the function takes a glock holder.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# ffd0cd3c 18-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix __gfs2_holder_init function name in kernel-doc comment

The function name in the kernel-doc comment wasn't updated when the
function was renamed.

Fixes: b016d9a84abd ("gfs2: Save ip from gfs2_glock_nq_init")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e11b02df 22-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix remote demote of weak glock holders

When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in
response to a remote locking conflict, we don't set the HIF_HOLDER flag. This
causes function may_grant to BUG. Fix by setting the missing HIF_HOLDER flag
in the mock glock holder.

In addition, define the mock glock holder where it is used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a7ac203d 08-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix "Introduce flag for glock holder auto-demotion"

Function demote_incompat_holders iterates over the list of glock holders
with list_for_each_entry, and it then sometimes removes the current
holder from the list. This will get the loop stuck; we must use
list_for_each_entry_safe instead.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7a92deaa 03-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix atomic bug in gfs2_instantiate

Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic
operation. Use clear_and_wake_up_bit() instead of open coding it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 660a6126 13-Oct-2021 Alexander Aring <aahringo@redhat.com>

gfs2: check context in gfs2_glock_put

Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
the last reference is released. This will show problems earlier, and
not only when the last reference is put.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7427f3bb 07-Oct-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix glock_hash_walk bugs

So far, glock_hash_walk took a reference on each glock it iterated over, and it
was the examiner's responsibility to drop those references. Dropping the final
reference to a glock can sleep and the examiners are called in a RCU critical
section with spin locks held, so examiners that didn't need the extra reference
had to drop it asynchronously via gfs2_glock_queue_put or similar. This wasn't
done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
dump_glock_func.

Change glock_hash_walk to not take glock references at all. That way, the
examiners that don't need them won't have to bother with slow asynchronous
puts, and the examiners that do need references can take them themselves.

Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 486408d6 11-Oct-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Cancel remote delete work asynchronously

In gfs2_inode_lookup and gfs2_create_inode, we're calling
gfs2_cancel_delete_work which currently cancels any remote delete work
(delete_work_func) synchronously. This means that if the work is
currently running, it will wait for it to finish. We're doing this to
pevent a previous instance of an inode from having any influence on the
next instance.

However, delete_work_func uses gfs2_inode_lookup internally, and we can
end up in a deadlock when delete_work_func gets interrupted at the wrong
time. For example,

(1) An inode's iopen glock has delete work queued, but the inode
itself has been evicted from the inode cache.

(2) The delete work is preempted before reaching gfs2_inode_lookup.

(3) Another process recreates the inode (gfs2_create_inode). It tries
to cancel any outstanding delete work, which blocks waiting for
the ongoing delete work to finish.

(4) The delete work calls gfs2_inode_lookup, which blocks waiting for
gfs2_create_inode to instantiate and unlock the new inode =>
deadlock.

It turns out that when the delete work notices that its inode has been
re-instantiated, it will do nothing. This means that it's safe to
cancel the delete work asynchronously. This prevents the kind of
deadlock described above.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# f2e70d8f 06-Oct-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: fix GL_SKIP node_scope problems

Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_instantiate glops function
(if one existed), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock.

But node-scope introduced a new problem: if the first holder has GL_SKIP
and the next one does NOT, since it is not the first holder on the queue,
the go_instantiate op was not called. Eventually the GL_SKIP holder may
call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
still a window of time in which another non-GL_SKIP holder assumes the
instantiate function had been called by the first holder. In the case of
rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing two new glock flags:

GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
needs to be called to "fill in" or "read in" the object before it is
referenced.

GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
in the process of reading in the object. Whenever a function needs to
reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
function.

As before, the gl_lockref spin_lock is unlocked during the IO operation,
which may take a relatively long amount of time to complete. While
unlocked, if another process determines go_instantiate is still needed,
it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
go_instantiate operation may not have been successful.

Functions that previously called the instantiate sub-functions now call
directly into gfs2_instantiate so the new bits are managed properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e6f85600 06-Oct-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: split glock instantiation off from do_promote

Before this patch, function do_promote had a section of code that did
the actual instantiation. This patch splits that off into its own
function, gfs2_instantiate, which prepares us for the next patch that
will use that function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 60d8bae9 06-Oct-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: further simplify do_promote

This patch further simplifies function do_promote by eliminating some
redundant code in favor of using a lock_released flag. This is just
prep work for a future patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 17a6ecee 06-Oct-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: re-factor function do_promote

This patch simply re-factors function do_promote to reduce the indents.
The logic should be unchanged. This makes future patches more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d74d0ce5 05-Oct-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove 'first' trace_gfs2_promote argument

Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the
'first' holder isn't the one that instantiates the glock
(gl_instantiate), which is what the 'first' flag was apparently supposed
to indicate.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3278b977 29-Sep-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: change go_lock to go_instantiate

Before this patch, the go_lock glock operations (glops) did not do
any actual locking. They were used to instantiate objects, like reading
in dinodes and rgrps from the media.

This patch renames the functions to go_instantiate for clarity.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b016d9a8 30-Sep-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Save ip from gfs2_glock_nq_init

Before this patch, when a glock was locked by function gfs2_glock_nq_init,
it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
That made it extremely difficult to track down problems because many
functions call gfs2_glock_nq_init. This patch changes the function so
that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
it easy to backtrack which holder took the lock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# c1442f6b 12-Sep-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: move GL_SKIP check from glops to do_promote

Before this patch, each individual "go_lock" glock operation (glop)
checked the GL_SKIP flag, and if set, would skip further processing.

This patch changes the logic so the go_lock caller, function go_promote,
checks the GL_SKIP flag before calling the go_lock op in the first place.
This avoids having to unnecessarily unlock gl_lockref.lock only to
re-lock it again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4c69038d 12-Sep-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Add GL_SKIP holder flag to dump_holder

Somehow, the GL_SKIP flag was missed when dumping glock holders.
This patch adds it to function hflags2str. I added it at the end because
I wanted Holder and Skip flags together to read "Hs" rather than "sH"
to avoid confusion with "Shared" ("SH") holder state.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# dc732906 19-Aug-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Introduce flag for glock holder auto-demotion

This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
will allow glocks to be demoted automatically on locking conflicts.
When a locking request comes in that isn't compatible with the locking
state of an active holder and that holder has the HIF_MAY_DEMOTE flag
set, the holder will be demoted before the incoming locking request is
granted.

Note that this mechanism demotes active holders (with the HIF_HOLDER
flag set), while before we were only demoting glocks without any active
holders. This allows processes to keep hold of locks that may form a
cyclic locking dependency; the core glock logic will then break those
dependencies in case a conflicting locking request occurs. We'll use
this to avoid giving up the inode glock proactively before faulting in
pages.

Processes that allow a glock holder to be taken away indicate this by
calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
Later, they call gfs2_holder_disallow_demote() to clear the flag again,
and then they check if their holder is still queued: if it is, they are
still holding the glock; if it isn't, they can re-acquire the glock (or
abort).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 61444649 09-Aug-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up function may_grant

Pass the first current glock holder into function may_grant and
deobfuscate the logic there.

While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make
that build cleanly, de-constify the may_grant arguments.

We're now using function find_first_holder in do_promote, so move the
function's definition above do_promote.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 08d73666 02-Aug-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Remove redundant check from gfs2_glock_dq

In function gfs2_glock_dq, it checks to see if this is the fast path.
Before this patch, it checked both "find_first_holder(gl) == NULL" and
list_empty(&gl->gl_holders), which is redundant. If gl_holders is empty
then find_first_holder must return NULL. This patch removes the
redundancy.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# a8f1d32d 27-Jul-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Eliminate vestigial HIF_FIRST

Holder flag HIF_FIRST is no longer used or needed, so remove it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 38a618db 07-Jun-2021 Baokun Li <libaokun1@huawei.com>

gfs2: Use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1ab19c5d 18-May-2021 Hillf Danton <hdanton@sina.com>

gfs2: Fix use-after-free in gfs2_glock_shrink_scan

The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to
remove the glock from the lru list in __gfs2_glock_put().

On the shrink scan path, the same flag is cleared under lru_lock but because
of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the
put side can be made without deleting the glock from the lru list.

Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to
ensure correct behavior on both sides - clear GLF_LRU after list_del under
lru_lock.

Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 865cc3e9 18-May-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: fix a deadlock on withdraw-during-mount

Before this patch, gfs2 would deadlock because of the following
sequence during mount:

mount
gfs2_fill_super
gfs2_make_fs_rw <--- Detects IO error with glock
kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish

logd
Detects IO error and the need to withdraw
calls gfs2_withdraw
gfs2_make_fs_ro
kthread_stop(sdp->sd_quotad_process);
<--- Blocked waiting for quotad to finish

gfs2_quotad
gfs2_statfs_sync
gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted

glock_work_func
do_xmote <---Detects IO error, can't release glock: blocked on withdraw
glops->go_inval
glock_blocked_by_withdraw
requeue glock work & exit <--- work requeued, blocked by withdraw

This patch makes a special exception for the statfs system inode glock,
which allows the statfs glock UNLOCK to proceed normally. That allows the
quotad daemon to exit during the withdraw, which allows the logd daemon
to exit during the withdraw, which allows the mount to exit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 20265d9a 18-May-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: fix scheduling while atomic bug in glocks

Before this patch, in the unlikely event that gfs2_glock_dq encountered
a withdraw, it would do a wait_on_bit to wait for its journal to be
recovered, but it never released the glock's spin_lock, which caused a
scheduling-while-atomic error.

This patch unlocks the lockref spin_lock before waiting for recovery.

Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7716506a 04-May-2021 Matthew Wilcox (Oracle) <willy@infradead.org>

mm: introduce and use mapping_empty()

Patch series "Remove nrexceptional tracking", v2.

We actually use nrexceptional for very little these days. It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger with
the THP patches because we don't know how many indices a shadow entry
occupies. It's easier to just remove it than keep it accurate.

Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects. Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.

This patch (of 4):

Instead of checking the two counters (nrpages and nrexceptional), we can
just check whether i_pages is empty.

Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c551f66c 30-Mar-2021 Lee Jones <lee.jones@linaro.org>

gfs2: Fix a number of kernel-doc warnings

Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions. Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f68effb3 19-Mar-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Eliminate gh parameter from go_xmote_bh func

The only glock that uses go_xmote_bh glops function is the freeze glock
which uses freeze_go_xmote_bh. It does not use its gh parameter, so
this patch eliminates the unneeded parameter.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4f0f586b 08-Apr-2021 Sami Tolvanen <samitolvanen@google.com>

treewide: Change list_sort to use const pointers

list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com


# 06e908cd 18-Apr-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Allow node-wide exclusive glock sharing

Introduce a new LM_FLAG_NODE_SCOPE glock holder flag: when taking a
glock in LM_ST_EXCLUSIVE (EX) mode and with the LM_FLAG_NODE_SCOPE flag
set, the exclusive lock is shared among all local processes who are
holding the glock in EX mode and have the LM_FLAG_NODE_SCOPE flag set.
From the point of view of other nodes, the lock is still held
exclusively.

A future patch will start using this flag to improve performance with
rgrp sharing.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a55a47a3 27-Nov-2020 Andreas Gruenbacher <agruenba@redhat.com>

Revert "GFS2: Prevent delete work from occurring on glocks used for create"

Since commit a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"), we're
cancelling any pending delete work of an iopen glock before attaching a new
inode to that glock in gfs2_create_inode. This means that delete_work_func can
no longer be queued or running when attaching the iopen glock to the new inode,
and we can revert commit a4923865ea07 ("GFS2: Prevent delete work from
occurring on glocks used for create"), which tried to achieve the same but in a
racy way.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 515b269d 23-Nov-2020 Alexander Aring <aahringo@redhat.com>

gfs2: set lockdep subclass for iopen glocks

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&gl->gl_lockref.lock);
lock(&gl->gl_lockref.lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
#0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
#2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
dump_stack+0x8b/0xb0
__lock_acquire.cold+0x19e/0x2e3
lock_acquire+0x150/0x410
? lockref_get+0x9/0x20
_raw_spin_lock+0x27/0x40
? lockref_get+0x9/0x20
lockref_get+0x9/0x20
delete_work_func+0x188/0x260
process_one_work+0x237/0x540
worker_thread+0x4d/0x3b0
? process_one_work+0x540/0x540
kthread+0x127/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30

Suggested-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# da7d554f 26-Oct-2020 Alexander Aring <aahringo@redhat.com>

gfs2: Wake up when sd_glock_disposal becomes zero

Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.

Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race")
Cc: stable@vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2ffed529 15-Oct-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Only access gl_delete for iopen glocks

Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.

Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# dbffb29d 12-Oct-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Fix comments to glock_hash_walk

The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e2c6c8a7 12-Oct-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)

Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.

The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.

This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.

We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.

This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# ee1e2c77 16-Sep-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: call truncate_inode_pages_final for address space glocks

Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e8a8023e 15-Sep-2020 Liu Shixin <liushixin2@huawei.com>

gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c07bfb4d 27-Jul-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix refcount leak in gfs2_glock_poke

In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local
glock holder. Without that, we're leaking a glock and a pid reference.

Fixes: 9e8990dea926 ("gfs2: Smarter iopen glock waiting")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5deaf1f6 09-Jun-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Add some flags missing from glock output

Before this patch, three flags were not represented in the glock output.
This patch adds them in:

c - GLF_INODE_CREATING
P - GLF_PENDING_DELETE
x - GLF_FREEING (both f and F are already used)

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 34244d71 10-Jun-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't sleep during glock hash walk

In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 9e8990de 17-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Smarter iopen glock waiting

When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock. The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 35b6f8fb 17-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Wake up when setting GLF_DEMOTE

Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE
flag.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b0dcffd8 15-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Check inode generation number in delete_work_func

In delete_work_func, if the iopen glock still has an inode attached,
limit the inode lookup to that specific generation number: in the likely
case that the inode was deleted on the node on which the inode's link
count dropped to zero, we can skip verifying the on-disk block type and
reading in the inode. The same applies if another node that had the
inode open managed to delete the inode before us.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6bdcadea 14-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor gfs2_lookup_by_inum cleanup

Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode
generation number will qualify: a valid inode never has a zero no_formal_ino.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 8c7b9262 13-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Give up the iopen glock on contention

When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode. In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a0e3cc65 16-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Turn gl_delete into a delayed work

This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f286d627 13-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Keep track of deleted inode generations in LVBs

When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB). When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes. This avoids taking the resource
group glock in order to verify the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 15f2547b 14-Jan-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Allow ASPACE glocks to also have an lvb

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# ea4e61c7 23-May-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: introduce new gfs2_glock_assert_withdraw

Before this patch, asserts based on glocks did not print the glock with
the error. This patch introduces a new macro, gfs2_glock_assert_withdraw
which first prints the glock, then takes the assert.

This also changes a few glock asserts to the new macro.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7e901d6e 22-May-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: print mapping->nrpages in glock dump for address space glocks

This patch makes the glock dumps in debugfs print the number of pages
(nrpages) for address space glocks. This will aid in debugging.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b14c9490 08-May-2020 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: Don't demote a glock until its revokes are written"

This reverts commit df5db5f9ee112e76b5202fbc331f990a0fc316d6.

This patch fixes a regression: patch df5db5f9ee112 allowed function
run_queue() to bypass its call to do_xmote() if revokes were queued for
the glock. That's wrong because its call to do_xmote() is what is
responsible for calling the go_sync() glops functions to sync both
the ail list and any revokes queued for it. By bypassing the call,
gfs2 could get into a stand-off where the glock could not be demoted
until its revokes are written back, but the revokes would not be
written back because do_xmote() was never called.

It "sort of" works, however, because there are other mechanisms like
the log flush daemon (logd) that can sync the ail items and revokes,
if it deems it necessary. The problem is: without file system pressure,
it might never deem it necessary.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b11e1a84 06-May-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: If go_sync returns error, withdraw but skip invalidate

Before this patch, if the go_sync operation returned an error during
the do_xmote process (such as unable to sync metadata to the journal)
the code did goto out. That kept the glock locked, so it could not be
given away, which correctly avoids file system corruption. However,
it never set the withdraw bit or requeueing the glock work. So it would
hang forever, unable to ever demote the glock.

This patch changes to goto to a new label, skip_inval, so that errors
from go_sync are treated the same way as errors from go_inval:
The delayed withdraw bit is set and the work is requeued. That way,
the logd should eventually figure out there's a problem and withdraw
properly there.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a8b7528b 23-Apr-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Fix error exit in do_xmote

Before this patch, if an error was detected from glock function go_sync
by function do_xmote, it would return. But the function had temporarily
unlocked the gl_lockref spin_lock, and it never re-locked it. When the
caller of do_xmote tried to unlock it again, it was already unlocked,
which resulted in a corrupted spin_lock value.

This patch makes sure the gl_lockref spin_lock is re-locked after it is
unlocked.

Thanks to Wu Bo <wubo40@huawei.com> for reporting this problem.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 969183bc 03-Feb-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Switch to list_{first,last}_entry

Replace open-coded versions of list_first_entry and list_last_entry with those
functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1c634f94 13-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Do proper error checking for go_sync family of glops functions

Before this patch, function do_xmote would try to sync out the glock
dirty data by calling the appropriate glops function XXX_go_sync()
but it did not check for a good return code. If the sync was not
possible due to an io error or whatever, do_xmote would continue on
and call go_inval and release the glock to other cluster nodes.
When those nodes go to replay the journal, they may already be holding
glocks for the journal records that should have been synced, but were
not due to the ignored error.

This patch introduces proper error code checking to the go_sync
family of glops functions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# df5db5f9 13-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Don't demote a glock until its revokes are written

Before this patch, run_queue would demote glocks based on whether
there are any more holders. But if the glock has pending revokes that
haven't been written to the media, giving up the glock might end in
file system corruption if the revokes never get written due to
io errors, node crashes and fences, etc. In that case, another node
will replay the metadata blocks associated with the glock, but
because the revoke was never written, it could replay that block
even though the glock had since been granted to another node who
might have made changes.

This patch changes the logic in run_queue so that it never demotes
a glock until its count of pending revokes reaches zero.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# d93ae386 29-Apr-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Check for log write errors before telling dlm to unlock

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# 33dbd1e4 22-May-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: fix infinite loop when checking ail item count before go_inval

Before this patch, the rgrp_go_inval and inode_go_inval functions each
checked if there were any items left on the ail count (by way of a
count), and if so, did a withdraw. But the withdraw code now uses
glocks when changing the file system to read-only status. So we can
not have glock functions withdrawing or a hang will likely result:
The glocks can't be serviced by the work_func if the work_func is
busy doing its own withdraw.

This patch removes the checks from the go_inval functions and adds
a centralized check in do_xmote to warn about the problem and not
withdraw, but flag the error so it's eventually caught when the logd
daemon eventually runs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# 601ef0d5 28-Jan-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Force withdraw to replay journals and wait for it to finish

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# a72d2401 13-Jun-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Allow some glocks to be used during withdraw

We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
when we're withdrawn. For example, to maintain metadata integrity, we should
disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
iopen or the transaction glocks may be safely used because none of their
metadata goes through the journal. So in general, we should disallow all
glocks with an address space, and allow all the others. One exception is:
we need to allow our active journal to be demoted so others may recover it.

Allowing glocks after withdraw gives us the ability to take appropriate
action (in a following patch) to have our journal properly replayed by
another node rather than just abandoning the current transactions and
pretending nothing bad happened, leaving the other nodes free to modify
the blocks we had in our journal, which may result in file system
corruption.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b3422cac 13-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Rework how rgrp buffer_heads are managed

Before this patch, the rgrp code had a serious problem related to
how it managed buffer_heads for resource groups. The problem caused
file system corruption, especially in cases of journal replay.

When an rgrp glock was demoted to transfer ownership to a
different cluster node, do_xmote() first calls rgrp_go_sync and then
rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
gfs2_rgrp_brelse() that dropped the buffer_head reference count.
In most cases, the reference count went to zero, which is right.
However, there were other places where the buffers are handled
differently.

After rgrp_go_sync, do_xmote called rgrp_go_inval which called
gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
truncate_inode_pages_range would get rid of the pages in memory,
but only if the reference count drops to 0.

Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
to the buffer_heads in cases where the reference count was still 1.
Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
it failed the check for "if (bi->bi_bh)" and thus failed to call
brelse a second time. Because of that, the reference count on those
buffers sometimes failed to drop from 1 to 0. And that caused
function truncate_inode_pages_range to keep the pages in page cache
rather than freeing them.

The next time the rgrp glock was acquired, the metadata read of
the rgrp buffers re-used the pages in memory, which were now
wrong because they were likely modified by the other node who
acquired the glock in EX (which is why we demoted the glock).
This re-use of the page cache caused corruption because changes
made by the other nodes were never seen, so the bitmaps were
inaccurate.

For some reason, the problem became most apparent when journal
replay forced the replay of rgrps in memory, which caused newer
rgrp data to be overwritten by the older in-core pages.

A big part of the problem was that the rgrp buffer were released
in multiple places: The go_unlock function would release them when
the glock was released rather than when the glock is demoted,
which is clearly wrong because our intent was to cache them until
the glock is demoted from SH or EX.

This patch attempts to clean up the mess and make one consistent
and centralized mechanism for managing the rgrp buffer_heads by
implementing several changes:

1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
We don't want to release the buffers or zero the pointers when
syncing for the reasons stated above. It only makes sense to
release them when the glock is actually invalidated (go_inval).
And when we do, then we set the bh pointers to NULL.
2. The go_unlock function (which was only used for rgrps) is
eliminated, as we've talked about doing many times before.
The go_unlock function was called too early in the glock dq
process, and should not happen until the glock is invalidated.
3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
That will now happen automatically when the rgrp glocks are
demoted, and shouldn't happen any sooner or later than that.
Instead, function gfs2_clear_rgrpd has been modified to demote
the rgrp glocks, and therefore, free those pages, before the
remaining glocks are culled by gfs2_gl_hash_clear. This
prevents the gl_object from hanging around when the glocks are
culled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# f7be987b 15-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove GFS2_MIN_LVB_SIZE define

The dlm lockspace is set up to have lock value blocks of GDLM_LVB_SIZE bytes,
and dlm is the only lock manager we support, so there is no point in claiming
that the lock value block could have any other size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d99724c3 15-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS

This patch closes a timing window in which two processes compete
and overlap in the execution of do_xmote for the same glock:

Process A Process B
------------------------------------ -----------------------------
1. Grabs gl_lockref and calls do_xmote
2. Grabs gl_lockref but is blocked
3. Sets GLF_INVALIDATE_IN_PROGRESS
4. Unlocks gl_lockref
5. Calls do_xmote
6. Call glops->go_sync
7. test_and_clear_bit GLF_DIRTY
8. Call gfs2_log_flush Call glops->go_sync
9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY
It's not dirty (step 7) returns
10. Tests GLF_INVALIDATE_IN_PROGRESS
11. Calls go_inval (rgrp_go_inval)
12. gfs2_rgrp_relse does brelse
13. truncate_inode_pages_range
14. Calls lm_lock UN

In step 14 we've just told dlm to give the glock to another node
when, in fact, process A has not finished the IO and synced all
buffer_heads to disk and make sure their revokes are done.

This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
to use test_and_set_bit, and if the bit is already set, process B just
ignores it and trusts that process A will do the do_xmote in the proper
order.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# eb43e660 14-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Introduce function gfs2_withdrawn

Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# ad26967b 29-Aug-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Use async glocks for rename

Because s_vfs_rename_mutex is not cluster-wide, multiple nodes can
reverse the roles of which directories are "old" and which are "new" for
the purposes of rename. This can cause deadlocks where two nodes end up
waiting for each other.

There can be several layers of directory dependencies across many nodes.

This patch fixes the problem by acquiring all gfs2_rename's inode glocks
asychronously and waiting for all glocks to be acquired. That way all
inodes are locked regardless of the order.

The timeout value for multiple asynchronous glocks is calculated to be
the total of the individual wait times for each glock times two.

Since gfs2_exchange is very similar to gfs2_rename, both functions are
patched in the same way.

A new async glock wait queue, sd_async_glock_wait, keeps a list of
waiters for these events. If gfs2's holder_wake function detects an
async holder, it wakes up any waiters for the event. The waiter only
tests whether any of its requests are still pending.

Since the glocks are sent to dlm asychronously, the wait function needs
to check to see which glocks, if any, were granted.

If a glock is granted by dlm (and therefore held), its minimum hold time
is checked and adjusted as necessary, as other glock grants do.

If the event times out, all glocks held thus far must be dequeued to
resolve any existing deadlocks. Then, if there are any outstanding
locking requests, we need to loop around and wait for dlm to respond to
those requests too. After we release all requests, we return -ESTALE to
the caller (vfs rename) which loops around and retries the request.

Node1 Node2
--------- ---------
1. Enqueue A Enqueue B
2. Enqueue B Enqueue A
3. A granted
6. B granted
7. Wait for B
8. Wait for A
9. A times out (since Node 1 holds A)
10. Dequeue B (since it was granted)
11. Wait for all requests from DLM
12. B Granted (since Node2 released it in step 10)
13. Rename
14. Dequeue A
15. DLM Grants A
16. Dequeue A (due to the timeout and since we
no longer have B held for our task).
17. Dequeue B
18. Return -ESTALE to vfs
19. VFS retries the operation, goto step 1.

This release-all-locks / acquire-all-locks may slow rename / exchange
down as both nodes struggle in the same way and do the same thing.
However, this will only happen when there is contention for the same
inodes, which ought to be rare.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 01123cf1 29-Aug-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: create function gfs2_glock_update_hold_time

This patch moves the code that updates glock minimum hold
time to a separate function. This will be called by a future
patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 98fb0574 13-Aug-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Fix possible fs name overflows

This patch fixes three places in which temporary character buffers
could overflow due to the addition of the file system id from patch
3792ce973f07. Thanks to Dan Carpenter for pointing it out.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3792ce97 09-May-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: dump fsid when dumping glock problems

Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 04aea0ca 07-May-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 15a798f7 05-Jun-2019 Kefeng Wang <wangkefeng.wang@huawei.com>

gfs2: Use IS_ERR_OR_NULL

Use IS_ERR_OR_NULL where appropriate.

(Several more places converted by Andreas.)

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 638803d4 06-Jun-2019 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: Replace gl_revokes with a GLF flag"

Commit 73118ca8baf7 introduced a glock reference counting bug in
gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336d0e6 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 73118ca8 04-Apr-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Replace gl_revokes with a GLF flag

The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes. However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero. Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 9287c645 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix occasional glock use-after-free

This patch has to do with the life cycle of glocks and buffers. When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list. Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke. At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node. This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list. The
evict decremented the glock reference count to zero, which allowed the
glock to be freed. After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 7881ef3f 27-Mar-2019 Ross Lagerwall <ross.lagerwall@citrix.com>

gfs2: Fix lru_count going negative

Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 605b0487 06-Mar-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix missed wakeups in find_insert_glock

Mark Syms has reported seeing tasks that are stuck waiting in
find_insert_glock. It turns out that struct lm_lockname contains four padding
bytes on 64-bit architectures that function glock_waitqueue doesn't skip when
hashing the glock name. As a result, we can end up waking up the wrong
waitqueue, and the waiting tasks may be stuck forever.

Fix that by using ht_parms.key_len instead of sizeof(struct lm_lockname) for
the key length.

Reported-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 2abbf9a4 22-Jan-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gfs: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

There is no need to save the dentries for the debugfs files, so drop
those variables to save a bit of space and make the code simpler.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 27a2660f 18-Apr-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Dump nrpages for inodes and their glocks

This patch is based on an idea from Steve Whitehouse. The idea is
to dump the number of pages for inodes in the glock dumps.
The additional locking required me to drop const from quite a few
places.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e54c78a2 03-Oct-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Use fs_* functions instead of pr_* function where we can

Before this patch, various errors and messages were reported using
the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does
not tell you which gfs2 mount had the problem, which is often vital
to debugging. This patch changes the calls from pr_* to fs_* in
most of the messages so that the file system id is printed along
with the message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 6da2ec56 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kmalloc() -> kmalloc_array()

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

kmalloc(a * b, gfp)

with:
kmalloc_array(a * b, gfp)

as well as handling cases of:

kmalloc(a * b * c, gfp)

with:

kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# 3fd5d3ad 27-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Stop using rhashtable_walk_peek

Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump. Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 7ac07fda 08-Jan-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Glock dump performance regression fix

Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open. This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.

In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.

Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 97a6ec4a 04-Dec-2017 Tom Herbert <tom@quantonium.net>

rhashtable: Change rhashtable_walk_start to return void

Most callers of rhashtable_walk_start don't care about a resize event
which is indicated by a return value of -EAGAIN. So calls to
rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
like this is common:

ret = rhashtable_walk_start(rhiter);
if (ret && ret != -EAGAIN)
goto out;

Since zero and -EAGAIN are the only possible return values from the
function this check is pointless. The condition never evaluates to true.

This patch changes rhashtable_walk_start to return void. This simplifies
code for the callers that ignore -EAGAIN. For the few cases where the
caller cares about the resize event, particularly where the table can be
walked in mulitple parts for netlink or seq file dump, the function
rhashtable_walk_start_check has been added that returns -EAGAIN on a
resize event.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 10201655 19-Sep-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix debugfs glocks dump

The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.3+


# d296b15e 30-Aug-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

gfs2: constify rhashtable_params

rhashtable_params are not supposed to change at runtime. All
Functions rhashtable_* working with const rhashtable_params
provided by <linux/rhashtable.h>. So mark the non-const structs
as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 27c3b415 18-Aug-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Fix up some sparse warnings

This patch cleans up various pieces of GFS2 to avoid sparse errors.
This doesn't fix them all, but it fixes several. The first error,
in function glock_hash_walk was a genuine bug where the rhashtable
could be started and not stopped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# a91323e2 04-Aug-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up waiting on glocks

The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in
commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are
better removed, resulting in cleaner code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 71c1b213 01-Aug-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_evict_inode: Put glocks asynchronously

gfs2_evict_inode is called to free inodes under memory pressure. The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode. However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have likely already
been destroyed.

With this change, inodes can end up being unlinked, remote-unlink can be
triggered, and then the inode can be reallocated before all
remote-unlink callbacks are processed. To detect that, revalidate the
link count in gfs2_evict_inode to make sure we're not deleting an
allocated, referenced inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 0515480a 01-Aug-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_glock_get: Wait on freeing glocks

Keep glocks in their hash table until they are freed instead of removing
them when their last reference is dropped. This allows to wait for any
previous instances of a glock to go away in gfs2_glock_get before
creating a new glocks.

Special thanks to Andy Price for finding and fixing a problem which also
required us to delete the rcu_read_unlock from the error case in function
gfs2_glock_get.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 645ebd49 26-Jul-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Don't waste time locking lru_lock for non-lru glocks

Before this patch, glock_dq would call gfs2_glock_remove_from_lru.
For glocks that are never put on the LRU, such as the transaction
glock, this just takes the spin_lock, determines there's nothing to
be done because the list is empty, then unlocks again. This was
causing unnecessary lock contention on the lru_lock spin_lock.
This patch adds a check for GLOF_LRU in the glops before taking
the spin_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 961ae1d8 07-Jul-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix glock rhashtable rcu bug

Before commit 88ffbf3e03 "GFS2: Use resizable hash table for glocks",
glocks were freed via call_rcu to allow reading the glock hashtable
locklessly using rcu. This was then changed to free glocks immediately,
which made reading the glock hashtable unsafe. Bring back the original
code for freeing glocks via call_rcu.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # 4.3+


# 6b0c7440 30-Jun-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up glock work enqueuing

This patch adds a standardized queueing mechanism for glock work
with spin_lock protection to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# ed17545d 05-May-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Allow glocks to be unlocked after withdraw

This bug fixes a regression introduced by patch 0d1c7ae9d8.

The intent of the patch was to stop promoting glocks after a
file system is withdrawn due to a variety of errors, because doing
so results in a BUG(). (You should be able to unmount after a
withdraw rather than having the kernel panic.)

Unfortunately, it also stopped demotions, so glocks could not be
unlocked after withdraw, which means the unmount would hang.

This patch allows function do_xmote to demote locks to an
unlocked state after a withdraw, but not promote them.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 0a52aba7 21-Feb-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Switch to rhashtable_lookup_get_insert_fast

Switch from rhashtable_lookup_insert_fast + rhashtable_lookup_fast to
rhashtable_lookup_get_insert_fast, which is cleaner and avoids an extra
rhashtable lookup.

At the same time, turn the retry loop in gfs2_glock_get into an infinite
loop. The lookup or insert will eventually succeed, usually very fast,
but there is no reason to give up trying at a fixed number of
iterations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 972b044e 16-Mar-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't pack struct lm_lockname

As per a suggestion by Linus, don't pack struct lm_lockname: we did that
because the struct is used as a rhashtable key, but packing tells the
compiler that the 64-bit fields in the struct may be unaligned, causing
it to generate worse code on some architectures. Instead, rearrange the
fields in the struct so that there is no padding between fields, and
exclude any tail padding from the hash key size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 92ecd73a 09-Mar-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Deduplicate gfs2_{glocks,glstats}_open

Both functions are identical except for the seq_operations used.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# cc37a627 09-Mar-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter

Function rhashtable_walk_init is deprecated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 0d1c7ae9 02-Mar-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Prevent BUG from occurring when normal Withdraws occur

When the GFS2 file system withdraws due to metadata corruption, it
often has outstanding transactions in the journal and delayed work
queued for its glocks. This patch adds some new checks for a
withdrawn file system before proceeding with operations that would
obviously cause a BUG() to be triggered. That allows GFS2 to be
safely unmounted rather than cause the system to go down.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# f38e5fb9 21-Feb-2017 Andrew Price <anprice@redhat.com>

gfs2: Add missing rcu locking for glock lookup

We must hold the rcu read lock across looking up glocks and trying to
bump their refcount to prevent the glocks from being freed in between.

Cc: <stable@vger.kernel.org> # 4.3+
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 98687f42 11-Feb-2017 Herbert Xu <herbert@gondor.apana.org.au>

gfs2: Use rhashtable walk interface in glock_hash_walk

The function glock_hash_walk walks the rhashtable by hand. This
is broken because if it catches the hash table in the middle of
a rehash, then it will miss entries.

This patch replaces the manual walk by using the rhashtable walk
interface.

Fixes: 88ffbf3e037e ("GFS2: Use resizable hash table for glocks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf3f14d6 15-Feb-2017 David S. Miller <davem@davemloft.net>

rhashtable: Revert nested table changes.

This reverts commits:

6a25478077d987edc5e2f880590a2bc5fcab4441
9dbbfb0ab6680c6a85609041011484e6658e7d3c
40137906c5f55c252194ef5834130383e639536f

It's too risky to put in this late in the release
cycle. We'll put these changes into the next merge
window instead.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a254780 11-Feb-2017 Herbert Xu <herbert@gondor.apana.org.au>

gfs2: Use rhashtable walk interface in glock_hash_walk

The function glock_hash_walk walks the rhashtable by hand. This
is broken because if it catches the hash table in the middle of
a rehash, then it will miss entries.

This patch replaces the manual walk by using the rhashtable walk
interface.

Fixes: 88ffbf3e037e ("GFS2: Use resizable hash table for glocks")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b0e1953 24-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

ktime: Cleanup ktime_set() usage

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 14d37564 14-Dec-2016 Dan Carpenter <dan.carpenter@oracle.com>

GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next

This patch fixes a place where function gfs2_glock_iter_next can
reference an invalid error pointer.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# e0d735c1 20-Sep-2016 Chao Yu <chao@kernel.org>

gfs2: fix to detect failure of register_shrinker

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after gfs2 module was been inited successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 47a9a527 01-Aug-2016 Fabian Frederick <fabf@skynet.be>

GFS2: use BIT() macro

Replace 1 << value shift by more explicit BIT() macro

Also fixes two bare unsigned definitions:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+ unsigned hsize = BIT(ip->i_depth);

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 6df9f9a2 17-Jun-2016 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Lock holder cleanup

Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag. Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# ec5ec66b 13-Jun-2016 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Get rid of gfs2_ilookup

Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
(and not for cached inodes anymore), there no longer is a need to
optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
and gfs2_ilookup can be removed.

In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
inconsistency goes away as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 3ce37b2c 13-Jun-2016 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix gfs2_lookup_by_inum lock inversion

The current gfs2_lookup_by_inum takes the glock of a presumed inode
identified by block number, verifies that the block is indeed an inode,
and then instantiates and reads the new inode via gfs2_inode_lookup.

However, instantiating a new inode may block on freeing a previous
instance of that inode (__wait_on_freeing_inode), and freeing an inode
requires to take the glock already held, leading to lock inversion and
deadlock.

Fix this by first instantiating the new inode, then verifying that the
block is an inode (if required), and then reading in the new inode, all
in gfs2_inode_lookup.

If the block we are looking for is not an inode, we discard the new
inode via iget_failed, which marks inodes as bad and unhashes them.
Other tasks waiting on that inode will get back a bad inode back from
ilookup or iget_locked; in that case, retry the lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# a527b38e 11-Apr-2016 Denys Vlasenko <dvlasenk@redhat.com>

GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes

This function compiles to 522 bytes of machine code.

Error paths are not very time critical.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 8f6fd83c 02-Mar-2016 Bob Copeland <me@bobcopeland.com>

rhashtable: accept GFP flags in rhashtable_walk_init

In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section. Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>


# 3e11e530 23-Mar-2016 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: ignore unlock failures after withdraw

After gfs2 has withdrawn the filesystem, it may still have many locks not
in the unlocked state. If it is using lock_dlm, it will failed trying
the unlocks since it has already unmounted the lock manager. Instead, it
should set the SDF_SKIP_DLM_UNLOCK flag on withdraw, to signal that
it can skip the lock_manager on unlocks, and failback to lock_nolock
style unlocking.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# ff34245d 27-Mar-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Don't filter out I_FREEING inodes anymore

This patch basically reverts a very old patch from 2008,
7a9f53b3c1875bef22ad4588e818bc046ef183da, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since function try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a4923865 07-Dec-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Prevent delete work from occurring on glocks used for create

This patch tries to prevent delete work (queued via iopen callback)
from executing if the glock is currently being used to create
a new inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 7508abc4 18-Dec-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Check if iopen is held when deleting inode

This patch fixes an error condition in which an inode is partially
created in gfs2_create_inode() but then some error is discovered,
which causes it to fail and call iput() before the iopen glock is
created or held. In that case, gfs2_delete_inode would try to
unlock an iopen glock that doesn't yet exist. Therefore, we test
its holder (which must exist) for the HIF_HOLDER bit before trying
to dq it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 2aba1b5b 19-May-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear

At some point in the past, we used to have a timeout when GFS2 was
unmounting, trying to clear out its glocks. If the timeout expires,
it would dump the remaining glocks to the kernel messages so that
developers can debug the problem. That timeout was eliminated,
probably by accident. This patch reintroduces it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b58bf407 24-Jul-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Reduce size of incore inode

This patch makes no functional changes. Its goal is to reduce the
size of the gfs2 inode in memory by rearranging structures and
changing the size of some variables within the structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 3dd1dd8c 12-Nov-2015 Andrew Price <anprice@redhat.com>

GFS2: Use rht_for_each_entry_rcu in glock_hash_walk

This lockdep splat was being triggered on umount:

[55715.973122] ===============================
[55715.980169] [ INFO: suspicious RCU usage. ]
[55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: G W
[55715.982353] -------------------------------
[55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage!

The code it refers to is the rht_for_each_entry_safe usage in
glock_hash_walk. The condition that triggers the warning is
lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
__rcu_dereference_protected macro.

The rhashtable buckets are not changed in glock_hash_walk so it's safe
to rely on the rcu protection. Replace the rht_for_each_entry_safe()
usage with rht_for_each_entry_rcu(), which doesn't care whether the
bucket lock is held if the rcu read lock is held.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# f3dd1649 29-Oct-2015 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove gl_spin define

Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a
gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in
gl_lockref). Remove that define to make the references to gl_lockref.lock more
obvious.

Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 8f7e0a80 27-Aug-2015 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: A minor "sbstats" cleanup

It seems cleaner to avoid the temporary value here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 4d207133 26-Aug-2015 Ben Hutchings <ben@decadent.org.uk>

gfs2: Make statistics unsigned, suitable for use with do_div()

None of these statistics can meaningfully be negative, and the
numerator for do_div() must have the type u64. The generic
implementation of do_div() used on some 32-bit architectures asserts
that, resulting in a compiler error in gfs2_rgrp_congested().

Fixes: 0166b197c2ed ("GFS2: Average in only non-zero round-trip times ...")

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Andreas Gruenbacher <agruenba@redhat.com>


# 88ffbf3e 16-Mar-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Use resizable hash table for glocks

This patch changes the glock hash table from a normal hash table to
a resizable hash table, which scales better. This also simplifies
a lot of code.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 15562c43 16-Mar-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Move glock superblock pointer to field gl_name

What uniquely identifies a glock in the glock hash table is not
gl_name, but gl_name and its superblock pointer. This patch makes
the gl_name field correspond to a unique glock identifier. That will
allow us to simplify hashing with a future patch, since the hash
algorithm can then take the gl_name and hash its components in one
operation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 81648d04 27-Aug-2015 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Simplify the seq file code for "sbstats"

Don't use struct gfs2_glock_iter as the helper data structure for iterating
through "sbstats"; we are not iterating through glocks here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# e7ccaf5f 12-Jun-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Don't add all glocks to the lru

The glocks used for resource groups often come and go hundreds of
thousands of times per second. Adding them to the lru list just
adds unnecessary contention for the lru_lock spin_lock, especially
considering we're almost certainly going to re-use the glock and
take it back off the lru microseconds later. We never want the
glock shrinker to cull them anyway. This patch adds a new bit in
the glops that determines which glock types get put onto the lru
list and which ones don't.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 7b4ddfa7 24-Mar-2015 Chengyu Song <csong84@gatech.edu>

gfs2: incorrect check for debugfs returns

debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs
is not configured, so the return value should be checked against ERROR_VALUE
as well, otherwise the later dereference of the dentry pointer would crash
the kernel.

Signed-off-by: Chengyu Song <csong84@gatech.edu>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# b83ae6d4 14-Jan-2015 Christoph Hellwig <hch@lst.de>

fs: remove mapping->backing_dev_info

Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8f6cb409 05-Jan-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Eliminate __gfs2_glock_remove_from_lru

Since the only caller of function __gfs2_glock_remove_from_lru locks the
same spin_lock as gfs2_glock_remove_from_lru, the functions can be combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 30badc95 18-Nov-2014 Markus Elfring <elfring@users.sourceforge.net>

GFS2: Deletion of unnecessary checks before two function calls

The functions iput() and put_pid() test whether their argument is NULL
and then return immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d29c0afe 03-Oct-2014 Fabian Frederick <fabf@skynet.be>

GFS2: use _RET_IP_ instead of (unsigned long)__builtin_return_address(0)

use macro definition

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5bef3e7c 26-Jun-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Allow flocks to use normal glock dq rather than dq_wait

This patch allows flock glocks to use a non-blocking dequeue rather
than dq_wait. It also reverts the previous patch I had posted regarding
dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
this might avoid unforeseen side effects, and was therefore safer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fe0bbd29 23-Jun-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use GFP_NOFS when allocating glocks

Normally GFP_KERNEL is ok here, but there is now a rarely used code path
relating to deallocation of unlinked inodes (in certain corner cases)
which if hit at times of memory shortage can cause recursion while
trying to free memory.

One solution would be to try and move the gfs2_glock_get() call so
that it is no longer called while another glock is held, but that
doesn't look at all easy, so GFP_NOFS is the best solution for the
time being.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 94a09a39 23-Jun-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix race in glock lru glock disposal

We must not leave items on the LRU list with GLF_LOCK set, since
they can be removed if the glock is brought back into use, which
may then potentially result in a hang, waiting for GLF_LOCK to
clear.

It doesn't happen very often, since it requires a glock that has
not been used for a long time to be brought back into use at the
same moment that the shrinker is part way through disposing of
glocks.

The fix is to set GLF_LOCK at a later time, when we already know
that the other locks can be obtained. Also, we now only release
the lru_lock in case a resched is needed, rather than on every
iteration.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 79272b35 20-Jun-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Only wait for demote when last holder is dequeued

Function gfs2_glock_dq_wait is supposed to dequeue a glock and then
wait for the lock to be demoted. The problem is, if this is a shared
lock, its demote will depend on the other holders, which means you
might end up waiting forever because the other process is blocked.
This problem is especially apparent when dealing with nested flocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 74316201 06-Jul-2014 NeilBrown <neilb@suse.de>

sched: Remove proliferation of wait_on_bit() action functions

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.

Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.

All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4e857c58 17-Mar-2014 Peter Zijlstra <peterz@infradead.org>

arch: Mass conversion of smp_mb__*()

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 01b172b7 12-Mar-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Ensure workqueue is scheduled after noexp request

This patch closes a small timing window whereby a request to hold the
transaction glock can get stuck. The problem is that after the DLM has
granted the lock, it can get into a state whereby it doesn't transition
the glock to a held state, due to not having requeued the glock state
machine to finish the transition.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d77d1b58 06-Mar-2014 Joe Perches <joe@perches.com>

GFS2: Use pr_<level> more consistently

Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.

Other miscellanea around these changes:

o Add missing newlines
o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fc554ed3 05-Mar-2014 Fabian Frederick <fabf@skynet.be>

GFS2: global conversion to pr_foo()

-All printk(KERN_foo converted to pr_foo().
-Messages updated to fit in 80 columns.
-fs_macros converted as well.
-fs_printk removed.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ac3beb6a 16-Jan-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Don't use ENOBUFS when ENOMEM is the correct error code

Al Viro has tactfully pointed out that we are using the incorrect
error code in some cases. This patch fixes that, and also removes
the (unused) return value for glock dumping.

> * gfs2_iget() - ENOBUFS instead of ENOMEM. ENOBUFS is
> "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
> we don't support STREAMS it's probably fair game, but... what the hell?

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>


# 0b3a2c99 02-Jan-2014 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

GFS2: Fix unsafe dereference in dump_holder()

GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e3c4269d 12-Nov-2013 Michal Nazarewicz <mina86@mina86.com>

GFS2: fix potential NULL pointer dereference

Commit [e66cf1610: GFS2: Use lockref for glocks] replaced call:
atomic_read(&gi->gl->gl_ref) == 0
with:
__lockref_is_dead(&gl->gl_lockref)
therefore changing how gl is accessed, from gi->gl to plan gl.
However, gl can be a NULL pointer, and so gi->gl needs to be
used instead (which is guaranteed not to be NULL because fo
the while loop checking that condition).

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e66cf161 15-Oct-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use lockref for glocks

Currently glocks have an atomic reference count and also a spinlock
which covers various internal fields, such as the state. This intent of
this patch is to replace the spinlock and the atomic reference count
with a lockref structure. This contains a spinlock which we can continue
to use as before, and a reference counter which is used in conjuction
with the spinlock to replace the previous atomic counter.

As a result of this there are some new rules for reference counting on
glocks. We need to distinguish between reference count changes under
gl_spin (which are now just increment or decrement of the new counter,
provided the count cannot hit zero) and those which are outside of
gl_spin, but which now take gl_spin internally.

The conversion is relatively straight forward. There is probably some
further clean up which can be done, but the priority at this stage is to
make the change in as simple a manner as possible.

A consequence of this change is that the reference count is being
decoupled from the lru list processing. This should allow future
adoption of the lru_list code with glocks in due course.

The reason for using the "dead" state and not just relying on 0 being
the "invalid state" is so that in due course 0 ref counts can be
allowable. The intent is to eventually be able to remove the ref count
changes which are currently hidden away in state_change().

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1ab6c499 27-Aug-2013 Dave Chinner <dchinner@redhat.com>

fs: convert fs shrinkers to new scan/count API

Convert the filesystem shrinkers to use the new API, and standardise some
of the behaviours of the shrinkers at the same time. For example,
nr_to_scan means the number of objects to scan, not the number of objects
to free.

I refactored the CIFS idmap shrinker a little - it really needs to be
broken up into a shrinker per tree and keep an item count with the tree
root so that we don't need to walk the tree every time the shrinker needs
to count the number of objects in the tree (i.e. all the time under
memory pressure).

[glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
[assorted fixes folded in]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 55f841ce 27-Aug-2013 Glauber Costa <glommer@openvz.org>

super: fix calculation of shrinkable objects for small numbers

The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.

It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible. But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).

This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 068213f7 25-Jul-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Remove unnecessary memory barrier

Function test_and_clear_bit implies a memory barrier, so subsequent
memory barriers are unnecessary.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7286b31e 20-Aug-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Take glock reference in examine_bucket()

We need to check the glock ref counter in a race free way
in order to ensure that the gfs2_glock_hold() call will
succeed. The easiest way to do that is to simply take the
reference count early in the common code of examine_bucket,
skipping any glocks with zero ref count.

That means that the examiner functions all need to put their
reference on the glock once they've performed their function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Tested-by: David Teigland <teigland@redhat.com>


# dfc4616d 15-Aug-2013 Dan Carpenter <dan.carpenter@oracle.com>

GFS2: alloc_workqueue() doesn't return an ERR_PTR

alloc_workqueue() returns a NULL on error, it doesn't return an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7af584d3 12-Dec-2012 Joe Perches <joe@perches.com>

gfs2: Convert print_symbol to %pSR

Use the new vsprintf extension to avoid any possible
message interleaving.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 222cb538 24-Apr-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Flush work queue before clearing glock hash tables

There was a timing window when a GFS2 file system was unmounted
that caused GFS2 to call BUG() and panic the kernel. The call
to BUG() is meant to ensure that the glock reference count,
gl_ref, never gets down to zero and bounce back up again. What was
happening during umount is that function gfs2_put_super was dequeing
its glocks for well-known files. In particular, we saw it on the
journal glock, sd_jinode_gh. The dequeue caused delayed work to be
queued for the glock state machine, to transition the lock to an
"unlocked" state. While the work was still queued, gfs2_put_super
called gfs2_gl_hash_clear to clear out the glock hash tables.
If the timing was just so, the glock work function would drop the
reference count at the time when it was being checked for zero,
and that caused BUG() to be called. This patch calls
flush_workqueue before clearing the glock hash tables, thereby
ensuring that the delayed work is executed before the hash tables
are cleared, and therefore the reference count never goes to zero
until the glock is cleared.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7bd8b2eb 10-Apr-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add origin indicator to glock demote tracing

This adds the origin indicator to the trace point for glock
demotion, so that it is possible to see where demote requests
have come from.

Note that requests generated from the demote_rq sysfs interface
will show as remote, since they are intended to replicate
exactly the effect of a demote reuqest from a remote node. It
is still possible to tell these apart by looking at the process
which initiated the demote request.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 81ffbf65 10-Apr-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add origin indicator to glock callbacks

This patch adds a bool indicating whether the demote
request was originated locally or remotely. This is then
used by the iopen ->go_callback() to make 100% sure that
it will only respond to remote callbacks.

Since ->evict_inode() uses GL_NOCACHE when it attempts to
get an exclusive lock on the iopen lock, this may result
in extra scheduling of the workqueue in case that the
exclusive promotion request failed. This patch prevents
that from happening.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 28fb3027 26-Feb-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove gfs2_refresh_inode from inode creation path

The original method for creating inodes used in GFS2 was to fill
out a buffer, with all the information, and then to read that
buffer into the in-core inode, using gfs2_refresh_inode()

The problem with this approach is that all the inode's fields
need to be calculated ahead of time, and were stored in various
variables making the code rather complicated.

The new approach is simply to allocate the in-core inode earlier
and fill in as many fields as possible ahead of time. These can
then be used to initilise the on disk representation. The
code has been working towards the point where it is possible
to remove gfs2_refresh_inode() because all the fields are
correctly initialised ahead of time. We've now reached that
milestone, and have reversed the order of setting up the in
core and on disk inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4506a519 01-Feb-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Split glock lru processing into two parts

The intent here is to split the processing of the glock lru
list into two parts, so that the selection of glocks and the
disposal are separate functions. The plan is then, that further
updates can then be made to these functions in the future
to improve the selection of glocks and also the efficiency of
glock disposal.

The new feature which this patch brings is sorting the
glocks to be disposed of into glock number (and thus also
disk block number) order. Not all glocks will need i/o in
order to dispose of them, but some will, and at least we'll
generate mostly disk block order i/o now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2a005855 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Separate LRU scanning from shrinker

This breaks out the LRU scanning function from the shrinker in
preparation for adding other callers to the LRU scanner.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 252aa6f5 11-Dec-2012 Rafael Aquini <aquini@redhat.com>

mm: redefine address_space.assoc_mapping

Overhaul struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*. By this
approach we consistently name the .private_* elements from struct
address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.

Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change (->private_data).

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4e2f8849 14-Nov-2012 David Teigland <teigland@redhat.com>

GFS2: remove redundant lvb pointer

The lksb struct already contains a pointer to the lvb,
so another directly from the glock struct is not needed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dba2d70c 14-Nov-2012 David Teigland <teigland@redhat.com>

GFS2: only use lvb on glocks that need it

Save the effort of allocating, reading and writing
the lvb for most glocks that do not use it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fb6791d1 13-Nov-2012 David Teigland <teigland@redhat.com>

GFS2: skip dlm_unlock calls in unmount

When unmounting, gfs2 does a full dlm_unlock operation on every
cached lock. This can create a very large amount of work and can
take a long time to complete. However, the vast majority of these
dlm unlock operations are unnecessary because after all the unlocks
are done, gfs2 leaves the dlm lockspace, which automatically clears
the locks of the leaving node, without unlocking each one individually.
So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
remove the locks implicitly. The one exception is when the lock's lvb is
being used. In this case, dlm_unlock is called because it may update the
lvb of the resource.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 06dfc306 24-Oct-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Rename glops go_xmote_th to go_sync

[Editorial: This is a nit, but has been a minor irritation for a long time:]

This patch renames glops structure item for go_xmote_th to go_sync.
The functionality is unchanged; it's just for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8eae1ca0 15-Oct-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Review bug traps in glops.c

Two of the bug traps here could really be warnings. The others are
converted from BUG() to GLOCK_BUG_ON() since we'll most likely
need to know the glock state in order to debug any issues which
arise. As a result of this, __dump_glock has to be renamed and
is no longer static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e5dc76b9 08-Aug-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Eliminate redundant calls to may_grant

Function add_to_queue was checking may_grant for the passed-in
holder for every iteration of its gh2 loop. Now it only checks it
once at the beginning to see if a try lock is futile.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 81e1d450 08-Aug-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote

Function gfs2_glock_dq_wait called two-line function wait_on_demote,
so they were combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 07a79049 08-Aug-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Combine functions gfs2_glock_wait and wait_on_holder

Function gfs2_glock_wait only called function wait_on_holder and
returned its return code, so they were combined for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4abb6ad9 08-Aug-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: inline __gfs2_glock_schedule_for_reclaim

Since function gfs2_glock_schedule_for_reclaim is only two
significant lines, we can eliminate it, simplifying the code
and making it more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0fe2f1e9 11-Jun-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Size seq_file buffer more carefully

This places a limit on the buffer size for archs with larger
PAGE_SIZE.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>


# 1bb49303 11-Jun-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use seq_vprintf for glocks debugfs file

Make use of the newly added seq_vprintf() function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>


# 90306c41 29-May-2012 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Use lvbs for storing rgrp information with mount option

Instead of reading in the resource groups when gfs2 is checking
for free space to allocate from, gfs2 can store the necessary infromation
in the resource group's lvb. Also, instead of searching for unlinked
inodes in every resource group that's checked for free space, gfs2 can
store the number of unlinked but inodes in the lvb, and only check for
unlinked inodes if it will find some.

The first time a resource group is locked, the lvb must initialized.
Since this involves counting the unlinked inodes in the resource group,
this takes a little extra time. But after that, if the resource group
is locked with GL_SKIP, the buffer head won't be read in unless it's
actually needed.

Enabling the resource groups lvbs is done via the rgrplvb mount option. If
this option isn't set, the lvbs will still be set and updated, but they won't
be verfied or used by the filesystem. To safely turn on this option, all of
the nodes mounting the filesystem must be running code with this patch, and
the filesystem must have been completely unmounted since they were updated.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ba1ddcb6 08-Jun-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Cache last hash bucket for glock seq_files

For the glocks and glstats seq_files, which are exposed via debugfs
we should cache the most recent hash bucket, along with the offset
into that bucket. This allows us to restart from that point, rather
than having to begin at the beginning each time.

This is an idea from Eric Dumazet, however I've slightly extended it
so that if the position from which we are due to start is at any
point beyond the last cached point, we start from the last cached
point, plus whatever is the appropriate offset. I don't really expect
people to be lseeking around these files, but if they did so with only
positive offsets, then we'd still get some of the benefit of using a
cached offset.

With my simple test of around 200k entries in the file, I'm seeing
an approx 10x speed up.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# df5d2f55 07-Jun-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Increase buffer size for glocks and glstats debugfs files

As per Al Viro's suggestion, this increases the buffer size used
for these two files. This provides a speed up of slightly less than
8x (i.e. proportional to the buffer size) for cases when we have
large numbers of glocks.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a245769f 20-Jan-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: glock statistics gathering

The stats are divided into two sets: those relating to the
super block and those relating to an individual glock. The
super block stats are done on a per cpu basis in order to
try and reduce the overhead of gathering them. They are also
further divided by glock type.

In the case of both the super block and glock statistics,
the same information is gathered in each case. The super
block statistics are used to provide default values for
most of the glock statistics, so that newly created glocks
should have, as far as possible, a sensible starting point.

The statistics are divided into three pairs of mean and
variance, plus two counters. The mean/variance pairs are
smoothed exponential estimates and the algorithm used is
one which will be very familiar to those used to calculation
of round trip times in network code.

The three pairs of mean/variance measure the following
things:

1. DLM lock time (non-blocking requests)
2. DLM lock time (blocking requests)
3. Inter-request time (again to the DLM)

A non-blocking request is one which will complete right
away, whatever the state of the DLM lock in question. That
currently means any requests when (a) the current state of
the lock is exclusive (b) the requested state is either null
or unlocked or (c) the "try lock" flag is set. A blocking
request covers all the other lock requests.

There are two counters. The first is there primarily to show
how many lock requests have been made, and thus how much data
has gone into the mean/variance calculations. The other counter
is counting queueing of holders at the top layer of the glock
code. Hopefully that number will be a lot larger than the number
of dlm lock requests issued.

So why gather these statistics? There are several reasons
we'd like to get a better idea of these timings:

1. To be able to better set the glock "min hold time"
2. To spot performance issues more easily
3. To improve the algorithm for selecting resource groups for
allocation (to base it on lock wait time, rather than blindly
using a "try lock")
Due to the smoothing action of the updates, a step change in
some input quantity being sampled will only fully be taken
into account after 8 samples (or 4 for the variance) and this
needs to be carefully considered when interpreting the
results.

Knowing both the time it takes a lock request to complete and
the average time between lock requests for a glock means we
can compute the total percentage of the time for which the
node is able to use a glock vs. time that the rest of the
cluster has its share. That will be very useful when setting
the lock min hold time.

The other point to remember is that all times are in
nanoseconds. Great care has been taken to ensure that we
measure exactly the quantities that we want, as accurately
as possible. There are always inaccuracies in any
measuring system, but I hope this is as accurate as we
can reasonably make it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4043b886 16-Jan-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix race between lru_list and glock ref count

This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e0c2a9aa 09-Jan-2012 David Teigland <teigland@redhat.com>

GFS2: dlm based recovery coordination

This new method of managing recovery is an alternative to
the previous approach of using the userland gfs_controld.

- use dlm slot numbers to assign journal id's
- use dlm recovery callbacks to initiate journal recovery
- use a dlm lock to determine the first node to mount fs
- use a dlm lock to track journals that need recovery

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7cf8dcd3 15-Jun-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: Automatically adjust glock min hold time

This patch is a performance improvement for GFS2 in a clustered
environment. It makes the glock hold time self-adjusting.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1495f230 24-May-2011 Ying Han <yinghan@google.com>

vmscan: change shrinker API by passing shrink_control struct

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f90e5b5b 24-May-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: Processes waiting on inode glock that no processes are holding

This patch fixes a race in the GFS2 glock state machine that may
result in lockups. The symptom is that all nodes but one will
hang, waiting for a particular glock. All the holder records
will have the "W" (Waiting) bit set. The other node will
typically have the glock stuck in Exclusive mode (EX) with no
holder records, but the dinode will be cached. In other words,
an entry with "I:" will appear in the glock dump for that glock,
but nothing else.

The race has to do with the glock "Pending Demote" bit, which
can be set, then immediately reset, thus losing the fact that
another node needs the glock. The sequence of events is:

1. Something schedules the glock workqueue (e.g. glock request from fs)
2. The glock workqueue gets to the point between the test of the reply pending
bit and the spin lock:

if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) {
finish_xmote(gl, gl->gl_reply);
drop_ref = 1;
}
down_read(&gfs2_umount_flush_sem); <---- i.e. here
spin_lock(&gl->gl_spin);

3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and
(b) the demote request which sets GLF_PENDING_DEMOTE

4. The following test is executed:

if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
gl->gl_state != LM_ST_UNLOCKED &&
gl->gl_demote_state != LM_ST_EXCLUSIVE) {

This resets the pending demote flag, and gl->gl_demote_state is not equal to
exclusive, however because the reply from the dlm arrived after we checked for
the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so
although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the
GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG.

The patch closes the timing window by only transitioning the
"Pending demote" bit to the "demote" flag once we know the
other conditions (not unlocked and not exclusive) are met.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 588da3b3 04-May-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Don't use a try lock when promoting to a higher mode

Previously we marked all locks being promoted to a higher mode
with the try flag to avoid any potential deadlocks issues. The
DLM is able to detect these and report them in way that GFS2 can
deal with them correctly. So we can just request the required mode
and wait for a response without needing to perform this check.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1879fd6a 25-Apr-2011 Christoph Hellwig <hch@infradead.org>

add hlist_bl_lock/unlock helpers

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4667a0ec 18-Apr-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Make writeback more responsive to system conditions

This patch adds writeback_control to writing back the AIL
list. This means that we can then take advantage of the
information we get in ->write_inode() in order to set off
some pre-emptive writeback.

In addition, the AIL code is cleaned up a bit to make it
a bit simpler to understand.

There is still more which can usefully be done in this area,
but this is a good start at least.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f42ab085 14-Apr-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Optimise glock lru and end of life inodes

The GLF_LRU flag introduced in the previous patch can be
used to check if a glock is on the lru list when a new
holder is queued and if so remove it, without having first
to get the lru_lock.

The main purpose of this patch however is to optimise the
glocks left over when an inode at end of life is being
evicted. Previously such glocks were left with the GLF_LFLUSH
flag set, so that when reclaimed, each one required a log flush.
This patch resets the GLF_LFLUSH flag when there is nothing
left to flush thus preventing later log flushes as glocks are
reused or demoted.

In order to do this, we need to keep track of the number of
revokes which are outstanding, and also to clear the GLF_LFLUSH
bit after a log commit when only revokes have been processed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 627c10b7 14-Apr-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Improve tracing support (adds two flags)

This adds support for two new flags. One keeps track of whether
the glock is on the LRU list or not. The other isn't really a
flag as such, but an indication of whether the glock has an
attached object or not. This indication is reported without
any locking, which is ok since we do not dereference the object
pointer but merely report whether it is NULL or not.

Also, this fixes one place where a tracepoint was missing, which
was at the point we remove deallocated blocks from the journal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 29687a2a 30-Mar-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Alter point of entry to glock lru list for glocks with an address_space

Rather than allowing the glocks to be scheduled for possible
reclaim as soon as they have exited the journal, this patch
delays their entry to the list until the glocks in question
are no longer in use.

This means that we will rely on the vm for writeback of all
dirty data and metadata from now on. When glocks are added
to the lru list they should be freeable much faster since all
the I/O required to free them should have already been completed.

This should lead to much better I/O patterns under low memory
conditions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 7e32d026 15-Mar-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Don't use _raw version of RCU dereference

As per RCU glock patch review comments, don't use the _raw
version of this function here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# fa1bbdea 10-Mar-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: Optimize glock multiple-dequeue code

This is a small patch that optimizes multiple glock dequeue
operations. It changes the unlock order to be more efficient
and makes it easier for lock debugging tools to unravel. It
also eliminates the need for the temp variable x, although
that would likely be optimized out.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fc0e38da 09-Mar-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix glock deallocation race

This patch fixes a race in deallocating glocks which was introduced
in the RCU glock patch. We need to ensure that the glock count is
kept correct even in the case that there is a race to add a new
glock into the hash table. Also, to avoid having to wait for an
RCU grace period, the glock counter can be decremented before
call_rcu() is called.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 58a69cb4 16-Feb-2011 Tejun Heo <tj@kernel.org>

workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'

There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>


# edae38a6 31-Jan-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix glock queue trace point

Somehow this tracepoint landed up in the wrong place. This moves it
to where it should be.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bc015cb8 19-Jan-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use RCU for glock hash table

This has a number of advantages:

- Reduces contention on the hash table lock
- Makes the code smaller and simpler
- Should speed up glock dumps when under load
- Removes ref count changing in examine_bucket
- No longer need hash chain lock in glock_put() in common case

There are some further changes which this enables and which
we may do in the future. One is to look at using SLAB_RCU,
and another is to look at using a per-cpu counter for the
per-sb glock counter, since that is touched twice in the
lifetime of each glock (but only used at umount time).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 47a25380 30-Nov-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge glock state fields into a bitfield

We can only merge the fields into a bitfield if the locking
rules for them are the same. In this case gl_spin covers all
of the fields (write side) but a couple of them are used
with GLF_LOCK as the read side lock, which should be ok
since we know that the field in question won't be changing
at the time.

The gl_req setting has to be done earlier (in glock.c) in order
to place it under gl_spin. The gl_reply setting also has to be
brought under gl_spin in order to comply with the new rules.

This saves 4*sizeof(unsigned int) per glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>


# 921169ca 28-Nov-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up of gdlm_lock function

The DLM never returns -EAGAIN in response to dlm_lock(), and even
if it did, the test in gdlm_lock() was wrong anyway. Once that
test is removed, it is possible to greatly simplify this code
by simply using a "normal" error return code (0 for success).

We then no longer need the LM_OUT_ASYNC return code which can
be removed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5e69069c 09-Nov-2010 Joe Perches <joe@perches.com>

GFS2: fs/gfs2/glock.c: Use printf extension %pV

Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cc18152e 05-Nov-2010 Joe Perches <joe@perches.com>

GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pS

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d2115778 03-Nov-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIM

The WQ_RESCUER flag should only be used internally to the
workqueue implementation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>


# 044b9414 03-Nov-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix inode deallocation race

This area of the code has always been a bit delicate due to the
subtleties of lock ordering. The problem is that for "normal"
alloc/dealloc, we always grab the inode locks first and the rgrp lock
later.

In order to ensure no races in looking up the unlinked, but still
allocated inodes, we need to hold the rgrp lock when we do the lookup,
which means that we can't take the inode glock.

The solution is to borrow the technique already used by NFS to solve
what is essentially the same problem (given an inode number, look up
the inode carefully, checking that it really is in the expected
state).

We cannot do that directly from the allocation code (lock ordering
again) so we give the job to the pre-existing delete workqueue and
carry on with the allocation as normal.

If we find there is no space, we do a journal flush (required anyway
if space from a deallocation is to be released) which should block
against the pending deallocations, so we should always get the space
back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c741c455 29-Sep-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix spectator umount issue

The tests further down the recovery function relating to
unlocking the journal need to be updated to match the
intial test. Also, a test in the umount code which was
surplus to requirements has been removed. Umounting
spectator mounts now works correctly, as expected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9fa0ea9f 13-Sep-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use new workqueue scheme

The recovery workqueue can be freezable since
we want it to finish what it is doing if the system is to
be frozen (although why you'd want to freeze a cluster node
is beyond me since it will result in it being ejected from
the cluster). It does still make sense for single node
GFS2 filesystems though.

The glock workqueue will benefit from being able to run more
work items concurrently. A test running postmark shows
improved performance and multi-threaded workloads are likely
to benefit even more. It needs to be high priority because
the latency directly affects the latency of filesystem glock
operations.

The delete workqueue is similar to the recovery workqueue in
that it must not get blocked by memory allocations, and may
run for a long time.

Potentially other GFS2 threads might also be converted to
workqueues, but I'll leave that for a later patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>


# 7b5e3d5f 03-Sep-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Don't enforce min hold time when two demotes occur in rapid succession

Due to the design of the VFS, it is quite usual for operations on GFS2
to consist of a lookup (requiring a shared lock) followed by an
operation requiring an exclusive lock. If a remote node has cached an
exclusive lock, then it will receive two demote events in rapid succession
firstly for a shared lock and then to unlocked. The existing min hold time
code was triggering in this case, even if the node was otherwise idle
since the state change time was being updated by the initial demote.

This patch introduces logic to skip the min hold timer in the case that
a "double demote" of this kind has occurred. The min hold timer will
still be used in all other cases.

A new glock flag is introduced which is used to keep track of whether
there have been any newly queued holders since the last glock state
change. The min hold time is only applied if the flag is set.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Abhijith Das <adas@redhat.com>


# 0809f6ec 02-Aug-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix recovery stuck bug (try #2)

This is a clean up of the code which deals with LM_FLAG_NOEXP
which aims to remove any possible race conditions by using
gl_spin to cover the gap between testing for the LM_FLAG_NOEXP
and the GL_FROZEN flag.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7cdee5db 29-Jul-2010 Steven Whitehouse <swhiteho@redhat.com>

Revert "GFS2: recovery stuck on transaction lock"

This reverts commit b7dc2df5725fe7355fd76000ead7e39728e1b8a9.

The initial patch didn't quite work since it doesn't cover all
the possible routes by which the GLF_FROZEN flag might be set.
A revised fix is coming up in the next patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d5341a92 23-Jul-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Make "try" lock not try quite so hard

This looks like a big change, but in reality its only a single line of actual
code change, the rest is just moving a function to before its new caller.
The "try" flag for glocks is a rather subtle and delicate setting since it
requires that the state machine tries just hard enough to ensure that it has
a good chance of getting the requested lock, but no so hard that the
request can land up blocked behind another.

The patch adds in an additional check which will fail any queued try
locks if there is another request blocking the try lock request which
is not granted and compatible, nor in progress already. The check is made
only after all pending locks which may be granted have been granted.

I've checked this with the reproducer for the reported flock bug which
this is intended to fix, and it now passes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7f8275d0 18-Jul-2010 Dave Chinner <dchinner@redhat.com>

mm: add context argument to shrinker callback

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# b7dc2df5 23-Jun-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: recovery stuck on transaction lock

This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
transaction lock. We set the frozen flag on the glock when we receive
a completion that cannot be delivered due to blocked locks. At that
point we check to see whether the first waiting holder has the noexp
flag set. If the noexp lock is queued later, then we need to unfreeze
the glock at that point in time, namely, in the glock work function.

This patch was originally written by Steve Whitehouse, but since
he's on holiday, I'm submitting it. It's been well tested with a
complex recovery test called revolver.

Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1a0eae88 14-Apr-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: glock livelock

This patch fixes a couple gfs2 problems with the reclaiming of
unlinked dinodes. First, there were a couple of livelocks where
everything would come to a halt waiting for a glock that was
seemingly held by a process that no longer existed. In fact, the
process did exist, it just had the wrong pid number in the holder
information. Second, there was a lock ordering problem between
inode locking and glock locking. Third, glock/inode contention
could sometimes cause inodes to be improperly marked invalid by
iget_failed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 4818972e 22-Feb-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: print glock numbers in hex

This patch changes glock numbers from printing in decimal to hex.
Since DLM prints corresponding resource IDs in hex, it makes debugging
easier.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c1184f8a 08-Jan-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove loopy umount code

As a consequence of the previous patch, we can now remove the
loop which used to be required due to the circular dependency
between the inodes and glocks. Instead we can just invalidate
the inodes, and then clear up any glocks which are left.

Also we no longer need the rwsem since there is no longer any
danger of the inode invalidation calling back into the glock
code (and from there back into the inode code).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 009d8518 07-Dec-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Metadata address space clean up

Since the start of GFS2, an "extra" inode has been used to store
the metadata belonging to each inode. The only reason for using
this inode was to have an extra address space, the other fields
were unused. This means that the memory usage was rather inefficient.

The reason for keeping each inode's metadata in a separate address
space is that when glocks are requested on remote nodes, we need to
be able to efficiently locate the data and metadata which relating
to that glock (inode) in order to sync or sync and invalidate it
(depending on the remotely requested lock mode).

This patch adds a new type of glock, which has in addition to
its normal fields, has an address space. This applies to all
inode and rgrp glocks (but to no other glock types which remain
as before). As a result, we no longer need to have the second
inode.

This results in three major improvements:
1. A saving of approx 25% of memory used in caching inodes
2. A removal of the circular dependency between inodes and glocks
3. No confusion between "normal" and "metadata" inodes in super.c

Although the first of these is the more immediately apparent, the
second is just as important as it now enables a number of clean
ups at umount time. Those will be the subject of future patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8f05228e 29-Jan-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Extend umount wait coverage to full glock lifetime

Although all glocks are, by the time of the umount glock wait,
scheduled for demotion, some of them haven't made it far
enough through the process for the original set of waiting
code to wait for them.

This extends the ref count to the whole glock lifetime in order
to ensure that the waiting does catch all glocks. It does make
it a bit more invasive, but it seems the only sensible solution
at the moment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 26bb7505 27-Nov-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix glock refcount issues

This patch fixes some ref counting issues. Firstly by moving
the point at which we drop the ref count after a dlm lock
operation has completed we ensure that we never call
gfs2_glock_hold() on a lock with a zero ref count.

Secondly, by using atomic_dec_and_lock() in gfs2_glock_put()
we ensure that at no time will a glock with zero ref count
appear on the lru_list. That means that we can remove the
check for this in our shrinker (which was racy).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7e71c55e 22-Sep-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix potential race in glock code

We need to be careful of the ordering between clearing the
GLF_LOCK bit and scheduling the workqueue.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b94a170e 23-Jul-2009 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: remove dcache entries for remote deleted inodes

When a file is deleted from a gfs2 filesystem on one node, a dcache
entry for it may still exist on other nodes in the cluster. If this
happens, gfs2 will be unable to free this file on disk. Because of this,
it's possible to have a gfs2 filesystem with no files on it and no free
space. With this patch, when a node receives a callback notifying it
that the file is being deleted on another node, it schedules a new
workqueue thread to remove the file's dcache entry.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8ff22a6f 10-Jul-2009 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Don't put unlikely reclaim candidates on the reclaim list.

GFS2 was placing far too many glocks on the reclaim list that were not good
candidates for freeing up from cache. These locks would sit there and
repeatedly get scanned to see if they could be reclaimed, wasting a lot
of time when there was memory pressure. This fix does more checks on the
locks to see if they are actually likely to be removable from cache.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a51b56ff 30-Jun-2009 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Fix panic in glock memory shrinker

It is possible for gfs2_shrink_glock_memory() to check a glock for
demotion
that's in the process of being freed by gfs2_glock_put(). In this case,
gfs2_shrink_glock_memory() will acquire a new reference to this glock,
and
then try to free the glock itself when it drops the refernce. To solve
this, gfs2_shrink_glock_memory() just needs to check if the glock is in
the process of being freed, and if so skip it without ever unlocking the
lru_lock.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2163b1e6 25-Jun-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Shrink the shrinker

This patch removes some of the special cases that the shrinker
was trying to deal with. As a result we leave fewer items on
the list and none at all which cannot be demoted. This makes
the list scanning more efficient and solves some issues seen
with large numbers of inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 63997775 12-Jun-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add tracepoints

This patch adds the ability to trace various aspects of the GFS2
filesystem. The trace points are divided into three groups,
glocks, logging and bmap. These points have been chosen because
they allow inspection of the major internal functions of GFS2
and they are also generic enough that they are unlikely to need
any major changes as the filesystem evolves.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fe64d517 19-May-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Umount recovery race fix

This patch fixes a race condition where we can receive recovery
requests part way through processing a umount. This was causing
problems since the recovery thread had already gone away.

Looking in more detail at the recovery code, it was really trying
to implement a slight variation on a work queue, and that happens to
align nicely with the recently introduced slow-work subsystem. As a
result I've updated the code to use slow-work, rather than its own home
grown variety of work queue.

When using the wait_on_bit() function, I noticed that the wait function
that was supplied as an argument was appearing in the WCHAN field, so
I've updated the function names in order to produce more meaningful
output.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0c7a531a 30-Apr-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix glock ref counting bug

Depending on the ordering of events as we go around the
glock shrinker loop, it is possible to drop the ref count
of a glock incorrectly. It doesn't happen very often. This
patch corrects the got_ref variable, fixing the problem.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a228df63 07-Apr-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move umount flush rwsem

The rwsem, used only on umount, is in the wrong place in glock.c.
This patch moves it up a bit so that it does not get called under
a spinlock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 64d576ba 12-Feb-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add a "demote a glock" interface to sysfs

This adds a sysfs file called demote_rq to GFS2's
per filesystem directory. Its possible to use this
file to demote arbitrary glocks in exactly the same
way as if a request had come in from a remote node.

This is intended for testing issues relating to caching
of data under glocks. Despite that, the interface is
generic enough to send requests to any type of glock,
but be careful as its not always safe to send an
arbitrary message to an arbitrary glock. For that reason
and to prevent DoS, this interface is restricted to root
only.

The messages look like this:

<type>:<glocknumber> <mode>

Example:

echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq

Which means "please demote inode glock (type 2) number 13324 so that
I can get an EX (exclusive) lock". The lock modes are those which
would normally be sent by a remote node in its callback so if you
want to unlock a glock, you use EX, to demote to shared, use SH or PR
(depending on whether you like GFS2 or DLM lock modes better!).

If the glock doesn't exist, you'll get -ENOENT returned. If the
arguments don't make sense, you'll get -EINVAL returned.

The plan is that this interface will be used in combination with
the blktrace patch which I recently posted for comments although
it is, of course, still useful in its own right.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d8348de0 05-Feb-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix deadlock on journal flush

This patch fixes a deadlock when the journal is flushed and there
are dirty inodes other than the one which caused the journal flush.
Originally the journal flushing code was trying to obtain the
transaction glock while running the flush code for an inode glock.
We no longer require the transaction glock at this point in time
since we know that any attempt to get the transaction glock from
another node will result in a journal flush. So if we are flushing
the journal, we can be sure that the transaction lock is still
cached from when the transaction was started.

By inlining a version of gfs2_trans_begin() (minus the bit which
gets the transaction glock) we can avoid the deadlock problems
caused if there is a demote request queued up on the transaction
glock.

In addition I've also moved the umount rwsem so that it covers
the glock workqueue, since it all demotions are done by this
workqueue now. That fixes a bug on umount which I came across
while fixing the original problem.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ac2425e7 13-Jan-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove unused field from glock

The time stamp field is unused in the glock now that we are
using a shrinker, so that we can remove it and save sizeof(unsigned long)
bytes in each glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f057f6cd 12-Jan-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge lock_dlm module into GFS2

This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
o Reducing overhead by eliminating duplicated fields between structures
o Simplifcation of the code (reduces the code size by a fair bit)
o The locking interface is now the DLM interface itself as proposed
some time ago.
o Fewer lookups of glocks when processing replies from the DLM
o Fewer memory allocations/deallocations for each glock
o Scope to do further optimisations in the future (but this patch is
more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# eb8374e7 25-Dec-2008 Julia Lawall <julia@diku.dk>

GFS2: Use DEFINE_SPINLOCK

SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested
in Documentation/spinlocks.txt

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
declarer name DEFINE_SPINLOCK;
identifier xxx_lock;
@@

- spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
+ DEFINE_SPINLOCK(xxx_lock);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fefc03bf 19-Dec-2008 Steven Whitehouse <swhiteho@redhat.com>

Revert "GFS2: Fix use-after-free bug on umount"

This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43.

The original patch is causing problems in relation to order of
operations at umount in relation to jdata files. I need to fix
this a different way.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3af165ac 27-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix use-after-free bug on umount

There was a use-after-free with the GFS2 super block during
umount. This patch moves almost all of the umount code from
->put_super into ->kill_sb, the only bit that cannot be moved
being the glock hash clearing which has to remain as ->put_super
due to umount ordering requirements. As a result its now obvious
that the kfree is the final operation, whereas before it was
hidden in ->put_super.

Also gfs2_jindex_free is then only referenced from a single file
so thats moved and marked static too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2bfb6449 26-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move four functions from super.c

The functions which are being moved can all be marked
static in their new locations, since they only have
a single caller each. Their new locations are more
logical than before and some of the functions are
small enough that the compiler might well inline them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 97cc1025 20-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Kill two daemons with one patch

This patch removes the two daemons, gfs2_scand and gfs2_glockd
and replaces them with a shrinker which is called from the VM.

The net result is that GFS2 responds better when there is memory
pressure, since it shrinks the glock cache at the same rate
as the VFS shrinks the dcache and icache. There are no longer
any time based criteria for shrinking glocks, they are kept
until such time as the VM asks for more memory and then we
demote just as many glocks as required.

There are potential future changes to this code, including the
possibility of sorting the glocks which are to be written back
into inode number order, to get a better I/O ordering. It would
be very useful to have an elevator based workqueue implementation
for this, as that would automatically deal with the read I/O cases
at the same time.

This patch is my answer to Andrew Morton's remark, made during
the initial review of GFS2, asking why GFS2 needs so many kernel
threads, the answer being that it doesn't :-) This patch is a
net loss of about 200 lines of code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 813e0c46 18-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix "truncate in progress" hang

Following on from the recent clean up of gfs2_quotad, this patch moves
the processing of "truncate in progress" inodes from the glock workqueue
into gfs2_quotad. This fixes a hang due to the "truncate in progress"
processing requiring glocks in order to complete.

It might seem odd to use gfs2_quotad for this particular item, but
we have to use a pre-existing thread since creating a thread implies
a GFP_KERNEL memory allocation which is not allowed from the glock
workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
both scheduled for removal at some (hopefully not too distant) future
point. That leaves only gfs2_quotad whose workload is generally fairly
light and is easily adapted for this extra task.

Also, as a result of this change, it opens the way for a future patch to
make the reading of the inode's information asynchronous with respect to
the glock workqueue, which is another improvement that has been on the list
for some time now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 55ba474d 24-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com>

GFS2: sparse annotation of gl->gl_spin

fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
fs/gfs2/glock.c:308:5: context '*gl+28': wanted >= 1, got 0
fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
fs/gfs2/glock.c:529:2: context '*gl+28': wanted >= 1, got 0
fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
fs/gfs2/glock.c:925:3: context '*gl+28': wanted >= 1, got 0

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 719ee344 18-Sep-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: high time to take some time over atime

Until now, we've used the same scheme as GFS1 for atime. This has failed
since atime is a per vfsmnt flag, not a per fs flag and as such the
"noatime" flag was not getting passed down to the filesystems. This
patch removes all the "special casing" around atime updates and we
simply use the VFS's atime code.

The net result is that GFS2 will now support all the same atime related
mount options of any other filesystem on a per-vfsmnt basis. We do lose
the "lazy atime" updates, but we gain "relatime". We could add lazy
atime to the VFS at a later date, if there is a requirement for that
variant still - I suspect relatime will be enough.

Also we lose about 100 lines of code after this patch has been applied,
and I have a suspicion that it will speed things up a bit, even when
atime is "on". So it seems like a nice clean up as well.

From a user perspective, everything stays the same except the loss of
the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
least, and to be honest I don't think anybody ever used it) and that a
number of options which were ignored before now work correctly.

Please let me know if you've got any comments. I'm pushing this out
early so that you can all see what my plans are.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dff52574 02-Sep-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix race relating to glock min-hold time

In the case that a request for a glock arrives right after the
grant reply has arrived, it sometimes means that the gl_tstamp
field hasn't been updated recently enough. The net result is that
the min-hold time for the glock is ignored. If this happens
often enough, it leads to poor performance.

This patch adds an additional test, so that if the reply pending
bit is set on a glock, then it will select the maximum length of
time for the min-hold time, rather than looking at gl_tstamp.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c1e817d0 22-Jul-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix debugfs glock file iterator

Due to an incorrect iterator, some glocks were being missed from the
glock dumps obtained via debugfs. This patch fixes the problem and
ensures that we don't miss any glocks in future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 209806ab 07-Jul-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Allow local DF locks when holding a cached EX glock

We already allow local SH locks while we hold a cached EX glock, so here
we allow DF locks as well. This works only because we rely on the VFS's
invalidation for locally cached data, and because if we hold an EX lock,
then we know that no other node can be caching data relating to this
file.

It dramatically speeds up initial writes to O_DIRECT files since we fall
back to buffered I/O for this and would otherwise bounce between DF and
EX modes on each and every write call. The lessons to be learned from
that are to ensure that (for the time being anyway) O_DIRECT files are
preallocated and that they are written to using reasonably large I/O
sizes. Even so this change fixes that corner case nicely

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 265d529c 07-Jul-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix delayed demote race

There is a race in the delayed demote code where it does the wrong thing
if a demotion to UN has occurred for other reasons before the delay has
expired. This patch adds an assert to catch that condition as well as
fixing the root cause by adding an additional check for the UN state.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>


# 1bdad606 03-Jun-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove remote lock dropping code

There are several reasons why this is undesirable:

1. It never happens during normal operation anyway
2. If it does happen it causes performance to be very, very poor
3. It isn't likely to solve the original problem (memory shortage
on remote DLM node) it was supposed to solve
4. It uses a bunch of arbitrary constants which are unlikely to be
correct for any particular situation and for which the tuning seems
to be a black art.
5. In an N node cluster, only 1/N of the dropped locked will actually
contribute to solving the problem on average.

So all in all we are better off without it. This also makes merging
the lock_dlm module into GFS2 a bit easier.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 048bca22 23-May-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] No lock_nolock

This patch merges the lock_nolock module into GFS2 itself. As well as removing
some of the overhead of the module, it also means that its now impossible to
build GFS2 without a lock module (which would be a pointless thing to do
anyway).

We also plan to merge lock_dlm into GFS2 in the future, but that is a more
tricky task, and will therefore be a separate patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>


# 6802e340 21-May-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Clean up the glock core

This patch implements a number of cleanups to the core of the
GFS2 glock code. As a result a lot of code is removed. It looks
like a really big change, but actually a large part of this patch
is either removing or moving existing code.

There are some new bits too though, such as the new run_queue()
function which is considerably streamlined. Highlights of this
patch include:

o Fixes a cluster coherency bug during SH -> EX lock conversions
o Removes the "glmutex" code in favour of a single bit lock
o Removes the ->go_xmote_bh() for inodes since it was duplicating
->go_lock()
o We now only use the ->lm_lock() function for both locks and
unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
o The fast path is considerably shortly, giving performance gains
especially with lock_nolock
o The glock_workqueue is now used for all the callbacks from the DLM
which allows us to simplify the lock_dlm module (see following patch)
o The way is now open to make further changes such as eliminating the two
threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
scheme.

This patch has undergone extensive testing with various test suites
so it should be pretty stable by now.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>


# 58e9fee1 14-Mar-2008 Benjamin Marzinski <bmarzins@redhat.com>

[GFS2] Invalidate cache at correct point

GFS2 wasn't invalidating its cache before it called into the lock manager
with a request that could potentially drop a lock. This was leaving a
window where the lock could be actually be held by another node, but the
file's page cache would still appear valid, causing coherency problems.
This patch moves the cache invalidation to before the lock manager call
when dropping a lock. It also adds the option to the lock_dlm lock
manager to not use conversion mode deadlock avoidance, which, on a
conversion from shared to exclusive, could internally drop the lock, and
then reacquire in. GFS2 now asks lock_dlm to not do this. Instead, GFS2
manually drops the lock and reacquires it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 840ca0ec 12-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix bug where we called drop_bh incorrectly

As a result of an earlier patch, drop_bh was being called in cases
when it shouldn't have been. Since we never have a gh in the drop
case and we always have a gh in the promote case, we can use that
extra information to tell which case has been seen.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>


# cf45b752 31-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Remove rgrp and glock version numbers

This patch further reduces GFS2's memory requirements by
eliminating the 64-bit version number fields in lieu of
a couple bits.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# da755fdb 30-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove lm.[ch] and distribute content

The functions in lm.c were just wrappers which were mostly
only used in one other file. By moving the functions to
the files where they are being used, they can be marked
static and also this will usually result in them being inlined
since they are often only used from one point in the code.

A couple of really trivial functions have been inlined by hand
into the function which called them as it makes the code clearer
to do that.

We also gain from one fewer function call in the glock lock and
unlock paths.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ab0d7566 29-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Eliminate gl_req_bh

This patch further reduces the memory needs of GFS2 by
eliminating the gl_req_bh variable from struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 29d38cd1 28-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Get rid of gl_waiters2

This patch reduces memory by replacing the int variable
gl_waiters2 by a single bit in the gl_flags.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 048786f1 28-Jan-2008 Adrian Bunk <bunk@kernel.org>

[GFS2] make gfs2_glock_hold() static

gfs2_glock_hold() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ef8c441c 28-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Only wake the reclaim daemon if we need to

This patch only wakes up the glock reclaim daemon if there is
actually something to be reclaimed.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# eccba068 07-Feb-2008 Pavel Emelyanov <xemul@openvz.org>

gfs2: make gfs2_glock.gl_owner_pid be a struct pid *

The gl_owner_pid field is used to get the lock owning task by its pid, so make
it in a proper manner, i.e. by using the struct pid pointer and pid_task()
function.

The pid_task() becomes exported for the gfs2 module.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1e058da 07-Feb-2008 Pavel Emelyanov <xemul@openvz.org>

gfs2: make gfs2_holder.gh_owner_pid be a struct pid *

The gl_owner_pid field is used to get the holder task by its pid and check
whether the current is a holder, so make it in a proper manner, i.e. via the
struct pid * manipulations.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 398bbe68 11-Dec-2007 Bob Peterson <rpeterso@redhat.com>

[GFS2] Reorganize function gfs2_glmutex_lock

This patch optimizes the function gfs2_glmutex_lock.
The basic theory is: Why bother initializing a holder, setting up
wait bits and then waiting on them, if you know the glock can be
yours. So the holder stuff is placed inside the if checking if the
glock is locked. This one needs careful scrutiny because changing
anything to do with locking should strike terror into one's heart.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1a2781cf 16-Nov-2007 Fabio Massimo Di Nitto <fabbione@ubuntu.com>

[GFS2] Fix runtime issue with UP kernels

The issue is indeed UP vs SMP and it is totally random.

spin_is_locked() is a bad assertion because there is no correct answer on UP.
on UP spin_is_locked() has to return either one value or another, always.

This means that in my setup I am lucky enough to trigger the issue and your you
are lucky enough not to.

the patch in attachment removes the bogus calls to BUG_ON and according to David
(in CC and thanks for the long explanation on the problem) we can rely upon
things like lockdep to find problem that might be trying to catch.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2bcd610d 08-Nov-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Don't add glocks to the journal

The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e589665e 02-Nov-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove flags no longer required

The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.

Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3042a2cc 02-Nov-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reorder writeback for glock sync

Previously we were doing (write data, wait for data, write metadata, wait
for metadata). After this patch we so (write metadata, write data, wait for
data, wait for metadata) which should be more efficient.

Also I noticed that the drop_bh and xmote_bh functions were almost
identical. In fact the only difference was a single test, and that
test is such that in the drop_bh case, it would always evaluate to
the correct result. As such we can use the xmote_bh functions in
all the places where we were using the drop_bh function and remove
the drop_bh functions.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c2932e03 01-Nov-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove "reclaim limit"

This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cc7e79b1 04-Oct-2007 Wendy Cheng <wcheng@redhat.com>

[GFS2] Handle multiple glock demote requests

Fix a race condition where multiple glock demote requests are sent to
a node back-to-back. This patch does a check inside handle_callback()
to see whether a demote request is in progress. If true, it sets a flag
to make sure run_queue() will loop again to handle the new request,
instead of erronously setting gl_demote_state to a different state.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 49e61f2e 13-Sep-2007 Wendy Cheng <wcheng@redhat.com>

[GFS2] Move inode deletion out of blocking_cb

Move inode deletion code out of blocking_cb handle_callback route to
avoid racy conditions that end up blocking lock_dlm1 thread. Fix
bugzilla 286821.

Signed-off-by: Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b4c20166 13-Sep-2007 Abhijith Das <adas@redhat.com>

[GFS2] flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118!

This patch adds a new flag to the gfs2_holder structure GL_FLOCK.
It is set on holders of glocks representing flocks. This flag is
checked in add_to_queue() and a process is permitted to queue more
than one holder onto a glock if it is set. This solves the issue
of a process not being able to do multiple flocks on the same file.
Through a single descriptor, a process can now promote and demote
flocks. Through multiple descriptors a process can now queue
multiple flocks on the same file. There's still the problem of
a process deadlocking itself (because gfs2 blocking locks are not
interruptible) by queueing incompatible deadlock.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c4f68a13 23-Aug-2007 Benjamin Marzinski <bmarzins@redhat.com>

[GFS2] delay glock demote for a minimum hold time

When a lot of IO, with some distributed mmap IO, is run on a GFS2 filesystem in
a cluster, it will deadlock. The reason is that do_no_page() will repeatedly
call gfs2_sharewrite_nopage(), because each node keeps giving up the glock
too early, and is forced to call unmap_mapping_range(). This bumps the
mapping->truncate_count sequence count, forcing do_no_page() to retry. This
patch institutes a minimum glock hold time a tenth a second. This insures
that even in heavy contention cases, the node has enough time to get some
useful work done before it gives up the glock.

A second issue is that when gfs2_glock_dq() is called from within a page fault
to demote a lock, and the associated page needs to be written out, it will
try to acqire a lock on it, but it has already been locked at a higher level.
This patch puts makes gfs2_glock_dq() use the work queue as well, to avoid this
issue. This is the same patch as Steve Whitehouse originally proposed to fix
this issue, execpt that gfs2_glock_dq() now grabs a reference to the glock
before it queues up the work on it.

Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a947e033 21-Aug-2007 Abhijith Das <adas@redhat.com>

[GFS2] Wendy's dump lockname in hex & fix glock dump

With this patch, gfs2 glockdump through the debugfs filesystem will only
dump glocks for the specified filesystem instead of all glocks. Also, to
aid debugging, the glock number is dumped in hex instead of decimal.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Abhijith Das <adas@redhat.com>


# 8fbbfd21 01-Aug-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reduce number of gfs2_scand processes to one

We only need a single gfs2_scand process rather than the one
per filesystem which we had previously. As a result the parameter
determining the frequency of gfs2_scand runs becomes a module
parameter rather than a mount parameter as it was before.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4ef29002 31-Jul-2007 Denis Cheng <crquan@gmail.com>

[GFS2] mark struct *_operations const

these struct *_operations are all method tables, thus should be const.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7b08fc62 24-Jul-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix an oops in glock dumping

This fixes an oops which was occurring during glock dumping due to the
seq file code not taking a reference to the glock. Also this fixes a
memory leak which occurred in certain cases, in turn preventing the
filesystem from unmounting.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# aa0481e5 21-Jul-2007 Jesper Juhl <jesper.juhl@gmail.com>

[GFS2] Clean up duplicate includes in fs/gfs2/

This patch cleans up duplicate includes in
fs/gfs2/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 26caee5b 23-Jul-2007 Josef Whiter <jwhiter@redhat.com>

[GFS2] Fix calculation of demote state

If a glock is in the exclusive state and a request for demote to
deferred has been received, then further requests for demote to
shared are being ignored. This patch fixes that by ensuring that
we demote to unlocked in that case.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 87124e58 23-Jul-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix two races relating to glock callbacks

One of the races relates to referencing a variable while not holding
its protecting spinlock. The patch simply moves the test inside the
spin lock. The other races occurs when a demote to unlocked request
occurs during the time a demote to shared request is already running.
This of course only happens in the case that the lock was in the
exclusive mode to start with. The patch adds a check to see if another
demote request has occurred in the mean time and if it has, then it
performs a second demote.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# eaf5bd3c 19-Jun-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Simplify multiple glock aquisition

There is a bug in the code which acquires multiple glocks where if the
initial out-of-order attempt fails part way though we can land up trying
to acquire the wrong number of glocks. This is part of the fix for red
hat bz #239737. The other part of the bz doesn't apply to upstream
kernels since it was fixed by:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6

Since the out-of-order code doesn't appear to add anything to the
performance of GFS2, this patch just removed it rather than trying to
fix it. It should be much easier to see whats going on here now. In
addition, we don't allocate any memory unless we are using a lot of
glocks (which is a relatively uncommon case).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d93cfa98 11-Jun-2007 Abhijith Das <adas@redhat.com>

[GFS2] Fix deallocation issues

There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.

The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.

This patch was written jointly by Abhijith Das and Steven Whitehouse.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dbb7cae2 15-May-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Clean up inode number handling

This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cd81a4ba 13-May-2007 Robert Peterson <rpeterso@redhat.com>

[GFS2] Addendum patch 2 for gfs2_grow

This addendum patch 2 corrects three things:

1. It fixes a stupid mistake in the previous addendum that broke gfs2.
Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html
2. It fixes a problem that Dave Teigland pointed out regarding the
external declarations in ops_address.h being in the wrong place.
3. It recasts a couple more %llu printks to (unsigned long long)
as requested by Steve Whitehouse.

I would have loved to put this all in one revised patch, but there was
a rush to get some patches for RHEL5. Therefore, the previous patches
were applied to the git tree "as is" and therefore, I'm posting another
addendum. Sorry.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 37fde8ca 01-May-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Uncomment sprintf_symbol calling code

Now that the patch from -mm has gone upstream, we can uncomment the code
in GFS2 which uses sprintf_symbol.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert Peterson <rpeterso@redhat.com>


# 5f882096 18-Apr-2007 Robert Peterson <rpeterso@redhat.com>

[GFS2] lockdump improvements

The patch below consists of the following changes (in code order):

1. I fixed a minor compiler warning regarding the printing of
a kernel symbol address.
2. I implemented a suggestion from Dave Teigland that moves
the debugfs information for gfs2 into a subdirectory so
we can easily expand our use of debugfs in the future.
The current code keeps the glock information in:
/debug/gfs2/<fs>
With the patch, the new code keeps the glock information in:
/debug/gfs2/<fs>/glock
That will allow us to create more debugfs files in the future.
3. This fixes a bug whereby a failed mount attempt causes the
debugfs file to not be deleted. Failed mount attempts should
always clean up after themselves, including deleting the
debugfs file and/or directory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7a0079d9 17-Apr-2007 Robert Peterson <rpeterso@redhat.com>

[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)

This is for Bugzilla Bug 236008: Kernel gpf doing cat /debugfs/gfs2/xxx
(lock dump) seen at the "gfs2 summit". This also fixes the bug that caused
garbage to be printed by the "initialized at" field. I apologize for the
kludge, but that code will all be ripped out anyway when the official
sprint_symbol function becomes available in the Linux kernel. I also
changed some formatting so that spaces are replaced by proper tabs.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 04b933f2 23-Mar-2007 Robert Peterson <rpeterso@redhat.com>

[GFS2] Red Hat bz 228540: owner references

In Testing the previously posted and accepted patch for
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=228540
I uncovered some gfs2 badness. It turns out that the current
gfs2 code saves off a process pointer when glocks is taken
in both the glock and glock holder structures. Those
structures will persist in memory long after the process has
ended; pointers to poisoned memory.

This problem isn't caused by the 228540 fix; the new capability
introduced by the fix just uncovered the problem.

I wrote this patch that avoids saving process pointers
and instead saves off the process pid. Rather than
referencing the bad pointers, it now does process lookups.
There is special code that makes the output nicer for
printing holder information for processes that have ended.

This patch also adds a stub for the new "sprint_symbol"
function that exists in Andrew Morton's -mm patch set, but
won't go into the base kernel until 2.6.22, since it adds
functionality but doesn't fix a bug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 420d2a10 18-Mar-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix a bug on i386 due to evaluation order

Since gcc didn't evaluate the last two terms of the expression in
glock.c:1881 as a constant expression, it resulted in an error on
i386 due to the lack of a 64bit divide instruction. This adds some
brackets to fix the problem.

This was reported by Andrew Morton.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>


# 3b8249f6 16-Mar-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix bz 224480 and cleanup glock demotion code

This patch prevents the printing of a warning message in cases where
the fs is functioning normally by handing off responsibility for
unlinked, but still open inodes, to another node for eventual deallocation.
Also, there is now an improved system for ensuring that such requests
to other nodes do not get lost. The callback on the iopen lock is
only ever called when i_nlink == 0 and when a node is unable to deallocate
it due to it still being in use on another node. When a node receives
the callback therefore, it knows that i_nlink must be zero, so we mark
it as such (in gfs2_drop_inode) in order that it will then attempt
deallocation of the inode itself.

As an additional benefit, queuing a demote request no longer requires
a memory allocation. This simplifies the code for dealing with gfs2_holders
as it removes one special case.

There are two new fields in struct gfs2_glock. gl_demote_state is the
state which the remote node has requested and gl_demote_time is the
time when the request came in. Both fields are only valid when the
GLF_DEMOTE flag is set in gl_flags.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5c7342d8 07-Mar-2007 Josef Whiter <jwhiter@redhat.com>

[GFS2] fix bz 231369, gfs2 will oops if you specify an invalid mount option

If you specify an invalid mount option when trying to mount a gfs2 filesystem,
gfs2 will oops. The attached patch resolves this problem.

Signed-off-by: Josef Whiter <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7c52b166 16-Mar-2007 Robert Peterson <rpeterso@redhat.com>

[GFS2] Add gfs2_tool lockdump support to gfs2 (bz 228540)

The attached patch resolves bz 228540. This adds the capability
for gfs2 to dump gfs2 locks through the debugfs file system.
This used to exist in gfs1 as "gfs_tool lockdump" but it's missing from
gfs2 because all the ioctls were stripped out. Please see the bugzilla
for more history about the fix. This patch is also attached to the bugzilla
record.

The patch is against Steve Whitehouse's latest nmw git tree kernel
(2.6.21-rc1) and has been tested on system trin-10.

Signed-off-by: Robert Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 95d97b7d 06-Mar-2007 Andrew Morton <akpm@linux-foundation.org>

[GFS2] build fix

fs/gfs2/glock.c:2198: error: 'THIS_MODULE' undeclared here (not in a function)

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 631c42e1 01-Mar-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] go_drop_bh is never used, so remove it

The ->go_drop_bh function is never used, so this removes it and the single
caller,

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 61be084e 29-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Put back semaphore to avoid umount problem

Dave Teigland fixed this bug a while back, but I managed to mistakenly
remove the semaphore during later development. It is required to avoid
the list of inodes changing during an invalidate_inodes call. I have
made it an rwsem since the read side will be taken frequently during
normal filesystem operation. The write site will only happen during
umount of the file system.

Also the bug only triggers when using the DLM lock manager and only then
under certain conditions as its timing related.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Teigland <teigland@redhat.com>


# d043e190 23-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix typo in glock.c

This is a one letter typo fix in glock.c, spotted by Rob Kenna.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 90101c31 23-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Compile fix for glock.c

This one liner got missed from the previous patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 12132933 22-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove queue_empty() function

This function is not longer required since we do not do recursive
locking in the glock layer. As a result all its callers can be
replaceed with list_empty() calls.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b5d32bea 21-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up glops calls

This patch doesn't make any changes to the ordering of the various
operations related to glocking, but it does tidy up the calls to the
glops.c functions to make the structure more obvious.

The two functions: gfs2_glock_xmote_th() and gfs2_glock_drop_th() can be
made static within glock.c since they are called by every set of glock
operations. The xmote_th and drop_th glock operations are then made
conditional upon those two routines existing and called from the
previously mentioned functions in glock.c respectively.

Also it can be seen that the go_sync operation isn't needed since it can
easily be replaced by calls to xmote_bh and drop_bh respectively. This
results in no longer (confusingly) calling back into routines in glock.c
from glops.c and also reducing the glock operations by one member.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1c0f4872 21-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove local exclusive glock mode

Here is a patch for GFS2 to remove the local exclusive flag. In
the places it was used, mutex's are always held earlier in the
call path, so it appears redundant in the LM_ST_SHARED case.

Also, the GFS2 holders were setting local exclusive in any case where
the requested lock was LM_ST_EXCLUSIVE. So the other places in the glock
code where the flag was tested have been replaced with tests for the
lock state being LM_ST_EXCLUSIVE in order to ensure the logic is the
same as before (i.e. LM_ST_EXCLUSIVE is always locally exclusive as well
as globally exclusive).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 6bd9c8c2 19-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove unused go_callback operation

This is never used, so we might as well remove it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e5dab552 18-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove the "greedy" function from glock.[ch]

The "greedy" code was an attempt to retain glocks for a minimum length
of time when they relate to mmap()ed files. The current implementation
of this feature is not, however, ideal in that it required allocating
memory in order to do this and its overly complicated.

It also misses the mark by ignoring the other I/O operations which are
just as likely to suffer from the same problem. So the plan is to remove
this now and then add the functionality back as part of the glock state
machine at a later date (and thus take into account all the possible
users of this feature)

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fee852e3 17-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Shrink gfs2_inode memory by half

Here is something I spotted (while looking for something entirely
different) the other day.

Rather than using a completion in each and every struct gfs2_holder,
this removes it in favour of hashed wait queues, thus saving a
considerable amount of memory both on the stack (where a number of
gfs2_holder structures are allocated) and in particular in the
gfs2_inode which has 8 gfs2_holder structures embedded within it.

As a result on x86_64 the gfs2_inode shrinks from 2488 bytes to
1912 bytes, a saving of 576 bytes per inode (no thats not a typo!).
In actual practice we get a much better result than that since
now that a gfs2_inode is under the 2048 byte barrier, we get two
per 4k slab page effectively halving the amount of memory required
to store gfs2_inodes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3699e3a4 17-Jan-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Clean up/speed up readdir

This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0ac23069 28-Nov-2006 Randy Dunlap <randy.dunlap@oracle.com>

[GFS2] lock function parameter

Fix function parameter typing:
fs/gfs2/glock.c:100: warning: function declaration isn't a prototype

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b004157a 23-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix journal flush problem

This fixes a bug which resulted in poor performance due to flushing
the journal too often. The code path in question was via the inode_go_sync()
function in glops.c. The solution is not to flush the journal immediately
when inodes are ejected from memory, but batch up the work for glockd to
deal with later on. This means that glocks may now live on beyond the end of
the lifetime of their inodes (but not very much longer in the normal case).

Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in
calculation of the number of free journal blocks.

The gfs2_logd process has been altered to be more responsive to the journal
filling up. We now wake it up when the number of uncommitted journal blocks
has reached the threshold level rather than trying to flush directly at the
end of each transaction. This again means doing fewer, but larger, log
flushes in general.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1a14d3a6 20-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Simplify glops functions

The go_sync callback took two flags, but one of them was set on every
call, so this patch removes once of the flags and makes the previously
conditional operations (on this flag), unconditional.

The go_inval callback took three flags, each of which was set on every
call to it. This patch removes the flags and makes the operations
unconditional, which makes the logic rather more obvious.

Two now unused flags are also removed from incore.h.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ab923031 15-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix memory allocation in glock.c

Change from GFP_KERNEL to GFP_NOFS as this was causing a
slow down when trying to push inodes from cache.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c594d886 08-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove unused GL_DUMP flag

There is no way to set the GL_DUMP flag, and in any case the
same thing can be done with systemtap if required for debugging,
so this removes it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b60623c2 31-Oct-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Shrink gfs2_inode (3) - di_mode

This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c4028958 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 907b9bce 25-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2/DLM] Fix trailing whitespace

As per Andrew Morton's request, removed trailing whitespace.

Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7d308590 18-Sep-2006 Fabio Massimo Di Nitto <fabbione@ubuntu.com>

[GFS2] Export lm_interface to kernel headers


lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.

Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.

Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a8336344 14-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix glock hash clearing

A one liner bug fix to prevent the return value being
wrong when more than one superblock is mounted.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 16feb9fe 13-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use atomic_t rather than kref in glock.c

Use atomic_t as the ref count in glocks rather than a kref.
This is another step towards using RCU for the glock hash.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b6397893 12-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use hlist for glock hash chains

This results in smaller list heads, so that we can have more chains
in the same amount of memory (twice as many). I've multiplied the
size of the table by four though - this is because we are saving
memory by not having one lock per chain any more. So we land up
using about the same amount of memory for the hash table as we
did before I started these changes, the difference being that we
now have four times as many hash chains.

The reason that I say "about the same amount of memory" is that the
actual amount now depends upon the NR_CPUS and some of the config
variables, so that its not exact and in some cases we do use more
memory. Eventually we might want to scale the hash table size
according to the size of physical ram as measured on module load.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 24264434 11-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Rewrite of examine_bucket()

The existing implementation of this function in glock.c was not
very efficient as it relied upon keeping a cursor element upon the
hash chain in question and moving it along. This new version improves
upon this by using the current element as a cursor. This is possible
since we only look at the "next" element in the list after we've
taken the read_lock() subsequent to calling the examiner function.
Obviously we have to eventually drop the ref count that we are then
left with and we cannot do that while holding the read_lock, so we
do that next time we drop the lock. That means either just before
we examine another glock, or when the loop has terminated.

The new implementation has several advantages: it uses only a
read_lock() rather than a write_lock(), so it can run simnultaneously
with other code, it doesn't need a "plug" element, so that it removes
a test not only from this list iterator, but from all the other glock
list iterators too. So it makes things faster and smaller.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 94610610 09-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove unused function from glock.c

The callback for iopen locks is unused, so this removes
it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a5e08a9e 09-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add consts to glock sorting function

Add back the consts which were casted away in the glock sorting
function. Also add early exit code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 087efdd3 09-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Make glock hash locks proportional to NR_CPUS

Make the number of locks used for hash chains in glock.c
proportional to NR_CPUS. Also move constants for the number
of hash chains into glock.c from incore.h since they are
not used outside of glock.c.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 37b2fa6a 08-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Move rwlocks in glock.c into their own array

This splits the rwlocks guarding the hash chains of the glock hash
table into their own array. This will reduce memory usage in some
cases due to better alignment, although the real reason for doing it
is to allow the two tables to be different sizes in future (i.e.
the locks will be sized proportionally with the max number of CPUs
and the hash chains sized proportinally with the size of physical memory)

In order to allow this, the gl_bucket member of struct gfs2_glock has
now become gl_hash, so we record the hash rather than a pointer to the
bucket itself.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9b47c11d 08-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use void * instead of typedef for locking module interface

As requested by Jan Engelhardt, this removes the typedefs in the
locking module interface and replaces them with void *. Also
since we are changing the interface, I've added a few consts
as well.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1c089c32 07-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove one typedef

This removes one of the typedefs from the locking interface. It
is replaced by a forward declaration of the gfs2 superblock. The
other two are not so easy to solve since in their case, they
can refer to one of two possible structures.

Cc: David Teigland <teigland@redhat.com>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 85d1da67 07-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Move glock hash table out of superblock

There are several reasons why we want to do this:
- Firstly its large and thus we'll scale better with multiple
GFS2 fs mounted at the same time
- Secondly its easier to scale its size as required (thats a plan
for later patches)
- Thirdly, we can use kzalloc rather than vmalloc when allocating
the superblock (its now only 4888 bytes)
- Fourth its all part of my plan to eventually be able to use RCU
with the glock hash.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b8547856 07-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add gfs2 superblock to glock hash function

This is another patch preparing for sharing of the glock hash
table between different gfs2 mounts.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cd915493 03-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Change all types to uX style

This makes all fixed size types have consistent names.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a91ea69f 03-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Align all labels against LH side

This makes everything consistent.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 50299965 04-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up locking code

As per Jan Engelhardt's second email, this removes some unused code,
and fixes up indenting in various places.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e9fc2aa0 01-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright, tidy up incore.h

As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.

The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].

I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 899be4d3 29-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add superblock into key for glock lookups

This adds the superblock as a key for glock lookups. Since the glocks
are already stored in a per-superblock table, this has no effect at
the moment. Later on this will change though.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d6a53727 30-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use const on glock lookup key

Use const for the glock name which is being used as a lookup key
in the glock hash table.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ec45d9f5 30-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use slab properly with glocks

We can take advantage of the slab allocator to ensure that all the list
heads and the spinlock (plus one or two other fields) are initialised
by slab to speed up allocation of glocks.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5e2b0613 30-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove unused code from glock layer

Remove the unused sync feature from glocks. This is currently done by
calling the required functions to sync pages/blocks directly so this
code isn't needed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8fb4b536 30-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Make glock operations const

For all the usual reasons of enforcing correctness and potentially
reducing code size, this patch makes the glock operations const.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 86384605 25-Aug-2006 Abhijith Das <adas@redhat.com>

[GFS2] Allow mounting of gfs2 and gfs2meta at the same time

This patch allows the simultaneous mounting of gfs2meta and gfs2
filesystems. A restriction however is that a gfs2meta fs may only be
mounted if its corresponding gfs2 filesystem is also mounted. Also, a
gfs2 filesystem cannot be unmounted before its gfs2meta filesystem.

Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a2242db0 24-Aug-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Speed up scanning of glocks

I noticed the gfs2_scand seemed to be taking a lot of CPU,
so in order to cut that down a bit, here is a patch. Firstly
the type of a glock is a constant during its lifetime, so that
its possible to check this without needing locking. I've moved
the (common) case of testing for an inode glock outside of
the glmutex lock.

Also there was a mutex left over from when the glock cache was
master of the inode cache. That isn't required any more so I've
removed that too.

There is probably scope for further speed ups in the future
in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 88721877 10-Aug-2006 Russell Cattelan <cattelan@redhat.com>

[GFS2] Fix a couple of refcount leaks.

recovery.c add a brelse to deal with gfs2_replay_read_block being called
twice on the same block.

add a dput to drop the ref count on the root inode.
This was causing lingering glocks and thus causing
a mount failure to hang.

Fix a endian conversion macro that was was swizzling
16bits when it should have been swizzling 32.

Signed-off-by: Russell Cattelan <cattelan@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5dd9feaf 28-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix bug in clear_inode

We should have been waiting for lock demotion to finish in
clear_inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f45b7ddd 27-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use a bio to read the superblock

This means that we don't need to create a special inode just to contain
a struct address_space in order to read a single disk block. Instead
we read the disk block directly. Its slightly faster, and uses slightly
less memory, but the real reason for doing this is that it removes a
special case from the glock code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 29937ac6 06-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fixes to scanning of glocks (again)

This really is the correct fix this time. We just ignore all
glocks associated with inodes until the inodes are pushed
from the inode cache. At that point the glocks are queued for
reclaim, so we don't need to do it here.

Also fix one or two other minor bugs.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 627add2d 05-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Correct logic in glock scanner

Under certain circumstances the glock scanning logic would
demote locks which ought not to have been selected for
demotion.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# af18ddb8 24-Jun-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Eliminate one instance of __GFP_NOFAIL

This removes one instance of GFP_NOFAIL from the glock callback
function. It also fixes a bug where a , was used at a line end
rather than ; causing unintended results.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# feaa7bba 14-Jun-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix unlinked file handling

This patch fixes the way we have been dealing with unlinked,
but still open files. It removes all limits (other than memory
for inodes, as per every other filesystem) on numbers of these
which we can support on GFS2. It also means that (like other
fs) its the responsibility of the last process to close the file
to deallocate the storage, rather than the person who did the
unlinking. Note that with GFS2, those two events might take place
on different nodes.

Also there are a number of other changes:

o We use the Linux inode subsystem as it was intended to be
used, wrt allocating GFS2 inodes
o The Linux inode cache is now the point which we use for
local enforcement of only holding one copy of the inode in
core at once (previous to this we used the glock layer).
o We no longer use the unlinked "special" file. We just ignore it
completely. This makes unlinking more efficient.
o We now use the 4th block allocation state. The previously unused
state is used to track unlinked but still open inodes.
o gfs2_inoded is no longer needed
o Several fields are now no longer needed (and removed) from the in
core struct gfs2_inode
o Several fields are no longer needed (and removed) from the in core
superblock

There are a number of future possible optimisations and clean ups
which have been made possible by this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 382066da 24-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Casts for printing 64bit numbers

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 320dd101 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] glock debugging and inode cache changes

This adds some extra debugging to glock.c and changes
inode.c's deallocation code to call the debugging code
at a suitable moment. I'm chasing down a particular bug
to do with deallocation at the moment and the code can
go again once the bug is fixed.

Also this includes the first part of some changes to unify
the Linux struct inode and GFS2's struct gfs2_inode. This
transformation will happen in small parts over the next short
period.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3a8a9a10 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright date to 2006

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bd896801 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove semaphore.h from C files

We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fd88de56 05-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Readpages support

This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.

In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.

See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.

Signed-off-by: Steven Whitehouse


# 56409abb 28-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove some unused code

Remove some of the unused code flagged up by Adrian Bunk.

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse


# 08bc2dbc 28-Apr-2006 Adrian Bunk <bunk@stusta.de>

[GFS2] [-mm patch] fs/gfs2/: possible cleanups

This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
unimplemented:
- super.c: gfs2_do_upgrade()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 36327521 28-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reordering in deallocation to avoid recursive locking

Despite my earlier careful search, there was a recursive lock left
in the deallocation code. This removes it. It also should speed up
deallocation be reducing the number of locking operations which take
place by using two "try lock" operations on the two locks involved in
inode deallocation which allows us to grab the locks out of order
(compared with NFS which grabs the inode lock first and the iopen
lock later). It is ok for us to fail while doing this since if it
does fail it means that someone else is still using the inode and
thus it wouldn't be possible to deallocate anyway.

This fixes the bug reported to me by Rob Kenna.

Cc: Rob Kenna <rkenna@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e7f5c01c 27-Apr-2006 David Teigland <teigland@redhat.com>

[GFS2] Remove redundant casts to/from void

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 579b78a4 26-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove GL_NEVER_RECURSE flag

There is no point in keeping this flag since recursion is not
now allowed for any glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5965b1f4 26-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Don't do recursive locking in glock layer

This patch changes the last user of recursive locking so that
it no longer needs this feature and removes it from the glock
layer. This makes the glock code a lot simpler and easier to
understand. Its also a prerequsite to adding support for the
AOP_TRUNCATED_PAGE return code (or at least it is if you don't
want your brain to melt in the process)

I've left in a couple of checks just in case there is some place
else in the code which is still using this feature that I didn't
spot yet, but they can probably be removed long term.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 190562bd 20-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix a bug: scheduling under a spinlock

At some stage, a mutex was added to gfs2_glock_put() without
checking all its call sites. Two of them were called from
under a spinlock causing random delays at various points and
crashes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fe1bdedc 18-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use vmalloc() in dir code

When allocating memory to sort directory entries, use vmalloc()
rather than kmalloc() since for larger directories, the required
size can easily be graeter than the 128k maximum of kmalloc().

Also adding the first steps towards getting the AOP_TRUNCATED_PAGE
return code get in the glock code by flagging all places where we
request a glock and we are holding a page lock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d0dc80db 29-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update debugging code

Update the debugging code in trans.c and at the same time improve
the debugging code for gfs2_holders. The new code should be pretty
fast during the normal case and provide just as much information
in case of errors (or more).

One small function from glock.c has moved to glock.h as a static inline so
that its return address won't get in the way of the debugging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5c676f6d 27-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Macros removal in gfs2.h

As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d92a8d48 27-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Audit printk and kmalloc

All printk calls now have KERN_ set where required and a couple of
kmalloc(), memset(.., 0, ...) calls changed to kzalloc().

This is in response to comments from:
Pekka Enberg <penberg@cs.helsinki.fi> and
Eric Sesterhenn <snakebyte@gmx.de>

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d95cb943 23-Feb-2006 David Teigland <teigland@redhat.com>

[GFS2] Patch to remove stats counters from GFS2 (II)

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 6a6b3d01 23-Feb-2006 David Teigland <teigland@redhat.com>

[GFS2] Patch to remove stats gathering from GFS2

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f55ab26a 20-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use mutices rather than semaphores

As well as a number of minor bug fixes, this patch changes GFS
to use mutices rather than semaphores. This results in better
information in case there are any locking problems.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b3b94faa 16-Jan-2006 David Teigland <teigland@redhat.com>

[GFS2] The core of GFS2

This patch contains all the core files for GFS2.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>