History log of /linux-master/fs/gfs2/trans.c
Revision Date Author Comments
# 4d927b03 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn

This function checks whether the filesystem has been been marked to be
withdrawn eventually or has been withdrawn already. Rename this
function to avoid confusing code like checking for gfs2_withdrawing()
when gfs2_withdrawn() has already returned true.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 015af1af 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Mark withdraws as unlikely

Mark the gfs2_withdrawn(), gfs2_withdrawing(), and
gfs2_withdraw_in_prog() inline functions as likely to return %false.
This allows to get rid of likely() and unlikely() annotations at the
call sites of those functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2cbd8064 04-Aug-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix freeze consistency check in gfs2_trans_add_meta

Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make
sure that no buffers are added to a transaction while the filesystem is
frozen. With the recent freeze/thaw rework, the SDF_FROZEN flag is
cleared after thaw_super() is called, which is sufficient for
serializing freeze/thaw.

However, other filesystem operations started after thaw_super() may now
be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared,
which will trigger the SDF_FROZEN check in gfs2_trans_add_meta(). Fix
that by checking the s_writers.frozen state instead.

In addition, make sure not to call gfs2_assert_withdraw() with the
sd_log_lock spin lock held. Check for a withdrawn filesystem before
checking for a frozen filesystem, and don't pin/add buffers to the
current transaction in case of a failure in either case.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 5432af15 18-Aug-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Replace sd_freeze_state with SDF_FROZEN flag

Replace sd_freeze_state with a new SDF_FROZEN flag.

There no longer is a need for indicating that a freeze is in progress
(SDF_STARTING_FREEZE); we are now protecting the critical sections with
the sd_freeze_mutex.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1a5a2cfd 25-Feb-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: fix use-after-free in trans_drain

This patch adds code to function trans_drain to remove drained
bd elements from the ail lists, if queued, before freeing the bd.
If we don't remove the bd from the ail, function ail_drain will
try to reference the bd after it has been freed by trans_drain.

Thanks to Andy Price for his analysis of the problem.

Reported-by: Andy Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2129b428 17-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Per-revoke accounting in transactions

In the log, revokes are stored as a revoke descriptor (struct
gfs2_log_descriptor), followed by zero or more additional revoke blocks
(struct gfs2_meta_header). On filesystems with a blocksize of 4k, the
revoke descriptor contains up to 503 revokes, and the metadata blocks
contain up to 509 revokes each. We've so far been reserving space for
revokes in transactions in block granularity, so a lot more space than
necessary was being allocated and then released again.

This patch switches to assigning revokes to transactions individually
instead. Initially, space for the revoke descriptor is reserved and
handed out to transactions. When more revokes than that are reserved,
additional revoke blocks are added. When the log is flushed, the space
for the additional revoke blocks is released, but we keep the space for
the revoke descriptor block allocated.

Transactions may still reserve more revokes than they will actually need
in the end, but now we won't overshoot the target as much, and by only
returning the space for excess revokes at log flush time, we further
reduce the amount of contention between processes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# fe3e3976 09-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rework the log space allocation logic

The current log space allocation logic is hard to understand or extend.
The principle it that when the log is flushed, we may or may not have a
transaction active that has space allocated in the log. To deal with
that, we set aside a magical number of blocks to be used in case we
don't have an active transaction. It isn't clear that the pool will
always be big enough. In addition, we can't return unused log space at
the end of a transaction, so the number of blocks allocated must exactly
match the number of blocks used.

Simplify this as follows:
* When transactions are allocated or merged, always reserve enough
blocks to flush the transaction (err on the safe side).
* In gfs2_log_flush, return any allocated blocks that haven't been used.
* Maintain a pool of spare blocks big enough to do one log flush, as
before.
* In gfs2_log_flush, when we have no active transaction, allocate a
suitable number of blocks. For that, use the spare pool when
called from logd, and leave the pool alone otherwise. This means
that when the log is almost full, logd will still be able to do one
more log flush, which will result in more log space becoming
available.

This will make the log space allocator code easier to work with in
the future.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 297de318 06-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Use a tighter bound in gfs2_trans_begin

Use a tighter bound for the number of blocks required by transactions in
gfs2_trans_begin: in the worst case, we'll have mixed data and metadata,
so we'll need a log desciptor for each type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5ae8fff8 13-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up gfs2_log_reserve

Wake up log waiters in gfs2_log_release when log space has actually become
available. This is a much better place for the wakeup than gfs2_logd.

Check if enough log space is immeditely available before anything else. If
there isn't, use io_wait_event to wait instead of open-coding it.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c1eba1b0 12-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Move lock flush locking to gfs2_trans_{begin,end}

Move the read locking of sd_log_flush_lock from gfs2_log_reserve to
gfs2_trans_begin, and its unlocking from gfs2_log_release to
gfs2_trans_end. Use gfs2_log_release in two places in which it was open
coded before.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f3708fb5 13-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Get rid of sd_reserving_log

This counter and the associated wait queue are only used so that
gfs2_make_fs_ro can efficiently wait for all pending log space
allocations to fail after setting the filesystem to read-only. This
comes at the cost of waking up that wait queue very frequently.

Instead, when gfs2_log_reserve fails because the filesystem has become
read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file
system read-only and then wait until all the log space has been
released. Give up and report the problem after a while. With that,
sd_reserving_log and sd_reserving_log_wait can be removed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c968f578 29-Jan-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up on-stack transactions

Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only
needs to be set in the exceptional case of on-stack transactions. Split off
__gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded
version in gfs2_ail_empty_gl.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 15e20a30 03-Feb-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Use sb_start_intwrite in gfs2_ail_empty_gl

Commit 2e60d7683c8d ("GFS2: update freeze code to use freeze/thaw_super
on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite
protection for the on-stack transactions in gfs2_ail_empty_gl with no
explanation. I can't think of a valid reason for doing that, so revert
that change. This simplifies the next commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 625a8edd 06-Dec-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor debugging improvement

Split the assert in gfs2_trans_end into two parts.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 462582b9 21-Aug-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: add some much needed cleanup for log flushes that fail

When a log flush fails due to io errors, it signals the failure but does
not clean up after itself very well. This is because buffers are added to
the transaction tr_buf and tr_databuf queue, but the io error causes
gfs2_log_flush to bypass the "after_commit" functions responsible for
dequeueing the bd elements. If the bd elements are added to the ail list
before the error, function ail_drain takes care of dequeueing them.
But if they haven't gotten that far, the elements are forgotten and
make the transactions unable to be freed.

This patch introduces new function trans_drain which drains the bd
elements from the transaction so they can be freed properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b0be23b2 23-Jul-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: print details on transactions that aren't properly ended

If function gfs2_trans_begin is called with another transaction active
it BUGs out, but it doesn't give any details about the duplicate.
This patch moves function gfs2_print_trans and calls it when this
situation arises for better debugging.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b839dada 17-Apr-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: new slab for transactions

This patch adds a new slab for gfs2 transactions. That allows us to
reduce kernel memory fragmentation, have better organization of data
for analysis of vmcore dumps. A new centralized function is added to
free the slab objects, and it exposes use-after-free by giving
warnings if a transaction is freed while it still has bd elements
attached to its buffers or ail lists. We make sure to initialize
those transaction ail lists so we can check their integrity when freeing.

At a later time, we should add a slab initialization function to
make it more efficient, but for this initial patch I wanted to
minimize the impact.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# cbcc89b6 05-Jun-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: initialize transaction tr_ailX_lists earlier

Since transactions may be freed shortly after they're created, before
a log_flush occurs, we need to initialize their ail1 and ail2 lists
earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush().
This moves the initialization to the point when the transaction is first
created.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2ca0c2fb 13-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: drain the ail2 list after io errors

Before this patch, gfs2_logd continually tried to flush its journal
log, after the file system is withdrawn. We don't want to write anything
to the journal, lest we add corruption. Best course of action is to
drain the ail1 into the ail2 list (via gfs2_ail1_empty) then drain the
ail2 list with a new function, ail2_drain.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# a31b4ec5 20-Jan-2020 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: eliminate tr_num_revoke_rm"

This reverts commit e955537e3262de8e56f070b13817f525f472fa00.

Before patch e955537e32, tr_num_revoke tracked the number of revokes
added to the transaction, and tr_num_revoke_rm tracked how many
revokes were removed. But since revokes are queued off the sdp
(superblock) pointer, some transactions could remove more revokes
than they added. (e.g. revokes added by a different process).
Commit e955537e32 eliminated transaction variable tr_num_revoke_rm,
but in order to do so, it changed the accounting to always use
tr_num_revoke for its math. Since you can remove more revokes than
you add, tr_num_revoke could now become a negative value.
This negative value broke the assert in function gfs2_trans_end:

if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) &&
(tr->tr_num_revoke <=3D tr->tr_revokes)))

One way to fix this is to simply remove the tr_num_revoke clause
from the assert and allow the value to become negative. Andreas
didn't like that idea, so instead, we decided to revert e955537e32.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2e9eeaa1 13-Dec-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: eliminate ssize parameter from gfs2_struct2blk

Every caller of function gfs2_struct2blk specified sizeof(u64).

This patch eliminates the unnecessary parameter and replaces the
size calculation with a new superblock variable that is computed
to be the maximum number of block pointers we can fit inside a
log descriptor, as is done for pointers per dinode and indirect
block.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# fe5e7ba1 14-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: fix glock reference problem in gfs2_trans_remove_revoke

Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock
after it had been freed. To do that, it temporarily added a new glock
reference by calling gfs2_glock_hold in function gfs2_add_revoke.
However, if the bd element was removed by gfs2_trans_remove_revoke, it
failed to drop the additional reference.

This patch adds logic to gfs2_trans_remove_revoke to properly drop the
additional glock reference.

Fixes: 9287c6452d2b ("gfs2: Fix occasional glock use-after-free")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e955537e 26-Mar-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: eliminate tr_num_revoke_rm

For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed in their own list, linked from the superblock.
So it's entirely unnecessary to keep separate per-transaction counts
for revokes added and removed. A single count will do the same job.
Therefore, this patch combines the transaction revokes into a single
count.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336d0e6 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# fbb27873 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}

Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
such thing as an "unrevoke" object; all this function does is remove
existing revoke objects plus some bookkeeping.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a5b1d3fc 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename sd_log_le_{revoke,ordered}

Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to
sd_log_ordered: not sure what le stands for here, but it doesn't add
clarity, and if it stands for list entry, it's actually confusing as
those are both list heads but not list entries.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# cbbe76c8 16-Nov-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Remove vestigial bd_ops

Field bd_ops was set but never used, so I removed it, and all
code supporting it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e54c78a2 03-Oct-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Use fs_* functions instead of pr_* function where we can

Before this patch, various errors and messages were reported using
the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does
not tell you which gfs2 mount had the problem, which is often vital
to debugging. This patch changes the calls from pr_* to fs_* in
most of the messages so that the file system id is printed along
with the message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 845802b1 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove ordered write mode handling from gfs2_trans_add_data

In journaled data mode, we need to add each buffer head to the current
transaction. In ordered write mode, we only need to add the inode to
the ordered inode list. So far, both cases are handled in
gfs2_trans_add_data. This makes the code look misleading and is
inefficient for small block sizes as well. Handle both cases separately
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 805c0907 08-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Log the reason for log flushes in every log header

This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# c1696fb8 16-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Introduce new gfs2_log_header_v2

This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible). Some of these are used for
debug purposes so we can backtrack when problems occur. Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# da5eb9cd 12-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove pointless BUG_ON

The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1751e8a6 27-Nov-2017 Linus Torvalds <torvalds@linux-foundation.org>

Rename superblock flags (MS_xyz -> SB_xyz)

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"

SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 61d6899a 04-Oct-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix a harmless typo

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# aacee720 30-Jan-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Reduce contention on gfs2_log_lock

This patch modifies functions gfs2_trans_add_meta and _data so that
they check whether the buffer_head is already in a transaction,
and if so, avoid taking the gfs2_log_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 192738b7 24-Jan-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Inline function meta_lo_add

This patch simply combines function meta_lo_add with its only
caller, trans_add_meta. This makes the code easier to read and
will make it easier to reduce contention on gfs2_log_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 9862ca05 24-Jan-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Switch tr_touched to flag in transaction

This patch eliminates the int variable tr_touched in favor of a
new flag in the transaction. This is a step toward reducing contention
on the gfs2_log_lock spin_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 491e94f7 01-Oct-2015 Bob Peterson <rpeterso@redhat.com>

gfs2: Add missing else in trans_add_meta/data

This patch fixes a timing window that causes a segfault.
The problem is that bd can remain NULL throughout the function
and then reference that NULL pointer if the bh->b_private starts
out NULL, then someone sets it to non-NULL inside the locking.
In that case, bd still needs to be set.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 15562c43 16-Mar-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Move glock superblock pointer to field gl_name

What uniquely identifies a glock in the glock hash table is not
gl_name, but gl_name and its superblock pointer. This patch makes
the gl_name field correspond to a unique glock identifier. That will
allow us to simplify hashing with a future patch, since the hash
algorithm can then take the gl_name and hash its components in one
operation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 2e60d768 13-Nov-2014 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: update freeze code to use freeze/thaw_super on all nodes

The current gfs2 freezing code is considerably more complicated than it
should be because it doesn't use the vfs freezing code on any node except
the one that begins the freeze. This is because it needs to acquire a
cluster glock before calling the vfs code to prevent a deadlock, and
without the new freeze_super and thaw_super hooks, that was impossible. To
deal with the issue, gfs2 had to do some hacky locking tricks to make sure
that a frozen node couldn't be holding on a lock it needed to do the
unfreeze ioctl.

This patch makes use of the new hooks to simply the gfs2 locking code. Now,
all the nodes in the cluster freeze and thaw in exactly the same way. Every
node in the cluster caches the freeze glock in the shared state. The new
freeze_super hook allows the freezing node to grab this freeze glock in
the exclusive state without first calling the vfs freeze_super function.
All the nodes in the cluster see this lock change, and call the vfs
freeze_super function. The vfs locking code guarantees that the nodes can't
get stuck holding the glocks necessary to unfreeze the system. To
unfreeze, the freezing node uses the new thaw_super hook to drop the freeze
glock. Again, all the nodes notice this, reacquire the glock in shared mode
and call the vfs thaw_super function.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d29c0afe 03-Oct-2014 Fabian Frederick <fabf@skynet.be>

GFS2: use _RET_IP_ instead of (unsigned long)__builtin_return_address(0)

use macro definition

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 24972557 01-May-2014 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: remove transaction glock

GFS2 has a transaction glock, which must be grabbed for every
transaction, whose purpose is to deal with freezing the filesystem.
Aside from this involving a large amount of locking, it is very easy to
make the current fsfreeze code hang on unfreezing.

This patch rewrites how gfs2 handles freezing the filesystem. The
transaction glock is removed. In it's place is a freeze glock, which is
cached (but not held) in a shared state by every node in the cluster
when the filesystem is mounted. This lock only needs to be grabbed on
freezing, and actions which need to be safe from freezing, like
recovery.

When a node wants to freeze the filesystem, it grabs this glock
exclusively. When the freeze glock state changes on the nodes (either
from shared to unlocked, or shared to exclusive), the filesystem does a
special log flush. gfs2_log_flush() does all the work for flushing out
the and shutting down the incore log, and then it tries to grab the
freeze glock in a shared state again. Since the filesystem is stuck in
gfs2_log_flush, no new transaction can start, and nothing can be written
to disk. Unfreezing the filesytem simply involes dropping the freeze
glock, allowing gfs2_log_flush() to grab and then release the shared
lock, so it is cached for next time.

However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
shared lock on the filesystem root directory inode to check permissions.
If that glock has already been grabbed exclusively, fsfreeze will be
unable to get the shared lock and unfreeze the filesystem.

In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
on the filesystem root directory during the freeze, and hold it until it
unfreezes the filesystem. The functions which need to grab a shared
lock in order to allow the unfreeze ioctl to be issued now use the lock
grabbed by the freeze code instead.

The freeze and unfreeze code take care to make sure that this shared
lock will not be dropped while another process is using it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d77d1b58 06-Mar-2014 Joe Perches <joe@perches.com>

GFS2: Use pr_<level> more consistently

Add pr_fmt, remove embedded "GFS2: " prefixes.
This now consistently emits lower case "gfs2: " for each message.

Other miscellanea around these changes:

o Add missing newlines
o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fc554ed3 05-Mar-2014 Fabian Frederick <fabf@skynet.be>

GFS2: global conversion to pr_foo()

-All printk(KERN_foo converted to pr_foo().
-Messages updated to fit in 80 columns.
-fs_macros converted as well.
-fs_printk removed.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 022ef4fe 21-Feb-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move log buffer accounting to transaction

Now we have a master transaction into which other transactions
are merged, the accounting can be done using this master
transaction. We no longer require the superblock fields which
were being used for this function.

In addition, this allows for a clean up in calc_reserved()
making it rather easier understand. Also, by reducing the
number of variables used to track the buffers being added
and removed from the journal, a number of error checks are
now no longer required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d69a3c65 21-Feb-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move log buffer lists into transaction

Over time, we hope to be able to improve the concurrency available
in the log code. This is one small step towards that, by moving
the buffer lists from the super block, and into the transaction
structure, so that each transaction builds its own buffer lists.

At transaction commit time, the buffer lists are merged into
the currently accumulating transaction. That transaction then
is passed into the before and after commit functions at journal
flush time. Thus there should be no change in overall behaviour
yet.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 654a6d2f 21-Feb-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Reduce struct gfs2_trans in size

A couple of "int" fields were being used as boolean values
so we can make them bitfields of one bit, and put them in
what might otherwise be a hole in the structure with 64
bit alignment.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2b12eea6 19-Jun-2013 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: fix warning message

This patch fixes a warning message introduced in the recent
"GFS2: aggressively issue revokes in gfs2_log_flush" patch.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5d054964 14-Jun-2013 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: aggressively issue revokes in gfs2_log_flush

This patch looks at all the outstanding blocks in all the transactions
on the log, and moves the completed ones to the ail2 list. Then it
issues revokes for these blocks. This will hopefully speed things up
in situations where there is a lot of contention for glocks, especially
if they are acquired serially.

revoke_lo_before_commit will issue at most one log block's full of these
preemptive revokes. The amount of reserved log space that
gfs2_log_reserve() ignores has been incremented to allow for this extra
block.

This patch also consolidates the common revoke instructions into one
function, gfs2_add_revoke().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7af584d3 12-Dec-2012 Joe Perches <joe@perches.com>

gfs2: Convert print_symbol to %pSR

Use the new vsprintf extension to avoid any possible
message interleaving.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 16ca9412 05-Apr-2013 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: replace gfs2_ail structure with gfs2_trans

In order to allow transactions and log flushes to happen at the same
time, gfs2 needs to move the transaction accounting and active items
list code into the gfs2_trans structure. As a first step toward this,
this patch removes the gfs2_ail structure, and handles the active items
list in the gfs_trans structure. This keeps gfs2 from allocating an ail
structure on log flushes, and gives us a struture that can later be used
to store the transaction accounting outside of the gfs2 superblock
structure.

With this patch, at the end of a transaction, gfs2 will add the
gfs2_trans structure to the superblock if there is not one already.
This structure now has the active items fields that were previously in
gfs2_ail. This is not necessary in the case where the transaction was
simply used to add revokes, since these are never written outside of the
journal, and thus, don't need an active items list.

Also, in order to make sure that the transaction structure is not
removed while it's still in use by gfs2_trans_end, unlocking the
sd_log_flush_lock has to happen slightly later in ending the
transaction.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 45138990 28-Jan-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use ->writepages for ordered writes

Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.

The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.

The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c76c4d96 14-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge gfs2_attach_bufdata() into trans.c

The locking in gfs2_attach_bufdata() was type specific (data/meta)
which made the function rather confusing. This patch moves the core
of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()

As a result all of the locking related to adding data and metadata to
the journal is now in these two functions. This should help to clarify
what is going on, and give us some opportunities to simplify in
some cases.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 767f433f 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Copy gfs2_trans_add_bh into new data/meta functions

This patch copies the body of gfs2_trans_add_bh into the two newly
added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can
then move the .lo_add functions from lops.c into trans.c and call
them directly.

As a result of this, we no longer need to use the .lo_add functions
at all, so that is removed from the log operations structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 350a9b0a 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Split gfs2_trans_add_bh() into two

There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 75f2b879 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge revoke adding functions

This moves the lo_add function for revokes into trans.c, removing
a function call and making the code easier to read.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 96e5d1d3 06-Nov-2012 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Test bufdata with buffer locked and gfs2_log_lock held

In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 39263d5e 12-Jun-2012 Jan Kara <jack@suse.cz>

gfs2: Convert to new freezing mechanism

We update gfs2_page_mkwrite() to use new freeze protection and the transaction
code to use freeze protection while the transaction is running. That is needed
to stop iput() of unlinked file from modifying the filesystem. The rest is
handled by the generic code.

CC: cluster-devel@redhat.com
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# c0752aa7 30-Apr-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: eliminate log elements and simplify

This patch eliminates the gfs2_log_element data structure and
rolls its two components into the gfs2_bufdata. This makes the code
easier to understand and makes it easier to migrate to a rbtree
to keep the list sorted.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c50b91c4 16-Apr-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove bd_list_tr

This is another clean up in the logging code. This per-transaction
list was largely unused. Its main function was to ensure that the
number of buffers in a transaction was correct, however that counter
was only used to check the number of buffers in the bd_list_tr, plus
an assert at the end of each transaction. With the assert now changed
to use the calculated buffer counts, we can remove both bd_list_tr and
its associated counter.

This should make the code easier to understand as well as shrinking
a couple of structures.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7c9ca621 31-Aug-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme

Here is an update of Bob's original rbtree patch which, in addition, also
resolves the rather strange ref counting that was being done relating to
the bitmap blocks.

Originally we had a dual system for journaling resource groups. The metadata
blocks were journaled and also the rgrp itself was added to a list. The reason
for adding the rgrp to the list in the journal was so that the "repolish
clones" code could be run to update the free space, and potentially send any
discard requests when the log was flushed. This was done by comparing the
"cloned" bitmap with what had been written back on disk during the transaction
commit.

Due to this, there was a requirement to hang on to the rgrps' bitmap buffers
until the journal had been flushed. For that reason, there was a rather
complicated set up in the ->go_lock ->go_unlock functions for rgrps involving
both a mutex and a spinlock (the ->sd_rindex_spin) to maintain a reference
count on the buffers.

However, the journal maintains a reference count on the buffers anyway, since
they are being journaled as metadata buffers. So by moving the code which deals
with the post-journal accounting for bitmap blocks to the metadata journaling
code, we can entirely dispense with the rather strange buffer ref counting
scheme and also the requirement to journal the rgrps.

The net result of all this is that the ->sd_rindex_spin is left to do exactly
one job, and that is to look after the rbtree or rgrps.

This patch is designed to be a stepping stone towards using RCU for the rbtree
of resource groups, however the reduction in the number of uses of the
->sd_rindex_spin is likely to have benefits for multi-threaded workloads,
anyway.

The patch retains ->go_lock and ->go_unlock for rgrps, however these maybe also
be removed in future in favour of calling the functions directly where required
in the code. That will allow locking of resource groups without needing to
actually read them in - something that could be useful in speeding up statfs.

In the mean time though it is valid to dereference ->bi_bh only when the rgrp
is locked. This is basically the same rule as before, modulo the references not
being valid until the following journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: Benjamin Marzinski <bmarzins@redhat.com>


# 5e687eac 04-May-2010 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Various gfs2_logd improvements

This patch contains various tweaks to how log flushes and active item writeback
work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits
for gfs2_logd to do the log flushing. Multiple functions were rewritten to
remove the need to call gfs2_log_lock(). Instead of using one test to see if
gfs2_logd had work to do, there are now seperate tests to check if there
are two many buffers in the incore log or if there are two many items on the
active items list.

This patch is a port of a patch Steve Whitehouse wrote about a year ago, with
some minor changes. Since gfs2_ail1_start always submits all the active items,
it no longer needs to keep track of the first ai submitted, so this has been
removed. In gfs2_log_reserve(), the order of the calls to
prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has
been switched. If it called wake_up first there was a small window for a race,
where logd could run and return before gfs2_log_reserve was ready to get woken
up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve()
would be left waiting for gfs2_logd to eventualy run because it timed out.
Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times
out, and flushes the log, can now be set on mount with ar_commit.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a1c0643f 13-May-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move journal live test at transaction start

There seems little point grabbing the transaction glock
only to have to release it again if the journal isn't
live. This moves the test earlier to avoid grabbing the lock
when we don't need it in the first place.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d8348de0 05-Feb-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix deadlock on journal flush

This patch fixes a deadlock when the journal is flushed and there
are dirty inodes other than the one which caused the journal flush.
Originally the journal flushing code was trying to obtain the
transaction glock while running the flush code for an inode glock.
We no longer require the transaction glock at this point in time
since we know that any attempt to get the transaction glock from
another node will result in a journal flush. So if we are flushing
the journal, we can be sure that the transaction lock is still
cached from when the transaction was started.

By inlining a version of gfs2_trans_begin() (minus the bit which
gets the transaction glock) we can avoid the deadlock problems
caused if there is a demote request queued up on the transaction
glock.

In addition I've also moved the umount rwsem so that it covers
the glock workqueue, since it all demotions are done by this
workqueue now. That fixes a bug on umount which I came across
while fixing the original problem.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f057f6cd 12-Jan-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge lock_dlm module into GFS2

This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
o Reducing overhead by eliminating duplicated fields between structures
o Simplifcation of the code (reduces the code size by a fair bit)
o The locking interface is now the DLM interface itself as proposed
some time ago.
o Fewer lookups of glocks when processing replies from the DLM
o Fewer memory allocations/deallocations for each glock
o Scope to do further optimisations in the future (but this patch is
more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5731be53 01-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update gfs2_trans_add_unrevoke to accept extents

By adding an extra argument to gfs2_trans_add_unrevoke we can now
specify an extent length of blocks to unrevoke. This means that
we only need to make one pass through the list for each extent
rather than each block. Currently the only extent length which
is used is 1, but that will change in the future.

Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
since its the only difference between this and gfs2_alloc_data
which is left. This will allow a future patch to merge these
two functions into one (i.e. one call to allocate both data
and metadata in a single extent in the future).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2bcd610d 08-Nov-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Don't add glocks to the journal

The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1ad38c43 03-Sep-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Clean up gfs2_trans_add_revoke()

The following alters gfs2_trans_add_revoke() to take a struct
gfs2_bufdata as an argument. This eliminates the memory allocation which
was previously required by making use of the already existing struct
gfs2_bufdata. It makes some sanity checks to ensure that the
gfs2_bufdata has been removed from all the lists before its recycled as
a revoke structure. This saves one memory allocation and one free per
revoke structure.

Also as a result, and to simplify the locking, since there is no longer
any blocking code in gfs2_trans_add_revoke() we must hold the log lock
whenever this function is called. This reduces the amount of times we
take and unlock the log lock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0820ab51 02-Sep-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use slab operations for all gfs2_bufdata allocations

The old revoke structure was allocated using kalloc/kfree but
there is a slab cache for gfs2_bufdata, so we should use that
now that the structures have been converted.

This is part two of the patch series to merge the revoke
and gfs2_bufdata structures.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 82e86087 02-Sep-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Replace revoke structure with bufdata structure

Both the revoke structure and the bufdata structure are quite similar.
They are basically small tags which are put on lists. In addition to
which the revoke structure is always allocated when there is a bufdata
structure which is (or can be) freed. As such it should be possible to
reduce the number of frees and allocations by using the same structure
for both purposes.

This patch is the first step along that path. It replaces existing uses
of the revoke structure with the bufdata structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7d308590 18-Sep-2006 Fabio Massimo Di Nitto <fabbione@ubuntu.com>

[GFS2] Export lm_interface to kernel headers


lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.

Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.

Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cd915493 03-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Change all types to uX style

This makes all fixed size types have consistent names.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e9fc2aa0 01-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright, tidy up incore.h

As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.

The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].

I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3a8a9a10 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright date to 2006

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bd896801 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove semaphore.h from C files

We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 579b78a4 26-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove GL_NEVER_RECURSE flag

There is no point in keeping this flag since recursion is not
now allowed for any glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f4154ea0 11-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update journal accounting code.

A small update to the journaling code to change the way that
the "extra" blocks are accounted for in the journal. These are
used at a rate of one per 503 metadata blocks or one per 251
journaled data blocks (or just one if the total number of journaled
blocks in the transaction is smaller). Since we are using them at
two different rates the old method of accounting for them no longer
works and we count them up as required.

Since the "per transaction" accounting can't handle this (there is no
fixed number of header blocks per transaction) we have to account for
it in the general journal code. We now require that each transaction
reserves more blocks than it actually needs to take account of the
possible extra blocks.

Also a final fix to dir.c to ensure that all ref counts are handled
correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b09e593d 07-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix a ref count bug and other clean ups

This fixes a ref count bug that sometimes showed up a umount time
(causing it to hang) but it otherwise mostly harmless. At the same
time there are some clean ups including making the log operations
structures const, moving a memory allocation so that its not done
in the fast path of checking to see if there is an outstanding
transaction related to a particular glock.

Removes the sd_log_wrap varaible which was updated, but never actually
used anywhere. Updates the gfs2 ioctl() to run without the kernel lock
(which it never needed anyway). Removes the "invalidate inodes" loop
from GFS2's put_super routine. This is done in kill super anyway so
we don't need to do it here. The loop was also bogus in that if there
are any inodes "stuck" at this point its a bug and we need to know
about it rather than hide it by hanging forever.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cd45697f 30-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add missing {} in trans.c

A conditional had missing {} around the two following
statements. Now added.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d0dc80db 29-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update debugging code

Update the debugging code in trans.c and at the same time improve
the debugging code for gfs2_holders. The new code should be pretty
fast during the normal case and provide just as much information
in case of errors (or more).

One small function from glock.c has moved to glock.h as a static inline so
that its return address won't get in the way of the debugging.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 484adff8 29-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update locking in log.c

Replace the lock_for_trans()/lock_for_flush() functions with an rwsem.
In fact the sd_log_flush_lock becomes an rwsem (the write part of it)
and is extended slightly to cover everything that the lock_for_flush()
used to cover. The read part of the lock is instead of lock_for_trans().

This corrects the races in the original code and reduces the code size.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b4dc7291 01-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix some bugs

Fix a bug I introduced earlier with a kfree() and usage of
a structure in the wrong order. Also try and get the counts
of the journaled data buffers "more correct". Still some work
to do in this area though.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e317ffcb 01-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove uneeded memory allocation

For every filesystem operation where we need a transaction, we
now make one less memory allocation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5c676f6d 27-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Macros removal in gfs2.h

As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f55ab26a 20-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Use mutices rather than semaphores

As well as a number of minor bug fixes, this patch changes GFS
to use mutices rather than semaphores. This results in better
information in case there are any locking problems.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 18ec7d5c 08-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Make journaled data files identical to normal files on disk

This is a very large patch, with a few still to be resolved issues
so you might want to check out the previous head of the tree since
this is known to be unstable. Fixes for the various bugs will be
forthcoming shortly.

This patch removes the special data format which has been used
up till now for journaled data files. Directories still retain the
old format so that they will remain on disk compatible with earlier
releases. As a result you can now do the following with journaled
data files:

1) mmap them
2) export them over NFS
3) convert to/from normal files whenever you want to (the zero length
restriction is gone)

In addition the level at which GFS' locking is done has changed for all
files (since they all now use the page cache) such that the locking is
done at the page cache level rather than the level of the fs operations.
This should mean that things like loopback mounts and other things which
touch the page cache directly should now work.

Current known issues:

1. There is a lock mode inversion problem related to the resource
group hold function which needs to be resolved.
2. Any significant amount of I/O causes an oops with an offset of hex 320
(NULL pointer dereference) which appears to be related to a journaled data
buffer appearing on a list where it shouldn't be.
3. Direct I/O writes are disabled for the time being (will reappear later)
4. There is probably a deadlock between the page lock and GFS' locks under
certain combinations of mmap and fs operation I/O.
5. Issue relating to ref counting on internally used inodes causes a hang
on umount (discovered before this patch, and not fixed by it)
6. One part of the directory metadata is different from GFS1 and will need
to be resolved before next release.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 64fb4eb7 18-Jan-2006 Steven Whitehouse <steve@chygwyn.com>

[GFS2] Remove gfs2_databuf in favour of gfs2_bufdata structure

Removing the gfs2_databuf structure and using gfs2_bufdata instead
is a step towards allowing journaling of data without requiring the
metadata header on each journaled block. The idea is to merge the
code paths for ordered data with that of journaled data, with the
log operations in lops.c tacking account of the different types of
buffers as they are presented to it. Largely the code path for
metadata will be similar too, but obviously through a different set
of log operations.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 586dfdaa 18-Jan-2006 Steven Whitehouse <steve@chygwyn.com>

[GFS2] Make the new argument to gfs2_trans_add_bh() actually do something

Passes the flag through to ensure that the correct log operations are
invoked when the flag is set.

Signed-off-by: Steven Whitehouse: <swhiteho@redhat.com>


# d4e9c4c3 18-Jan-2006 Steven Whitehouse <steve@chygwyn.com>

[GFS2] Add an additional argument to gfs2_trans_add_bh()

This adds an extra argument to gfs2_trans_add_bh() to indicate whether the
bh being added to the transaction is metadata or data. Its currently unused
since all existing callers set it to 1 (metadata) but following patches will
make use of it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b3b94faa 16-Jan-2006 David Teigland <teigland@redhat.com>

[GFS2] The core of GFS2

This patch contains all the core files for GFS2.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>