History log of /linux-master/fs/ocfs2/journal.c
Revision Date Author Comments
# 783cc68d 14-Jan-2022 Zhang Mingyu <zhang.mingyu@zte.com.cn>

ocfs2: use BUG_ON instead of if condition followed by BUG.

This issue was detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/20211105014424.75372-1-zhang.mingyu@zte.com.cn
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 111e7049 19-Oct-2021 Eric W. Biederman <ebiederm@xmission.com>

exit/kthread: Have kernel threads return instead of calling do_exit

In 2009 Oleg reworked[1] the kernel threads so that it is not
necessary to call do_exit if you are not using kthread_stop(). Remove
the explicit calls of do_exit and complete_and_exit (with a NULL
completion) that were previously necessary.

[1] 63706172f332 ("kthreads: rework kthread_stop()")
Link: https://lkml.kernel.org/r/20211020174406.17889-12-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# da5e7c87 05-Nov-2021 Valentin Vidic <vvidic@valentin-vidic.from.hr>

ocfs2: cleanup journal init and shutdown

Allocate and free struct ocfs2_journal in ocfs2_journal_init and
ocfs2_journal_shutdown. Init and release of system inodes references
the journal so reorder calls to make sure they work correctly.

Link: https://lkml.kernel.org/r/20211009145006.3478-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 01d5d965 18-May-2021 Leah Rumancik <leah.rumancik@gmail.com>

ext4: add discard/zeroout flags to journal flush

Add a flags argument to jbd2_journal_flush to enable discarding or
zero-filling the journal blocks while flushing the journal.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# fa60ce2c 06-May-2021 Masahiro Yamada <masahiroy@kernel.org>

treewide: remove editor modelines and cruft

The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ede7dc7f 05-Nov-2020 Harshad Shirwadkar <harshadshirwadkar@gmail.com>

jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs

The on-disk superblock field sb->s_maxlen represents the total size of
the journal including the fast commit area and is no more the max
number of blocks available for a transaction. The maximum number of
blocks available to a transaction is reduced by the number of fast
commit blocks. So, this patch renames j_maxlen to j_total_len to
better represent its intent. Also, it adds a function to calculate max
number of bufs available for a transaction.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 342af94e 05-Oct-2020 Mauricio Faria de Oliveira <mfo@canonical.com>

jbd2, ext4, ocfs2: introduce/use journal callbacks j_submit|finish_inode_data_buffers()

Introduce journal callbacks to allow different behaviors
for an inode in journal_submit|finish_inode_data_buffers().

The existing users of the current behavior (ext4, ocfs2)
are adapted to use the previously exported functions
that implement the current behavior.

Users are callers of jbd2_journal_inode_ranged_write|wait(),
which adds the inode to the transaction's inode list with
the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree.

Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2,
which builds fs/jbd2/commit.c and journal.c that define and
export the functions, so we can call directly in ext4/ocfs2.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 3c9210d4 01-Apr-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

ocfs2: replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language extension
to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning in
case the flexible array does not occur last in the structure, which will
help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by this
change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Link: http://lkml.kernel.org/r/20200213160244.GA6088@embeddedor
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 397eac17 04-Jan-2020 Kai Li <li.kai4@h3c.com>

ocfs2: call journal flush to mark journal as empty after journal recovery when mount

If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset). If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
Block 0: Journal Superblock
Seq: 0 Type: 4 (JBD2_SUPERBLOCK_V2)
Blocksize: 4096 Total Blocks: 32768 First Block: 1
First Commit ID: 13 Start Log Blknum: 1
Error: 0
Feature Compat: 0
Feature Incompat: 2 block64
Feature RO compat: 0
Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
FS Share Cnt: 1 Dynamic Superblk Blknum: 0
Per Txn Block Limit Journal: 0 Data: 0

Block 1: Journal Commit Block
Seq: 14 Type: 2 (JBD2_COMMIT_BLOCK)

Block 2: Journal Descriptor
Seq: 15 Type: 1 (JBD2_DESCRIPTOR_BLOCK)
No. Blocknum Flags
0. 587 none
UUID: 00000000000000000000000000000000
1. 8257792 JBD2_FLAG_SAME_UUID
2. 619 JBD2_FLAG_SAME_UUID
3. 24772864 JBD2_FLAG_SAME_UUID
4. 8257802 JBD2_FLAG_SAME_UUID
5. 513 JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
...
Block 7: Inode
Inode: 8257802 Mode: 0640 Generation: 57157641 (0x3682809)
FS Generation: 2839773110 (0xa9437fb6)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x1) InlineData
User: 0 (root) Group: 0 (root) Size: 7
Links: 1 Clusters: 0
ctime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
atime: 0x5de5d870 0x113181a1 -- Tue Dec 3 11:37:20.288457121 2019
mtime: 0x5de5d870 0x11104c61 -- Tue Dec 3 11:37:20.286280801 2019
dtime: 0x0 -- Thu Jan 1 08:00:00 1970
...
Block 9: Journal Commit Block
Seq: 15 Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
Signed-off-by: Kai Li <li.kai4@h3c.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Changwei Ge <gechangwei@live.cn>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fdc3ef88 05-Nov-2019 Jan Kara <jack@suse.cz>

jbd2: Reserve space for revoke descriptor blocks

Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.

On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# 9797a902 05-Nov-2019 Jan Kara <jack@suse.cz>

ocfs2: Use accessor function for h_buffer_credits

Use the jbd2 accessor function for h_buffer_credits.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# b918c430 18-Oct-2019 Yi Li <yilikernel@gmail.com>

ocfs2: fix panic due to ocfs2_wq is null

mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error. ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
------------[ cut here ]------------
Oops: 0002 [#1] SMP NOPTI
CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G E
4.14.148-200.ckv.x86_64 #1
Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
task: ffff967af0520000 task.stack: ffffa5f05484000
RIP: 0010:mutex_lock+0x19/0x20
Call Trace:
flush_workqueue+0x81/0x460
ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
ocfs2_dismount_volume+0x84/0x400 [ocfs2]
ocfs2_fill_super+0xa4/0x1270 [ocfs2]
? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
mount_bdev+0x17f/0x1c0
mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.com
Signed-off-by: Yi Li <yilikernel@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 328970de 23-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 021110 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 84 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d85400af 28-Dec-2018 Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: clear journal dirty flag after shutdown journal

Dirty flag of the journal should be cleared at the last stage of umount,
if do it before jbd2_journal_destroy(), then some metadata in uncommitted
transaction could be lost due to io error, but as dirty flag of journal
was already cleared, we can't find that until run a full fsck. This may
cause system panic or other corruption.

Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 21158ca8 02-Nov-2018 Guozhonghua <guozhonghua@h3c.com>

ocfs2: without quota support, avoid calling quota recovery

During one dead node's recovery by other node, quota recovery work will
be queued. We should avoid calling quota when it is not supported, so
check the quota flags.

Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: guozhonghua <guozhonghua@h3c.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6396bb22 12-Jun-2018 Kees Cook <keescook@chromium.org>

treewide: kzalloc() -> kcalloc()

The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

kzalloc(a * b, gfp)

with:
kcalloc(a * b, gfp)

as well as handling cases of:

kzalloc(a * b * c, gfp)

with:

kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>


# d984187e 31-Jan-2018 piaojun <piaojun@huawei.com>

ocfs2: return error when we attempt to access a dirty bh in jbd2

We should not reuse the dirty bh in jbd2 directly due to the following
situation:

1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.

2. The bhs are submitted to jbd2 area successfully.

3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.

4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():

ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.

5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.

6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.

Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.

Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 964f14a0 06-Sep-2017 Jun Piao <piaojun@huawei.com>

ocfs2: clean up some dead code

clean up some unused functions and parameters.

Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 395627b0 12-Dec-2016 Deepa Dinamani <deepa.kernel@gmail.com>

ocfs2: use time64_t to represent orphan scan times

struct timespec is not y2038 safe. Use time64_t which is y2038 safe to
represent orphan scan times. time64_t is sufficient here as only the
seconds delta times are relevant.

Also use appropriate time functions that return time in time64_t format.
Time functions now return monotonic time instead of real time as only
delta scan times are relevant and these values are not persistent across
reboots.

The format string for the debug print is still using long as this is
only the time elapsed since the last scan and long is sufficient to
represent this value.

Link: http://lkml.kernel.org/r/1475365138-20567-1-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0b492f68 26-Jul-2016 Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: improve recovery performance

Journal replay will be run when performing recovery for a dead node. To
avoid the stale cache impact, all blocks of dead node's journal inode
were reloaded from disk. This hurts the performance. Check whether one
block is cached before reloading it can improve performance a lot. In
my test env, the time doing recovery was improved from 120s to 1s.

[akpm@linux-foundation.org: clean up the for loop p_blkno handling]
Link: http://lkml.kernel.org/r/1466155682-24656-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 35ddf78e 25-Mar-2016 jiangyiwen <jiangyiwen@huawei.com>

ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local

This patch fixes a deadlock, as follows:

Node 1 Node 2 Node 3
1)volume a and b are only mount vol a only mount vol b
mounted

2) start to mount b start to mount a

3) check hb of Node 3 check hb of Node 2
in vol a, qs_holds++ in vol b, qs_holds++

4) -------------------- all nodes' network down --------------------

5) progress of mount b the same situation as
failed, and then call Node 2
ocfs2_dismount_volume.
but the process is hung,
since there is a work
in ocfs2_wq cannot beo
completed. This work is
about vol a, because
ocfs2_wq is global wq.
BTW, this work which is
scheduled in ocfs2_wq is
ocfs2_orphan_scan_work,
and the context in this work
needs to take inode lock
of orphan_dir, because
lockres owner are Node 1 and
all nodes' nework has been down
at the same time, so it can't
get the inode lock.

6) Why can't this node be fenced
when network disconnected?
Because the process of
mount is hung what caused qs_holds
is not equal 0.

Because all works in the ocfs2_wq are relative to the super block.

The solution is to change the ocfs2_wq from global to local. In other
words, move it into struct ocfs2_super.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5955102c 22-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

wrappers for ->i_mutex access

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 72865d92 14-Jan-2016 Joseph Qi <joseph.qi@huawei.com>

ocfs2: clean up redundant NULL check before iput

Since iput will take care the NULL check itself, NULL check before
calling it is redundant. So clean them up.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5afc44e2 05-Nov-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: add uuid to ocfs2 thread name for problem analysis

A node can mount multiple ocfs2 volumes. And if thread names are same for
each volume/domain, it will bring inconvenience when analyzing problems
because we have to identify which volume/domain the messages belong to.

Since thread name will be printed to messages, so add volume uuid or dlm
name to thread name can benefit problem analysis.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Gang He <ghe@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 93d911fc 05-Nov-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: only take lock if dio entry when recover orphans

We have no need to take inode mutex, rw and inode lock if it is not dio
entry when recover orphans. Optimize it by adding a flag
OCFS2_INODE_DIO_ORPHAN_ENTRY to ocfs2_inode_info to reduce contention.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 30edc43c 05-Nov-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: do not include dio entry in case of orphan scan

dio entry will only do truncate in case of ORPHAN_NEED_TRUNCATE. So do
not include it when doing normal orphan scan to reduce contention.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ecef14a 04-Sep-2015 Joe Perches <joe@perches.com>

ocfs2: neaten do_error, ocfs2_error and ocfs2_abort

These uses sometimes do and sometimes don't have '\n' terminations. Make
the uses consistently use '\n' terminations and remove the newline from
the functions.

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ad694821 04-Sep-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: fix race between crashed dio and rm

There is a race case between crashed dio and rm, which will lead to
OCFS2_VALID_FL not set read-only.

N1 N2
------------------------------------------------------------------------
dd with direct flag
rm file
crashed with an dio entry left
in orphan dir
clear OCFS2_VALID_FL in
ocfs2_remove_inode
recover N1 and read the corrupted inode,
and set filesystem read-only

So we skip the inode deletion this time and wait for dio entry recovered
first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# acf8fdbe 04-Sep-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access

When storage network is unstable, it may trigger the BUG in
__ocfs2_journal_access because of buffer not uptodate. We can retry the
write in this case or return error instead of BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Zhangguanghui <zhang.guanghui@h3c.com>
Tested-by: Zhangguanghui <zhang.guanghui@h3c.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 512f62ac 04-Sep-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: fix race between dio and recover orphan

During direct io the inode will be added to orphan first and then
deleted from orphan. There is a race window that the orphan entry will
be deleted twice and thus trigger the BUG when validating
OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.

ocfs2_direct_IO_write
...
ocfs2_add_inode_to_orphan
>>>>>>>> race window.
1) another node may rm the file and then down, this node
take care of orphan recovery and clear flag
OCFS2_DIO_ORPHANED_FL.
2) since rw lock is unlocked, it may race with another
orphan recovery and append dio.
ocfs2_del_inode_from_orphan

So take inode mutex lock when recovering orphans and make rw unlock at the
end of aio write in case of append dio.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b519ea6d 24-Jun-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: mark local functions as static

Some functions are only used locally, so mark them as static.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 74e364ad 24-Jun-2015 Xue jiufei <xuejiufei@huawei.com>

ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()

ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no
function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
dereference while calling this function. We can get sb from
bh->b_bdev->bd_super instead of b_assoc_map.

[akpm@linux-foundation.org: update comment, per Joseph]
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e272e7f0 24-Jun-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: do not BUG if jbd2_journal_dirty_metadata fails

jbd2_journal_dirty_metadata may fail. Currently it cannot take care of
non zero return value and just BUG in ocfs2_journal_dirty. This patch is
aborting the handle and journal instead of BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf1776a9 24-Jun-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: fix a tiny race when truncate dio orohaned entry

Once dio crashed it will leave an entry in orphan dir. And orphan scan
will take care of the clean up. There is a tiny race case that the same
entry will be truncated twice and then trigger the BUG in
ocfs2_del_inode_from_orphan.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4813962b 16-Feb-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: wait for orphan recovery first once append O_DIRECT write crash

If one node has crashed with orphan entry leftover, another node which do
append O_DIRECT write to the same file will override the
i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If
this case happens, we let it wait for orphan recovery first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ed460cff 16-Feb-2015 Joseph Qi <joseph.qi@huawei.com>

ocfs2: add orphan recovery types in ocfs2_recover_orphans

Define two orphan recovery types, which indicates if need truncate file or
not.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Xuejiufei <xuejiufei@huawei.com>
Cc: alex chen <alex.chen@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9d6008c7 10-Feb-2015 Daeseok Youn <daeseok.youn@gmail.com>

ocfs2: remove unreachable code in __ocfs2_recovery_thread()

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ac7576f4 30-Oct-2014 Miklos Szeredi <miklos@szeredi.hu>

vfs: make first argument of dir_context.actor typed

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 55b465b6 04-Jun-2014 Joseph Qi <joseph.qi@huawei.com>

ocfs2: limit printk when journal is aborted

Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
ocfs2_commit_thread. Then it will get into a loop with mass logs. This
will meaninglessly consume a larger number of resource and may lead to
the system hanging. So limit printk in this case.

[akpm@linux-foundation.org: document the msleep]
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7bf619c1 03-Apr-2014 Jan Kara <jack@suse.cz>

ocfs2: remove OCFS2_INODE_SKIP_DELETE flag

The flag was never set, delete it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f17c20dd 11-Sep-2013 Junxiao Bi <junxiao.bi@oracle.com>

ocfs2: use i_size_read() to access i_size

Though ocfs2 uses inode->i_mutex to protect i_size, there are both
i_size_read/write() and direct accesses. Clean up all direct access to
eliminate confusion.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2b1e55c3 11-Sep-2013 Younger Liu <younger.liu@huawei.com>

ocfs2: lighten up allocate transaction

The issue scenario is as following:

When fallocating a very large disk space for a small file,
__ocfs2_extend_allocation attempts to get a very large transaction. For
some journal sizes, there may be not enough room for this transaction,
and the fallocate will fail.

The patch below extends & restarts the transaction as necessary while
allocating space, and should work with even the smallest journal. This
patch refers ext4 resize.

Test:
# mkfs.ocfs2 -b 4K -C 32K -T datafiles /dev/sdc
...(jounral size is 32M)
# mount.ocfs2 /dev/sdc /mnt/ocfs2/
# touch /mnt/ocfs2/1.log
# fallocate -o 0 -l 400G /mnt/ocfs2/1.log
fallocate: /mnt/ocfs2/1.log: fallocate failed: Cannot allocate memory
# tail -f /var/log/messages
[ 7372.278591] JBD: fallocate wants too many credits (2051 > 2048)
[ 7372.278597] (fallocate,6438,0):__ocfs2_extend_allocation:709 ERROR: status = -12
[ 7372.278603] (fallocate,6438,0):ocfs2_allocate_unwritten_extents:1504 ERROR: status = -12
[ 7372.278607] (fallocate,6438,0):__ocfs2_change_file_space:1955 ERROR: status = -12
^C
With this patch, the test works well.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3704412b 22-May-2013 Al Viro <viro@zeniv.linux.org.uk>

[readdir] convert ocfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d787ab09 21-Feb-2013 Tim Gardner <tim.gardner@canonical.com>

ocfs2: remove kfree() redundant null checks

smatch analysis indicates a number of redundant NULL checks before
calling kfree(), eg:

fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info:
redundant null check on *tl_copy calling kfree()

fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info:
redundant null check on pages calling kfree()

etc....

[akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fef6925c 12-Jun-2012 Jan Kara <jack@suse.cz>

ocfs2: Convert to new freezing mechanism

Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze
protection. We also protect several ioctl entry points which were missing the
protection. Finally, we add freeze protection to the journaling mechanism so
that iput() of unlinked inode cannot modify a frozen filesystem.

CC: Mark Fasheh <mfasheh@suse.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a035bff6 24-Jul-2011 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Add comment about orphan scanning

Add a comment that explains the reason as to why orphan scan scans all the slots.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>


# 619c200d 24-Jul-2011 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Clean up messages in the fs

Convert useful messages from ML_NOTICE to KERN_NOTICE to improve readability.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>


# 10b3dd76 04-May-2011 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Skip mount recovery for hard-ro mounts

Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# b4107950 23-Feb-2011 Tao Ma <boyu.mt@taobao.com>

ocfs2: Remove masklog ML_JOURNAL.

Remove mlog(0) from fs/ocfs2/journal.c and the masklog JOURNAL.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>


# c1e8d35e 07-Mar-2011 Tao Ma <boyu.mt@taobao.com>

ocfs2: Remove EXIT from masklog.

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>


# ef6b689b 20-Feb-2011 Tao Ma <boyu.mt@taobao.com>

ocfs2: Remove ENTRY from masklog.

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>


# 17ae5211 01-Aug-2010 Tao Ma <tao.ma@oracle.com>

ocfs2: Remove obsolete comments before ocfs2_start_trans.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# f9c57ada 01-Aug-2010 Tao Ma <tao.ma@oracle.com>

ocfs2: Remove unused old_id in ocfs2_commit_cache.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 3c3f20c9 31-May-2010 Tao Ma <tao.ma@oracle.com>

ocfs2: Add some trace log for orphan scan.

Now orphan scan worker has no trace log, so it is
very hard to tell whether it is finished or blocked.
So add 2 mlog trace log so that we can tell whether
the current orphan scan worker is blocked or not.
It does help when I analyzed a orphan scan bug.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# a931da6a 03-Aug-2010 Theodore Ts'o <tytso@mit.edu>

jbd2: Change j_state_lock to be a rwlock_t

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores. So
change it to be a read/write spinlock.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


# 13ceef09 13-Jul-2010 Jan Kara <jack@suse.cz>

jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
- This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
- This copies off b_frozen_data to make it safe for transaction1 to commit.
jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
- There was no dirty call for the bh on handle2, so it is never queued for
any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback. This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 40f165f4 28-May-2010 Tao Ma <tao.ma@oracle.com>

ocfs2: Move orphan scan work to ocfs2_wq.

We used to let orphan scan work in the default work queue,
but there is a corner case which will make the system deadlock.
The scenario is like this:
1. set heartbeat threadshold to 200. this will allow us to have a
great chance to have a orphan scan work before our quorum decision.
2. mount node 1.
3. after 1~2 minutes, mount node 2(in order to make the bug easier
to reproduce, better add maxcpus=1 to kernel command line).
4. node 1 do orphan scan work.
5. node 2 do orphan scan work.
6. node 1 do orphan scan work. After this, node 1 hold the orphan scan
lock while node 2 know node 1 is the master.
7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection).

Now when node 2 begins orphan scan, the system queue is blocked.

The root cause is that both orphan scan work and quorum decision work
will use the system event work queue. orphan scan has a chance of
blocking the event work queue(in dlm_wait_for_node_death) so that there
is no chance for quorum decision work to proceed.

This patch resolve it by moving orphan scan work to ocfs2_wq.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# c901fb00 26-Apr-2010 Tao Ma <tao.ma@oracle.com>

ocfs2: Make ocfs2_extend_trans() really extend.

In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
blocks. But if jbd2_journal_extend() fails, it will only restart
with the the new number of blocks. This tends to be awkward since
in most cases we want additional reserved blocks. It makes our code
harder to mantain since the caller can't be sure all the original
blocks will not be accessed and dirtied again. There are 15 callers
of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
h_buffer_credits before they call ocfs2_extend_trans(). This makes
ocfs2_extend_trans() really extend atop the original block count.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# ec20cec7 19-Mar-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Make ocfs2_journal_dirty() void.

jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
since before the kernel moved to git. There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning. All over ocfs2, we have blocks of code checking this can't
fail status. In the past few years, we've tried to avoid adding these
checks, because they are pointless. But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function. All error
checking is removed from other files. We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday. They
won't.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 2bd63216 25-Jan-2010 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2/trivial: Remove trailing whitespaces

Patch removes trailing whitespaces.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# af901ca1 14-Nov-2009 André Goddard Rosa <andre.goddard@gmail.com>

tree-wide: fix assorted typos all over the place

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 93c97087 17-Aug-2009 Tao Ma <tao.ma@oracle.com>

ocfs2: Add metaecc for ocfs2_refcount_block.

Add metaecc and journal trigger for ocfs2_refcount_block.

Signed-off-by: Tao Ma <tao.ma@oracle.com>


# 0cf2f763 12-Feb-2009 Joel Becker <joel.becker@oracle.com>

ocfs2: Pass struct ocfs2_caching_info to the journal functions.

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 8cb471e8 10-Feb-2009 Joel Becker <joel.becker@oracle.com>

ocfs2: Take the inode out of the metadata read/write paths.

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 8b712cd5 07-Jul-2009 Jeff Mahoney <jeffm@suse.com>

ocfs2: Fixup orphan scan cleanup after failed mount

If the mount fails for any reason, ocfs2_dismount_volume calls
ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
be called to setup the mutex and work queues, but that doesn't
happen if the mount has failed and we oops accessing an uninitialized
work queue.

This patch splits the init and startup of the orphan scan, eliminating
the oops.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# df152c24 22-Jun-2009 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Disable orphan scanning for local and hard-ro mounts

Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 3211949f 19-Jun-2009 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()

We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one. This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 692684e1 19-Jun-2009 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Stop orphan scan as early as possible during umount

Currently if the orphan scan fires a tick before the user issues the umount,
the umount will wait for the queued orphan scan tasks to complete.

This patch makes the umount stop the orphan scan as early as possible so as
to reduce the probability of the queued tasks slowing down the umount.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 15633a22 03-Jun-2009 Srinivas Eeda <srinivas.eeda@oracle.com>

ocfs2 patch to track delayed orphan scan timer statistics

Patch to track delayed orphan scan timer statistics.

Modifies ocfs2_osb_dump to print the following:
Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 83273932 03-Jun-2009 Srinivas Eeda <srinivas.eeda@oracle.com>

ocfs2: timer to queue scan of all orphan slots

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 9140db04 06-Mar-2009 Srinivas Eeda <srinivas.eeda@oracle.com>

ocfs2: recover orphans in offline slots during recovery and mount

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 9b7895ef 12-Nov-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Add a name indexed b-tree to directory inodes

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>


# 96a6c64b 16-Dec-2008 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Move struct recovery_map to a header file

Move the definition of struct recovery_map from journal.c to journal.h. This
is preparation for the next patch.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# c175a518 10-Dec-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Checksum and ECC for directory blocks.

Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the
dirblocks.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 13723d00 17-Oct-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out. This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks. It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root. Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal. We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions. Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 50655ae9 11-Sep-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Add journal_access functions with jbd2 triggers.

We create wrappers for ocfs2_journal_access() that are specific to the
type of metadata block. This allows us to associate jbd2 commit
triggers with the block. The triggers will compute metadata ecc in a
future commit.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 19ece546 21-Aug-2008 Jan Kara <jack@suse.cz>

ocfs2: Enable quota accounting on mount, disable on umount

Enable quota usage tracking on mount and disable it on umount. Also
add support for quota on and quota off quotactls and usrquota and
grpquota mount options. Add quota features among supported ones.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 2205363d 20-Oct-2008 Jan Kara <jack@suse.cz>

ocfs2: Implement quota recovery

Implement functions for recovery after a crash. Functions just
read local quota file and sync info to global quota file.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 90e86a63 27-Aug-2008 Jan Kara <jack@suse.cz>

ocfs2: Support nested transactions

OCFS2 can easily support nested transactions. We just have to
take care and not spoil statistics acquire semaphore unnecessarily.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 53ef99ca 18-Nov-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Remove JBD compatibility layer

JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 10995aa2 13-Nov-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.

Random places in the code would check a dinode bh to see if it was
valid. Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place. Thus, these haphazard checks are no longer
necessary. Rather than eliminate them, however, we change them to
BUG_ON() checks. This ensures the assumptions remain true. All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# b657c95c 13-Nov-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Wrap inode block reads in a dedicated function.

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# ae0dff68 22-Oct-2008 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: Set journal descriptor to NULL after journal shutdown

Patch sets journal descriptor to NULL after the journal is shutdown.
This ensures that jbd2_journal_release_jbd_inode(), which removes the
jbd2 inode from txn lists, can be called safely from ocfs2_clear_inode()
even after the journal has been shutdown.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# d4a8c93c 09-Oct-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Make cached block reads the common case.

ocfs2_read_blocks() currently requires the CACHED flag for cached I/O.
However, that's the common case. Let's flip it around and provide an
IGNORE_CACHE flag for the special users. This has the added benefit of
cleaning up the code some (ignore_cache takes on its special meaning
earlier in the loop).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 0fcaa56a 09-Oct-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Simplify ocfs2_read_block()

More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 31d33073 09-Oct-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Require an inode for ocfs2_read_block(s)().

Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode. Use it
unconditionally. Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# da1e9098 09-Oct-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Separate out sync reads from ocfs2_read_blocks()

The ocfs2_read_blocks() function currently handles sync reads, cached,
reads, and sometimes cached reads. We're going to add some
functionality to it, so first we should simplify it. The uncached,
synchronous reads are much easer to handle as a separate function, so we
instroduce ocfs2_read_blocks_sync().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# a81cb88b 07-Oct-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Don't check for NULL before brelse()

This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh


# 2b4e30fb 03-Sep-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Switch over to JBD2.

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change. Most functions are just renamed. The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem. It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being. This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# a1af7d15 19-Aug-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Fix sleep-with-spinlock recovery regression

This fixes a bug introduced with 539d8264093560b917ee3afe4c7f74e5da09d6a5:
[PATCH 2/2] ocfs2: Fix race between mount and recovery

ocfs2_mark_dead_nodes() was reading journal inodes while holding the
spinlock protecting our in-memory recovery state. The fix is very simple -
the disk state is protected by a cluster lock that's already held, so we
just move the spinlock down past the read.

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 539d8264 14-Jul-2008 Sunil Mushran <sunil.mushran@oracle.com>

[PATCH 2/2] ocfs2: Fix race between mount and recovery

As the fs recovery is asynchronous, there is a small chance that another
node can mount (and thus recover) the slot before the recovery thread
gets to it.

If this happens, the recovery thread will block indefinitely on the
journal/slot lock as that lock will be held for the duration of the mount
(by design) by the node assigned to that slot.

The solution implemented is to keep track of the journal replays using
a recovery generation in the journal inode, which will be incremented by the
thread replaying that journal. The recovery thread, before attempting the
blocking lock on the journal/slot lock, will compare the generation on disk
with what it has cached and skip recovery if it does not match.

This bug appears to have been inadvertently introduced during the mount/umount
vote removal by mainline commit 34d024f84345807bf44163fac84e921513dde323. In the
mount voting scheme, the messaging would indirectly indicate that the slot
was being recovered.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# e407e397 12-Jun-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs

A couple places use OCFS2_DEBUG_FS where they really mean
CONFIG_OCFS2_DEBUG_FS.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Joel Becker <joel.becker@oracle.com>


# b1f3550f 04-Mar-2008 Julia Lawall <julia@diku.dk>

ocfs2: Use BUG_ON

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# fc881fa0 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: De-magic the in-memory slot map.

The in-memory slot map uses the same magic as the on-disk one. There is
a special value to mark a slot as invalid. It relies on the size of
certain types and so on.

Write a new in-memory map that keeps validity as a separate field. Outside
of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
It also is no longer tied to the type size.

This also means that only the I/O functions refer to 16bit quantities.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 553abd04 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Change the recovery map to an array of node numbers.

The old recovery map was a bitmap of node numbers. This was sufficient
for the maximum node number of 254. Going forward, we want node numbers
to be UINT32. Thus, we need a new recovery map.

Note that we can't keep track of slots here. We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.

The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.

Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init(). This actually cleans up
ocfs2_initialize_super() a little as well. Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().

A number of node map functions are rendered obsolete and are removed.

Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event. This is a cleanup from Mark.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# d85b20e4 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Make ocfs2_slot_info private.

Just use osb_lock around the ocfs2_slot_info data. This allows us to
take the ocfs2_slot_info structure private in slot_info.c. All access
is now via accessors.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 8e8a4603 01-Feb-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Move slot map access into slot_map.c

journal.c and dlmglue.c would refresh the slot map by hand. Instead, have
the update and clear functions do the work inside slot_map.c. The eventual
result is to make ocfs2_slot_info defined privately in slot_map.c

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 5fa0613e 10-Jan-2008 Jan Kara <jack@suse.cz>

ocfs2: Silence false lockdep warnings

Create separate lockdep lock classes for system file's i_mutexes. They are
used to guard allocations and similar things and thus rank differently
than i_mutex of a regular file or directory.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# d147b3d6 07-Nov-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Support commit= mount option

Mostly taken from ext3. This allows the user to set the jbd commit interval,
in seconds. The default of 5 seconds stays the same, but now users can
easily increase the commit interval. Typically, this would be increased in
order to benefit performance at the expense of data-safety.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# e63aecb6 18-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Rename ocfs2_meta_[un]lock

Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 34d024f8 24-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Remove mount/unmount votes

The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.

The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# e8aed345 03-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Re-journal buffers after transaction extend

ocfs2_extend_trans() might call journal_restart() which will commit dirty
buffers and then restart the transaction. This means that any buffers which
still need changes should be passed to journal_access() again. Some paths
during extend weren't doing this right.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 0879c584 03-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Allow for debugging of transaction extends

The nastiest cases of transaction extends are also the rarest. We can expose
them more quickly at the expense of performance by going straight to the
journal_restart() in ocfs2_extend_trans(). Wrap things in OCFS2_DEBUG_FS so
that we only do this when "expensive debugging" is turned on.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# a86370fb 03-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: fix exit-while-locked bug in ocfs2_queue_orphans()

We're holding the cluster lock when a failure might happen in
ocfs2_dir_foreach() so it needs to be released.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 5eae5b96 10-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Remove open coded readdir()

ocfs2_queue_orphans() has an open coded readdir loop which can easily just
use a directory accessor function.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>


# 316f4b9f 07-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Move directory manipulation code into dir.c

The code for adding, removing, deleting directory entries was splattered all
over namei.c. I'd rather have this all centralized, so that it's easier to
make changes for inline dir data, and eventually indexed directories.

None of the code in any of the functions was changed. I only removed the
static keyword from some prototypes so that they could be exported.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>


# 800deef3 17-May-2007 Christoph Hellwig <hch@lst.de>

[PATCH] ocfs2: use list_for_each_entry where benefical

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 1ca1a111 27-Apr-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: fix sparse warnings in fs/ocfs2

None of these are actually harmful, but the noise makes looking for real
problems difficult.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 8110b073 22-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Fix up i_blocks calculation to know about holes

Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 4f902c37 09-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Fix extent lookup to return true size of holes

Initially, we had wired things to return a size '1' of holes. Cook up a
small amount of code to find the next extent and calculate the number of
clusters between the virtual offset and the next allocated extent.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 49cb8d2d 09-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Read from an unwritten extent returns zeros

Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 363041a5 17-Jan-2007 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: temporarily remove extent map caching

The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 50008630 20-Mar-2007 Tiger Yang <tiger.yang@oracle.com>

ocfs2: Remove delete inode vote

Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# c271c5c2 05-Dec-2006 Sunil Mushran <Sunil.Mushran@oracle.com>

ocfs2: local mounts

This allows users to format an ocfs2 file system with a special flag,
OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
will not use any cluster services, nor will it require a cluster
configuration, thus acting like a 'local' file system.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 1fabe148 09-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t

This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 65eff9cc 09-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove handle argument to ocfs2_start_trans()

All callers either pass in NULL directly, or a local variable that is
already set to NULL.

The internals of ocfs2_start_trans() get a nice cleanup as a result.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# dae85832 09-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove ocfs2_journal_handle journal field

It is no longer used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 02dc1af4 09-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: pass ocfs2_super * into ocfs2_commit_trans()

This sets us up to remove handle->journal.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 4bcec184 09-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove unused handle argument from ocfs2_meta_lock_full()

Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# a301a27d 06-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: make ocfs2_alloc_handle() static

This is no longer used outside of journal.c

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# daf29e9c 06-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove unused ocfs2_handle_add_lock()

This gets us rid of a slab we no longer need, as well as removing the
majority of what's left on ocfs2_journal_handle.

ocfs2_commit_unstarted_handle() has no more real work to do, so remove that
function too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 02928a71 06-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove unused ocfs2_handle_add_inode()

We can also delete the unused infrastructure which was once in place to
support this functionality. ocfs2_inode_private loses ip_handle and
ip_handle_list. ocfs2_journal_handle loses handle_list.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# c161f89b 05-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove ocfs2_journal_handle flags field

Callers can set h_sync directly on the handle_t, whether a transaction has
been started or not can be determined via the existence of the handle_t on
the struct ocfs2_journal_handle.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 1fc58146 05-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: have ocfs2_extend_trans() take handle_t

No reason to use our wrapper struct in this function, so take the handle_t
directly.

Also fixes a bug where we were incorrectly setting the handle to NULL in
case of a failure from journal_restart()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 01ddf1e1 05-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: remove unused ocfs2_journal_handle field

max_buffs was just being set and not actually used.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# c4028958 22-Nov-2006 David Howells <dhowells@redhat.com>

WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>


# 24c19ef4 22-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Remove i_generation from inode lock names

OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 78427043 04-May-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: clean up some osb fields

Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were
unused, or could easily be removed. As a result, we also no longer need
MAX_OSB_ID or ocfs2_globals_lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 34af946a 27-Jun-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] spin/rwlock init cleanups

locking init cleanups:

- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f116629d 26-Jun-2006 Akinobu Mita <mita@miraclelinux.com>

[PATCH] fs: use list_move()

This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under fs/.

Cc: Ian Kent <raven@themaw.net>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Hans Reiser <reiserfs-dev@namesys.com>
Cc: Urban Widmark <urban@teststation.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# afae00ab 12-Apr-2006 Sunil Mushran <sunil.mushran@oracle.com>

ocfs2: fix gfp mask in some file system paths

We were using GFP_KERNEL in a handful of places which really wanted
GFP_NOFS. Fix this.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# dd4a2c2b 12-Apr-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: Don't populate uptodate cache in ocfs2_force_read_journal()

This greatly reduces the amount of memory useded during recovery.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 205f87f6 26-Mar-2006 Badari Pulavarty <pbadari@us.ibm.com>

[PATCH] change buffer_head.b_size to size_t

Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.

The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5515eff8 26-Mar-2006 Andrew Morton <akpm@osdl.org>

[PATCH] 2tb-files-add-blkcnt_t-fixes

Cc: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b0697053 03-Mar-2006 Mark Fasheh <mark.fasheh@oracle.com>

ocfs2: don't use MLF* in the file system

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# b4df6ed8 22-Feb-2006 Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: fix orphan recovery deadlock

Orphan dir recovery can deadlock with another process in
ocfs2_delete_inode() in some corner cases. Fix this by tracking recovery
state more closely and allowing it to handle inode wipes which might
deadlock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 745ae8ba 09-Feb-2006 Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: only checkpoint journal when asked to

Disable automatic checkpointing of the journal - this is a relic from older
ocfs2 days. Worth quite a bit of performance on longer running single node
tests.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 6eff5790 18-Jan-2006 Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] ocfs2: don't wait on recovery when locking journal

The mount path had incorrectly asked the locking code to wait for recovery
completion, which deadlocks things because recovery waits for mount to
complete first.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# ebdec83b 27-Jan-2006 Eric Sesterhenn / snakebyte <snakebyte@gmx.de>

[PATCH] BUG_ON() Conversion in fs/ocfs2/

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# c74ec2f7 13-Jan-2006 Arjan van de Ven <arjan@infradead.org>

[PATCH] ocfs2: Semaphore to mutex conversion.

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 251b6ecc 10-Jan-2006 Mark Fasheh <mark.fasheh@oracle.com>

[OCFS2] Make ip_io_sem a mutex

ip_io_sem is now ip_io_mutex.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>


# 1b1dcc1b 09-Jan-2006 Jes Sorensen <jes@sgi.com>

[PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ccd979bd 15-Dec-2005 Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] OCFS2: The Second Oracle Cluster Filesystem

The OCFS2 file system module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>