History log of /linux-master/fs/ocfs2/stackglue.c
Revision Date Author Comments
# 9d5b9475 20-Nov-2023 Joel Granados <j.granados@samsung.com>

fs: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel elements ctl_table struct. Special attention was placed in
making sure that an empty directory for fs/verity was created when
CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the
register sysctl call that expects a size.

Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 13b6269d 01-Nov-2022 Shang XiaoJing <shangxiaojing@huawei.com>

ocfs2: fix memory leak in ocfs2_stack_glue_init()

ocfs2_table_header should be free in ocfs2_stack_glue_init() if
ocfs2_sysfs_init() failed, otherwise kmemleak will report memleak.

BUG: memory leak
unreferenced object 0xffff88810eeb5800 (size 128):
comm "modprobe", pid 4507, jiffies 4296182506 (age 55.888s)
hex dump (first 32 bytes):
c0 40 14 a0 ff ff ff ff 00 00 00 00 01 00 00 00 .@..............
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000001e59e1cd>] __register_sysctl_table+0xca/0xef0
[<00000000c04f70f7>] 0xffffffffa0050037
[<000000001bd12912>] do_one_initcall+0xdb/0x480
[<0000000064f766c9>] do_init_module+0x1cf/0x680
[<000000002ba52db0>] load_module+0x6441/0x6f20
[<000000009772580d>] __do_sys_finit_module+0x12f/0x1c0
[<00000000380c1f22>] do_syscall_64+0x3f/0x90
[<000000004cf473bc>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lkml.kernel.org/r/41651ca1-432a-db34-eb97-d35744559de1@linux.alibaba.com
Fixes: 3878f110f71a ("ocfs2: Move the hb_ctl_path sysctl into the stack glue.")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# c97e21fe 18-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

ocfs2: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220818210123.7637-4-wsa+renesas@sang-engineering.com
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# f6a26318 28-Jan-2022 Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: fix subdirectory registration with register_sysctl()

The kernel test robot reports that commit c42ff46f97c1 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like

sysctl table check failed: fs/ocfs2/nm Not a file
sysctl table check failed: fs/ocfs2/nm No proc_handler
sysctl table check failed: fs/ocfs2/nm bogus .mode 0555

and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report. Possibly that original report
only ever made it to the lkp list.

The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.

So take that last step.

Fixes: c42ff46f97c1 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c42ff46f 21-Jan-2022 Luis Chamberlain <mcgrof@kernel.org>

ocfs2: simplify subdirectory registration with register_sysctl()

There is no need to user boiler plate code to specify a set of base
directories we're going to stuff sysctls under. Simplify this by using
register_sysctl() and specifying the directory path directly.

// pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH

@c1@
expression E1;
identifier subdir, sysctls;
@@

static struct ctl_table subdir[] = {
{
.procname = E1,
.maxlen = 0,
.mode = 0555,
.child = sysctls,
},
{ }
};

@c2@
identifier c1.subdir;

expression E2;
identifier base;
@@

static struct ctl_table base[] = {
{
.procname = E2,
.maxlen = 0,
.mode = 0555,
.child = subdir,
},
{ }
};

@c3@
identifier c2.base;
identifier header;
@@

header = register_sysctl_table(base);

@r1 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.subdir, c1.sysctls;
@@

-static struct ctl_table subdir[] = {
- {
- .procname = E1,
- .maxlen = 0,
- .mode = 0555,
- .child = sysctls,
- },
- { }
-};

@r2 depends on c1 && c2 && c3@
identifier c1.subdir;

expression c2.E2;
identifier c2.base;
@@
-static struct ctl_table base[] = {
- {
- .procname = E2,
- .maxlen = 0,
- .mode = 0555,
- .child = subdir,
- },
- { }
-};

@initialize:python@
@@

def make_my_fresh_expression(s1, s2):
return '"' + s1.strip('"') + "/" + s2.strip('"') + '"'

@r3 depends on c1 && c2 && c3@
expression c1.E1;
identifier c1.sysctls;
expression c2.E2;
identifier c2.base;
identifier c3.header;
fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) };
@@

header =
-register_sysctl_table(base);
+register_sysctl(E3, sysctls);

Generated-by: Coccinelle SmPL
Link: https://lkml.kernel.org/r/20211123202422.819032-5-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Airlie <airlied@linux.ie>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Qing Wang <wangqing@vivo.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 54e948c6 28-Jun-2021 Dan Carpenter <dan.carpenter@oracle.com>

ocfs2: fix snprintf() checking

The snprintf() function returns the number of bytes which would have been
printed if the buffer was large enough. In other words it can return ">=
remain" but this code assumes it returns "== remain".

The run time impact of this bug is not very severe. The next iteration
through the loop would trigger a WARN() when we pass a negative limit to
snprintf(). We would then return success instead of -E2BIG.

The kernel implementation of snprintf() will never return negatives so
there is no need to check and I have deleted that dead code.

Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
Fixes: a860f6eb4c6a ("ocfs2: sysfile interfaces for online file check")
Fixes: 74ae4e104dfc ("ocfs2: Create stack glue sysfs files.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fa60ce2c 06-May-2021 Masahiro Yamada <masahiroy@kernel.org>

treewide: remove editor modelines and cruft

The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."

I recently receive a patch to explicitly add a new one.

Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.

It is even nicer if scripts/checkpatch.pl can check it.

If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.

[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/

Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f13604a2 29-Apr-2021 Bhaskar Chowdhury <unixbhaskar@gmail.com>

ocfs2: fix a typo

s/cluter/cluster/

Link: https://lkml.kernel.org/r/20210324072931.5056-1-unixbhaskar@gmail.com
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ee9dc325 01-Apr-2020 Alex Shi <alex.shi@linux.alibaba.com>

ocfs2: remove FS_OCFS2_NM

This macro is unused since commit ab09203e302b ("sysctl fs: Remove dead
binary sysctl support").

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Link: http://lkml.kernel.org/r/1579577812-251572-1-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 50acfb2b 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 286

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 this program is distributed
in the hope that it will be useful but without any warranty without
even the implied warranty of merchantability or fitness for a
particular purpose see the gnu general public license for more
details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 97 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.025053186@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# a6346447 02-Nov-2018 Gang He <ghe@suse.com>

ocfs2: remove ocfs2_is_o2cb_active()

Remove ocfs2_is_o2cb_active(). We have similar functions to identify
which cluster stack is being used via osb->osb_cluster_stack.

Secondly, the current implementation of ocfs2_is_o2cb_active() is not
totally safe. Based on the design of stackglue, we need to get
ocfs2_stack_lock before using ocfs2_stack related data structures, and
that active_stack pointer can be NULL in the case of mount failure.

Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.com
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Eric Ren <zren@suse.com>
Acked-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b74271e4 06-Jul-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

ocfs2: constify attribute_group structures

attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with
const attribute_group. So mark the non-const structs as const.

File size before:
text data bss dec hex filename
4402 1088 38 5528 1598 fs/ocfs2/stackglue.o

File size After adding 'const':
text data bss dec hex filename
4442 1024 38 5504 1580 fs/ocfs2/stackglue.o

Link: http://lkml.kernel.org/r/cab4e59b4918db3ed2ec77073a4cb310c4429ef5.1498808026.git.arvind.yadav.cs@gmail.com
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e7ee2c08 10-Jan-2017 Eric Ren <zren@suse.com>

ocfs2: fix crash caused by stale lvb with fsdlm plugin

The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.

The crash backtrace is below:

dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
ocfs2: Beginning quota recovery on device (253,18) for slot 2
ocfs2: Finishing quota recovery on device (253,18) for slot 2
(truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
(truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
------------[ cut here ]------------
kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
invalid opcode: 0000 [#1] SMP
Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: No, Unsupported modules are loaded
CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282
RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
Call Trace:
ocfs2_setattr+0x698/0xa90 [ocfs2]
notify_change+0x1ae/0x380
do_truncate+0x5e/0x90
do_sys_ftruncate.constprop.11+0x108/0x160
entry_SYSCALL_64_fastpath+0x12/0x6d
Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size. We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us. But, why?

The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB. This is not the right way for
fsdlm.

The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.

The following diagram briefly illustrates how this crash happens:

RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

The 1st round:

Node1 Node2
RSB1: PR
RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
convert_lock(overwrite lkb->lkb_exflags
with no DLM_LKF_VALBLK)

RSB1: NULL RSB1: EX
reset Node2
dlm_recover_rsbs()
recover_lvb()

/* The LVB is not trustable if the node with EX fails and
* no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
*/

if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
return; * to invalid the LVB here.
*/

The 2nd round:

Node 1 Node2
RSB1(become master from recovery)

ocfs2_setattr()
ocfs2_inode_lock(NULL->EX)
/* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
ocfs2_truncate_file()
mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */

The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.

Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 191df2b5 26-Jul-2016 Eric Ren <zren@suse.com>

ocfs2: fix a redundant re-initialization

Obviously, memset() has zeroed the whole struct locking_max_version.
So, it's no need to zero its two fields individually.

Link: http://lkml.kernel.org/r/1463970605-18354-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9dde5e4f 22-Mar-2016 Gang He <ghe@suse.com>

ocfs2: export ocfs2_kset for online file check

When there are errors in the ocfs2 filesystem, they are usually
accompanied by the inode number which caused the error. This inode
number would be the input to fixing the file. One of these options
could be considered:

A file in the sys filesytem which would accept inode numbers. This
could be used to communication back what has to be fixed or is fixed.
You could write:

$# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/check

or

$# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/fix

Compare with second version, I re-design filecheck sysfs interfaces,
there are three sysfs files (check, fix and set) under filecheck
directory (see above), sysfs will accept only one argument <inode>.
Second, I adjust some code in ocfs2_filecheck_repair_inode_block()
function according to upstream feedback, we cannot just add VALID_FL
flag back as a inode block fix, then we will not fix this field
corruption currently until having a complete solution. Compare with
first version, I use strncasecmp instead of double strncmp functions.
Second, update the source file contribution vendor.

This patch (of 4):

Export ocfs2_kset object from ocfs2_stackglue kernel module, then online
file check code will create the related sysfiles under ocfs2_kset
object. We're exporting this because it's built in ocfs2_stackglue.ko.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a5c4e2a 04-Jun-2014 Fabian Frederick <fabf@skynet.be>

ocfs2: remove NULL assignments on static

Static values are automatically initialized to NULL.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 765aabbb 03-Apr-2014 Goldwyn Rodrigues <rgoldwyn@suse.de>

ocfs2: add dlm_recover_callback_support in sysfs

This is a part of the nocontrold feature which was incorporated sometime
back.

This is required for backward compatibility of the tools, specifically
the scenario where the tools with recovery callback is used with a
kernel not using the recovery callbacks (older kernel + newer tools).
The tools look for this file to understand if the kernel supports DLM
recovery callbacks.

For kernels which support recovery callbacks but will miss this patch,
ocfs2 will continue to use the older API and would still be able to
mount the filesystem.

[akpm@linux-foundation.org: simplify]
[sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up]
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d9060742 28-Mar-2014 Sasha Levin <sasha.levin@oracle.com>

ocfs2: check if cluster name exists before deref

Commit c74a3bdd9b52 ("ocfs2: add clustername to cluster connection") is
trying to strlcpy a string which was explicitly passed as NULL in the
very same patch, triggering a NULL ptr deref.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: strlcpy (lib/string.c:388 lib/string.c:151)
CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G W 3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
RIP: strlcpy (lib/string.c:388 lib/string.c:151)
Call Trace:
ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
vfs_mkdir (fs/namei.c:3467)
SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
tracesys (arch/x86/kernel/entry_64.S:749)

akpm: this patch probably disables the feature. A temporary thing to
avoid triviel oopses.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 58f86cc8 23-Mar-2014 Rusty Russell <rusty@rustcorp.com.au>

VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.

Summary of http://lkml.org/lkml/2014/3/14/363 :

Ted: module_param(queue_depth, int, 444)
Joe: 0444!
Rusty: User perms >= group perms >= other perms?
Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?

Side effect of stricter permissions means removing the unnecessary
S_IFREG from several callers.

Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
number of drivers fail this test, so that will be the debate for a
future patch.

Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 3e834151 21-Jan-2014 Goldwyn Rodrigues <rgoldwyn@suse.de>

ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node

This is done to differentiate between using and not using controld and
use the connection information accordingly.

We need to be backward compatible. So, we use a new enum
ocfs2_connection_type to identify when controld is used and when it is
not.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c74a3bdd 21-Jan-2014 Goldwyn Rodrigues <rgoldwyn@suse.de>

ocfs2: add clustername to cluster connection

This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x). AFAIK, cman also is being phased out for a unified corosync
cluster stack.

fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2. For all future
references, DLM stands for fs/dlm code.

The advantages are:
+ No need to run an additional userspace daemon (ocfs2_controld)
+ No controld device handling and controld protocol
+ Shifting responsibilities of node management to DLM layer

For backward compatibility, we are keeping the controld handling code.
Once enough time has passed we can remove a significant portion of the
code. This was tested by using the kernel with changes on older
unmodified tools. The kernel used ocfs2_controld as expected, and
displayed the appropriate warning message.

This feature requires modification in the userspace ocfs2-tools. The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.

This patch (of 6):

Add clustername to cluster connection.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d00d2f8a 12-Nov-2013 Joe Perches <joe@perches.com>

ocfs2: convert use of typedef ctl_table to struct ctl_table

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cbe0e331 30-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2_dlmfs: Enable the use of user cluster stacks.

Unlike ocfs2, dlmfs has no permanent storage. It can't store off a
cluster stack it is supposed to be using. So it can't specify the stack
name in ocfs2_cluster_connect().

Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses
the stack that is currently enabled. This is find for dlmfs, which will
rely on the stack initialization.

We add the "stackglue" capability to dlmfs's capability list. This lets
userspace know dlmfs can be used with all cluster stacks.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 553b5eb9 29-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Pass the locking protocol into ocfs2_cluster_connect().

Inside the stackglue, the locking protocol structure is hanging off of
the ocfs2_cluster_connection. This takes it one further; the locking
protocol is passed into ocfs2_cluster_connect(). Now different cluster
connections can have different locking protocols with distinct asts.
Note that all locking protocols have to keep their maximum protocol
version in lock-step.

With the protocol structure set in ocfs2_cluster_connect(), there is no
need for the stackglue to have a static pointer to a specific protocol
structure. We can change initialization to only pass in the maximum
protocol version.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# e603cfb0 29-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Remove the ast pointers from ocfs2_stack_plugins

With the full ocfs2_locking_protocol hanging off of the
ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers
there. They don't need to get them from their plugin structure.

The user plugin still needs the maximum locking protocol version,
though. This changes the plugin structure so that it only holds the max
version, not the entire ocfs2_locking_protocol pointer.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 110946c8 29-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Hang the locking proto on the cluster conn and use it in asts.

With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we
have access to it in the ast and bast wrapper functions. Attach the
ocfs2_locking_protocol to the conn.

Now, instead of refering to a static variable for ast/bast pointers, the
wrappers can look at the connection. This means different connections
can have different ast/bast pointers, and it reduces the need for the
static pointer.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# c0e41338 29-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Attach the connection to the lksb

We're going to want it in the ast functions, so we convert union
ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# a796d286 28-Jan-2010 Joel Becker <joel.becker@oracle.com>

ocfs2: Pass lksbs back from stackglue ast/bast functions.

The stackglue ast and bast functions tried to maintain the fiction that
their arguments were void pointers. In reality, stack_user.c had to
know that the argument was an ocfs2_lock_res in order to get the status
off of the lksb. That's ugly.

This changes stackglue to always pass the lksb as the argument to ast
and bast functions. The caller can always use container_of() to get the
ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
zero. They still get back the lockres in their ast. stackglue gets
cleaner, and now can use the lksb itself.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# 6d456111 16-Nov-2009 Eric W. Biederman <ebiederm@xmission.com>

sysctl: Drop & in front of every proc_handler.

For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# ab09203e 05-Nov-2009 Eric W. Biederman <ebiederm@xmission.com>

sysctl fs: Remove dead binary sysctl support

Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.

Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# 1c520dfb 19-Jun-2009 Joel Becker <joel.becker@oracle.com>

ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.

The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state. Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state. This is not a standard behavior, but ocfs2 has
always relied on it. Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag. As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
the LVB is valid. o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>


# 009d3750 06-Oct-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: Remove pointless !!

ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
one value.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 53da4939 21-Jul-2008 Mark Fasheh <mfasheh@suse.com>

ocfs2: POSIX file locks support

This is actually pretty easy since fs/dlm already handles the bulk of the
work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
underlying lock manager, so I only had to add the right calls.

Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
Internally, the file system uses two sets of file_operations, depending on
whether cluster aware plocks is required. This turns out to be easier than
implementing local-only versions of ->lock.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# d6817cdb 22-Aug-2008 Joel Becker <Joel.Becker@oracle.com>

ocfs2: Increment the reference count of an already-active stack.

The ocfs2_stack_driver_request() function failed to increment the
refcount of an already-active stack. It only did the increment on the
first reference. Whoops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Tested-by: Marcos Matsunaga <marcos.matsunaga@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 2c39450b 30-May-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Remove ->hangup() from stack glue operations.

The ->hangup() call was only used to execute ocfs2_hb_ctl. Now that
the generic stack glue code handles this, the underlying stack drivers
don't need to know about it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 9f9a99f4 30-May-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Move the call of ocfs2_hb_ctl into the stack glue.

Take o2hb_stop() out of the o2cb code and make it part of the generic
stack glue as ocfs2_leave_group(). This also allows us to remove the
ocfs2_get_hb_ctl_path() function - everything to do with hb_ctl is now
part of stackglue.c. o2cb no longer needs a ->hangup() function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 3878f110 30-May-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Move the hb_ctl_path sysctl into the stack glue.

ocfs2 needs to call out to the hb_ctl program at unmount for all cluster
stacks. The first step is to move the hb_ctl_path sysctl out of the
o2cb code and into the generic stack glue.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# cf4d8d75 20-Feb-2008 David Teigland <teigland@redhat.com>

ocfs2: add fsdlm to stackglue

Add code to use fs/dlm.

[ Modified to be part of the stack_user module -- Joel ]

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 9c6c877c 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Add the 'cluster_stack' sysfs file.

Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
the classic stack. Thus, old tools that do not know how to modify this
file will work just fine. The stack cannot be modified if there is a
live filesystem.

ocfs2_cluster_connect() now takes the expected cluster stack as an
argument. This way, the filesystem and the stack glue ensure they are
speaking to the same backend.

If the stack is 'o2cb', the o2cb stack plugin is used. For any other
value, the fsdlm stack plugin is selected.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 74ae4e10 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Create stack glue sysfs files.

Introduce a set of sysfs files that describe the current stack glue
state. The files live under /sys/fs/ocfs2. The locking_protocol file
displays the version of ocfs2's locking code. The
loaded_cluster_plugins file displays all of the currently loaded stack
plugins. When filesystems are mounted, the active_cluster_plugin file
will display the plugin in use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 286eaa95 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Break out stackglue into modules.

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c. This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module. This module now provides an interface to
register drivers. The ocfs2_stack_o2cb driver registers itself. As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.

Signed-off-by: Joel Becker <joel.becker@oracle.com>


# e3dad42b 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Create ocfs2_stack_operations and split out the o2cb stack.

Define the ocfs2_stack_operations structure. Build o2cb_stack_ops from
all of the o2cb-specific stack functions. Change the generic stack glue
functions to call the stack_ops instead of the o2cb functions directly.

The o2cb functions are moved to stack_o2cb.c. The headers are cleaned up
to where only needed headers are included.

In this code, stackglue.c and stack_o2cb.c refer to some shared
extern variables. When they become modules, that will change.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 553aa7e4 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Split o2cb code from generic stack functions.

Split off the o2cb-specific funtionality from the generic stack glue
calls. This is a precurser to wrapping the o2cb functionality in an
operations vector.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 63e0c48a 30-Jan-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Clean up stackglue initialization

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# cf0acdcd 29-Jan-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Abstract out a debugging function for underlying dlms.

dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's
create a generic ocfs2_dlm_dump_lksb() function. This allows underlying
DLMs to print whatever they want about their lock.

We then move the o2dlm dump into stackglue.c where it belongs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# de551246 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Remove CANCELGRANT from the view of dlmglue.

o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling). This is called CANCELGRANT after the
status code sent to the callback. fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it. ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.

Making these changes opens up a locking race. dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time. But dlmglue must unlock the lockres before calling into the
dlm. In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock. The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.

Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op. There's nothing to cancel. With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel. Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed. The downconvert thread can then do its
downconvert.

Without CANCELGRANT, there is nothing to clean up the cancellation
state. The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us. With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way. dlmglue is deadlocked.

The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
can check whether the lock is cancelable. If not, it just loops around
to try again. Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 6953b4c0 29-Jan-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Move o2hb functionality into the stack glue.

The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.

We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl. Other stacks will just leave hangup() empty.

The check for heartbeat is moved into ocfs2_cluster_connect(). It will
be matched by a similar check for other stacks.

With this change, only stackglue.c includes cluster/ headers.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 19fdb624 30-Jan-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Abstract out node number queries.

ocfs2 asks the cluster stack for the local node's node number for two
reasons; to fill the slot map and to print it. While the slot map isn't
necessary for userspace cluster stacks, the printing is very nice for
debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
this value. It is anticipated that the slot map will not be used under a
userspace cluster stack, so validity checks of the node num only need to
exist in the slot map code. Otherwise, it just gets used and printed as an
opaque value.

[ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
truly opaque. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 4670c46d 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.

This step introduces a cluster stack agnostic API for initializing and
exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack. It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
o2dlm initialization in one block. Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain. Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call. I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect(). The filesystem has already set
itself to ignore the callback.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 8f2c9c1b 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Create the lock status block union.

Wrap the lock status block (lksb) in a union. Later we will add a union
element for the fs/dlm lksb. Create accessors for the status and lvb
fields.

Other than a debugging function, dlmglue.c does not directly reference
the o2dlm locking path anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 7431cd7e 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.

Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
This is the first step towards elminiating dlm_status in
fs/ocfs2/dlmglue.c. The change also passes -errno values to
->unlock_ast().

[ Fix a return code in dlmglue.c and change the error translation table into
an array of ints. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# bd3e7610 01-Feb-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Use global DLM_ constants in generic code.

The ocfs2 generic code should use the values in <linux/dlmconstants.h>.
stackglue.c will convert them to o2dlm values.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>


# 24ef1815 29-Jan-2008 Joel Becker <joel.becker@oracle.com>

ocfs2: Separate out dlm lock functions.

This is the first in a series of patches to isolate ocfs2 from the
underlying cluster stack. Here we wrap the dlm locking functions with
ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
callbacks, we can eliminate the callbacks from the filesystem visible
functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>