History log of /linux-master/fs/dlm/user.c
Revision Date Author Comments
# 2ab3d705 12-Mar-2024 Alexander Aring <aahringo@redhat.com>

dlm: fix user space lkb refcounting

This patch fixes to check on the right return value if it was the last
callback. The rv variable got overwritten by the return of
copy_result_to_user(). Fixing it by introducing a second variable for
the return value and don't let rv being overwritten.

Cc: stable@vger.kernel.org
Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
Reported-by: Valentin Vidić <vvidic@valentin-vidic.from.hr>
Closes: https://lore.kernel.org/gfs2/Ze4qSvzGJDt5yxC3@valentin-vidic.from.hr
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 5ce9ef30 29-May-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: move dlm_purge_lkb_callbacks to user module

This patch moves the dlm_purge_lkb_callbacks() function from ast to user
dlm module as it is only a function being used by dlm user
implementation. I got be hinted to hold specific locks regarding the
callback handling for dlm_purge_lkb_callbacks() but it was false
positive. It is confusing because ast dlm implementation uses a
different locking behaviour as user locks uses as DLM handles kernel and
user dlm locks differently. To avoid the confusing we move this function
to dlm user implementation.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# e1af8728 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: move internal flags to atomic ops

This patch will move the lkb_flags value to the recently introduced
lkb_iflags value. For lkb_iflags we use atomic bit operations because
some flags like DLM_IFL_CB_PENDING are used while non rsb lock is held
to avoid issues with other flag manipulations which might run at the
same time we switch to atomic bit operations. Snapshot the bit values to
an uint32_t value is only used for debugging/logging use cases and don't
need to be 100% correct.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8a39dcd9 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: change dflags to use atomic bits

Currently manipulating lkb_dflags assumes to held the rsb lock assigned
to the lkb. This is held by dlm message processing after certain
time to lookup the right rsb from the received lkb message id. For user
space locks flags, which is currently the only use case for lkb_dflags,
flags are also being set during dlm character device handling without
holding the rsb lock. To minimize the risk that bit operations are
getting corrupted we switch to atomic bit operations. This patch will
also introduce helpers to snapshot atomic bit values in an non atomic
way. There might be still issues with the flag handling e.g. running in
case of manipulating bit ops and snapshot them at the same time, but this
patch minimize them and will start to use atomic bit operations.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8c11ba64 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: store lkb distributed flags into own value

This patch stores lkb distributed flags value in an separate value
instead of sharing internal and distributed flags in lkb->lkb_flags value.
This has the advantage to not mask/write back flag values in
receive_flags() functionality. The dlm debug_fs does not provide the
distributed flags anymore, those can be added in future.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 01c7a597 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove deprecated code parts

This patch removes code parts which was declared deprecated by
commit 6b0afc0cc3e9 ("fs: dlm: don't use deprecated timeout features by
default"). This contains the following dlm functionality:

- start a cancel of a dlm request did not complete after certain timeout:
The current way how dlm cancellation works and interfering with other
dlm requests triggered by the user can end in an overlapping and
returning in -EBUSY. The most user don't handle this case and are
unaware that DLM can return such errno in such situation. Due the
timeout the user are mostly unaware when this happens.
- start a netlink warning messages for user space if dlm requests did
not complete after certain timeout:
This feature was never being built in the only known dlm user space side.
As we are to remove the timeout cancellation feature we can directly
remove this feature as well.

There might be the possibility to bring the timeout cancellation feature
back. However the current way of handling the -EBUSY case which is only
a software limitation and not a hardware limitation should be changed.
We minimize the current code base in DLM cancellation feature to not have
to deal with those existing features while solving the DLM cancellation
feature in general.

UAPI define DLM_LSFL_TIMEWARN is commented as deprecated and reserved
value. We should avoid at first to give it a new meaning but let
possible users still compile by keeping this define. In far future we
can give this flag a new meaning. The same for the DLM_LKF_TIMEOUT lock
request flag.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a034c137 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten

This patch introduce a new internal flag per lkb value to handle
internal flags which are handled not on wire. The current lkb internal
flags stored as lkb->lkb_flags are split in upper and lower bits, the
lower bits are used to share internal flags over wire for other cluster
wide lkb copies on other nodes.

In commit 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
we introduced a new internal flag for pending callbacks for the dlm
callback queue. This flag is protected by the lkb->lkb_cb_lock lock.
This patch overlooked that on dlm receive path and the mentioned upper
and lower bits, that dlm will read the flags, mask it and write it
back. As example receive_flags() in fs/dlm/lock.c. This flag
manipulation is not done atomically and is not protected by
lkb->lkb_cb_lock. This has unknown side effects of the current callback
handling.

In future we should move to set/clear/test bit functionality and avoid
read, mask and writing back flag values. In later patches we will move
the upper parts to the new introduced internal lkb flags which are not
shared between other cluster nodes to the new non shared internal flag
field to avoid similar issues.

Cc: stable@vger.kernel.org
Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 554d8496 17-Nov-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING

This patch renames DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING because
CB_PENDING is a proper name to describe this flag. This flag is set when
callback enqueue will return DLM_ENQUEUE_CALLBACK_NEED_SCHED because the
callback worker need to be queued. The flag tells that callbacks are
currently pending to be called and will be unset if the callback work
for the specific lkb is done. The term need schedule is part of this
time but a proper name is to say that there are some callbacks pending
to being called.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 740bb8fc 17-Nov-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: ast do WARN_ON_ONCE() on hotpath

This patch changes the ast hotpath functionality in very unlikely cases
that we do WARN_ON_ONCE() instead of WARN_ON() to not spamming the
console output if we run into states that it would occur over and over
again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 61bed0ba 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: use a non-static queue for callbacks

This patch will introducde a queue implementation for callbacks by using
the Linux lists. The current callback queue handling is implemented by a
static limit of 6 entries, see DLM_CALLBACKS_SIZE. The sequence number
inside the callback structure was used to see if the entries inside the
static entry is valid or not. We don't need any sequence numbers anymore
with a dynamic datastructure with grows and shrinks during runtime to
offer such functionality.

We assume that every callback will be delivered to the DLM user if once
queued. Therefore the callback flag DLM_CB_SKIP was dropped and the
check for skipping bast was moved before worker handling and not skip
while the callback worker executes. This will reduce unnecessary queues
of the callback worker.

All last callback saves are pointers now and don't need to copied over.
There is a reference counter for callback structures which will care
about to free the callback structures at the right time if they are not
referenced anymore.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# d3e4dc5d 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: use list_first_entry marco

Instead of using list_entry() this patch moves to using the
list_first_entry() macro.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 12cda13c 15-Aug-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove DLM_LSFL_FS from uapi

The DLM_LSFL_FS flag is set in lockspaces created directly
for a kernel user, as opposed to those lockspaces created
for user space applications. The user space libdlm allowed
this flag to be set for lockspaces created from user space,
but then used by a kernel user. No kernel user has ever
used this method, so remove the ability to do it.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 7a3de732 15-Aug-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: trace user space callbacks

This patch adds trace callbacks for user locks. Unfortenately user locks
are handled in a different way than kernel locks in some cases. User
locks never call the dlm_lock()/dlm_unlock() kernel API and use the next
step internal API of dlm. Adding those traces from user API callers
should make it possible for dlm trace system to see lock handling for
user locks as well.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 296d9d1e 15-Aug-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: change ls_clear_proc_locks to spinlock

This patch changes the ls_clear_proc_locks to a spinlock because there
is no need to handle it as a mutex as there is no sleepable context when
ls_clear_proc_locks is held. This allows us to call those functionality
in non-sleepable contexts.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 6b0afc0c 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: don't use deprecated timeout features by default

This patch will disable use of deprecated timeout features if
CONFIG_DLM_DEPRECATED_API is not set. The deprecated features
will be removed in upcoming kernel release v6.2.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 81eeb82f 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: add deprecation Kconfig and warnings for timeouts

This patch adds a CONFIG_DLM_DEPRECATED_API Kconfig option
that must be enabled to use two timeout-related features
that we intend to remove in kernel v6.2. Warnings are
printed if either is enabled and used. Neither has ever
been used as far as we know.

. The DLM_LSFL_TIMEWARN lockspace creation flag will be
removed, along with the associated configfs entry for
setting the timeout. Setting the flag and configfs file
would cause dlm to track how long locks were waiting
for reply messages. After a timeout, a kernel message
would be logged, and a netlink message would be sent
to userspace. Recently, midcomms messages have been
added that produce much better logging about actual
problems with messages. No use has ever been found
for the netlink messages.

. The userspace libdlm API has allowed the DLM_LKF_TIMEOUT
flag with a timeout value to be set in lock requests.
The lock request would be cancelled after the timeout.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8d614a44 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove timeout from dlm_user_adopt_orphan

Remove the unused timeout parameter from dlm_user_adopt_orphan().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# c087eabd 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: remove __user conversion warnings

This patch avoids the following sparse warning:

fs/dlm/user.c:111:38: warning: incorrect type in assignment (different address spaces)
fs/dlm/user.c:111:38: expected void [noderef] __user *castparam
fs/dlm/user.c:111:38: got void *
fs/dlm/user.c:112:37: warning: incorrect type in assignment (different address spaces)
fs/dlm/user.c:112:37: expected void [noderef] __user *castaddr
fs/dlm/user.c:112:37: got void *
fs/dlm/user.c:113:38: warning: incorrect type in assignment (different address spaces)
fs/dlm/user.c:113:38: expected void [noderef] __user *bastparam
fs/dlm/user.c:113:38: got void *
fs/dlm/user.c:114:37: warning: incorrect type in assignment (different address spaces)
fs/dlm/user.c:114:37: expected void [noderef] __user *bastaddr
fs/dlm/user.c:114:37: got void *
fs/dlm/user.c:115:33: warning: incorrect type in assignment (different address spaces)
fs/dlm/user.c:115:33: expected struct dlm_lksb [noderef] __user *lksb
fs/dlm/user.c:115:33: got void *
fs/dlm/user.c:130:39: warning: cast removes address space '__user' of expression
fs/dlm/user.c:131:40: warning: cast removes address space '__user' of expression
fs/dlm/user.c:132:36: warning: cast removes address space '__user' of expression

So far I see there is no direct handling of copying a pointer value to
another pointer value. The handling only copies the actual pointer
address to a scalar type or vice versa. This should be okay because it
never handles dereferencing anything of those addresses in the kernel
space. To get rid of those warnings we doing some different casting
which results in no warnings in sparse or compiler.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3c80d379 09-Mar-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

dlm: user: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2522fe45 28-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 3595c559 03-Dec-2018 David Teigland <teigland@redhat.com>

dlm: fix invalid cluster name warning

The warning added in commit 3b0e761ba83
"dlm: print log message when cluster name is not set"

did not account for the fact that lockspaces created
from userland do not supply a cluster name, so bogus
warnings are printed every time a userland lockspace
is created.

Signed-off-by: David Teigland <teigland@redhat.com>


# 9de30f3f 02-Nov-2018 Tycho Andersen <tycho@tycho.pizza>

dlm: don't leak kernel pointer to userspace

In copy_result_to_user(), we first create a struct dlm_lock_result, which
contains a struct dlm_lksb, the last member of which is a pointer to the
lvb. Unfortunately, we copy the entire struct dlm_lksb to the result
struct, which is then copied to userspace at the end of the function,
leaking the contents of sb_lvbptr, which is a valid kernel pointer in some
cases (indeed, later in the same function the data it points to is copied
to userspace).

It is an error to leak kernel pointers to userspace, as it undermines KASLR
protections (see e.g. 65eea8edc31 ("floppy: Do not copy a kernel pointer to
user memory in FDGETPRM ioctl") for another example of this).

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: David Teigland <teigland@redhat.com>


# a9a08845 11-Feb-2018 Linus Torvalds <torvalds@linux-foundation.org>

vfs: do bulk POLL* -> EPOLL* replacement

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 076ccb76 02-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

fs: annotate ->poll() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 55acdd92 03-Aug-2017 Edwin Török <edvin.torok@citrix.com>

dlm: avoid double-free on error path in dlm_device_{register,unregister}

Can be reproduced when running dlm_controld (tested on 4.4.x, 4.12.4):
# seq 1 100 | xargs -P0 -n1 dlm_tool join
# seq 1 100 | xargs -P0 -n1 dlm_tool leave

misc_register fails due to duplicate sysfs entry, which causes
dlm_device_register to free ls->ls_device.name.
In dlm_device_deregister the name was freed again, causing memory
corruption.

According to the comment in dlm_device_deregister the name should've been
set to NULL when registration fails,
so this patch does that.

sysfs: cannot create duplicate filename '/dev/char/10:1'
------------[ cut here ]------------
warning: cpu: 1 pid: 4450 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x56/0x70
modules linked in: msr rfcomm dlm ccm bnep dm_crypt uvcvideo
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev
btusb media btrtl btbcm btintel bluetooth ecdh_generic intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
snd_hda_codec_hdmi irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel thinkpad_acpi pcbc nvram snd_seq_midi
snd_seq_midi_event aesni_intel snd_hda_codec_realtek snd_hda_codec_generic
snd_rawmidi aes_x86_64 crypto_simd glue_helper snd_hda_intel snd_hda_codec
cryptd intel_cstate arc4 snd_hda_core snd_seq snd_seq_device snd_hwdep
iwldvm intel_rapl_perf mac80211 joydev input_leds iwlwifi serio_raw
cfg80211 snd_pcm shpchp snd_timer snd mac_hid mei_me lpc_ich mei soundcore
sunrpc parport_pc ppdev lp parport autofs4 i915 psmouse
e1000e ahci libahci i2c_algo_bit sdhci_pci ptp drm_kms_helper sdhci
pps_core syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi video
cpu: 1 pid: 4450 comm: dlm_test.exe not tainted 4.12.4-041204-generic
hardware name: lenovo 232425u/232425u, bios g2et82ww (2.02 ) 09/11/2012
task: ffff96b0cbabe140 task.stack: ffffb199027d0000
rip: 0010:sysfs_warn_dup+0x56/0x70
rsp: 0018:ffffb199027d3c58 eflags: 00010282
rax: 0000000000000038 rbx: ffff96b0e2c49158 rcx: 0000000000000006
rdx: 0000000000000000 rsi: 0000000000000086 rdi: ffff96b15e24dcc0
rbp: ffffb199027d3c70 r08: 0000000000000001 r09: 0000000000000721
r10: ffffb199027d3c00 r11: 0000000000000721 r12: ffffb199027d3cd1
r13: ffff96b1592088f0 r14: 0000000000000001 r15: ffffffffffffffef
fs: 00007f78069c0700(0000) gs:ffff96b15e240000(0000)
knlgs:0000000000000000
cs: 0010 ds: 0000 es: 0000 cr0: 0000000080050033
cr2: 000000178625ed28 cr3: 0000000091d3e000 cr4: 00000000001406e0
call trace:
sysfs_do_create_link_sd.isra.2+0x9e/0xb0
sysfs_create_link+0x25/0x40
device_add+0x5a9/0x640
device_create_groups_vargs+0xe0/0xf0
device_create_with_groups+0x3f/0x60
? snprintf+0x45/0x70
misc_register+0x140/0x180
device_write+0x6a8/0x790 [dlm]
__vfs_write+0x37/0x160
? apparmor_file_permission+0x1a/0x20
? security_file_permission+0x3b/0xc0
vfs_write+0xb5/0x1a0
sys_write+0x55/0xc0
? sys_fcntl+0x5d/0xb0
entry_syscall_64_fastpath+0x1e/0xa9
rip: 0033:0x7f78083454bd
rsp: 002b:00007f78069bbd30 eflags: 00000293 orig_rax: 0000000000000001
rax: ffffffffffffffda rbx: 0000000000000006 rcx: 00007f78083454bd
rdx: 000000000000009c rsi: 00007f78069bee00 rdi: 0000000000000005
rbp: 00007f77f8000a20 r08: 000000000000fcf0 r09: 0000000000000032
r10: 0000000000000024 r11: 0000000000000293 r12: 00007f78069bde00
r13: 00007f78069bee00 r14: 000000000000000a r15: 00007f78069bbd70
code: 85 c0 48 89 c3 74 12 b9 00 10 00 00 48 89 c2 31 f6 4c 89 ef e8 2c c8
ff ff 4c 89 e2 48 89 de 48 c7 c7 b0 8e 0c a8 e8 41 e8 ed ff <0f> ff 48 89
df e8 00 d5 f4 ff 5b 41 5c 41 5d 5d c3 66 0f 1f 84
---[ end trace 40412246357cc9e0 ]---

dlm: 59f24629-ae39-44e2-9030-397ebc2eda26: leaving the lockspace group...
bug: unable to handle kernel null pointer dereference at 0000000000000001
ip: [<ffffffff811a3b4a>] kmem_cache_alloc+0x7a/0x140
pgd 0
oops: 0000 [#1] smp
modules linked in: dlm 8021q garp mrp stp llc openvswitch nf_defrag_ipv6
nf_conntrack libcrc32c iptable_filter dm_multipath crc32_pclmul dm_mod
aesni_intel psmouse aes_x86_64 sg ablk_helper cryptd lrw gf128mul
glue_helper i2c_piix4 nls_utf8 tpm_tis tpm isofs nfsd auth_rpcgss
oid_registry nfs_acl lockd grace sunrpc xen_wdt ip_tables x_tables autofs4
hid_generic usbhid hid sr_mod cdrom sd_mod ata_generic pata_acpi 8139too
serio_raw ata_piix 8139cp mii uhci_hcd ehci_pci ehci_hcd libata
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod ipv6
cpu: 0 pid: 394 comm: systemd-udevd tainted: g w 4.4.0+0 #1
hardware name: xen hvm domu, bios 4.7.2-2.2 05/11/2017
task: ffff880002410000 ti: ffff88000243c000 task.ti: ffff88000243c000
rip: e030:[<ffffffff811a3b4a>] [<ffffffff811a3b4a>]
kmem_cache_alloc+0x7a/0x140
rsp: e02b:ffff88000243fd90 eflags: 00010202
rax: 0000000000000000 rbx: ffff8800029864d0 rcx: 000000000007b36c
rdx: 000000000007b36b rsi: 00000000024000c0 rdi: ffff880036801c00
rbp: ffff88000243fdc0 r08: 0000000000018880 r09: 0000000000000054
r10: 000000000000004a r11: ffff880034ace6c0 r12: 00000000024000c0
r13: ffff880036801c00 r14: 0000000000000001 r15: ffffffff8118dcc2
fs: 00007f0ab77548c0(0000) gs:ffff880036e00000(0000) knlgs:0000000000000000
cs: e033 ds: 0000 es: 0000 cr0: 0000000080050033
cr2: 0000000000000001 cr3: 000000000332d000 cr4: 0000000000040660
stack:
ffffffff8118dc90 ffff8800029864d0 0000000000000000 ffff88003430b0b0
ffff880034b78320 ffff88003430b0b0 ffff88000243fdf8 ffffffff8118dcc2
ffff8800349c6700 ffff8800029864d0 000000000000000b 00007f0ab7754b90
call trace:
[<ffffffff8118dc90>] ? anon_vma_fork+0x60/0x140
[<ffffffff8118dcc2>] anon_vma_fork+0x92/0x140
[<ffffffff8107033e>] copy_process+0xcae/0x1a80
[<ffffffff8107128b>] _do_fork+0x8b/0x2d0
[<ffffffff81071579>] sys_clone+0x19/0x20
[<ffffffff815a30ae>] entry_syscall_64_fastpath+0x12/0x71
] code: f6 75 1c 4c 89 fa 44 89 e6 4c 89 ef e8 a7 e4 00 00 41 f7 c4 00 80
00 00 49 89 c6 74 47 eb 32 49 63 45 20 48 8d 4a 01 4d 8b 45 00 <49> 8b 1c
06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 ac 49 63
rip [<ffffffff811a3b4a>] kmem_cache_alloc+0x7a/0x140
rsp <ffff88000243fd90>
cr2: 0000000000000001
--[ end trace 70cb9fd1b164a0e8 ]--

CC: stable@vger.kernel.org
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8286d6b1 22-Feb-2017 Vlad Tsyrklevich <vlad@tsyrklevich.net>

dlm: Fix kernel memory disclosure

Clear the 'unused' field and the uninitialized padding in 'lksb' to
avoid leaking memory to userland in copy_result_to_user().

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: David Teigland <teigland@redhat.com>


# 174cd4b1 02-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7963b8a5 19-Sep-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

dlm: audit and remove any unnecessary uses of module.h

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.

In the case of some code where it is modular, we can extend that to
also include files that are building basic support functionality but
not related to loading or registering the final module; such files
also have no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h might have been the implicit source for init.h
(for __init) and for export.h (for EXPORT_SYMBOL) we consider each
instance for the presence of either and replace as needed.

In the dlm case, we remove module.h from a global header and only
introduce it in the files where it is explicitly required, since
there is nothing modular in dlm_internal.h itself.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 117aa41e 21-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

[regression] fix braino in fs/dlm/user.c

it's "bugger off if we got ERR_PTR", not the other way round...

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 16e5c1fc 23-Dec-2015 Al Viro <viro@zeniv.linux.org.uk>

convert a bunch of open-coded instances of memdup_user_nul()

A _lot_ of ->write() instances were open-coding it; some are
converted to memdup_user_nul(), a lot more remain...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# b96f4650 24-Aug-2015 David Teigland <teigland@redhat.com>

dlm: fix lvb copy for user locks

For a userland lock request, the previous and current
lock modes are used to decide when the lvb should be
copied back to the user. The wrong previous value was
used, so that it always matched the current value.
This caused the lvb to be copied back to the user in
the wrong cases.

Signed-off-by: David Teigland <teigland@redhat.com>


# f368ed60 30-Jul-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

char: make misc_deregister a void function

With well over 200+ users of this api, there are a mere 12 users that
actually checked the return value of this function. And all of them
really didn't do anything with that information as the system or module
was shutting down no matter what.

So stop pretending like it matters, and just return void from
misc_deregister(). If something goes wrong in the call, you will get a
WARNING splat in the syslog so you know how to fix up your driver.
Other than that, there's nothing that can go wrong.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 2ab4bd8e 17-Oct-2014 David Teigland <teigland@redhat.com>

dlm: adopt orphan locks

A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock. Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.

An adopting process must specify the resource name and mode
of the lock it wants to adopt. If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.

If an orphan with a different mode is found, then -EAGAIN is
returned. If no orphan lock is found on the resource, then
-ENOENT is returned. No async completion is used because
the result is immediately available.

Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.

Signed-off-by: David Teigland <teigland@redhat.com>


# c6ca7bc9 12-Aug-2013 David Teigland <teigland@redhat.com>

dlm: remove signal blocking

The signal blocking was incorrect and unnecessary
so just remove it.

Signed-off-by: David Teigland <teigland@redhat.com>


# 201d3dfa 09-Aug-2013 Oleg Nesterov <oleg@redhat.com>

dlm: kill the unnecessary and wrong device_close()->recalc_sigpending()

device_close()->recalc_sigpending() is not needed, sigprocmask() takes
care of TIF_SIGPENDING correctly.

And without ->siglock it is racy and wrong, it can wrongly clear
TIF_SIGPENDING and miss a signal.

But even with this patch device_close() is still buggy:

1. sigprocmask() should not be used, we have set_task_blocked(),
but this is minor.

2. We should never block SIGKILL or SIGSTOP, and this is what
the code tries to do.

3. This can't protect against SIGKILL or SIGSTOP anyway. Another
thread can do signal_wake_up(), say, do_signal_stop() or
complete_signal() or debugger.

4. sigprocmask(SIG_BLOCK, allsigs) doesn't necessarily clears
TIF_SIGPENDING, say, freezing() or ->jobctl.

5. device_write() looks equally wrong by the same reason.

Looks like, this tries to protect some wait_event_interruptible() logic
from signals, it should be turned into uninterruptible wait. Or we need
to implement something like signals_stop/start for such a use-case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d4b0bcf3 04-Feb-2013 David Teigland <teigland@redhat.com>

dlm: check the write size from user

Return EINVAL from write if the size is larger than
allowed. Do this before allocating kernel memory for
the bogus size, which could lead to OOM.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Jana Saout <jana@saout.de>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2b75bc91 09-Sep-2012 Sasha Levin <levinsasha928@gmail.com>

dlm: check the maximum size of a request from user

device_write only checks whether the request size is big enough, but it doesn't
check if the size is too big.

At that point, it also tries to allocate as much memory as the user has requested
even if it's too much. This can lead to OOM killer kicking in, or memory corruption
if (count + 1) overflows.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 60f98d18 02-Nov-2011 David Teigland <teigland@redhat.com>

dlm: add recovery callbacks

These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.

In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts. These callbacks allow the same
coordination directly, and more simply.

Signed-off-by: David Teigland <teigland@redhat.com>


# 23e8e1aa 05-Apr-2011 David Teigland <teigland@redhat.com>

dlm: use workqueue for callbacks

Instead of creating our own kthread (dlm_astd) to deliver
callbacks for all lockspaces, use a per-lockspace workqueue
to deliver the callbacks. This eliminates complications and
slowdowns from many lockspaces sharing the same thread.

Signed-off-by: David Teigland <teigland@redhat.com>


# 4bcad6c1 24-Mar-2011 Matt Fleming <matt.fleming@linux.intel.com>

dlm: Remove superfluous call to recalc_sigpending()

recalc_sigpending() is called within sigprocmask(), so there is no
need call it again after sigprocmask() has returned.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8304d6f2 21-Feb-2011 David Teigland <teigland@redhat.com>

dlm: record full callback state

Change how callbacks are recorded for locks. Previously, information
about multiple callbacks was combined into a couple of variables that
indicated what the end result should be. In some situations, we
could not tell from this combined state what the exact sequence of
callbacks were, and would end up either delivering the callbacks in
the wrong order, or suppress redundant callbacks incorrectly. This
new approach records all the data for each callback, leaving no
uncertainty about what needs to be delivered.

Signed-off-by: David Teigland <teigland@redhat.com>


# 6038f373 15-Aug-2010 Arnd Bergmann <arnd@arndb.de>

llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>


# 89d799d0 25-Mar-2010 David Teigland <teigland@redhat.com>

dlm: fix ast ordering for user locks

Commit 7fe2b3190b8b299409f13cf3a6f85c2bd371f8bb fixed possible
misordering of completion asts (casts) and blocking asts (basts)
for kernel locks. This patch does the same for locks taken by
user space applications.

Signed-off-by: David Teigland <teigland@redhat.com>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 7fe2b319 24-Feb-2010 David Teigland <teigland@redhat.com>

dlm: fix ordering of bast and cast

When both blocking and completion callbacks are queued for lock,
the dlm would always deliver the completion callback (cast) first.
In some cases the blocking callback (bast) is queued before the
cast, though, and should be delivered first. This patch keeps
track of the order in which they were queued and delivers them
in that order.

This patch also keeps track of the granted mode in the last cast
and eliminates the following bast if the bast mode is compatible
with the preceding cast mode. This happens when a remotely mastered
lock is demoted, e.g. EX->NL, in which case the local node queues
a cast immediately after sending the demote message. In this way
a cast can be queued for a mode, e.g. NL, that makes an in-transit
bast extraneous.

Signed-off-by: David Teigland <teigland@redhat.com>


# 573c24c4 30-Nov-2009 David Teigland <teigland@redhat.com>

dlm: always use GFP_NOFS

Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
ls_allocation would be GFP_KERNEL for userland lockspaces
and GFP_NOFS for file system lockspaces.

It was discovered that any lockspaces on the system can
affect all others by triggering memory reclaim in the
file system which could in turn call back into the dlm
to acquire locks, deadlocking dlm threads that were
shared by all lockspaces, like dlm_recv.

Signed-off-by: David Teigland <teigland@redhat.com>


# 1fecb1c4 04-Mar-2009 David Teigland <teigland@redhat.com>

dlm: fix length calculation in compat code

Using offsetof() to calculate name length does not work because
it does not produce consistent results with with structure packing.
This caused memcpy to corrupt memory by copying 4 extra bytes off
the end of the buffer on 64 bit kernels with 32 bit userspace
(the only case where this 32/64 compat code is used).

The fix is to calculate name length directly from the start instead
of trying to derive it later using count and offsetof.

Signed-off-by: David Teigland <teigland@redhat.com>


# fd22a51b 09-Dec-2008 David Teigland <teigland@redhat.com>

dlm: improve how bast mode handling

The lkb bastmode value is set in the context of processing the
lock, and read by the dlm_astd thread. Because it's accessed
in these two separate contexts, the writing/reading ought to
be done under a lock. This is simple to do by setting it and
reading it when the lkb is added to and removed from dlm_astd's
callback list which is properly locked.

Signed-off-by: David Teigland <teigland@redhat.com>


# f9f2ed48 03-Sep-2008 David Teigland <teigland@redhat.com>

dlm: remove bkl

BLK from recent pushdown is not needed.

Signed-off-by: David Teigland <teigland@redhat.com>


# dc68c7ed 18-Aug-2008 David Teigland <teigland@redhat.com>

dlm: detect available userspace daemon

If dlm_controld (the userspace daemon that controls the setup and
recovery of the dlm) fails, the kernel should shut down the lockspaces
in the kernel rather than leaving them running. This is detected by
having dlm_controld hold a misc device open while running, and if
the kernel detects a close while the daemon is still needed, it stops
the lockspaces in the kernel.

Knowing that the userspace daemon isn't running also allows the
lockspace create/remove routines to avoid waiting on the daemon
for join/leave operations.

Signed-off-by: David Teigland <teigland@redhat.com>


# 0f8e0d9a 06-Aug-2008 David Teigland <teigland@redhat.com>

dlm: allow multiple lockspace creates

Add a count for lockspace create and release so that create can
be called multiple times to use the lockspace from different places.
Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
the previous behavior of returning -EEXIST if the lockspace already
exists.

Signed-off-by: David Teigland <teigland@redhat.com>


# cb980d9a 29-Jul-2008 David Teigland <teigland@redhat.com>

dlm: add missing kfrees

A couple of unlikely error conditions were missing a kfree on the error
exit path.

Reported-by: Juha Leppanen <juha_motorsportcom@luukku.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 254ae43a 27-May-2008 Masatake YAMATO <yamato@redhat.com>

dlm: check for null in device_write

If `device_write' method is called via "dlm-control",
file->private_data is NULL. (See ctl_device_open() in
user.c. ) Through proc->flags is read.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 514bcc66 20-May-2008 Arnd Bergmann <arnd@arndb.de>

dlm-user: BKL pushdown

Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 30727174 01-Feb-2008 Denis Cheng <crquan@gmail.com>

dlm: add __init and __exit marks to init and exit functions

it moves 365 bytes from .text to .init.text, and 30 bytes from .text to
.exit.text, saves memory.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# d292c0cc 06-Feb-2008 David Teigland <teigland@redhat.com>

dlm: eliminate astparam type casting

Put lkb_astparam in a union with a dlm_user_args pointer to
eliminate a lot of type casting.

Signed-off-by: David Teigland <teigland@redhat.com>


# cb79f199 26-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: dlm/user.c input validation fixes

a) in device_write(): add sentinel NUL byte, making sure that lspace.name will
be NUL-terminated
b) in compat_input() be keep it simple about the amounts of data we are copying.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 0fe410d3 28-Jan-2008 Denis Cheng <crquan@gmail.com>

dlm: static initialization improvements

also change name_prefix from char pointer to char array.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2a79289e 17-Jan-2008 Patrick Caulfeld <pcaulfie@redhat.com>

dlm: Sanity check namelen before copying it

The 32/64 compatibility code in the DLM does not check the validity of
the lock name length passed into it, so it can easily overwrite memory
if the value is rubbish (as early versions of libdlm can cause with
unlock calls, it doesn't zero the field).

This patch restricts the length of the name to the amount of data
actually passed into the call.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# ce5246b9 14-Jan-2008 David Teigland <teigland@redhat.com>

dlm: fix possible use-after-free

The dlm_put_lkb() can free the lkb and its associated ua structure,
so we can't depend on using the ua struct after the put.

Signed-off-by: David Teigland <teigland@redhat.com>


# ba25f9dc 19-Oct-2007 Pavel Emelyanov <xemul@openvz.org>

Use helpers to obtain task pid in printks

The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8b4021fa 29-May-2007 David Teigland <teigland@redhat.com>

[DLM] canceling deadlocked lock

Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock. A completion ast with EDEADLK is returned
to the process waiting for the lock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 84d8cd69 29-May-2007 David Teigland <teigland@redhat.com>

[DLM] timeout fixes

Various fixes related to the new timeout feature:
- add_timeout() missed setting TIMEWARN flag on lkb's when the
TIMEOUT flag was already set
- clear_proc_locks should remove a dead process's locks from the
timeout list
- the end-of-life calculation for user locks needs to consider that
ETIMEDOUT is equivalent to -DLM_ECANCEL
- make initial default timewarn_cs config value visible in configfs
- change bit position of TIMEOUT_CANCEL flag so it's not copied to
a remote master node
- set timestamp on remote lkb's so a lock dump will display the time
they've been waiting

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d7db923e 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] dlm_device interface changes [3/6]

Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface. libdlm
can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
flags can be used from userspace.
- Add an opaque "xid" value for each lock. This "transaction id" will be
used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock. This is used along with the
DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3ae1acf9 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] add lock timeouts and warnings [2/6]

New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock). This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node). The time warnings will be used in the future
to do deadlock detection in userspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fc7c44f0 10-Apr-2007 Patrick Caulfield <pcaulfie@redhat.com>

[DLM] Remove redundant assignment

This patch removes a redundant (and incorrect) assignment from compat_output

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 72c2be77 30-Mar-2007 David Teigland <teigland@redhat.com>

[DLM] interface for purge (2/2)

Add code to accept purge commands from userland.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ef0c2bb0 28-Mar-2007 David Teigland <teigland@redhat.com>

[DLM] overlapping cancel and unlock

Full cancel and force-unlock support. In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock. Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel. This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
can be sent and when it needs to be delayed until a request/lookup
reply is received

- processing request/lookup replies detects when cancel/unlock-force
occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
nothing vs cases where it actually did a cancel; the cancel completion ast
should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 254da030 21-Mar-2007 Patrick Caulfield <pcaulfie@redhat.com>

[DLM] Don't delete misc device if lockspace removal fails

Currently if the lockspace removal fails the misc device associated with a
lockspace is left deleted. After that there is no way to access the orphaned
lockspace from userland.

This patch recreates the misc device if th dlm_release_lockspace fails. I
believe this is better than attempting to remove the lockspace first because
that leaves an unattached device lying around. The potential gap in which there
is no access to the lockspace between removing the misc device and recreating it
is acceptable ... after all the application is trying to remove it, and only new
users of the lockspace will be affected.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 84c6e8cd 25-Feb-2007 Adrian Bunk <bunk@stusta.de>

[DLM] fs/dlm/user.c should #include "user.h"

Every file should include the headers containing the prototypes for
it's global functions.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 00977a59 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] mark struct file_operations const 6

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a1bc86e6 15-Jan-2007 David Teigland <teigland@redhat.com>

[DLM] fix user unlocking

When a user process exits, we clear all the locks it holds. There is a
problem, though, with locks that the process had begun unlocking before it
exited. We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD. To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding. We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c6e6f0ba 30-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] force removal of user lockspace

Check if the FORCEFREE flag has been provided from user space. If so, set
the force option to dlm_release_lockspace() so that any remaining locks
will be freed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 36098198 20-Jul-2006 David Teigland <teigland@redhat.com>

[DLM] fix whitespace damage

My previous dlm patch added trailing whitespace damage, fix that.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 34e22bed 18-Jul-2006 David Teigland <teigland@redhat.com>

[DLM] fix leaking user locks

User NOQUEUE lock requests to a remote node that failed with -EAGAIN were
never being removed from a process's list of locks.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 597d0cae 12-Jul-2006 David Teigland <teigland@redhat.com>

[DLM] dlm: user locks

This changes the way the dlm handles user locks. The core dlm is now
aware of user locks so they can be dealt with more efficiently. There is
no more dlm_device module which previously managed its own duplicate copy
of every user lock.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>