History log of /linux-master/fs/dlm/dlm_internal.h
Revision Date Author Comments
# 484b4f90 15-Mar-2024 David Teigland <teigland@redhat.com>

dlm: revert atomic_t lkb_wait_count

Revert "fs: dlm: handle lkb wait count as atomic_t"
This reverts commit 75a7d60134ce84209f2c61ec4619ee543aa8f466.

This counter does not need to be atomic. As the comment in
the reverted commit mentions, the counter is protected by
the rsb lock.

Signed-off-by: David Teigland <teigland@redhat.com>


# 541adb0d 01-Aug-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: debugfs for queued callbacks

It was useful to debug an issue with the callback queue to check if any
callbacks in any lkb are for some reason not processed by the callback
workqueue. The mentioned issue was fixed by commit a034c1370ded ("fs:
dlm: fix DLM_IFL_CB_PENDING gets overwritten"). If there are similar
issue that looks like a ast callback was not processed, we can confirm
now that it is not sitting to be processed by the callback workqueue
anymore.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 75a7d601 29-May-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: handle lkb wait count as atomic_t

Currently the lkb_wait_count is locked by the rsb lock and it should be
fine to handle lkb_wait_count as non atomic_t value. However for the
overall process of reducing locking this patch converts it to an
atomic_t value.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 1361737f 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: switch lkb_sbflags to atomic ops

This patch moves lkb_sbflags handling to atomic bits ops. This should
prepare for a possible manipulating of lkb_sbflags flags at the same
time by concurrent execution.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 46d6e722 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: rsb hash table flag value to atomic ops

This patch moves the rsb hash table handling to atomic flag operations.
The flag operations for DLM_RTF_SHRINK are protected by
ls->ls_rsbtbl[b].lock. However we switch to atomic ops if new possible
flags will be used in a different way and don't assume such lock
dependencies.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# e1af8728 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: move internal flags to atomic ops

This patch will move the lkb_flags value to the recently introduced
lkb_iflags value. For lkb_iflags we use atomic bit operations because
some flags like DLM_IFL_CB_PENDING are used while non rsb lock is held
to avoid issues with other flag manipulations which might run at the
same time we switch to atomic bit operations. Snapshot the bit values to
an uint32_t value is only used for debugging/logging use cases and don't
need to be 100% correct.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8a39dcd9 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: change dflags to use atomic bits

Currently manipulating lkb_dflags assumes to held the rsb lock assigned
to the lkb. This is held by dlm message processing after certain
time to lookup the right rsb from the received lkb message id. For user
space locks flags, which is currently the only use case for lkb_dflags,
flags are also being set during dlm character device handling without
holding the rsb lock. To minimize the risk that bit operations are
getting corrupted we switch to atomic bit operations. This patch will
also introduce helpers to snapshot atomic bit values in an non atomic
way. There might be still issues with the flag handling e.g. running in
case of manipulating bit ops and snapshot them at the same time, but this
patch minimize them and will start to use atomic bit operations.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8c11ba64 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: store lkb distributed flags into own value

This patch stores lkb distributed flags value in an separate value
instead of sharing internal and distributed flags in lkb->lkb_flags value.
This has the advantage to not mask/write back flag values in
receive_flags() functionality. The dlm debug_fs does not provide the
distributed flags anymore, those can be added in future.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 9f48eead 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove DLM_IFL_LOCAL_MS flag

The DLM_IFL_LOCAL_MS flag is an internal non shared flag but used in
m_flags of dlm messages. It is not shared because it is only used for
local messaging. Instead using DLM_IFL_LOCAL_MS in dlm messages we pass a
parameter around to signal local messaging or not. This patch is adding
the local parameter to signal local messaging.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a7e7ffac 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: rename stub to local message flag

This patch renames DLM_IFL_STUB_MS to DLM_IFL_LOCAL_MS flag. The
DLM_IFL_STUB_MS flag is somewhat misnamed, it means the dlm message is
used for local message transfer only. It is used by recovery to resolve
lock states if a node got fenced.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 01c7a597 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove deprecated code parts

This patch removes code parts which was declared deprecated by
commit 6b0afc0cc3e9 ("fs: dlm: don't use deprecated timeout features by
default"). This contains the following dlm functionality:

- start a cancel of a dlm request did not complete after certain timeout:
The current way how dlm cancellation works and interfering with other
dlm requests triggered by the user can end in an overlapping and
returning in -EBUSY. The most user don't handle this case and are
unaware that DLM can return such errno in such situation. Due the
timeout the user are mostly unaware when this happens.
- start a netlink warning messages for user space if dlm requests did
not complete after certain timeout:
This feature was never being built in the only known dlm user space side.
As we are to remove the timeout cancellation feature we can directly
remove this feature as well.

There might be the possibility to bring the timeout cancellation feature
back. However the current way of handling the -EBUSY case which is only
a software limitation and not a hardware limitation should be changed.
We minimize the current code base in DLM cancellation feature to not have
to deal with those existing features while solving the DLM cancellation
feature in general.

UAPI define DLM_LSFL_TIMEWARN is commented as deprecated and reserved
value. We should avoid at first to give it a new meaning but let
possible users still compile by keeping this define. In far future we
can give this flag a new meaning. The same for the DLM_LKF_TIMEOUT lock
request flag.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a034c137 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: fix DLM_IFL_CB_PENDING gets overwritten

This patch introduce a new internal flag per lkb value to handle
internal flags which are handled not on wire. The current lkb internal
flags stored as lkb->lkb_flags are split in upper and lower bits, the
lower bits are used to share internal flags over wire for other cluster
wide lkb copies on other nodes.

In commit 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
we introduced a new internal flag for pending callbacks for the dlm
callback queue. This flag is protected by the lkb->lkb_cb_lock lock.
This patch overlooked that on dlm receive path and the mentioned upper
and lower bits, that dlm will read the flags, mask it and write it
back. As example receive_flags() in fs/dlm/lock.c. This flag
manipulation is not done atomically and is not protected by
lkb->lkb_cb_lock. This has unknown side effects of the current callback
handling.

In future we should move to set/clear/test bit functionality and avoid
read, mask and writing back flag values. In later patches we will move
the upper parts to the new introduced internal lkb flags which are not
shared between other cluster nodes to the new non shared internal flag
field to avoid similar issues.

Cc: stable@vger.kernel.org
Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 554d8496 17-Nov-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING

This patch renames DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING because
CB_PENDING is a proper name to describe this flag. This flag is set when
callback enqueue will return DLM_ENQUEUE_CALLBACK_NEED_SCHED because the
callback worker need to be queued. The flag tells that callbacks are
currently pending to be called and will be unset if the callback work
for the specific lkb is done. The term need schedule is part of this
time but a proper name is to say that there are some callbacks pending
to being called.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3872f87b 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove ls_remove_wait waitqueue

This patch removes the ls_remove_wait waitqueue handling. The current
handling tries to wait before a lookup is send out for a identically
resource name which is going to be removed. Hereby the remove message
should be send out before the new lookup message. The reason is that
after a lookup request and response will actually use the specific
remote rsb. A followed remove message would delete the rsb on the remote
side but it's still being used.

To reach a similar behaviour we simple send the remove message out while
the rsb lookup lock is held and the rsb is removed from the toss list.
Other find_rsb() calls would never have the change to get a rsb back to
live while a remove message will be send out (without holding the lock).

This behaviour requires a non-sleepable context which should be provided
now and might be the reason why it was not implemented so in the first
place.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 61bed0ba 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: use a non-static queue for callbacks

This patch will introducde a queue implementation for callbacks by using
the Linux lists. The current callback queue handling is implemented by a
static limit of 6 entries, see DLM_CALLBACKS_SIZE. The sequence number
inside the callback structure was used to see if the entries inside the
static entry is valid or not. We don't need any sequence numbers anymore
with a dynamic datastructure with grows and shrinks during runtime to
offer such functionality.

We assume that every callback will be delivered to the DLM user if once
queued. Therefore the callback flag DLM_CB_SKIP was dropped and the
check for skipping bast was moved before worker handling and not skip
while the callback worker executes. This will reduce unnecessary queues
of the callback worker.

All last callback saves are pointers now and don't need to copied over.
There is a reference counter for callback structures which will care
about to free the callback structures at the right time if they are not
referenced anymore.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 92e95733 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: use spin lock instead of mutex

There is no need to use a mutex in those hot path sections. We change it
to spin lock to serve callbacks more faster by not allowing schedule.
The locked sections will not be locked for a long time.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a4c0352b 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: convert ls_cb_mutex mutex to spinlock

This patch converts the ls_cb_mutex mutex to a spinlock, there is no
sleepable context when this lock is held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# d96d0f96 11-Oct-2022 Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>

dlm: replace one-element array with fixed size array

One-element arrays are deprecated. So, replace one-element array with
fixed size array member in struct dlm_ls, and refactor the rest of the
code, accordingly.

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/228
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Link: https://lore.kernel.org/lkml/Y0W5jkiXUkpNl4ap@mail.google.com/

Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Teigland <teigland@redhat.com>


# 296d9d1e 15-Aug-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: change ls_clear_proc_locks to spinlock

This patch changes the ls_clear_proc_locks to a spinlock because there
is no need to handle it as a mutex as there is no sleepable context when
ls_clear_proc_locks is held. This allows us to call those functionality
in non-sleepable contexts.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 6b0afc0c 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: don't use deprecated timeout features by default

This patch will disable use of deprecated timeout features if
CONFIG_DLM_DEPRECATED_API is not set. The deprecated features
will be removed in upcoming kernel release v6.2.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2bb2a3d6 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove waiter warnings

This patch removes warning messages that could be logged when
remote requests had been waiting on a reply message for some timeout
period (which could be set through configfs, but was rarely enabled.)
The improved midcomms layer now carefully tracks all messages and
replies, and logs much more useful messages if there is an actual
problem.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# f10da927 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: add comment about lkb IFL flags

This patch adds comments about the difference between the lower 2 bytes
of lkb flags and the 2 upper bytes of the lkb IFL flags. In short the
upper 2 bytes will be handled as internal flags whereas the lower 2
bytes are part of the DLM protocol and are used to exchange messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 682bb91b 22-Jun-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: make new_lockspace() wait until recovery completes

Make dlm_new_lockspace() wait until a full recovery completes
sucessfully or fails. Previously, dlm_new_lockspace() returned
to the caller after dlm_recover_members() finished, which is
only partially through recovery. The result of the previous
behavior is that the new lockspace would not be usable for some
time (especially with overlapping recoveries), and some errors
in the later part of recovery could not be returned to the caller.

Kernel callers gfs2 and cluster-md have their own wait handling to
wait for recovery to complete after calling dlm_new_lockspace().
This continues to work, but will be unnecessary.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 00e99ccd 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for dlm messages

This patch changes to use __le types directly in the dlm message
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2f9dbeda 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for rcom messages

This patch changes to use __le types directly in the dlm rcom
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3428785a 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for dlm header

This patch changes to use __le types directly in the dlm header
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# d9efd005 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for options header

This patch changes to use __le types directly in the dlm option headers
structures which are casted at the right dlm message buffer positions.

Currently only midcomms.c using those headers which already was calling
endian conversions on-the-fly without using in/out functionality like
other endianness handling in dlm. Using __le types now will hopefully get
useful warnings in future if we do comparison against host byte order
values.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 21d9ac1a 30-Nov-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: use event based wait for pending remove

This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3cb5977c 02-Nov-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: ls_count busy wait to event based wait

This patch changes the ls_count busy wait to use atomic counter values
and wait_event() to wait until ls_count reach zero. It will slightly
reduce the number of holding lslist_lock. At remove lockspace we need to
retry the wait because it a lockspace get could interefere between
wait_event() and holding the lock which deletes the lockspace list entry.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 164d88ab 02-Nov-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: requestqueue busy wait to event based wait

This patch changes the requestqueue busy waiting algorithm to use
atomic counter values and wait_event() to wait until the requestqueue is
empty. It will slightly reduce the number of holding ls_requestqueue_mutex
mutex.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# dea450c9 02-Nov-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: remove obsolete INBUF define

This patch removes an obsolete define for some length for an temporary
buffer which is not being used anymore. The use of this define is not
necessary anymore since commit 4798cbbfbd00 ("fs: dlm: rework receive
handling").

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# b892e479 16-Jul-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: fix typo in tlv prefix

This patch fixes a small typo in a unused struct field. It should named
be t_pad instead of o_pad. Came over this as I updated wireshark
dissector.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 5b2f981f 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add midcomms debugfs functionality

This patch adds functionality to debug midcomms per connection state
inside a comms directory which is similar like dlm configfs. Currently
there exists the possibility to read out two attributes which is the
send queue counter and the version of each midcomms node state.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 489d8e55 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add reliable connection if reconnect

This patch introduce to make a tcp lowcomms connection reliable even if
reconnects occurs. This is done by an application layer re-transmission
handling and sequence numbers in dlm protocols. There are three new dlm
commands:

DLM_OPTS:

This will encapsulate an existing dlm message (and rcom message if they
don't have an own application side re-transmission handling). As optional
handling additional tlv's (type length fields) can be appended. This can
be for example a sequence number field. However because in DLM_OPTS the
lockspace field is unused and a sequence number is a mandatory field it
isn't made as a tlv and we put the sequence number inside the lockspace
id. The possibility to add optional options are still there for future
purposes.

DLM_ACK:

Just a dlm header to acknowledge the receive of a DLM_OPTS message to
it's sender.

DLM_FIN:

This provides a 4 way handshake for connection termination inclusive
support for half-closed connections. It's provided on application layer
because SCTP doesn't support half-closed sockets, the shutdown() call
can interrupted by e.g. TCP resets itself and a hard logic to implement
it because the othercon paradigm in lowcomms. The 4-way termination
handshake also solve problems to synchronize peer EOF arrival and that
the cluster manager removes the peer in the node membership handling of
DLM. In some cases messages can be still transmitted in this time and we
need to wait for the node membership event.

To provide a reliable connection the node will retransmit all
unacknowledges message to it's peer on reconnect. The receiver will then
filtering out the next received message and drop all messages which are
duplicates.

As RCOM_STATUS and RCOM_NAMES messages are the first messages which are
exchanged and they have they own re-transmission handling, there exists
logic that these messages must be first. If these messages arrives we
store the dlm version field. This handling is on DLM 3.1 and after this
patch 3.2 the same. A backwards compatibility handling has been added
which seems to work on tests without tcpkill, however it's not recommended
to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long
term bugs in the DLM protocol.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8e2e4086 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add union in dlm header for lockspace id

This patch adds union inside the lockspace id to handle it also for
another use case for a different dlm command.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8f2dc78d 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: make buffer handling per msg

This patch makes the void pointer handle for lowcomms functionality per
message and not per page allocation entry. A refcount handling for the
handle was added to keep the message alive until the user doesn't need
it anymore.

There exists now a per message callback which will be called when
allocating a new buffer. This callback will be guaranteed to be called
according the order of the sending buffer, which can be used that the
caller increments a sequence number for the dlm message handle.

For transition process we cast the dlm_mhandle to dlm_msg and vice versa
until the midcomms layer will implement a specific dlm_mhandle structure.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2df6b762 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add dlm macros for ratelimit log

This patch add ratelimit macro to dlm subsystem and will set the
connecting log message to ratelimit. In non blocking connecting cases it
will print out this message a lot.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# fe204591 07-May-2020 Arnd Bergmann <arnd@arndb.de>

dlm: remove BUG() before panic()

Building a kernel with clang sometimes fails with an objtool error in dlm:

fs/dlm/lock.o: warning: objtool: revert_lock_pc()+0xbd: can't find jump dest instruction at .text+0xd7fc

The problem is that BUG() never returns and the compiler knows
that anything after it is unreachable, however the panic still
emits some code that does not get fully eliminated.

Having both BUG() and panic() is really pointless as the BUG()
kills the current process and the subsequent panic() never hits.
In most cases, we probably don't really want either and should
replace the DLM_ASSERT() statements with WARN_ON(), as has
been done for some of them.

Remove the BUG() here so the user at least sees the panic message
and we can reliably build randconfig kernels.

Fixes: e7fd41792fc0 ("[DLM] The core of the DLM for GFS2/CLVM")
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: clang-built-linux@googlegroups.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Teigland <teigland@redhat.com>


# a4e439a6 09-Mar-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

dlm: dlm_internal: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a48f9721 12-Jun-2019 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

dlm: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2522fe45 28-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7963b8a5 19-Sep-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

dlm: audit and remove any unnecessary uses of module.h

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.
This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.

In the case of some code where it is modular, we can extend that to
also include files that are building basic support functionality but
not related to loading or registering the final module; such files
also have no need whatsoever for module.h

The advantage in removing such instances is that module.h itself
sources about 15 other headers; adding significantly to what we feed
cpp, and it can obscure what headers we are effectively using.

Since module.h might have been the implicit source for init.h
(for __init) and for export.h (for EXPORT_SYMBOL) we consider each
instance for the presence of either and replace as needed.

In the dlm case, we remove module.h from a global header and only
introduce it in the files where it is explicitly required, since
there is nothing modular in dlm_internal.h itself.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 505ee528 19-Jun-2016 Zhilong Liu <zlliu@suse.com>

dlm: add log_info config option

This config option can be used to disable the
LOG_INFO recovery messages.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 075f0177 14-Feb-2014 David Teigland <teigland@redhat.com>

dlm: use INFO for recovery messages

The log messages relating to the progress of recovery
are minimal and very often useful. Change these to
the KERN_INFO level so they are always available.

Signed-off-by: David Teigland <teigland@redhat.com>


# f1172283 07-Jan-2013 David Teigland <teigland@redhat.com>

dlm: avoid scanning unchanged toss lists

Keep track of whether a toss list contains any
shrinkable rsbs. If not, dlm_scand can avoid
scanning the list for rsbs to shrink. Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).

Signed-off-by: David Teigland <teigland@redhat.com>


# da8c6663 15-Nov-2012 David Teigland <teigland@redhat.com>

dlm: fix lvb invalidation conditions

When a node is removed that held a PW/EX lock, the
existing master node should invalidate the lvb on the
resource due to the purged lock.

Previously, the existing master node was invalidating
the lvb if it found only NL/CR locks on the resource
during recovery for the removed node. This could lead
to cases where it invalidated the lvb and shouldn't
have, or cases where it should have invalidated and
didn't.

When recovery selects a *new* master node for a
resource, and that new master finds only NL/CR locks
on the resource after lock recovery, it should
invalidate the lvb. This case was handled correctly
(but was incorrectly applied to the existing master
case also.)

When a process exits while holding a PW/EX lock,
the lvb on the resource should be invalidated.
This was not happening.

The lvb contents and VALNOTVALID flag should be
recovered before granting locks in recovery so that
the recovered lvb state is provided in the callback.
The lvb was being recovered after the lock was granted.

Signed-off-by: David Teigland <teigland@redhat.com>


# 475f230c 02-Aug-2012 David Teigland <teigland@redhat.com>

dlm: fix unlock balance warnings

The in_recovery rw_semaphore has always been acquired and
released by different threads by design. To work around
the "BUG: bad unlock balance detected!" messages, adjust
things so the dlm_recoverd thread always does both down_write
and up_write.

Signed-off-by: David Teigland <teigland@redhat.com>


# 05c32f47 13-Jun-2012 David Teigland <teigland@redhat.com>

dlm: fix race between remove and lookup

It was possible for a remove message on an old
rsb to be sent after a lookup message on a new
rsb, where the rsbs were for the same resource
name. This could lead to a missing directory
entry for the new rsb.

It is fixed by keeping a copy of the resource
name being removed until after the remove has
been sent. A lookup checks if this in-progress
remove matches the name it is looking up.

Signed-off-by: David Teigland <teigland@redhat.com>


# 1d7c484e 15-May-2012 David Teigland <teigland@redhat.com>

dlm: use idr instead of list for recovered rsbs

When a large number of resources are being recovered,
a linear search of the recover_list takes a long time.
Use an idr in place of a list.

Signed-off-by: David Teigland <teigland@redhat.com>


# c04fecb4 10-May-2012 David Teigland <teigland@redhat.com>

dlm: use rsbtbl as resource directory

Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory. It has always been an unnecessary
duplication of information.

This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.

This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.

In many cases there will be memory savings because
the dir hash table no longer exists.

Signed-off-by: David Teigland <teigland@redhat.com>


# 4875647a 26-Apr-2012 David Teigland <teigland@redhat.com>

dlm: fixes for nodir mode

The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used. This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
all in-progress operations after recovery. In some
cases it's not possible to know which in-progess locks
to recover, so recover all. (Most require recovery
in nodir mode anyway since rehashing changes most
master nodes.)

- Change the way nodir mode is enabled, from a command
line mount arg passed through gfs2, into a sysfs
file managed by dlm_controld, consistent with the
other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
from a previous, aborted recovery cycle. Base this
on the local recovery status not being in the state
where any nodes should be sending LOCK messages for the
current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
the master as is usual), because the lkb can switch
back and forth between being a master and being a
process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
non-empty convert or waiting queues for granting
at the end of recovery. (Rename flag from LOCKS_PURGED
to RECOVER_GRANT and similar for the recovery function,
because it's not only resources with purged locks
that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
error messages.

Signed-off-by: David Teigland <teigland@redhat.com>


# d6e24788 23-Apr-2012 David Teigland <teigland@redhat.com>

dlm: limit rcom debug messages

Unify the checking for both types of ignored
rcom messages, and replace the two log_debug
statements with a single, rate limited debug
message.

Signed-off-by: David Teigland <teigland@redhat.com>


# 60f98d18 02-Nov-2011 David Teigland <teigland@redhat.com>

dlm: add recovery callbacks

These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.

In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts. These callbacks allow the same
coordination directly, and more simply.

Signed-off-by: David Teigland <teigland@redhat.com>


# 757a4271 20-Oct-2011 David Teigland <teigland@redhat.com>

dlm: add node slots and generation

Slot numbers are assigned to nodes when they join the lockspace.
The slot number chosen is the minimum unused value starting at 1.
Once a node is assigned a slot, that slot number will not change
while the node remains a lockspace member. If the node leaves
and rejoins it can be assigned a new slot number.

A new generation number is also added to a lockspace. It is
set and incremented during each recovery along with the slot
collection/assignment.

The slot numbers will be passed to gfs2 which will use them as
journal id's.

Signed-off-by: David Teigland <teigland@redhat.com>


# 9beb3bf5 26-Oct-2011 Bob Peterson <rpeterso@redhat.com>

dlm: convert rsb list to rb_tree

Change the linked lists to rb_tree's in the rsb
hash table to speed up searches. Slow rsb searches
were having a large impact on gfs2 performance due
to the large number of dlm locks gfs2 uses.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 23e8e1aa 05-Apr-2011 David Teigland <teigland@redhat.com>

dlm: use workqueue for callbacks

Instead of creating our own kthread (dlm_astd) to deliver
callbacks for all lockspaces, use a per-lockspace workqueue
to deliver the callbacks. This eliminates complications and
slowdowns from many lockspaces sharing the same thread.

Signed-off-by: David Teigland <teigland@redhat.com>


# 3881ac04 07-Jul-2011 David Teigland <teigland@redhat.com>

dlm: improve rsb searches

By pre-allocating rsb structs before searching the hash
table, they can be inserted immediately. This avoids
always having to repeat the search when adding the struct
to hash list.

This also adds space to the rsb struct for a max resource
name, so an rsb allocation can be used by any request.
The constant size also allows us to finally use a slab
for the rsb structs.

Signed-off-by: David Teigland <teigland@redhat.com>


# 3d6aa675 06-Jul-2011 David Teigland <teigland@redhat.com>

dlm: keep lkbs in idr

This is simpler and quicker than the hash table, and
avoids needing to search the hash list for every new
lkid to check if it's used.

Signed-off-by: David Teigland <teigland@redhat.com>


# 2a7ce0ed 04-Apr-2011 David Teigland <teigland@redhat.com>

dlm: remove shared message stub for recovery

kmalloc a stub message struct during recovery instead of sharing the
struct in the lockspace. This leaves the lockspace stub_ms only for
faking downconvert replies, where it is never modified and sharing
is not a problem.

Also improve the debug messages in the same recovery function.

Signed-off-by: David Teigland <teigland@redhat.com>


# c6ff669b 28-Mar-2011 David Teigland <teigland@redhat.com>

dlm: delayed reply message warning

Add an option (disabled by default) to print a warning message
when a lock has been waiting a configurable amount of time for
a reply message from another node. This is mainly for debugging.

Signed-off-by: David Teigland <teigland@redhat.com>


# 8304d6f2 21-Feb-2011 David Teigland <teigland@redhat.com>

dlm: record full callback state

Change how callbacks are recorded for locks. Previously, information
about multiple callbacks was combined into a couple of variables that
indicated what the end result should be. In some situations, we
could not tell from this combined state what the exact sequence of
callbacks were, and would end up either delivering the callbacks in
the wrong order, or suppress redundant callbacks incorrectly. This
new approach records all the data for each callback, leaving no
uncertainty about what needs to be delivered.

Signed-off-by: David Teigland <teigland@redhat.com>


# 7fe2b319 24-Feb-2010 David Teigland <teigland@redhat.com>

dlm: fix ordering of bast and cast

When both blocking and completion callbacks are queued for lock,
the dlm would always deliver the completion callback (cast) first.
In some cases the blocking callback (bast) is queued before the
cast, though, and should be delivered first. This patch keeps
track of the order in which they were queued and delivers them
in that order.

This patch also keeps track of the granted mode in the last cast
and eliminates the following bast if the bast mode is compatible
with the preceding cast mode. This happens when a remotely mastered
lock is demoted, e.g. EX->NL, in which case the local node queues
a cast immediately after sending the demote message. In this way
a cast can be queued for a mode, e.g. NL, that makes an in-transit
bast extraneous.

Signed-off-by: David Teigland <teigland@redhat.com>


# 573c24c4 30-Nov-2009 David Teigland <teigland@redhat.com>

dlm: always use GFP_NOFS

Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
ls_allocation would be GFP_KERNEL for userland lockspaces
and GFP_NOFS for file system lockspaces.

It was discovered that any lockspaces on the system can
affect all others by triggering memory reclaim in the
file system which could in turn call back into the dlm
to acquire locks, deadlocking dlm threads that were
shared by all lockspaces, like dlm_recv.

Signed-off-by: David Teigland <teigland@redhat.com>


# 305a47b1 16-Jan-2009 Steven Whitehouse <swhiteho@redhat.com>

dlm: Change rwlock which is only used in write mode to a spinlock

The ls_dirtbl[].lock was an rwlock, but since it was only used in write
mode a spinlock will suffice.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# c7be761a 07-Jan-2009 David Teigland <teigland@redhat.com>

dlm: change rsbtbl rwlock to spinlock

The rwlock is almost always used in write mode, so there's no reason
to not use a spinlock instead.

Signed-off-by: David Teigland <teigland@redhat.com>


# d022509d 16-Dec-2008 David Teigland <teigland@redhat.com>

dlm: add new debugfs entry

The new debugfs entry dumps all rsb and lkb structures, and includes
a lot more information than has been available before. This includes
the new timestamps added by a previous patch for debugging callback
issues.

Signed-off-by: David Teigland <teigland@redhat.com>


# e3a84ad4 09-Dec-2008 David Teigland <teigland@redhat.com>

dlm: add time stamp of blocking callback

Record the time the latest blocking callback was queued for
a lock. This will be used for debugging in combination with
lock queue timestamp changes in the previous patch.

Signed-off-by: David Teigland <teigland@redhat.com>


# eeda418d 09-Dec-2008 David Teigland <teigland@redhat.com>

dlm: change lock time stamping

Use ktime instead of jiffies for timestamping lkb's. Also stamp the
time on every lkb whenever it's added to a resource queue, instead of
just stamping locks subject to timeouts. This will allow us to use
timestamps more widely for debugging all locks.

Signed-off-by: David Teigland <teigland@redhat.com>


# c1dcf65f 18-Aug-2008 David Teigland <teigland@redhat.com>

dlm: fix locking of lockspace list in dlm_scand

The dlm_scand thread needs to lock the list of lockspaces
when going through it.

Signed-off-by: David Teigland <teigland@redhat.com>


# 0f8e0d9a 06-Aug-2008 David Teigland <teigland@redhat.com>

dlm: allow multiple lockspace creates

Add a count for lockspace create and release so that create can
be called multiple times to use the lockspace from different places.
Also add the new flag DLM_LSFL_NEWEXCL to create a lockspace with
the previous behavior of returning -EEXIST if the lockspace already
exists.

Signed-off-by: David Teigland <teigland@redhat.com>


# 3d564fa3 14-Apr-2008 David Teigland <teigland@redhat.com>

dlm: common max length definitions

Add central definitions for max lockspace name length and max resource
name length. The lack of central definitions has resulted in scattered
private definitions which we can now clean up, including an unused one
in dlm_device.h.

Signed-off-by: David Teigland <teigland@redhat.com>


# 2402211a 14-Mar-2008 David Teigland <teigland@redhat.com>

dlm: move plock code from gfs2

Move the code that handles cluster posix locks from gfs2 into the dlm
so that it can be used by both gfs2 and ocfs2.

Signed-off-by: David Teigland <teigland@redhat.com>


# d44e0fc7 18-Mar-2008 David Teigland <teigland@redhat.com>

dlm: recover nodes that are removed and re-added

If a node is removed from a lockspace, and then added back before the
dlm is notified of the removal, the dlm will not detect the removal
and won't clear the old state from the node. This is fixed by using a
list of added nodes so the membership recovery can detect when a newly
added node is already in the member list.

Signed-off-by: David Teigland <teigland@redhat.com>


# cb688371 26-Feb-2008 Matthew Wilcox <willy@infradead.org>

fs: Remove unnecessary inclusions of asm/semaphore.h

None of these files use any of the functionality promised by
asm/semaphore.h.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>


# d292c0cc 06-Feb-2008 David Teigland <teigland@redhat.com>

dlm: eliminate astparam type casting

Put lkb_astparam in a union with a dlm_user_args pointer to
eliminate a lot of type casting.

Signed-off-by: David Teigland <teigland@redhat.com>


# e5dae548 05-Feb-2008 David Teigland <teigland@redhat.com>

dlm: proper types for asts and basts

Use proper types for ast and bast functions, and use
consistent type for ast param.

Signed-off-by: David Teigland <teigland@redhat.com>


# 4007685c 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: use proper type for ->ls_recover_buf

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 93ff2971 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: do not byteswap rcom_config

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 163a1859 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: do not byteswap rcom_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# eef7d739 24-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: dlm_process_incoming_buffer() fixes

* check that length is large enough to cover the non-variable part of message or
rcom resp. (after checking that it's large enough to cover the header, of
course).

* kill more pointless casts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# e028398d 02-Nov-2007 Adrian Bunk <bunk@kernel.org>

dlm: proper prototypes

This patch adds a proper prototype for some functions in
fs/dlm/dlm_internal.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>


# c36258b5 27-Sep-2007 David Teigland <teigland@redhat.com>

[DLM] block dlm_recv in recovery transition

Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm. This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle. While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero. This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv. The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa. Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ac90a255 06-Jul-2007 David Teigland <teigland@redhat.com>

[DLM] dump more lock values

Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs
file that dumps one lock per line. Also, dump all locks instead of just
mastered locks. Accordingly, use a suffix of _locks instead of _master.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 44f487a5 06-Jun-2007 Patrick Caulfield <pcaulfie@redhat.com>

[DLM] variable allocation

Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9dd592d7 29-May-2007 David Teigland <teigland@redhat.com>

[DLM] dumping master locks

Add a new debugfs file that dumps a compact list of mastered locks.
This will be used by a userland daemon to collect state for deadlock
detection.

Also, for the existing function that prints all lock state, lock the rsb
before going through the lock lists since they can be changing in the
course of normal dlm activity.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8b4021fa 29-May-2007 David Teigland <teigland@redhat.com>

[DLM] canceling deadlocked lock

Add a function that can be used through libdlm by a system daemon to cancel
another process's deadlocked lock. A completion ast with EDEADLK is returned
to the process waiting for the lock.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 84d8cd69 29-May-2007 David Teigland <teigland@redhat.com>

[DLM] timeout fixes

Various fixes related to the new timeout feature:
- add_timeout() missed setting TIMEWARN flag on lkb's when the
TIMEOUT flag was already set
- clear_proc_locks should remove a dead process's locks from the
timeout list
- the end-of-life calculation for user locks needs to consider that
ETIMEDOUT is equivalent to -DLM_ECANCEL
- make initial default timewarn_cs config value visible in configfs
- change bit position of TIMEOUT_CANCEL flag so it's not copied to
a remote master node
- set timestamp on remote lkb's so a lock dump will display the time
they've been waiting

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8b0e7b2c 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] wait for config check during join [6/6]

Joining the lockspace should wait for the initial round of inter-node
config checks to complete before returning. This way, if there's a
configuration mismatch between the joining node and the existing nodes,
the join can fail and return an error to the application.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d7db923e 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] dlm_device interface changes [3/6]

Change the user/kernel device interface used by libdlm:
- Add ability for userspace to check the version of the interface. libdlm
can now adapt to different versions of the kernel interface.
- Increase the size of the flags passed in a lock request so all possible
flags can be used from userspace.
- Add an opaque "xid" value for each lock. This "transaction id" will be
used later to associate locks with each other during deadlock detection.
- Add a "timeout" value for each lock. This is used along with the
DLM_LKF_TIMEOUT flag.

Also, remove a fragment of unused code in device_read().

This patch requires updating libdlm which is backward compatible with
older kernels.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3ae1acf9 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] add lock timeouts and warnings [2/6]

New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT
flag is set, then the request/conversion will be canceled after waiting
the specified number of centiseconds (specified per lock). This feature
is only available for locks requested through libdlm (can be enabled for
kernel dlm users if there's a use for it.)

If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then
a warning message will be sent to userspace (using genetlink) after a
request/conversion has been waiting for a given number of centiseconds
(configurable per node). The time warnings will be used in the future
to do deadlock detection in userspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8499137d 30-Mar-2007 David Teigland <teigland@redhat.com>

[DLM] add orphan purging code (1/2)

Add code for purging orphan locks. A process can also purge all of its
own non-orphan locks by passing a pid of zero. Code already exists for
processes to create persistent locks that become orphans when the process
exits, but the complimentary capability for another process to then purge
these orphans has been missing.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ef0c2bb0 28-Mar-2007 David Teigland <teigland@redhat.com>

[DLM] overlapping cancel and unlock

Full cancel and force-unlock support. In the past, cancel and force-unlock
wouldn't work if there was another operation in progress on the lock. Now,
both cancel and unlock-force can overlap an operation on a lock, meaning there
may be 2 or 3 operations in progress on a lock in parallel. This support is
important not only because cancel and force-unlock are explicit operations
that an app can use, but both are used implicitly when a process exits while
holding locks.

Summary of changes:

- add-to and remove-from waiters functions were rewritten to handle situations
with more than one remote operation outstanding on a lock

- validate_unlock_args detects when an overlapping cancel/unlock-force
can be sent and when it needs to be delayed until a request/lookup
reply is received

- processing request/lookup replies detects when cancel/unlock-force
occured during the op, and carries out the delayed cancel/unlock-force

- manipulation of the "waiters" (remote operation) state of a lock moved under
the standard rsb mutex that protects all the other lock state

- the two recovery routines related to locks on the waiters list changed
according to the way lkb's are now locked before accessing waiters state

- waiters recovery detects when lkb's being recovered have overlapping
cancel/unlock-force, and may not recover such locks

- revert_lock (cancel) returns a value to distinguish cases where it did
nothing vs cases where it actually did a cancel; the cancel completion ast
should only be done when cancel did something

- orphaned locks put on new list so they can be found later for purging

- cancel must be called on a lock when making it an orphan

- flag user locks (ENDOFLIFE) at the end of their useful life (to the
application) so we can return an error for any further cancel/unlock-force

- we weren't setting COMP/BAST ast flags if one was already set, so we'd lose
either a completion or blocking ast

- clear an unread bast on a lock that's become unlocked

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a1bc86e6 15-Jan-2007 David Teigland <teigland@redhat.com>

[DLM] fix user unlocking

When a user process exits, we clear all the locks it holds. There is a
problem, though, with locks that the process had begun unlocking before it
exited. We couldn't find the lkb's that were in the process of being
unlocked remotely, to flag that they are DEAD. To solve this, we move
lkb's being unlocked onto a new list in the per-process structure that
tracks what locks the process is holding. We can then go through this
list to flag the necessary lkb's when clearing locks for a process when it
exits.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 99fc6487 09-Jan-2007 David Teigland <teigland@redhat.com>

[DLM] add config entry to enable log_debug

Add a new dlm_config_info field to enable log_debug output and change
log_debug() to use it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 38aa8b0c 13-Dec-2006 David Teigland <teigland@redhat.com>

[DLM] fix old rcom messages

A reply to a recovery message will often be received after the relevant
recovery sequence has aborted and the next recovery sequence has begun.
We need to ignore replies to these old messages from the previous
recovery. There's already a way to do this for synchronous recovery
requests using the rc_id number, but not for async.

Each recovery sequence already has a locally unique sequence number
associated with it. This patch adds a field to the rcom (recovery
message) structure where this recovery sequence number can be placed,
rc_seq. When a node sends a reply to a recovery request, it copies the
rc_seq number it received into rc_seq_reply. When the first node receives
the reply to its recovery message, it will check whether rc_seq_reply
matches the current recovery sequence number, ls_recover_seq, and if not
then it ignores the old reply.

An old, inadequate approach to filtering out old replies (checking if the
current stage of recovery has moved back to the start) has been removed
from two spots.

The protocol version number is changed to reflect the different rcom
structures.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 98f176fb 27-Nov-2006 David Teigland <teigland@redhat.com>

[DLM] don't accept replies to old recovery messages

We often abort a recovery after sending a status request to a remote node.
We want to ignore any potential status reply we get from the remote node.
If we get one of these unwanted replies, we've often moved on to the next
recovery message and incremented the message sequence counter, so the
reply will be ignored due to the seq number. In some cases, we've not
moved on to the next message so the seq number of the reply we want to
ignore is still correct, causing the reply to be accepted. The next
recovery message will then mistake this old reply as a new one.

To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
new reply. We clear this flag if we abort recovery while waiting for a
reply. Before the flag is set again (to allow new replies) we know that
any old replies will be rejected due to their sequence number. We also
initialize the recovery-message sequence number to a random value when a
lockspace is first created. This makes it clear when messages are being
rejected from an old instance of a lockspace that has since been
recreated.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 32f105a1 23-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] down conversion clearing flags

The down-conversion optimization was resulting in the lkb flags being
cleared because the stub message reply had no flags value set. Copy the
current flags into the stub message so they'll be copied back into the lkb
as part of processing the fake reply. Also add an assertion to catch this
error more directly if it exists elsewhere.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4a99c3d9 09-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] reject replies to old requests

When recoveries are aborted by other recoveries we can get replies to
status or names requests that we've given up on. This can cause problems
if we're making another request and receive an old reply. Add a sequence
number to status/names requests and reject replies that don't match. A
field already exists for the seq number that's used in other message
types.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# faa0f267 08-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] show nodeid for recovery message

To aid debugging, it's useful to be able to see what nodeid the dlm is
waiting on for a message reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5de6319b 25-Jul-2006 David Teigland <teigland@redhat.com>

[DLM] more info through debugfs

Display more information from debugfs, particularly locks waiting for
a master lookup or operations waiting for a remote reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 597d0cae 12-Jul-2006 David Teigland <teigland@redhat.com>

[DLM] dlm: user locks

This changes the way the dlm handles user locks. The core dlm is now
aware of user locks so they can be dealt with more efficiently. There is
no more dlm_device module which previously managed its own duplicate copy
of every user lock.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 97a35d1e 02-May-2006 David Teigland <teigland@redhat.com>

[DLM] fix grant_after_purge softlockup

In dlm_grant_after_purge() we were holding a hash table read_lock while
calling put_rsb() which potentially removes the rsb from the hash table,
taking the same lock in write. Fix this by flagging rsb's ahead of time
that have been purged. Then iteratively read_lock the hash table, find a
flagged rsb, unlock, process rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3bcd3687 23-Feb-2006 David Teigland <teigland@redhat.com>

[DLM] Remove range locks from the DLM

This patch removes support for range locking from the DLM

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 90135925 20-Jan-2006 David Teigland <teigland@redhat.com>

[DLM] Update DLM to the latest patch level

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>


# e7fd4179 18-Jan-2006 David Teigland <teigland@redhat.com>

[DLM] The core of the DLM for GFS2/CLVM

This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>