History log of /linux-master/fs/dlm/rcom.c
Revision Date Author Comments
# a3d85fcf 01-Aug-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: don't use RCOM_NAMES for version detection

Currently RCOM_STATUS and RCOM_NAMES inclusive their replies are being
used to determine the DLM version. The RCOM_NAMES messages are triggered
in DLM recovery when calling dlm_recover_directory() only. At this time
the DLM version need to be determined. I ran some tests and did not
expirenced some issues. When the DLM version detection was developed
probably I run once in a case of RCOM_NAMES and the version was not
detected yet. However it seems to be not necessary.

For backwards compatibility we still need to accept RCOM_NAMES messages
which are not protected regarding the DLM message reliability layer aka
stateless message. This patch changes that RCOM_NAMES we are sending out
after this patch are not stateless anymore.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 11519351 01-Aug-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: constify receive buffer

The dlm receive buffer should be never manipulated as DLM is the last
instance of parsing layer. This patch constify the whole receive buffer
so we are sure it never gets manipulated when it's being parsed.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# b9d2f6ad 01-Aug-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy

Currently dlm_recover_master_copy() manipulates the receive buffer of an
rcom lock message and modifies it on the fly so a later memcpy() to a
new rcom message with the same message has those new values. This patch
avoids manipulating the received rcom message by store the values for
the new rcom message in paremter assigned with call by reference. Later
when dlm_send_rcom_lock() constructs a new message and memcpy() the
receive buffer those values will be set on the new constructed message.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# c4f4e135 01-Aug-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: get recovery sequence number as parameter

This patch removes a read of the ls->ls_recover_seq uint64_t number in
_create_rcom(). If the ls->ls_recover_seq is readed the ls_recover_lock
need to held. However this number was always readed before when any rcom
message is received and it's not necessary to read it again from a per
lockspace variable to use it for the replying message. This patch will
pass the sequence number as parameter so another read of ls->ls_recover_seq
and holding the ls->ls_recover_lock is not required.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8a39dcd9 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: change dflags to use atomic bits

Currently manipulating lkb_dflags assumes to held the rsb lock assigned
to the lkb. This is held by dlm message processing after certain
time to lookup the right rsb from the received lkb message id. For user
space locks flags, which is currently the only use case for lkb_dflags,
flags are also being set during dlm character device handling without
holding the rsb lock. To minimize the risk that bit operations are
getting corrupted we switch to atomic bit operations. This patch will
also introduce helpers to snapshot atomic bit values in an non atomic
way. There might be still issues with the flag handling e.g. running in
case of manipulating bit ops and snapshot them at the same time, but this
patch minimize them and will start to use atomic bit operations.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8c11ba64 06-Mar-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: store lkb distributed flags into own value

This patch stores lkb distributed flags value in an separate value
instead of sharing internal and distributed flags in lkb->lkb_flags value.
This has the advantage to not mask/write back flag values in
receive_flags() functionality. The dlm debug_fs does not provide the
distributed flags anymore, those can be added in future.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# e01c4b7b 27-Oct-2022 Alexander Aring <aahringo@redhat.com>

fd: dlm: trace send/recv of dlm message and rcom

This patch adds tracepoints for send and recv cases of dlm messages and
dlm rcom messages. In case of send and dlm message we add the dlm rsb
resource name this dlm messages belongs to. This has the advantage to
follow dlm messages on a per lock basis. In case of recv message the
resource name can be extracted by follow the send message sequence
number.

The dlm message DLM_MSG_PURGE doesn't belong to a lock request and will
not set the resource name in a dlm_message trace. The same for all rcom
messages.

There is additional handling required for this debugging functionality
which is tried to be small as possible. Also the midcomms layer gets
aware of lock resource names, for now this is required to make a
connection between sequence number and lock resource names. It is for
debugging purpose only.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# e425ac99 07-Apr-2022 Alexander Aring <aahringo@redhat.com>

fs: dlm: cast resource pointer to uintptr_t

This patch fixes the following warning when doing a 32 bit kernel build
when pointers are 4 byte long:

In file included from ./include/linux/byteorder/little_endian.h:5,
from ./arch/x86/include/uapi/asm/byteorder.h:5,
from ./include/asm-generic/qrwlock_types.h:6,
from ./arch/x86/include/asm/spinlock_types.h:7,
from ./include/linux/spinlock_types_raw.h:7,
from ./include/linux/ratelimit_types.h:7,
from ./include/linux/printk.h:10,
from ./include/asm-generic/bug.h:22,
from ./arch/x86/include/asm/bug.h:87,
from ./include/linux/bug.h:5,
from ./include/linux/mmdebug.h:5,
from ./include/linux/gfp.h:5,
from ./include/linux/slab.h:15,
from fs/dlm/dlm_internal.h:19,
from fs/dlm/rcom.c:12:
fs/dlm/rcom.c: In function ‘dlm_send_rcom_lock’:
./include/uapi/linux/byteorder/little_endian.h:32:43: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
#define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
^
./include/linux/byteorder/generic.h:86:21: note: in expansion of macro ‘__cpu_to_le64’
#define cpu_to_le64 __cpu_to_le64
^~~~~~~~~~~~~
fs/dlm/rcom.c:457:14: note: in expansion of macro ‘cpu_to_le64’
rc->rc_id = cpu_to_le64(r);

The rc_id value in dlm rcom is handled as u64. The rcom implementation
uses for an unique number generation the pointer value of the used
dlm_rsb instance. However if the pointer value is 4 bytes long
-Wpointer-to-int-cast will print a warning. We get rid of that warning
to cast the pointer to uintptr_t which is either 4 or 8 bytes. There
might be a very unlikely case where this number isn't unique anymore if
using dlm in a mixed cluster of nodes and sizeof(uintptr_t) returns 4 and
8.

However this problem was already been there and this patch should get
rid of the warning.

Fixes: 2f9dbeda8dc0 ("dlm: use __le types for rcom messages")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2f9dbeda 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for rcom messages

This patch changes to use __le types directly in the dlm rcom
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3428785a 04-Apr-2022 Alexander Aring <aahringo@redhat.com>

dlm: use __le types for dlm header

This patch changes to use __le types directly in the dlm header
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 3e973671 02-Nov-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: use dlm_recovery_stopped instead of test_bit

This patch will change to use dlm_recovery_stopped() which is the dlm way
to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper.
It is an atomic operation but the check is still as before to fetch the
value if ls_recover_lock is held. There might be more further
investigations if the value can be changed afterwards and if it has any
side effects.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 88aa023a 16-Jul-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: cleanup and remove _send_rcom

The _send_rcom() can be removed and we call directly dlm_rcom_out().
As we doing that we removing the struct dlm_ls parameter which isn't
used.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# d10a0b88 02-Jun-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: rename socket and app buffer defines

This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and
LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper
names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE
defines the maximum size of buffer which can be handled on socket layer,
the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be
handled by the DLM application layer.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# f6089981 26-May-2021 Colin Ian King <colin.king@canonical.com>

fs: dlm: Fix memory leak of object mh

There is an error return path that is not kfree'ing mh after
it has been successfully allocates. Fix this by moving the
call to create_rcom to after the check on rc_in->rc_id check
to avoid this.

Thanks to Alexander Ahring Oder Aring for suggesting the
correct way to fix this.

Addresses-Coverity: ("Resource leak")
Fixes: a070a91cf140 ("fs: dlm: add more midcomms hooks")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 489d8e55 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add reliable connection if reconnect

This patch introduce to make a tcp lowcomms connection reliable even if
reconnects occurs. This is done by an application layer re-transmission
handling and sequence numbers in dlm protocols. There are three new dlm
commands:

DLM_OPTS:

This will encapsulate an existing dlm message (and rcom message if they
don't have an own application side re-transmission handling). As optional
handling additional tlv's (type length fields) can be appended. This can
be for example a sequence number field. However because in DLM_OPTS the
lockspace field is unused and a sequence number is a mandatory field it
isn't made as a tlv and we put the sequence number inside the lockspace
id. The possibility to add optional options are still there for future
purposes.

DLM_ACK:

Just a dlm header to acknowledge the receive of a DLM_OPTS message to
it's sender.

DLM_FIN:

This provides a 4 way handshake for connection termination inclusive
support for half-closed connections. It's provided on application layer
because SCTP doesn't support half-closed sockets, the shutdown() call
can interrupted by e.g. TCP resets itself and a hard logic to implement
it because the othercon paradigm in lowcomms. The 4-way termination
handshake also solve problems to synchronize peer EOF arrival and that
the cluster manager removes the peer in the node membership handling of
DLM. In some cases messages can be still transmitted in this time and we
need to wait for the node membership event.

To provide a reliable connection the node will retransmit all
unacknowledges message to it's peer on reconnect. The receiver will then
filtering out the next received message and drop all messages which are
duplicates.

As RCOM_STATUS and RCOM_NAMES messages are the first messages which are
exchanged and they have they own re-transmission handling, there exists
logic that these messages must be first. If these messages arrives we
store the dlm version field. This handling is on DLM 3.1 and after this
patch 3.2 the same. A backwards compatibility handling has been added
which seems to work on tests without tcpkill, however it's not recommended
to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long
term bugs in the DLM protocol.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8e2e4086 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add union in dlm header for lockspace id

This patch adds union inside the lockspace id to handle it also for
another use case for a different dlm command.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 8f2dc78d 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: make buffer handling per msg

This patch makes the void pointer handle for lowcomms functionality per
message and not per page allocation entry. A refcount handling for the
handle was added to keep the message alive until the user doesn't need
it anymore.

There exists now a per message callback which will be called when
allocating a new buffer. This callback will be guaranteed to be called
according the order of the sending buffer, which can be used that the
caller increments a sequence number for the dlm message handle.

For transition process we cast the dlm_mhandle to dlm_msg and vice versa
until the midcomms layer will implement a specific dlm_mhandle structure.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# a070a91c 21-May-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: add more midcomms hooks

This patch prepares hooks to redirect to the midcomms layer which will
be used by the midcomms re-transmit handling.

There exists the new concept of stateless buffers allocation and
commits. This can be used to bypass the midcomms re-transmit handling. It
is used by RCOM_STATUS and RCOM_NAMES messages, because they have their
own ping-like re-transmit handling. As well these two messages will be
used to determine the DLM version per node, because these two messages
are per observation the first messages which are exchanged.

Cluster manager events for node membership are added to add support for
half-closed connections in cases that the peer connection get to
an end of file but DLM still holds membership of the node. In
this time DLM can still trigger new message which we should allow. After
the cluster manager node removal event occurs it safe to close the
connection.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# e1a7cbce 01-Mar-2021 Alexander Aring <aahringo@redhat.com>

fs: dlm: use GFP_ZERO for page buffer

This patch uses GFP_ZERO for allocate a page for the internal dlm
sending buffer allocator instead of calling memset zero after every
allocation. An already allocated space will never be reused again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 9f8f9c77 02-Nov-2020 Alexander Aring <aahringo@redhat.com>

fs: dlm: define max send buffer

This patch will set the maximum transmit buffer size for rcom messages
with "names" to 4096 bytes. It's a leftover change of
commit 4798cbbfbd00 ("fs: dlm: rework receive handling"). Fact is that we
cannot allocate a contiguous transmit buffer length above of 4096 bytes.
It seems at some places the upper layer protocol will calculate according
to dlm_config.ci_buffer_size the possible payload of a dlm recovery
message. As compiler setting we will use now the maximum possible
message which dlm can send out. Commit 4e192ee68e5af ("fs: dlm: disallow
buffer size below default") disallow a buffer setting smaller than the
4096 bytes and above 4096 bytes is definitely wrong because we will then
write out of buffer space as we cannot allocate a contiguous buffer above
4096 bytes. The ci_buffer_size is still there to define the possible
maximum receive buffer size of a recvmsg() which should be at least the
maximum possible dlm message size.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 90db4f8b 22-Apr-2020 Wu Bo <wubo40@huawei.com>

fs:dlm:remove unneeded semicolon in rcom.c

Fix the following coccicheck warning:
fs/dlm/rcom.c:566:2-3: Unneeded semicolon

Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2522fe45 28-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 9250e523 09-Oct-2017 David Teigland <teigland@redhat.com>

dlm: remove dlm_send_rcom_lookup_dump

This function was only for debugging. It would be
called in a condition that should not happen, and
should probably have been removed from the final
version of the original commit.

Remove it because it does mutex lock under spin lock.

Signed-off-by: David Teigland <teigland@redhat.com>


# 59661212 12-Sep-2017 tsutomu.owa@toshiba.co.jp <tsutomu.owa@toshiba.co.jp>

DLM: retry rcom when dlm_wait_function is timed out.

If a node sends a DLM_RCOM_STATUS command and an error occurs on the
receiving side, the DLM_RCOM_STATUS_REPLY response may not be returned.
We retransmitted the DLM_RCOM_STATUS command so that we do not wait for
an infinite response.

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>


# c07127b4 14-Oct-2014 Neale Ferguson <neale@sinenomine.net>

dlm: fix missing endian conversion of rcom_status flags

The flags are already converted to le when being sent,
but are not being converted back to cpu when received.

Signed-off-by: Neale Ferguson <neale@sinenomine.net>
Signed-off-by: David Teigland <teigland@redhat.com>


# 475f230c 02-Aug-2012 David Teigland <teigland@redhat.com>

dlm: fix unlock balance warnings

The in_recovery rw_semaphore has always been acquired and
released by different threads by design. To work around
the "BUG: bad unlock balance detected!" messages, adjust
things so the dlm_recoverd thread always does both down_write
and up_write.

Signed-off-by: David Teigland <teigland@redhat.com>


# 1d7c484e 15-May-2012 David Teigland <teigland@redhat.com>

dlm: use idr instead of list for recovered rsbs

When a large number of resources are being recovered,
a linear search of the recover_list takes a long time.
Use an idr in place of a list.

Signed-off-by: David Teigland <teigland@redhat.com>


# c04fecb4 10-May-2012 David Teigland <teigland@redhat.com>

dlm: use rsbtbl as resource directory

Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory. It has always been an unnecessary
duplication of information.

This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.

This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.

In many cases there will be memory savings because
the dir hash table no longer exists.

Signed-off-by: David Teigland <teigland@redhat.com>


# 4875647a 26-Apr-2012 David Teigland <teigland@redhat.com>

dlm: fixes for nodir mode

The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used. This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
all in-progress operations after recovery. In some
cases it's not possible to know which in-progess locks
to recover, so recover all. (Most require recovery
in nodir mode anyway since rehashing changes most
master nodes.)

- Change the way nodir mode is enabled, from a command
line mount arg passed through gfs2, into a sysfs
file managed by dlm_controld, consistent with the
other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
from a previous, aborted recovery cycle. Base this
on the local recovery status not being in the state
where any nodes should be sending LOCK messages for the
current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
the master as is usual), because the lkb can switch
back and forth between being a master and being a
process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
non-empty convert or waiting queues for granting
at the end of recovery. (Rename flag from LOCKS_PURGED
to RECOVER_GRANT and similar for the recovery function,
because it's not only resources with purged locks
that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
error messages.

Signed-off-by: David Teigland <teigland@redhat.com>


# d6e24788 23-Apr-2012 David Teigland <teigland@redhat.com>

dlm: limit rcom debug messages

Unify the checking for both types of ignored
rcom messages, and replace the two log_debug
statements with a single, rate limited debug
message.

Signed-off-by: David Teigland <teigland@redhat.com>


# 757a4271 20-Oct-2011 David Teigland <teigland@redhat.com>

dlm: add node slots and generation

Slot numbers are assigned to nodes when they join the lockspace.
The slot number chosen is the minimum unused value starting at 1.
Once a node is assigned a slot, that slot number will not change
while the node remains a lockspace member. If the node leaves
and rejoins it can be assigned a new slot number.

A new generation number is also added to a lockspace. It is
set and incremented during each recovery along with the slot
collection/assignment.

The slot numbers will be passed to gfs2 which will use them as
journal id's.

Signed-off-by: David Teigland <teigland@redhat.com>


# 8304d6f2 21-Feb-2011 David Teigland <teigland@redhat.com>

dlm: record full callback state

Change how callbacks are recorded for locks. Previously, information
about multiple callbacks was combined into a couple of variables that
indicated what the end result should be. In some situations, we
could not tell from this combined state what the exact sequence of
callbacks were, and would end up either delivering the callbacks in
the wrong order, or suppress redundant callbacks incorrectly. This
new approach records all the data for each callback, leaving no
uncertainty about what needs to be delivered.

Signed-off-by: David Teigland <teigland@redhat.com>


# 573c24c4 30-Nov-2009 David Teigland <teigland@redhat.com>

dlm: always use GFP_NOFS

Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
ls_allocation would be GFP_KERNEL for userland lockspaces
and GFP_NOFS for file system lockspaces.

It was discovered that any lockspaces on the system can
affect all others by triggering memory reclaim in the
file system which could in turn call back into the dlm
to acquire locks, deadlocking dlm threads that were
shared by all lockspaces, like dlm_recv.

Signed-off-by: David Teigland <teigland@redhat.com>


# 599e0f58 21-Feb-2008 David Teigland <teigland@redhat.com>

dlm: fix rcom_names message to self

The recent patch to validate data lengths in rcom_names messages
failed to account for fake messages a node directs to itself before
ever sending it. In this case we need to fill in the message length
in the header for the validation code to use.

Signed-off-by: David Teigland <teigland@redhat.com>


# e5dae548 05-Feb-2008 David Teigland <teigland@redhat.com>

dlm: proper types for asts and basts

Use proper types for ast and bast functions, and use
consistent type for ast param.

Signed-off-by: David Teigland <teigland@redhat.com>


# ae773d0b 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: verify that places expecting rcom_lock have packet long enough

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 02ed16b6 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: missing length check in check_config()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 4007685c 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: use proper type for ->ls_recover_buf

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 93ff2971 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: do not byteswap rcom_config

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# 163a1859 25-Jan-2008 Al Viro <viro@zeniv.linux.org.uk>

dlm: do not byteswap rcom_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Teigland <teigland@redhat.com>


# dbcfc347 29-Jan-2008 David Teigland <teigland@redhat.com>

dlm: clean ups

A couple small clean-ups. Remove unnecessary wrapper-functions in
rcom.c, and remove unnecessary casting and an unnecessary ASSERT in
util.c.

Signed-off-by: David Teigland <teigland@redhat.com>


# c36258b5 27-Sep-2007 David Teigland <teigland@redhat.com>

[DLM] block dlm_recv in recovery transition

Introduce a per-lockspace rwsem that's held in read mode by dlm_recv
threads while working in the dlm. This allows dlm_recv activity to be
suspended when the lockspace transitions to, from and between recovery
cycles.

The specific bug prompting this change is one where an in-progress
recovery cycle is aborted by a new recovery cycle. While dlm_recv was
processing a recovery message, the recovery cycle was aborted and
dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count
on an rsb after dlm_recoverd had reset it to zero. This is fixed by
suspending dlm_recv (taking write lock on the rwsem) before aborting the
current recovery.

The transitions to/from normal and recovery modes are simplified by using
this new ability to block dlm_recv. The switch from normal to recovery
mode means dlm_recv goes from processing locking messages, to saving them
for later, and vice versa. Races are avoided by blocking dlm_recv when
setting the flag that switches between modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 41684f95 13-Jul-2007 David Teigland <teigland@redhat.com>

[DLM] fix NULL ls usage

Fix regression in recent patch "[DLM] variable allocation" which
attempts to dereference an "ls" struct when it's NULL.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 44f487a5 06-Jun-2007 Patrick Caulfield <pcaulfie@redhat.com>

[DLM] variable allocation

Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8b0e7b2c 18-May-2007 David Teigland <teigland@redhat.com>

[DLM] wait for config check during join [6/6]

Joining the lockspace should wait for the initial round of inter-node
config checks to complete before returning. This way, if there's a
configuration mismatch between the joining node and the existing nodes,
the join can fail and return an error to the application.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 68c817a1 09-Jan-2007 David Teigland <teigland@redhat.com>

[DLM] rename dlm_config_info fields

Add a "ci_" prefix to the fields in the dlm_config_info struct so that we
can use macros to add configfs functions to access them (in a later
patch). No functional changes in this patch, just naming changes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8ec68867 09-Jan-2007 David Teigland <teigland@redhat.com>

[DLM] change some log_error to log_debug

Some common, non-error messages should use log_debug instead of log_error
so they can be turned off.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9e971b71 13-Dec-2006 David Teigland <teigland@redhat.com>

[DLM] add version check

Check if we receive a message from another lockspace member running a
version of the dlm with an incompatible inter-node message protocol.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 38aa8b0c 13-Dec-2006 David Teigland <teigland@redhat.com>

[DLM] fix old rcom messages

A reply to a recovery message will often be received after the relevant
recovery sequence has aborted and the next recovery sequence has begun.
We need to ignore replies to these old messages from the previous
recovery. There's already a way to do this for synchronous recovery
requests using the rc_id number, but not for async.

Each recovery sequence already has a locally unique sequence number
associated with it. This patch adds a field to the rcom (recovery
message) structure where this recovery sequence number can be placed,
rc_seq. When a node sends a reply to a recovery request, it copies the
rc_seq number it received into rc_seq_reply. When the first node receives
the reply to its recovery message, it will check whether rc_seq_reply
matches the current recovery sequence number, ls_recover_seq, and if not
then it ignores the old reply.

An old, inadequate approach to filtering out old replies (checking if the
current stage of recovery has moved back to the start) has been removed
from two spots.

The protocol version number is changed to reflect the different rcom
structures.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 57adf7ee 29-Nov-2006 Ryusuke Konishi <ryusuke@osrg.net>

[DLM] fix format warnings in rcom.c and recoverd.c

This fixes the following gcc warnings generated on
the architectures where uint64_t != unsigned long long (e.g. ppc64).

fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'uint64_t'
fs/dlm/rcom.c:154: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'uint64_t'
fs/dlm/recoverd.c:48: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
fs/dlm/recoverd.c:202: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'
fs/dlm/recoverd.c:210: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'

Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 98f176fb 27-Nov-2006 David Teigland <teigland@redhat.com>

[DLM] don't accept replies to old recovery messages

We often abort a recovery after sending a status request to a remote node.
We want to ignore any potential status reply we get from the remote node.
If we get one of these unwanted replies, we've often moved on to the next
recovery message and incremented the message sequence counter, so the
reply will be ignored due to the seq number. In some cases, we've not
moved on to the next message so the seq number of the reply we want to
ignore is still correct, causing the reply to be accepted. The next
recovery message will then mistake this old reply as a new one.

To fix this, we add the flag RCOM_WAIT to indicate when we can accept a
new reply. We clear this flag if we abort recovery while waiting for a
reply. Before the flag is set again (to allow new replies) we know that
any old replies will be rejected due to their sequence number. We also
initialize the recovery-message sequence number to a random value when a
lockspace is first created. This makes it clear when messages are being
rejected from an old instance of a lockspace that has since been
recreated.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1babdb45 27-Nov-2006 David Teigland <teigland@redhat.com>

[DLM] fix size of STATUS_REPLY message

When the not_ready routine sends a "fake" status reply with blank status
flags, it needs to use the correct size for a normal STATUS_REPLY by
including the size of the would-be config parameters. We also fill in the
non-existant config parameters with an invalid lvblen value so it's easier
to notice if these invalid paratmers are ever being used.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 435618b7 02-Nov-2006 David Teigland <teigland@redhat.com>

[DLM] status messages ping-pong between unmounted nodes

Red Hat BZ 213682

If two nodes leave the lockspace (while unmounting the fs in the case of
gfs) after one has sent a STATUS message to the other, STATUS/STATUS_REPLY
messages will then ping-pong between the nodes when neither of them can
find the lockspace in question any longer. We kill this by not sending
another STATUS message when we get a STATUS_REPLY for an unknown
lockspace.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f5888750 22-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] sequence number missing in not_ready reply

When a status reply is sent for a lockspace that doesn't yet exist, the
message sequence number from the sender was not being copied into the
reply causing the sender to ignore the reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4a99c3d9 09-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] reject replies to old requests

When recoveries are aborted by other recoveries we can get replies to
status or names requests that we've given up on. This can cause problems
if we're making another request and receive an old reply. Add a sequence
number to status/names requests and reject replies that don't match. A
field already exists for the seq number that's used in other message
types.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# faa0f267 08-Aug-2006 David Teigland <teigland@redhat.com>

[DLM] show nodeid for recovery message

To aid debugging, it's useful to be able to see what nodeid the dlm is
waiting on for a message reply.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3bcd3687 23-Feb-2006 David Teigland <teigland@redhat.com>

[DLM] Remove range locks from the DLM

This patch removes support for range locking from the DLM

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e7fd4179 18-Jan-2006 David Teigland <teigland@redhat.com>

[DLM] The core of the DLM for GFS2/CLVM

This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>