Cross Reference: /linux-master/fs/dlm/member.c

History log of /linux-master/fs/dlm/member.c
Revision	Date	Author	Comments
# 11519351	01-Aug-2023	Alexander Aring <aahringo@redhat.com>	fs: dlm: constify receive buffer The dlm receive buffer should be never manipulated as DLM is the last instance of parsing layer. This patch constify the whole receive buffer so we are sure it never gets manipulated when it's being parsed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# c4f4e135	01-Aug-2023	Alexander Aring <aahringo@redhat.com>	fs: dlm: get recovery sequence number as parameter This patch removes a read of the ls->ls_recover_seq uint64_t number in _create_rcom(). If the ls->ls_recover_seq is readed the ls_recover_lock need to held. However this number was always readed before when any rcom message is received and it's not necessary to read it again from a per lockspace variable to use it for the replying message. This patch will pass the sequence number as parameter so another read of ls->ls_recover_seq and holding the ls->ls_recover_lock is not required. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# c84c4733	01-Aug-2023	Alexander Aring <aahringo@redhat.com>	fs: dlm: remove clear_members_cb This patch is just a small cleanup to directly call remove_remote_member() instead of going over clear_members_cb() which just calls remove_remote_member(). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 07ee3867	29-May-2023	Alexander Aring <aahringo@redhat.com>	fs: dlm: filter ourself midcomms calls It makes no sense to call midcomms/lowcomms functionality for the local node as socket functionality is only required for remote nodes. This patch filters those calls in the upper layer of lockspace membership handling instead of doing it in midcomms/lowcomms layer as they should never be aware of local nodeid. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 9c693d76	27-Oct-2022	Alexander Aring <aahringo@redhat.com>	fs: dlm: catch dlm_add_member() error This patch will catch a possible dlm_add_member() and delivers it to the dlm recovery handling. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 682bb91b	22-Jun-2022	Alexander Aring <aahringo@redhat.com>	fs: dlm: make new_lockspace() wait until recovery completes Make dlm_new_lockspace() wait until a full recovery completes sucessfully or fails. Previously, dlm_new_lockspace() returned to the caller after dlm_recover_members() finished, which is only partially through recovery. The result of the previous behavior is that the new lockspace would not be usable for some time (especially with overlapping recoveries), and some errors in the later part of recovery could not be returned to the caller. Kernel callers gfs2 and cluster-md have their own wait handling to wait for recovery to complete after calling dlm_new_lockspace(). This continues to work, but will be unnecessary. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 7e09b15c	22-Jun-2022	Alexander Aring <aahringo@redhat.com>	fs: dlm: call dlm_lsop_recover_prep once A lockspace can be "stopped" multiple times consecutively before being "started" (when recoveries overlap.) In this case, the lsop_recover_prep callback only needs to be called once when the lockspace is first stopped, and not repeatedly for each stop. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# ca8031d9	22-Jun-2022	Alexander Aring <aahringo@redhat.com>	fs: dlm: update comments about recovery and membership handling Make clear that a particular recovery iteration must not be aborted before membership changes are applied to the members list (ls_nodes) and midcomms layer. Interrupting recovery before this can result in missing node-specific changes in midcomms or through lsops. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 2f9dbeda	04-Apr-2022	Alexander Aring <aahringo@redhat.com>	dlm: use __le types for rcom messages This patch changes to use __le types directly in the dlm rcom structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 3428785a	04-Apr-2022	Alexander Aring <aahringo@redhat.com>	dlm: use __le types for dlm header This patch changes to use __le types directly in the dlm header structure which is casted at the right dlm message buffer positions. The main goal what is reached here is to remove sparse warnings regarding to host to little byte order conversion or vice versa. Leaving those sparse issues ignored and always do it in out/in functionality tends to leave it unknown in which byte order the variable is being handled. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# e10249b1	02-Nov-2021	Alexander Aring <aahringo@redhat.com>	fs: dlm: use dlm_recovery_stopped in condition This patch will change to evaluate the dlm_recovery_stopped() in the condition of the if branch instead fetch it before evaluating the condition. As this is an atomic test-set operation it should be evaluated in the condition itself. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# aee742c9	18-Aug-2021	Alexander Aring <aahringo@redhat.com>	fs: dlm: fix return -EINTR on recovery stopped This patch will return -EINTR instead of 1 if recovery is stopped. In case of ping_members() the return value will be checked if the error is -EINTR for signaling another recovery was triggered and the whole recovery process will come to a clean end to process the next one. Returning 1 will abort the recovery process and can leave the recovery in a broken state. It was reported with the following kernel log message attached and a gfs2 mount stopped working: "dlm: bobvirt1: dlm_recover_members error 1" whereas 1 was returned because of a conversion of "dlm_recovery_stopped()" to an errno was missing which this patch will introduce. While on it all other possible missing errno conversions at other places were added as they are done as in other places. It might be worth to check the error case at this recovery level, because some of the functionality also returns -ENOBUFS and check why recovery ends in a broken state. However this will fix the issue if another recovery was triggered at some points of recovery handling. Reported-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# d10a0b88	02-Jun-2021	Alexander Aring <aahringo@redhat.com>	fs: dlm: rename socket and app buffer defines This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE defines the maximum size of buffer which can be handled on socket layer, the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be handled by the DLM application layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# a070a91c	21-May-2021	Alexander Aring <aahringo@redhat.com>	fs: dlm: add more midcomms hooks This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# c937aabb	21-May-2021	Alexander Aring <aahringo@redhat.com>	fs: dlm: always run complete for possible waiters This patch changes the ping_members() result that we always run complete() for possible waiters. We handle the -EINTR error code as successful. This error code is returned if the recovery is stopped which is likely that a new recovery is triggered with a new members configuration and ping_members() runs again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 9f8f9c77	02-Nov-2020	Alexander Aring <aahringo@redhat.com>	fs: dlm: define max send buffer This patch will set the maximum transmit buffer size for rcom messages with "names" to 4096 bytes. It's a leftover change of commit 4798cbbfbd00 ("fs: dlm: rework receive handling"). Fact is that we cannot allocate a contiguous transmit buffer length above of 4096 bytes. It seems at some places the upper layer protocol will calculate according to dlm_config.ci_buffer_size the possible payload of a dlm recovery message. As compiler setting we will use now the maximum possible message which dlm can send out. Commit 4e192ee68e5af ("fs: dlm: disallow buffer size below default") disallow a buffer setting smaller than the 4096 bytes and above 4096 bytes is definitely wrong because we will then write out of buffer space as we cannot allocate a contiguous buffer above 4096 bytes. The ci_buffer_size is still there to define the possible maximum receive buffer size of a recvmsg() which should be at least the maximum possible dlm message size. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# 2522fe45	28-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193 Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license v 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 45 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 58a923ad	13-Nov-2018	Denis V. Lunev <den@openvz.org>	dlm: fix possible call to kfree() for non-initialized pointer Technically dlm_config_nodes() could return error and keep nodes uninitialized. After that on the fail path of we'll call kfree() for that uninitialized value. The patch is simple - we should just initialize nodes with NULL. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David Teigland <teigland@redhat.com>
# d968b4e2	02-Nov-2018	Tycho Andersen <tycho@tycho.pizza>	dlm: fix invalid free dlm_config_nodes() does not allocate nodes on failure, so we should not free() nodes when it fails. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Signed-off-by: David Teigland <teigland@redhat.com>
# 2ab93ae1	06-May-2017	Markus Elfring <elfring@users.sourceforge.net>	dlm: Delete an unnecessary variable initialisation in dlm_ls_start() The local variable "rv" is reassigned by a statement at the beginning. Thus omit the explicit initialisation. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
# d12ad1a9	06-May-2017	Markus Elfring <elfring@users.sourceforge.net>	dlm: Improve a size determination in two functions Replace the specification of two data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
# 2f48e061	06-May-2017	Markus Elfring <elfring@users.sourceforge.net>	dlm: Use kcalloc() in two functions * Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus reuse the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of data structures by pointer dereferences to make the corresponding size determinations a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
# 790854be	06-May-2017	Markus Elfring <elfring@users.sourceforge.net>	dlm: Use kmalloc_array() in make_member_array() * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David Teigland <teigland@redhat.com>
# 075f0177	14-Feb-2014	David Teigland <teigland@redhat.com>	dlm: use INFO for recovery messages The log messages relating to the progress of recovery are minimal and very often useful. Change these to the KERN_INFO level so they are always available. Signed-off-by: David Teigland <teigland@redhat.com>
# 475f230c	02-Aug-2012	David Teigland <teigland@redhat.com>	dlm: fix unlock balance warnings The in_recovery rw_semaphore has always been acquired and released by different threads by design. To work around the "BUG: bad unlock balance detected!" messages, adjust things so the dlm_recoverd thread always does both down_write and up_write. Signed-off-by: David Teigland <teigland@redhat.com>
# 60f98d18	02-Nov-2011	David Teigland <teigland@redhat.com>	dlm: add recovery callbacks These new callbacks notify the dlm user about lock recovery. GFS2, and possibly others, need to be aware of when the dlm will be doing lock recovery for a failed lockspace member. In the past, this coordination has been done between dlm and file system daemons in userspace, which then direct their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland <teigland@redhat.com>
# 757a4271	20-Oct-2011	David Teigland <teigland@redhat.com>	dlm: add node slots and generation Slot numbers are assigned to nodes when they join the lockspace. The slot number chosen is the minimum unused value starting at 1. Once a node is assigned a slot, that slot number will not change while the node remains a lockspace member. If the node leaves and rejoins it can be assigned a new slot number. A new generation number is also added to a lockspace. It is set and incremented during each recovery along with the slot collection/assignment. The slot numbers will be passed to gfs2 which will use them as journal id's. Signed-off-by: David Teigland <teigland@redhat.com>
# f95a34c6	13-Oct-2011	David Teigland <teigland@redhat.com>	dlm: move recovery barrier calls Put all the calls to recovery barriers in the same function to clarify where they each happen. Should not change any behavior. Also modify some recovery debug lines to make them consistent. Signed-off-by: David Teigland <teigland@redhat.com>
# c41b20e7	11-Dec-2009	Adam Buchbinder <adam.buchbinder@gmail.com>	Fix misspellings of "truly" in comments. Some comments misspell "truly"; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
# 573c24c4	30-Nov-2009	David Teigland <teigland@redhat.com>	dlm: always use GFP_NOFS Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by: David Teigland <teigland@redhat.com>
# 748285cc	15-May-2009	David Teigland <teigland@redhat.com>	dlm: use more NOFS allocation Change some GFP_KERNEL allocations to use either GFP_NOFS or ls_allocation (when available) which the fs sets to GFP_NOFS. The point is to prevent allocations from going back into the cluster fs in places where that might lead to deadlock. Signed-off-by: David Teigland <teigland@redhat.com>
# 391fbdc5	07-May-2009	Christine Caulfield <ccaulfie@redhat.com>	dlm: connect to nodes earlier Make network connections to other nodes earlier, in the context of dlm_recoverd. This avoids connecting to nodes from dlm_send where we try to avoid allocations which could possibly deadlock if memory reclaim goes into the cluster fs which may try to do a dlm operation. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
# d44e0fc7	18-Mar-2008	David Teigland <teigland@redhat.com>	dlm: recover nodes that are removed and re-added If a node is removed from a lockspace, and then added back before the dlm is notified of the removal, the dlm will not detect the removal and won't clear the old state from the node. This is fixed by using a list of added nodes so the membership recovery can detect when a newly added node is already in the member list. Signed-off-by: David Teigland <teigland@redhat.com>
# 46b43eed	08-Jan-2008	David Teigland <teigland@redhat.com>	dlm: reject messages from non-members Messages from nodes that are no longer members of the lockspace should be ignored. When nodes are removed from the lockspace, recovery can sometimes complete quickly enough that messages arrive from a removed node after recovery has completed. When processed, these messages would often cause an error message, and could in some cases change some state, causing problems. Signed-off-by: David Teigland <teigland@redhat.com>
# c36258b5	27-Sep-2007	David Teigland <teigland@redhat.com>	[DLM] block dlm_recv in recovery transition Introduce a per-lockspace rwsem that's held in read mode by dlm_recv threads while working in the dlm. This allows dlm_recv activity to be suspended when the lockspace transitions to, from and between recovery cycles. The specific bug prompting this change is one where an in-progress recovery cycle is aborted by a new recovery cycle. While dlm_recv was processing a recovery message, the recovery cycle was aborted and dlm_recoverd began cleaning up. dlm_recv decremented recover_locks_count on an rsb after dlm_recoverd had reset it to zero. This is fixed by suspending dlm_recv (taking write lock on the rwsem) before aborting the current recovery. The transitions to/from normal and recovery modes are simplified by using this new ability to block dlm_recv. The switch from normal to recovery mode means dlm_recv goes from processing locking messages, to saving them for later, and vice versa. Races are avoided by blocking dlm_recv when setting the flag that switches between modes. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
# 1a2bf2ee	18-Jul-2007	Jesper Juhl <jesper.juhl@gmail.com>	[DLM] Fix memory leak in dlm_add_member() when dlm_node_weight() returns less than zero There's a memory leak in fs/dlm/member.c::dlm_add_member(). If "dlm_node_weight(ls->ls_name, nodeid)" returns < 0, then we'll return without freeing the memory allocated to the (at that point yet unused) 'memb'. This patch frees the allocated memory in that case and thus avoids the leak. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
# 8b0e7b2c	18-May-2007	David Teigland <teigland@redhat.com>	[DLM] wait for config check during join [6/6] Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
# 3ae1acf9	18-May-2007	David Teigland <teigland@redhat.com>	[DLM] add lock timeouts and warnings [2/6] New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
# 91c0dc93	31-Oct-2006	David Teigland <teigland@redhat.com>	[DLM] fix aborted recovery during node removal Red Hat BZ 211914 With the new cluster infrastructure, dlm recovery for a node removal can be aborted and restarted for a node addition. When this happens, the restarted recovery isn't aware that it's doing recovery for the earlier removal as well as the addition. So, it then skips the recovery steps only required when nodes are removed. This can result in locks not being purged for failed/removed nodes. The fix is to check for removed nodes for which recovery has not been completed at the start of a new recovery sequence. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
# faa0f267	08-Aug-2006	David Teigland <teigland@redhat.com>	[DLM] show nodeid for recovery message To aid debugging, it's useful to be able to see what nodeid the dlm is waiting on for a message reply. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>