History log of /linux-master/kernel/futex/pi.c
Revision Date Author Comments
# e626cb02 17-Jan-2024 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

futex: Prevent the reuse of stale pi_state

Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during
a lock operation for a PI futex. It requires that the a lock process is
interrupted by a timeout or signal:

T1 Owns the futex in user space.

T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a
pi_state and attaches itself to it.

T2 Times out and removes its rt_waiter from the rt_mutex. Drops the
rtmutex lock and tries to acquire the hash bucket lock to remove
the futex_q. The lock is contended and T2 schedules out.

T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no
rt_waiter. Unlocks the futex (do_uncontended) and makes it available
to user space.

T3 Acquires the futex in user space.

T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the
existing futex_q of T2 and tries to attach itself to the existing
pi_state. This (attach_to_pi_state()) fails with -EINVAL because uval
contains the TID of T3 but pi_state points to T1.

It's incorrect to unlock the futex and make it available for user space to
acquire as long as there is still an existing state attached to it in the
kernel.

T1 cannot hand over the futex to T2 because T2 already gave up and started
to clean up and is blocked on the hash bucket lock, so T2's futex_q with
the pi_state pointing to T1 is still queued.

T2 observes the futex_q, but ignores it as there is no waiter on the
corresponding rt_mutex and takes the uncontended path which allows the
subsequent caller of futex_lock_pi() (T4) to observe that stale state.

To prevent this the unlock path must dequeue all futex_q entries which
point to the same pi_state when there is no waiter on the rt mutex. This
requires obviously to make the dequeue conditional in the locking path to
prevent a double dequeue. With that it's guaranteed that user space cannot
observe an uncontended futex which has kernel state attached.

Fixes: fbeb558b0dd0d ("futex/pi: Fix recursive rt_mutex waiter state")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de
Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org


# 3b63a55f 20-Sep-2023 peterz@infradead.org <peterz@infradead.org>

futex: Propagate flags into get_futex_key()

Instead of only passing FLAGS_SHARED as a boolean, pass down flags as
a whole.

No functional change intended.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230921105248.282857501@noisy.programming.kicks-ass.net


# fbeb558b 15-Sep-2023 Peter Zijlstra <peterz@infradead.org>

futex/pi: Fix recursive rt_mutex waiter state

Some new assertions pointed out that the existing code has nested rt_mutex wait
state in the futex code.

Specifically, the futex_lock_pi() cancel case uses spin_lock() while there
still is a rt_waiter enqueued for this task, resulting in a state where there
are two waiters for the same task (and task_struct::pi_blocked_on gets
scrambled).

The reason to take hb->lock at this point is to avoid the wake_futex_pi()
EAGAIN case.

This happens when futex_top_waiter() and rt_mutex_top_waiter() state becomes
inconsistent. The current rules are such that this inconsistency will not be
observed.

Notably the case that needs to be avoided is where futex_lock_pi() and
futex_unlock_pi() interleave such that unlock will fail to observe a new
waiter.

*However* the case at hand is where a waiter is leaving, in this case the race
means a waiter that is going away is not observed -- which is harmless,
provided this race is explicitly handled.

This is a somewhat dangerous proposition because the converse race is not
observing a new waiter, which must absolutely not happen. But since the race is
valid this cannot be asserted.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20230915151943.GD6743@noisy.programming.kicks-ass.net


# d14f9e93 08-Sep-2023 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

locking/rtmutex: Use rt_mutex specific scheduler helpers

Have rt_mutex use the rt_mutex specific scheduler helpers to avoid
recursion vs rtlock on the PI state.

[[ peterz: adapted to new names ]]

Reported-by: Crystal Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-6-bigeasy@linutronix.de


# 68290613 11-May-2022 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

futex: Remove a PREEMPT_RT_FULL reference.

Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE
Kconfig option. The latter was a subset of the functionality that was
enabled with PREEMPT_RT_FULL and was mainly useful for debugging.

During the merging efforts the two Kconfig options were abandoned in the
v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables
the full features set (as PREEMPT_RT_FULL did in earlier releases).

Replace the PREEMPT_RT_FULL reference with PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/YnvWUvq1vpqCfCU7@linutronix.de


# 85dc28fa 23-Sep-2021 Peter Zijlstra <peterz@infradead.org>

futex: Split out PI futex

Move the PI futex implementation into it's own file.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: André Almeida <andrealmeid@collabora.com>
Link: https://lore.kernel.org/r/20210923171111.300673-10-andrealmeid@collabora.com