History log of /linux-master/arch/powerpc/lib/qspinlock.c
Revision Date Author Comments
# b629b541 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: Rename yield_propagate_owner tunable

Rename yield_propagate_owner to yield_sleepy_owner, which better
describes what it does (what, not how).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-7-npiggin@gmail.com


# 1e6d5f72 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: Propagate sleepy if previous waiter is preempted

The sleepy (aka lock-owner-is-preempted) condition is propagated down
the queue by each waiter. If a waiter becomes preempted, it can no
longer propagate sleepy. To allow subsequent waiters to yield to the
lock owner, also check the lock owner in this case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-6-npiggin@gmail.com


# fcf77d44 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: don't propagate the not-sleepy state

To simplify things, don't propagate the not-sleepy condition back down
the queue. Instead, have the waiters clear their own node->sleepy when
finding the lock owner is not preempted.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-5-npiggin@gmail.com


# fd8fae50 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: propagate owner preemptedness rather than CPU number

Rather than propagating the CPU number of the preempted lock owner,
just propagate whether the owner was preempted. Waiters must read the
lock value when yielding to it to prevent races anyway, so might as
well always load the owner CPU from the lock.

To further simplify the code, also don't propagate the -1 (or
sleepy=false in the new scheme) down the queue. Instead, have the
waiters clear it themselves when finding the lock owner is not
preempted.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-4-npiggin@gmail.com


# f6568647 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: stop queued waiters trying to set lock sleepy

If a queued waiter notices the lock owner or the previous waiter has
been preempted, it attempts to mark the lock sleepy, but it does this
as a try-set operation using the original lock value it got when
queueing, which will become stale as the queue progresses, and the
try-set will fail. Drop this and just set the sleepy seen clock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-3-npiggin@gmail.com


# f9bc9bbe 16-Oct-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: Fix stale propagated yield_cpu

yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).

The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.

This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.

This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.

Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.

Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Laurent Dufour <ldufour@linux.ibm.com>
Reported-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Debugged-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com


# 6f313632 09-May-2023 Rohan McLure <rmclure@linux.ibm.com>

powerpc: qspinlock: Enforce qnode writes prior to publishing to queue

Annotate the release barrier and memory clobber (in effect, producing a
compiler barrier) in the publish_tail_cpu call. These barriers have the
effect of ensuring that qnode attributes are all written to prior to
publish the node to the waitqueue.

Even while the initial write to the 'locked' attribute is guaranteed to
terminate prior to the node being visible, KCSAN still complains that
the write is reorderable by the compiler. Issue a kcsan_release() to
inform KCSAN of the release barrier contained in publish_tail_cpu().

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-3-rmclure@linux.ibm.com


# 03d44ee8 09-May-2023 Rohan McLure <rmclure@linux.ibm.com>

powerpc: qspinlock: Mark accesses to qnode lock checks

The powerpc implementation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.

Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230510033117.1395895-2-rmclure@linux.ibm.com


# 13959373 08-Dec-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: Fix 32-bit build

Some 32-bit configurations don't pull in the spin_begin/end/relax
definitions. Fix is to restore a lost include.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/oe-kbuild-all/202212050224.i7uh9fOh-lkp@intel.com
Link: https://lore.kernel.org/r/20221208123225.1566113-1-npiggin@gmail.com


# 0b219984 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: add compile-time tuning adjustments

This adds compile-time options that allow the EH lock hint bit to be
enabled or disabled, and adds some new options that may or may not
help matters. To help with experimentation and tuning.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-18-npiggin@gmail.com


# 12b459a5 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: provide accounting and options for sleepy locks

Finding the owner or a queued waiter on a lock with a preempted vcpu is
indicative of an oversubscribed guest causing the lock to get into
trouble. Provide some options to detect this situation and have new CPUs
avoid queueing for a longer time (more steal iterations) to minimise the
problems caused by vcpu preemption on the queue.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-17-npiggin@gmail.com


# 39dfc735 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: allow indefinite spinning on a preempted owner

Provide an option that holds off queueing indefinitely while the lock
owner is preempted. This could reduce queueing latencies for very
overcommitted vcpu situations.

This is disabled by default.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-16-npiggin@gmail.com


# cc797011 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: reduce remote node steal spins

Allow for a reduction in the number of times a CPU from a different
node than the owner can attempt to steal the lock before queueing.
This could bias the transfer behaviour of the lock across the
machine and reduce NUMA crossings.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-15-npiggin@gmail.com


# 71c23502 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: use spin_begin/end API

Use the spin_begin/spin_cpu_relax/spin_end APIs in qspinlock, which helps
to prevent threads issuing a lot of expensive priority nops which may not
have much effect due to immediately executing low then medium priority.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-14-npiggin@gmail.com


# f61ab43c 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: allow lock stealing in trylock and lock fastpath

This change allows trylock to steal the lock. It also allows the initial
lock attempt to steal the lock rather than bailing out and going to the
slow path.

This gives trylock more strength: without this a continually-contended
lock will never permit a trylock to succeed. With this change, the
trylock has a small but non-zero chance.

It also gives the lock fastpath most of the benefit of passing the
reservation back through to the steal loop in the slow path without the
complexity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-13-npiggin@gmail.com


# be742c57 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: add ability to prod new queue head CPU

After the head of the queue acquires the lock, it releases the
next waiter in the queue to become the new head. Add an option
to prod the new head if its vCPU was preempted. This may only
have an effect if queue waiters are yielding.

Disable this option by default for now, i.e., no logical change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-12-npiggin@gmail.com


# 28db61e2 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: allow propagation of yield CPU down the queue

Having all CPUs poll the lock word for the owner CPU that should be
yielded to defeats most of the purpose of using MCS queueing for
scalability. Yet it may be desirable for queued waiters to yield to a
preempted owner.

With this change, queue waiters never sample the owner CPU directly from
the lock word. The queue head (which is spinning on the lock) propagates
the owner CPU back to the next waiter if it finds the owner has been
preempted. That waiter then propagates the owner CPU back to the next
waiter, and so on.

s390 addresses this problem differenty, by having queued waiters sample
the lock word to find the owner at a low frequency. That has the
advantage of being simpler, the advantage of propagation is that the
lock word never has to be accesed by queued waiters, and the transfer of
cache lines to transmit the owner data is only required when lock holder
vCPU preemption occurs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-11-npiggin@gmail.com


# b4c3cdc1 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: allow stealing when head of queue yields

If the head of queue is preventing stealing but it finds the owner vCPU
is preempted, it will yield its cycles to the owner which could cause it
to become preempted. Add an option to re-allow stealers before yielding,
and disallow them again after returning from the yield.

Disable this option by default for now, i.e., no logical change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-10-npiggin@gmail.com


# bd48287b 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: implement option to yield to previous node

Queued waiters which are not at the head of the queue don't spin on
the lock word but their qnode lock word, waiting for the previous queued
CPU to release them. Add an option which allows these waiters to yield
to the previous CPU if its vCPU is preempted.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-9-npiggin@gmail.com


# 085f0331 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: paravirt yield to lock owner

Waiters spinning on the lock word should yield to the lock owner if the
vCPU is preempted. This improves performance when the hypervisor has
oversubscribed physical CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-8-npiggin@gmail.com


# e1a31e7f 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: store owner CPU in lock word

Store the owner CPU number in the lock word so it may be yielded to,
as powerpc's paravirtualised simple spinlocks do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com


# 0944534e 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: theft prevention to control latency

Give the queue head the ability to stop stealers. After a number of
spins without successfully acquiring the lock, the queue head sets
this, which halts stealing and will assure it is the next owner.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com


# 6aa42f88 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: allow new waiters to steal the lock before queueing

Allow new waiters to "steal" the lock before queueing. That is, to
acquire it while other CPUs have queued.

This particularly helps paravirt performance when physical CPUs are
oversubscribed, by keeping the lock from becoming a strict FIFO and
vCPU preemption causing queue train wrecks.

The new __queued_spin_trylock_steal() function is put in qspinlock.h
to save having to move it, because it will be used there by a later
change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com


# b3a73b7d 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: convert atomic operations to assembly

This uses more optimal ll/sc style access patterns (rather than
cmpxchg), and also sets the EH=1 lock hint on those operations
which acquire ownership of the lock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com


# 84990b16 26-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: add mcs queueing for contended waiters

This forms the basis of the qspinlock slow path.

Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock
owner does not participate in the queue, only waiters. The first waiter
spins on the lock word, then when the lock is released it takes
ownership and unqueues the next waiter. This is how qspinlocks can be
implemented with the spinlock API -- lock owners don't need a node, only
waiters do.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com


# 9f61521c 27-Nov-2022 Nicholas Piggin <npiggin@gmail.com>

powerpc/qspinlock: powerpc qspinlock implementation

Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinlocks are built, and to make performance and correctness
bisects more useful.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop paravirt.h & processor.h changes to fix 32-bit build]
[mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo