History log of /linux-master/arch/arm64/include/asm/spinlock.h
Revision Date Author Comments
# d8d0da4e 10-Feb-2021 Waiman Long <longman@redhat.com>

locking/arch: Move qrwlock.h include after qspinlock.h

include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via
asm-generic/qspinlock.h. However, this does not work because architectures
might be using queued rwlocks but not queued spinlocks (csky), or because they
might be defining their own queued_* macros before including asm/qspinlock.h.

To fix this, ensure that asm/spinlock.h always includes qrwlock.h after
defining arch_spin_is_locked (either directly for csky, or via
asm/qspinlock.h for other architectures). The only inclusion elsewhere
is in kernel/locking/qrwlock.c. That one is really unnecessary because
the file is only compiled in SMP configurations (config QUEUED_RWLOCKS
depends on SMP) and in that case linux/spinlock.h already includes
asm/qrwlock.h if needed, via asm/spinlock.h.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Waiman Long <longman@redhat.com>
Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ben Gardon <bgardon@google.com>
[Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 345d52c1 23-Jan-2020 Qian Cai <cai@lca.pw>

arm64/spinlock: fix a -Wunused-function warning

The commit f5bfdc8e3947 ("locking/osq: Use optimized spinning loop for
arm64") introduced a warning from Clang because vcpu_is_preempted() is
compiled away,

kernel/locking/osq_lock.c:25:19: warning: unused function 'node_cpu'
[-Wunused-function]
static inline int node_cpu(struct optimistic_spin_node *node)
^
1 warning generated.

Fix it by converting vcpu_is_preempted() to a static inline function.

Fixes: f5bfdc8e3947 ("locking/osq: Use optimized spinning loop for arm64")
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>


# f5bfdc8e 13-Jan-2020 Waiman Long <longman@redhat.com>

locking/osq: Use optimized spinning loop for arm64

Arm64 has a more optimized spinning loop (atomic_cond_read_acquire)
using wfe for spinlock that can boost performance of sibling threads
by putting the current cpu to a wait state that is broken only when
the monitored variable changes or an external event happens.

OSQ has a more complicated spinning loop. Besides the lock value, it
also checks for need_resched() and vcpu_is_preempted(). The check for
need_resched() is not a problem as it is only set by the tick interrupt
handler. That will be detected by the spinning cpu right after iret.

The vcpu_is_preempted() check, however, is a problem as changes to the
preempt state of of previous node will not affect the wait state. For
ARM64, vcpu_is_preempted is not currently defined and so is a no-op.
Will has indicated that he is planning to para-virtualize wfe instead
of defining vcpu_is_preempted for PV support. So just add a comment in
arch/arm64/include/asm/spinlock.h to indicate that vcpu_is_preempted()
should not be defined as suggested.

On a 2-socket 56-core 224-thread ARM64 system, a kernel mutex locking
microbenchmark was run for 10s with and without the patch. The
performance numbers before patch were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 316/123,143/2,121,269
Threads = 224, Total Rate = 2,757 kop/s; Percpu Rate = 12 kop/s

After patch, the numbers were:

Running locktest with mutex [runtime = 10s, load = 1]
Threads = 224, Min/Mean/Max = 334/147,836/1,304,787
Threads = 224, Total Rate = 3,311 kop/s; Percpu Rate = 15 kop/s

So there was about 20% performance improvement.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200113150735.21956-1-longman@redhat.com


# caab277b 02-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c1109047 13-Mar-2018 Will Deacon <will@kernel.org>

arm64: locking: Replace ticket lock implementation with qspinlock

It's fair to say that our ticket lock has served us well over time, but
it's time to bite the bullet and start using the generic qspinlock code
so we can make use of explicit MCS queuing and potentially better PV
performance in future.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# c6f5d02b 14-May-2018 Andrea Parri <andrea.parri@amarulasolutions.com>

locking/spinlocks/arm64: Remove smp_mb() from arch_spin_is_locked()

The following commit:

38b850a73034f ("arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks")

... added an smp_mb() to arch_spin_is_locked(), in order
"to ensure that the lock value is always loaded after any other locks have
been taken by the current CPU", and reported one example (the "insane case"
in ipc/sem.c) relying on such guarantee.

It is however understood that spin_is_locked() is not required to provide
such an ordering guarantee (a guarantee that is currently not provided by
all the implementations/archs), and that callers relying on such ordering
should instead insert suitable memory barriers before acting on the result
of spin_is_locked().

Following a recent auditing [1] of the callers of {,raw_}spin_is_locked(),
revealing that none of them are relying on the ordering guarantee anymore,
this commit removes the leading smp_mb() from the primitive thus reverting
38b850a73034f.

[1] https://marc.info/?l=linux-kernel&m=151981440005264&w=2
https://marc.info/?l=linux-kernel&m=152042843808540&w=2
https://marc.info/?l=linux-kernel&m=152043346110262&w=2

Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akiyks@gmail.com
Cc: boqun.feng@gmail.com
Cc: dhowells@redhat.com
Cc: j.alglave@ucl.ac.uk
Cc: linux-arch@vger.kernel.org
Cc: luc.maranget@inria.fr
Cc: npiggin@gmail.com
Cc: parri.andrea@gmail.com
Cc: stern@rowland.harvard.edu
Link: http://lkml.kernel.org/r/1526338889-7003-2-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 202fb4ef 30-Jan-2018 Will Deacon <will@kernel.org>

arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics

If the spinlock "next" ticket wraps around between the initial LDR
and the cmpxchg in the LSE version of spin_trylock, then we can erroneously
think that we have successfuly acquired the lock because we only check
whether the next ticket return by the cmpxchg is equal to the owner ticket
in our updated lock word.

This patch fixes the issue by performing a full 32-bit check of the lock
word when trying to determine whether or not the CASA instruction updated
memory.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 087133ac 12-Oct-2017 Will Deacon <will@kernel.org>

locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks

Now that the qrwlock can make use of WFE, remove our homebrewed rwlock
code in favour of the generic queued implementation.

Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jeremy.Linton@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507810851-306-5-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a4c1887d 03-Oct-2017 Will Deacon <will@kernel.org>

locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations

The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 0160fb17 03-Oct-2017 Will Deacon <will@kernel.org>

locking/arch: Remove dummy arch_{read,spin,write}_relax() implementations

arch_{read,spin,write}_relax() are defined as cpu_relax() by the core
code, so architectures that can't do better (i.e. most of them) don't
need to bother with the dummy definitions.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 952111d7 29-Jun-2017 Paul E. McKenney <paulmck@kernel.org>

arch: Remove spin_unlock_wait() arch-specific definitions

There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>


# a9668cd6 07-Jun-2017 Peter Zijlstra <peterz@infradead.org>

locking: Remove smp_mb__before_spinlock()

Now that there are no users of smp_mb__before_spinlock() left, remove
it entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d89e588c 05-Sep-2016 Peter Zijlstra <peterz@infradead.org>

locking: Introduce smp_mb__after_spinlock()

Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc. That would make all schedule() calls
fully transitive against one another.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 05492f2f 06-Sep-2016 Will Deacon <will@kernel.org>

arm64: lse: convert lse alternatives NOP padding to use __nops

The LSE atomics are implemented using alternative code sequences of
different lengths, and explicit NOP padding is used to ensure the
patching works correctly.

This patch converts the bulk of the LSE code over to using the __nops
macro, which makes it slightly clearer as to what is going on and also
consolidates all of the padding at the end of the various sequences.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 872c63fb 05-Sep-2016 Will Deacon <will@kernel.org>

arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()

smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.

Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.

This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# c56bdcac 02-Jun-2016 Will Deacon <will@kernel.org>

arm64: spinlock: Ensure forward-progress in spin_unlock_wait

Rather than wait until we observe the lock being free (which might never
happen), we can also return from spin_unlock_wait if we observe that the
lock is now held by somebody else, which implies that it was unlocked
but we just missed seeing it in that state.

Furthermore, in such a scenario there is no longer a need to write back
the value that we loaded, since we know that there has been a lock
hand-off, which is sufficient to publish any stores prior to the
unlock_wait because the ARm architecture ensures that a Store-Release
instruction is multi-copy atomic when observed by a Load-Acquire
instruction.

The litmus test is something like:

AArch64
{
0:X1=x; 0:X3=y;
1:X1=y;
2:X1=y; 2:X3=x;
}
P0 | P1 | P2 ;
MOV W0,#1 | MOV W0,#1 | LDAR W0,[X1] ;
STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3] ;
DMB SY | | ;
LDR W2,[X3] | | ;
exists
(0:X2=0 /\ 2:X0=1 /\ 2:X2=0)

where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
doing spin_lock.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 3a5facd0 08-Jun-2016 Will Deacon <will@kernel.org>

arm64: spinlock: fix spin_unlock_wait for LSE atomics

Commit d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
concurrent lockers") fixed spin_unlock_wait for LL/SC-based atomics under
the premise that the LSE atomics (in particular, the LDADDA instruction)
are indivisible.

Unfortunately, these instructions are only indivisible when used with the
-AL (full ordering) suffix and, consequently, the same issue can
theoretically be observed with LSE atomics, where a later (in program
order) load can be speculated before the write portion of the atomic
operation.

This patch fixes the issue by performing a CAS of the lock once we've
established that it's unlocked, in much the same way as the LL/SC code.

Fixes: d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 38b850a7 02-Jun-2016 Will Deacon <will@kernel.org>

arm64: spinlock: order spin_{is_locked,unlock_wait} against local locks

spin_is_locked has grown two very different use-cases:

(1) [The sane case] API functions may require a certain lock to be held
by the caller and can therefore use spin_is_locked as part of an
assert statement in order to verify that the lock is indeed held.
For example, usage of assert_spin_locked.

(2) [The insane case] There are two locks, where a CPU takes one of the
locks and then checks whether or not the other one is held before
accessing some shared state. For example, the "optimized locking" in
ipc/sem.c.

In the latter case, the sequence looks like:

spin_lock(&sem->lock);
if (!spin_is_locked(&sma->sem_perm.lock))
/* Access shared state */

and requires that the spin_is_locked check is ordered after taking the
sem->lock. Unfortunately, since our spinlocks are implemented using a
LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated
before the STXR and consequently return a stale value.

Whilst this hasn't been seen to cause issues in practice, PowerPC fixed
the same issue in 51d7d5205d33 ("powerpc: Add smp_mb() to
arch_spin_is_locked()") and, although we did something similar for
spin_unlock_wait in d86b8da04dfa ("arm64: spinlock: serialise
spin_unlock_wait against concurrent lockers") that doesn't actually take
care of ordering against local acquisition of a different lock.

This patch adds an smp_mb() to the start of our arch_spin_is_locked and
arch_spin_unlock_wait routines to ensure that the lock value is always
loaded after any other locks have been taken by the current CPU.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# d86b8da0 19-Nov-2015 Will Deacon <will@kernel.org>

arm64: spinlock: serialise spin_unlock_wait against concurrent lockers

Boqun Feng reported a rather nasty ordering issue with spin_unlock_wait
on architectures implementing spin_lock with LL/SC sequences and acquire
semantics:

| CPU 1 CPU 2 CPU 3
| ================== ==================== ==============
| spin_unlock(&lock);
| spin_lock(&lock):
| r1 = *lock; // r1 == 0;
| o = READ_ONCE(object); // reordered here
| object = NULL;
| smp_mb();
| spin_unlock_wait(&lock);
| *lock = 1;
| smp_mb();
| o->dead = true;
| if (o) // true
| BUG_ON(o->dead); // true!!

The crux of the problem is that spin_unlock_wait(&lock) can return on
CPU 1 whilst CPU 2 is in the process of taking the lock. This can be
resolved by upgrading spin_unlock_wait to a LOCK operation, forcing it
to serialise against a concurrent locker and giving it acquire semantics
in the process (although it is not at all clear whether this is needed -
different callers seem to assume different things about the barrier
semantics and architectures are similarly disjoint in their
implementations of the macro).

This patch implements spin_unlock_wait using an LL/SC sequence with
acquire semantics on arm64. For v8.1 systems with the LSE atomics, the
exclusive writeback is omitted, since the spin_lock operation is
indivisible and no intermediate state can be observed.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# c1d7cd22 28-Jul-2015 Will Deacon <will@kernel.org>

arm64: spinlock: fix ll/sc unlock on big-endian systems

When unlocking a spinlock, we perform a read-modify-write on the owner
ticket in order to increment it and store it back with release
semantics.

In the LL/SC case, we load the 16-bit ticket using a 32-bit load and
therefore store back the wrong halfword on a big-endian system,
corrupting the lock after the first unlock and killing the system dead.

This patch fixes the unlock code to use 16-bit accessors consistently.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 81bb5c64 09-Feb-2015 Will Deacon <will@kernel.org>

arm64: locks: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our locking functions so that
LSE atomic instructions are used for spinlocks and rwlocks.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 9511ca19 22-Jul-2015 Will Deacon <will@kernel.org>

arm64: rwlocks: don't fail trylock purely due to contention

STXR can fail for a number of reasons, so don't fail an rwlock trylock
operation simply because the STXR reported failure.

I'm not aware of any issues with the current code, but this makes it
consistent with spin_trylock and also other architectures (e.g. arch/arm).

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# af2e7aae 24-Nov-2014 Christian Borntraeger <borntraeger@de.ibm.com>

arm64/spinlock: Replace ACCESS_ONCE READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 95c41896 03-Feb-2014 Will Deacon <will@kernel.org>

arm64: asm: remove redundant "cc" clobbers

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 5686b06c 09-Oct-2013 Will Deacon <will@kernel.org>

arm64: lockref: add support for lockless lockrefs using cmpxchg

Our spinlocks are only 32-bit (2x16-bit tickets) and our cmpxchg can
deal with 8-bytes (as one would hope!).

This patch wires up the cmpxchg-based lockless lockref implementation
for arm64.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 52ea2a56 09-Oct-2013 Will Deacon <will@kernel.org>

arm64: locks: introduce ticket-based spinlock implementation

This patch introduces a ticket lock implementation for arm64, along the
same lines as the implementation for arch/arm/.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 4ecf7ccb 31-May-2013 Catalin Marinas <catalin.marinas@arm.com>

arm64: spinlock: retry trylock operation if strex fails on free lock

An exclusive store instruction may fail for reasons other than lock
contention (e.g. a cache eviction during the critical section) so, in
line with other architectures using similar exclusive instructions
(alpha, mips, powerpc), retry the trylock operation if the lock appears
to be free but the strex reported failure.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Tony Thompson <anthony.thompson@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>


# 3a0310eb 03-Feb-2013 Will Deacon <will@kernel.org>

arm64: atomics: fix grossly inconsistent asm constraints for exclusives

Our uses of inline asm constraints for atomic operations are fairly
wild and varied. We basically need to guarantee the following:

1. Any instructions with barrier implications
(load-acquire/store-release) have a "memory" clobber

2. When performing exclusive accesses, the addresing mode is generated
using the "Q" constraint

3. Atomic blocks which use the condition flags, have a "cc" clobber

This patch addresses these concerns which, as well as fixing the
semantics of the code, stops GCC complaining about impossible asm
constraints.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 08e875c1 05-Mar-2012 Catalin Marinas <catalin.marinas@arm.com>

arm64: SMP support

This patch adds SMP initialisation and spinlocks implementation for
AArch64. The spinlock support uses the new load-acquire/store-release
instructions to avoid explicit barriers. The architecture also specifies
that an event is automatically generated when clearing the exclusive
monitor state to wake up processors in WFE, so there is no need for an
explicit DSB/SEV instruction sequence. The SEVL instruction is used to
set the exclusive monitor locally as there is no conditional WFE and a
branch is more expensive.

For the SMP booting protocol, see Documentation/arm64/booting.txt.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>