History log of /linux-master/arch/arm64/include/asm/cmpxchg.h
Revision Date Author Comments
# febe950d 31-May-2023 Peter Zijlstra <peterz@infradead.org>

arch: Remove cmpxchg_double

No moar users, remove the monster.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org


# b23e139d 31-May-2023 Peter Zijlstra <peterz@infradead.org>

arch: Introduce arch_{,try_}_cmpxchg128{,_local}()

For all architectures that currently support cmpxchg_double()
implement the cmpxchg128() family of functions that is basically the
same but with a saner interface.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org


# 06855063 18-Jan-2023 Andrzej Hajda <andrzej.hajda@intel.com>

locking/arch: Rename all internal __xchg() names to __arch_xchg()

Decrease the probability of this internal facility to be used by
driver code.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Palmer Dabbelt <palmer@rivosinc.com> [riscv]
Link: https://lore.kernel.org/r/20230118154450.73842-1-andrzej.hajda@intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>


# 3364c6ce 12-Jan-2022 Kees Cook <keescook@chromium.org>

arm64: atomics: lse: Dereference matching size

When building with -Warray-bounds, the following warning is generated:

In file included from ./arch/arm64/include/asm/lse.h:16,
from ./arch/arm64/include/asm/cmpxchg.h:14,
from ./arch/arm64/include/asm/atomic.h:16,
from ./include/linux/atomic.h:7,
from ./include/asm-generic/bitops/atomic.h:5,
from ./arch/arm64/include/asm/bitops.h:25,
from ./include/linux/bitops.h:33,
from ./include/linux/kernel.h:22,
from kernel/printk/printk.c:22:
./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds]
247 | asm volatile( \
| ^~~
./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE'
266 | __CMPXCHG_CASE(w, , acq_, 32, a, "memory")
| ^~~~~~~~~~~~~~
kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner'
3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
| ^~~~~~~~~~~~~~~~~~~~

This is due to the compiler seeing an unsigned long * cast against
something (atomic_t) that is int sized. Replace the cast with the
matching size cast. This results in no change in binary output.

Note that __ll_sc__cmpxchg_case_##name##sz already uses the same
constraint:

[v] "+Q" (*(u##sz *)ptr

Which is why only the LSE form needs updating and not the
LL/SC form, so this change is unlikely to be problematic.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 920fdab7 10-Sep-2019 Arnd Bergmann <arnd@arndb.de>

arm64: fix unreachable code issue with cmpxchg

On arm64 build with clang, sometimes the __cmpxchg_mb is not inlined
when CONFIG_OPTIMIZE_INLINING is set.
Clang then fails a compile-time assertion, because it cannot tell at
compile time what the size of the argument is:

mm/memcontrol.o: In function `__cmpxchg_mb':
memcontrol.c:(.text+0x1a4c): undefined reference to `__compiletime_assert_175'
memcontrol.c:(.text+0x1a4c): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `__compiletime_assert_175'

Mark all of the cmpxchg() style functions as __always_inline to
ensure that the compiler can see the result.

Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/648
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Tested-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will@kernel.org>


# 5aad6cda 29-Aug-2019 Will Deacon <will@kernel.org>

arm64: atomics: Undefine internal macros after use

We use a bunch of internal macros when constructing our atomic and
cmpxchg routines in order to save on boilerplate. Avoid exposing these
directly to users of the header files.

Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# 0533f97b 29-Aug-2019 Will Deacon <will@kernel.org>

arm64: asm: Kill 'asm/atomic_arch.h'

The contents of 'asm/atomic_arch.h' can be split across some of our
other 'asm/' headers. Remove it.

Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# addfc386 28-Aug-2019 Andrew Murray <amurray@thegoodpenguin.co.uk>

arm64: atomics: avoid out-of-line ll/sc atomics

When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
or toolchain doesn't support it the existing code will fallback to ll/sc
atomics. It achieves this by branching from inline assembly to a function
that is built with special compile flags. Further this results in the
clobbering of registers even when the fallback isn't used increasing
register pressure.

Improve this by providing inline implementations of both LSE and
ll/sc and use a static key to select between them, which allows for the
compiler to generate better atomics code. Put the LL/SC fallback atomics
in their own subsection to improve icache performance.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>


# caab277b 02-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b4f9209b 13-Sep-2018 Will Deacon <will@kernel.org>

arm64: Avoid masking "old" for LSE cmpxchg() implementation

The CAS instructions implicitly access only the relevant bits of the "old"
argument, so there is no need for explicit masking via type-casting as
there is in the LL/SC implementation.

Move the casting into the LL/SC code and remove it altogether for the LSE
implementation.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 5ef3fe4c 13-Sep-2018 Will Deacon <will@kernel.org>

arm64: Avoid redundant type conversions in xchg() and cmpxchg()

Our atomic instructions (either LSE atomics of LDXR/STXR sequences)
natively support byte, half-word, word and double-word memory accesses
so there is no need to mask the data register prior to being stored.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# c0df1081 04-Sep-2018 Mark Rutland <mark.rutland@arm.com>

arm64, locking/atomics: Use instrumented atomics

Now that the generic atomic headers provide instrumented wrappers of all
the atomics implemented by arm64, let's migrate arm64 over to these.

The additional instrumentation will help to find bugs (e.g. when fuzzing
with Syzkaller).

Mostly this change involves adding an arch_ prefix to a number of
function names and macro definitions. When LSE atomics are used, the
out-of-line LL/SC atomics will be named __ll_sc_arch_atomic_${OP}.

Adding the arch_ prefix requires some whitespace fixups to keep things
aligned. Some other unusual whitespace is fixed up at the same time
(e.g. in the cmpxchg wrappers).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linuxdrivers@attotech.com
Cc: dvyukov@google.com
Cc: boqun.feng@gmail.com
Cc: arnd@arndb.de
Cc: aryabinin@virtuozzo.com
Cc: glider@google.com
Link: http://lkml.kernel.org/r/20180904104830.2975-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1cfc63b5 30-Apr-2018 Will Deacon <will@kernel.org>

arm64: cmpwait: Clear event register before arming exclusive monitor

When waiting for a cacheline to change state in cmpwait, we may immediately
wake-up the first time around the outer loop if the event register was
already set (for example, because of the event stream).

Avoid these spurious wakeups by explicitly clearing the event register
before loading the cacheline and setting the exclusive monitor.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 2a58fca9 27-Feb-2018 Will Deacon <will@kernel.org>

arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h

We need linux/compiler.h for unreachable(), so #include it here.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# c9406e51 27-Feb-2018 Will Deacon <will@kernel.org>

arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h

We want to avoid pulling linux/preempt.h into cmpxchg.h, since that can
introduce a circular dependency on linux/bitops.h. linux/preempt.h is
only needed by the per-cpu cmpxchg implementation, which is better off
alongside the per-cpu xchg implementation in percpu.h, so move it there
and add the missing #include.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e8a2d040 19-Feb-2018 Will Deacon <will@kernel.org>

arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG

Having asm/cmpxchg.h pull in linux/bug.h is problematic because this
ends up pulling in the atomic bitops which themselves may be built on
top of atomic.h and cmpxchg.h.

Instead, just include build_bug.h for the definition of BUILD_BUG.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# fee960be 03-May-2017 Mark Rutland <mark.rutland@arm.com>

arm64: xchg: hazard against entire exchange variable

The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
against other accesses to the memory location being exchanged. However,
the pointer passed to the constraint is a u8 pointer, and thus the
hazard only applies to the first byte of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, as demonstrated with the following test case:

union u {
unsigned long l;
unsigned int i[2];
};

unsigned long update_char_hazard(union u *u)
{
unsigned int a, b;

a = u->i[1];
asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
b = u->i[1];

return a ^ b;
}

unsigned long update_long_hazard(union u *u)
{
unsigned int a, b;

a = u->i[1];
asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
b = u->i[1];

return a ^ b;
}

The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
using -O2 or above:

0000000000000000 <update_char_hazard>:
0: d2800001 mov x1, #0x0 // #0
4: f9000001 str x1, [x0]
8: d2800000 mov x0, #0x0 // #0
c: d65f03c0 ret

0000000000000010 <update_long_hazard>:
10: b9400401 ldr w1, [x0,#4]
14: d2800002 mov x2, #0x0 // #0
18: f9000002 str x2, [x0]
1c: b9400400 ldr w0, [x0,#4]
20: 4a000020 eor w0, w1, w0
24: d65f03c0 ret

This patch fixes the issue by passing an unsigned long pointer into the
+Q constraint, as we do for our cmpxchg code. This may hazard against
more than is necessary, but this is better than missing a necessary
hazard.

Fixes: 305d454aaa29 ("arm64: atomics: implement native {relaxed, acquire, release} atomics")
Cc: <stable@vger.kernel.org> # 4.4.x-
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 05492f2f 06-Sep-2016 Will Deacon <will@kernel.org>

arm64: lse: convert lse alternatives NOP padding to use __nops

The LSE atomics are implemented using alternative code sequences of
different lengths, and explicit NOP padding is used to ensure the
patching works correctly.

This patch converts the bulk of the LSE code over to using the __nops
macro, which makes it slightly clearer as to what is going on and also
consolidates all of the padding at the end of the various sequences.

Signed-off-by: Will Deacon <will.deacon@arm.com>


# 03e3c2b7 27-Jun-2016 Will Deacon <will@kernel.org>

locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()

smp_cond_load_acquire() is used to spin on a variable until some
expression involving that variable becomes true.

On arm64, we can build this using the LDXR and WFE instructions, since
clearing of the exclusive monitor as a result of the variable being
changed by another CPU generates an event, which will wake us up out of WFE.

This patch implements smp_cond_load_acquire() using LDXR and WFE, which
themselves are contained in an internal __cmpwait() function.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: catalin.marinas@arm.com
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1467049434-30451-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4a6ccf30 10-Dec-2015 Mark Brown <broonie@kernel.org>

arm64: cmpxchg: Don't incldue linux/mmdebug.h

The arm64 asm/cmpxchg.h includes linux/mmdebug.h but doesn't so far as I
can tell actually use anything from it. Removing the inclusion reduces
spurious header dependency rebuilds and also avoids issues with
recursive inclusions of headers causing build breaks due to attempts to
use things before they are defined if linux/mmdebug.h starts pulling in
more low level headers.

Such errors have happened in -next recently, for example:

In file included from include/linux/completion.h:11:0,
from include/linux/rcupdate.h:43,
from include/linux/tracepoint.h:19,
from include/linux/mmdebug.h:6,
from ./arch/arm64/include/asm/cmpxchg.h:22,
from ./arch/arm64/include/asm/atomic.h:41,
from include/linux/atomic.h:4,
from include/linux/spinlock.h:406,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from arch/arm64/kernel/asm-offsets.c:21:
include/linux/wait.h: In function 'wait_on_atomic_t':
include/linux/wait.h:1218:2: error: implicit declaration of function 'atomic_read' [-Werror=implicit-function-declaration]
if (atomic_read(val) == 0)

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 305d454a 08-Oct-2015 Will Deacon <will@kernel.org>

arm64: atomics: implement native {relaxed, acquire, release} atomics

Commit 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}()
variants of some atomic operation") introduced a relaxed atomic API to
Linux that maps nicely onto the arm64 memory model, including the new
ARMv8.1 atomic instructions.

This patch hooks up the API to our relaxed atomic instructions, rather
than have them all expand to the full-barrier variants as they do
currently.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# a14949e0 30-Jul-2015 Will Deacon <will@kernel.org>

arm64: cmpxchg: truncate sub-word signed types before comparison

When performing a cmpxchg operation on a signed sub-word type (e.g. s8),
we need to ensure that the upper register bits of the "old" value used
for comparison are zeroed, otherwise we may erroneously fail the cmpxchg
which may even be interpreted as success by the caller (if the compiler
performs the truncation as part of its check). This has been observed
in mod_state, where negative values where causing problems with
this_cpu_cmpxchg.

This patch fixes the issue by explicitly casting 8-bit and 16-bit "old"
values using unsigned types in our cmpxchg wrappers. 32-bit types can be
left alone, since the underlying asm makes use of W registers in this
case.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0ea366f5 29-May-2015 Will Deacon <will@kernel.org>

arm64: atomics: prefetch the destination word for write prior to stxr

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch makes use of prfm to prefetch cachelines for write prior to
ldxr/stxr loops when using the ll/sc atomic routines.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e9a4b795 14-May-2015 Will Deacon <will@kernel.org>

arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg_double primitives
so that the LSE casp instruction is used instead.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# c342f782 23-Apr-2015 Will Deacon <will@kernel.org>

arm64: cmpxchg: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg primitives so that
the LSE cas instruction is used instead.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# c8366ba0 31-Mar-2015 Will Deacon <will@kernel.org>

arm64: xchg: patch in lse instructions when supported by the CPU

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our xchg primitives so that
the LSE swp instruction (yes, you read right!) is used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# f3eab718 22-Mar-2015 Steve Capper <steve.capper@linaro.org>

arm64: percpu: Make this_cpu accessors pre-empt safe

this_cpu operations were implemented for arm64 in:
5284e1b arm64: xchg: Implement cmpxchg_double
f97fc81 arm64: percpu: Implement this_cpu operations

Unfortunately, it is possible for pre-emption to take place between
address generation and data access. This can lead to cases where data
is being manipulated by this_cpu for a different CPU than it was
called on. Which effectively breaks the spec.

This patch disables pre-emption for the this_cpu operations
guaranteeing that address generation and data manipulation take place
without a pre-emption in-between.

Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double")
Fixes: f97fc810798c ("arm64: percpu: Implement this_cpu operations")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
[catalin.marinas@arm.com: remove space after type cast]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# f97fc810 19-Nov-2014 Steve Capper <steve.capper@linaro.org>

arm64: percpu: Implement this_cpu operations

The generic this_cpu operations disable interrupts to ensure that the
requested operation is protected from pre-emption. For arm64, this is
overkill and can hurt throughput and latency.

This patch provides arm64 specific implementations for the this_cpu
operations. Rather than disable interrupts, we use the exclusive
monitor or atomic operations as appropriate.

The following operations are implemented: add, add_return, and, or,
read, write, xchg. We also wire up a cmpxchg implementation from
cmpxchg.h.

Testing was performed using the percpu_test module and hackbench on a
Juno board running 3.18-rc4.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 5284e1b4 24-Oct-2014 Steve Capper <steve.capper@linaro.org>

arm64: xchg: Implement cmpxchg_double

The arm64 architecture has the ability to exclusively load and store
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.

This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).

Also definitions of this_cpu_cmpxchg_8 and this_cpu_cmpxchg_double_8
are wired up to cmpxchg_local and cmpxchg_double_local (rather than the
stock implementations that perform non-atomic operations with
interrupts disabled) as they are used by the SLUB.

On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with 5 runs of hackbench on v3.17:

Baseline | With Patch
-----------------+-----------
Mean 119.2312 | 106.1782
StdDev 0.4919 | 0.4494

(times taken to complete `./hackbench 100 process 1000', in seconds)

Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e1dfda9c 30-Apr-2014 Will Deacon <will@kernel.org>

arm64: xchg: prevent warning if return value is unused

Some users of xchg() don't bother using the return value, which results
in a compiler warning like the following (from kgdb):

In file included from linux/arch/arm64/include/asm/atomic.h:27:0,
from include/linux/atomic.h:4,
from include/linux/spinlock.h:402,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from include/linux/pid_namespace.h:4,
from kernel/debug/debug_core.c:30:
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
linux/arch/arm64/include/asm/cmpxchg.h:75:3: warning: value computed is not used [-Wunused-value]
((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
^
linux/arch/arm64/include/asm/atomic.h:132:30: note: in expansion of macro ‘xchg’
#define atomic_xchg(v, new) (xchg(&((v)->counter), new))

kernel/debug/debug_core.c:504:4: note: in expansion of macro ‘atomic_xchg’
atomic_xchg(&kgdb_active, cpu);
^

This patch makes use of the same trick as we do for cmpxchg, by assigning
the return value to a dummy variable in the xchg() macro itself.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 95c41896 03-Feb-2014 Will Deacon <will@kernel.org>

arm64: asm: remove redundant "cc" clobbers

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 8e86f0b4 03-Feb-2014 Will Deacon <will@kernel.org>

arm64: atomics: fix use of acquire + release for full barrier semantics

Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

// A, B, C are independent memory locations

<Access [A]>

// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b

<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

<Access [A]>

// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

<Access [A]>

// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

The simple observations here are:

- The dmb ensures that no subsequent accesses (e.g. the access to C)
can enter or pass the atomic sequence.

- The dmb also ensures that no prior accesses (e.g. the access to A)
can pass the atomic sequence.

- Therefore, no prior access can pass a subsequent access, or
vice-versa (i.e. A is strictly ordered before C).

- The stlxr ensures that no prior access can pass the store component
of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

1. We have observed the ldxr. This means that if we perform a store to
[B], the ldxr will still return older data. If we can observe the
ldxr, then we can potentially observe the permitted re-ordering
with the access to A, which is clearly an issue when compared to
the dmb variant of the code. Thankfully, the exclusive monitor will
save us here since it will be cleared as a result of the store and
the ldxr will retry. Notice that any use of a later memory
observation to imply observation of the ldxr will also imply
observation of the access to A, since the stlxr/dmb ensure strict
ordering.

2. We have not observed the ldxr. This means we can perform a store
and influence the later ldxr. However, that doesn't actually tell
us anything about the access to [A], so we've not lost anything
here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 60010e50 03-Dec-2013 Mark Hambleton <mahamble@broadcom.com>

arm64: cmpxchg: update macros to prevent warnings

Make sure the value we are going to return is referenced in order to
avoid warnings from newer GCCs such as:

arch/arm64/include/asm/cmpxchg.h:162:3: warning: value computed is not used [-Wunused-value]
((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
^
net/netfilter/nf_conntrack_core.c:674:2: note: in expansion of macro ‘cmpxchg’
cmpxchg(&nf_conntrack_hash_rnd, 0, rand);

[Modified to use the current underlying implementation as current
mainline for both cmpxchg() and cmpxchg_local() does -- broonie]

Signed-off-by: Mark Hambleton <mahamble@broadcom.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# cf10b79a 09-Oct-2013 Will Deacon <will@kernel.org>

arm64: cmpxchg: implement cmpxchg64_relaxed

This patch introduces cmpxchg64_relaxed for arm64 using the existing
cmpxchg_local macro, which performs a cmpxchg operation (up to 64 bits)
without barrier semantics.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# a84b086b 21-Apr-2013 Chen Gang <gang.chen@asianux.com>

arm64: Define cmpxchg64 and cmpxchg64_local for outside use

Drivers use cmpxchg64, cmpxchg64_local to perform 64-bit operation, so
they can cross 32-bit and 64-bit platforms (it is a standard way).

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 3a0310eb 03-Feb-2013 Will Deacon <will@kernel.org>

arm64: atomics: fix grossly inconsistent asm constraints for exclusives

Our uses of inline asm constraints for atomic operations are fairly
wild and varied. We basically need to guarantee the following:

1. Any instructions with barrier implications
(load-acquire/store-release) have a "memory" clobber

2. When performing exclusive accesses, the addresing mode is generated
using the "Q" constraint

3. Atomic blocks which use the condition flags, have a "cc" clobber

This patch addresses these concerns which, as well as fixing the
semantics of the code, stops GCC complaining about impossible asm
constraints.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 10b663ae 05-Mar-2012 Catalin Marinas <catalin.marinas@arm.com>

arm64: Miscellaneous header files

This patch introduces a few AArch64-specific header files together with
Kbuild entries for generic headers.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>