History log of /linux-master/include/linux/atomic/atomic-arch-fallback.h
Revision Date Author Comments
# 6dfee110 08-Feb-2024 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: Clarify ordering of conditional atomics

Conditional atomic operations (e.g. cmpxchg()) only provide ordering
when the condition holds; when the condition does not hold, the location
is not modified and relaxed ordering is provided. Where ordering is
needed for failed conditional atomics, it is necessary to use
smp_mb__before_atomic() and/or smp_mb__after_atomic().

This is explained tersely in memory-barriers.txt, and is implied but not
explicitly stated in the kerneldoc comments for the conditional
operations. The lack of an explicit statement has lead to some off-list
queries about the ordering semantics of failing conditional operations,
so evidently this is confusing.

Update the kerneldoc comments to explicitly describe the lack of ordering
for failed conditional atomic operations.

For most conditional atomic operations, this is written as:

| If (${condition}), atomically updates @v to (${new}) with ${desc_order} ordering.
| Otherwise, @v is not modified and relaxed ordering is provided.

For the try_cmpxchg() operations, this is written as:

| If (${condition}), atomically updates @v to @new with ${desc_order} ordering.
| Otherwise, @v is not modified, @old is updated to the current value of @v,
| and relaxed ordering is provided.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Link: https://lore.kernel.org/r/20240209124010.2096198-1-mark.rutland@arm.com


# e01cc1e8 25-Sep-2023 Uros Bizjak <ubizjak@gmail.com>

locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback

Provide the generic sync_try_cmpxchg() function from the
raw_ prefixed version, also adding explicit instrumentation.

The patch amends existing scripts to generate sync_try_cmpxchg()
locking primitive and its raw_sync_try_cmpxchg() fallback, while
leaving existing macros from the try_cmpxchg() family unchanged.

The target can define its own arch_sync_try_cmpxchg() to override the
generic version of raw_sync_try_cmpxchg(). This allows the target
to generate more optimal assembly than the generic version.

Additionally, the patch renames two scripts to better reflect
whet they really do.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org


# 6d2779ec 19-Sep-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: fix fallback ifdeffery

Since commit:

9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")

The ordering fallbacks for atomic*_read_acquire() and
atomic*_set_release() erroneously fall back to the implictly relaxed
atomic*_read() and atomic*_set() variants respectively, without any
additional barriers. This loses the ACQUIRE and RELEASE ordering
semantics, which can result in a wide variety of problems, even on
strongly-ordered architectures where the implementation of
atomic*_read() and/or atomic*_set() allows the compiler to reorder those
relative to other accesses.

In practice this has been observed to break bit spinlocks on arm64,
resulting in dentry cache corruption.

The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to
be defined in terms of FULL ops, but where an op had RELAXED ordering by
default, this unintentionally permitted the ACQUIRE/RELEASE ops to be
defined in terms of the implicitly RELAXED default.

This patch corrects the logic to avoid falling back to implicitly
RELAXED ops, resulting in the same behaviour as prior to commit
9257959a6e5b4fca.

I've verified the resulting assembly on arm64 by generating outlined
wrappers of the atomics. Prior to this patch the compiler generates
sequences using relaxed load (LDR) and store (STR) instructions, e.g.

| <outlined_atomic64_read_acquire>:
| ldr x0, [x0]
| ret
|
| <outlined_atomic64_set_release>:
| str x1, [x0]
| ret

With this patch applied the compiler generates sequences using the
intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.

| <outlined_atomic64_read_acquire>:
| ldar x0, [x0]
| ret
|
| <outlined_atomic64_set_release>:
| stlr x1, [x0]
| ret

To make sure that there were no other victims of the ifdeffery rewrite,
I generated outlined copies of all of the {atomic,atomic64,atomic_long}
atomic operations before and after commit 9257959a6e5b4fca. A diff of
the generated assembly on arm64 shows that only the read_acquire() and
set_release() operations were changed, and only lost their intended
ordering:

| [mark@lakrids:~/src/linux]% diff -u \
| <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o)
| <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o)
| --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100
| +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100
| @@ -1,5 +1,5 @@
|
| -before-9257959a6e5b4fca.o: file format elf64-littleaarch64
| +after-9257959a6e5b4fca.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
| @@ -9,7 +9,7 @@
| 4: d65f03c0 ret
|
| 0000000000000008 <outlined_atomic_read_acquire>:
| - 8: 88dffc00 ldar w0, [x0]
| + 8: b9400000 ldr w0, [x0]
| c: d65f03c0 ret
|
| 0000000000000010 <outlined_atomic_set>:
| @@ -17,7 +17,7 @@
| 14: d65f03c0 ret
|
| 0000000000000018 <outlined_atomic_set_release>:
| - 18: 889ffc01 stlr w1, [x0]
| + 18: b9000001 str w1, [x0]
| 1c: d65f03c0 ret
|
| 0000000000000020 <outlined_atomic_add>:
| @@ -1230,7 +1230,7 @@
| 1070: d65f03c0 ret
|
| 0000000000001074 <outlined_atomic64_read_acquire>:
| - 1074: c8dffc00 ldar x0, [x0]
| + 1074: f9400000 ldr x0, [x0]
| 1078: d65f03c0 ret
|
| 000000000000107c <outlined_atomic64_set>:
| @@ -1238,7 +1238,7 @@
| 1080: d65f03c0 ret
|
| 0000000000001084 <outlined_atomic64_set_release>:
| - 1084: c89ffc01 stlr x1, [x0]
| + 1084: f9000001 str x1, [x0]
| 1088: d65f03c0 ret
|
| 000000000000108c <outlined_atomic64_add>:
| @@ -2427,7 +2427,7 @@
| 207c: d65f03c0 ret
|
| 0000000000002080 <outlined_atomic_long_read_acquire>:
| - 2080: c8dffc00 ldar x0, [x0]
| + 2080: f9400000 ldr x0, [x0]
| 2084: d65f03c0 ret
|
| 0000000000002088 <outlined_atomic_long_set>:
| @@ -2435,7 +2435,7 @@
| 208c: d65f03c0 ret
|
| 0000000000002090 <outlined_atomic_long_set_release>:
| - 2090: c89ffc01 stlr x1, [x0]
| + 2090: f9000001 str x1, [x0]
| 2094: d65f03c0 ret
|
| 0000000000002098 <outlined_atomic_long_add>:

I've build tested this with a variety of configs for alpha, arm, arm64,
csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv,
s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I
was unable to build test for ia64 and parisc due to existing build
breakage in v6.6-rc2.

Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
Reported-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com


# b33eb50a 15-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc

The ${atomic}_dec_if_positive() ops are unlike all the other conditional
atomic ops. Rather than returning a boolean success value, these return
the value that the atomic variable would be updated to, even when no
update is performed.

We missed this when adding kerneldoc comments, and the documentation for
${atomic}_dec_if_positive() erroneously states:

| Return: @true if @v was updated, @false otherwise.

Ideally we'd clean this up by aligning ${atomic}_dec_if_positive() with
the usual atomic op conventions: with ${atomic}_fetch_dec_if_positive()
for those who care about the value of the varaible, and
${atomic}_dec_if_positive() returning a boolean success value.

In the mean time, align the documentation with the current reality.

Fixes: ad8110706f381170 ("locking/atomic: scripts: generate kerneldoc comments")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20230615132734.1119765-1-mark.rutland@arm.com


# ad811070 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: generate kerneldoc comments

Currently the atomics are documented in Documentation/atomic_t.txt, and
have no kerneldoc comments. There are a sufficient number of gotchas
(e.g. semantics, noinstr-safety) that it would be nice to have comments
to call these out, and it would be nice to have kerneldoc comments such
that these can be collated.

While it's possible to derive the semantics from the code, this can be
painful given the amount of indirection we currently have (e.g. fallback
paths), and it's easy to be mislead by naming, e.g.

* The unconditional void-returning ops *only* have relaxed variants
without a _relaxed suffix, and can easily be mistaken for being fully
ordered.

It would be nice to give these a _relaxed() suffix, but this would
result in significant churn throughout the kernel.

* Our naming of conditional and unconditional+test ops is rather
inconsistent, and it can be difficult to derive the name of an
operation, or to identify where an op is conditional or
unconditional+test.

Some ops are clearly conditional:
- dec_if_positive
- add_unless
- dec_unless_positive
- inc_unless_negative

Some ops are clearly unconditional+test:
- sub_and_test
- dec_and_test
- inc_and_test

However, what exactly those test is not obvious. A _test_zero suffix
might be clearer.

Others could be read ambiguously:
- inc_not_zero // conditional
- add_negative // unconditional+test

It would probably be worth renaming these, e.g. to inc_unless_zero and
add_test_negative.

As a step towards making this more consistent and easier to understand,
this patch adds kerneldoc comments for all generated *atomic*_*()
functions. These are generated from templates, with some common text
shared, making it easy to extend these in future if necessary.

I've tried to make these as consistent and clear as possible, and I've
deliberately ensured:

* All ops have their ordering explicitly mentioned in the short and long
description.

* All test ops have "test" in their short description.

* All ops are described as an expression using their usual C operator.
For example:

andnot: "Atomically updates @v to (@v & ~@i)"
inc: "Atomically updates @v to (@v + 1)"

Which may be clearer to non-naative English speakers, and allows all
the operations to be described in the same style.

* All conditional ops have their condition described as an expression
using the usual C operators. For example:

add_unless: "If (@v != @u), atomically updates @v to (@v + @i)"
cmpxchg: "If (@v == @old), atomically updates @v to @new"

Which may be clearer to non-naative English speakers, and allows all
the operations to be described in the same style.

* All bitwise ops (and,andnot,or,xor) explicitly mention that they are
bitwise in their short description, so that they are not mistaken for
performing their logical equivalents.

* The noinstr safety of each op is explicitly described, with a
description of whether or not to use the raw_ form of the op.

There should be no functional change as a result of this patch.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-26-mark.rutland@arm.com


# 1d78814d 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: simplify raw_atomic*() definitions

Currently each ordering variant has several potential definitions,
with a mixture of preprocessor and C definitions, including several
copies of its C prototype, e.g.

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Make this a bit simpler by defining the C prototype once, and writing
the various potential definitions as plain C code guarded by ifdeffery.
For example, the above becomes:

| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| #if defined(arch_atomic_fetch_andnot_acquire)
| return arch_atomic_fetch_andnot_acquire(i, v);
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| #elif defined(arch_atomic_fetch_andnot)
| return arch_atomic_fetch_andnot(i, v);
| #else
| return raw_atomic_fetch_and_acquire(~i, v);
| #endif
| }

Which is far easier to read. As we now always have a single copy of the
C prototype wrapping all the potential definitions, we now have an
obvious single location for kerneldoc comments.

At the same time, the fallbacks for raw_atomic*_xhcg() are made to use
'new' rather than 'i' as the name of the new value. This is what the
existing fallback template used, and is more consistent with the
raw_atomic{_try,}cmpxchg() fallbacks.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-24-mark.rutland@arm.com


# 9257959a 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: scripts: restructure fallback ifdeffery

Currently the various ordering variants of an atomic operation are
defined in groups of full/acquire/release/relaxed ordering variants with
some shared ifdeffery and several potential definitions of each ordering
variant in different branches of the shared ifdeffery.

As an ordering variant can have several potential definitions down
different branches of the shared ifdeffery, it can be painful for a
human to find a relevant definition, and we don't have a good location
to place anything common to all definitions of an ordering variant (e.g.
kerneldoc).

Historically the grouping of full/acquire/release/relaxed ordering
variants was necessary as we filled in the missing atomics in the same
namespace as the architecture used. It would be easy to accidentally
define one ordering fallback in terms of another ordering fallback with
redundant barriers, and avoiding that would otherwise require a lot of
baroque ifdeffery.

With recent changes we no longer need to fill in the missing atomics in
the arch_atomic*_<op>() namespace, and only need to fill in the
raw_atomic*_<op>() namespace. Due to this, there's no risk of a
namespace collision, and we can define each raw_atomic*_<op> ordering
variant with its own ifdeffery checking for the arch_atomic*_<op>
ordering variants.

Restructure the fallbacks in this way, with each ordering variant having
its own ifdeffery of the form:

| #if defined(arch_atomic_fetch_andnot_acquire)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
| #elif defined(arch_atomic_fetch_andnot_relaxed)
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| int ret = arch_atomic_fetch_andnot_relaxed(i, v);
| __atomic_acquire_fence();
| return ret;
| }
| #elif defined(arch_atomic_fetch_andnot)
| #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
| #else
| static __always_inline int
| raw_atomic_fetch_andnot_acquire(int i, atomic_t *v)
| {
| return raw_atomic_fetch_and_acquire(~i, v);
| }
| #endif

Note that where there's no relevant arch_atomic*_<op>() ordering
variant, we'll define the operation in terms of a distinct
raw_atomic*_<otherop>(), as this itself might have been filled in with a
fallback.

As we now generate the raw_atomic*_<op>() implementations directly, we
no longer need the trivial wrappers, so they are removed.

This makes the ifdeffery easier to follow, and will allow for further
improvements in subsequent patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com


# d12157ef 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: make atomic*_{cmp,}xchg optional

Most architectures define the atomic/atomic64 xchg and cmpxchg
operations in terms of arch_xchg and arch_cmpxchg respectfully.

Add fallbacks for these cases and remove the trivial cases from arch
code. On some architectures the existing definitions are kept as these
are used to build other arch_atomic*() operations.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-5-mark.rutland@arm.com


# 14d72d4b 05-Jun-2023 Mark Rutland <mark.rutland@arm.com>

locking/atomic: remove fallback comments

Currently a subset of the fallback templates have kerneldoc comments,
resulting in a haphazard set of generated kerneldoc comments as only
some operations have fallback templates to begin with.

We'd like to generate more consistent kerneldoc comments, and to do so
we'll need to restructure the way the fallback code is generated.

To minimize churn and to make it easier to restructure the fallback
code, this patch removes the existing kerneldoc comments from the
fallback templates. We can add new kerneldoc comments in subsequent
patches.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-3-mark.rutland@arm.com


# 8c8b096a 31-May-2023 Peter Zijlstra <peterz@infradead.org>

instrumentation: Wire up cmpxchg128()

Wire up the cmpxchg128 family in the atomic wrapper scripts.

These provide the generic cmpxchg128 family of functions from the
arch_ prefixed version, adding explicit instrumentation where needed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531132323.519237070@infradead.org


# e6ce9d74 05-Apr-2023 Uros Bizjak <ubizjak@gmail.com>

locking/atomic: Add generic try_cmpxchg{,64}_local() support

Add generic support for try_cmpxchg{,64}_local() and their falbacks.

These provides the generic try_cmpxchg_local family of functions
from the arch_ prefixed version, also adding explicit instrumentation.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230405141710.3551-2-ubizjak@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>


# e5ab9eff 23-Mar-2023 Thomas Gleixner <tglx@linutronix.de>

atomics: Provide atomic_add_negative() variants

atomic_add_negative() does not provide the relaxed/acquire/release
variants.

Provide them in preparation for a new scalable reference count algorithm.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230323102800.101763813@linutronix.de


# 0aa7be05 15-May-2022 Uros Bizjak <ubizjak@gmail.com>

locking/atomic: Add generic try_cmpxchg64 support

Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed}
and their falbacks involving cmpxchg64.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com


# dc1b4df0 07-Feb-2022 Mark Rutland <mark.rutland@arm.com>

atomics: Fix atomic64_{read_acquire,set_release} fallbacks

Arnd reports that on 32-bit architectures, the fallbacks for
atomic64_read_acquire() and atomic64_set_release() are broken as they
use smp_load_acquire() and smp_store_release() respectively, which do
not work on types larger than the native word size.

Since those contain compiletime_assert_atomic_type(), any attempt to use
those fallbacks will result in a build-time error. e.g. with the
following added to arch/arm/kernel/setup.c:

| void test_atomic64(atomic64_t *v)
| {
| atomic64_set_release(v, 5);
| atomic64_read_acquire(v);
| }

The compiler will complain as follows:

| In file included from <command-line>:
| In function 'arch_atomic64_set_release',
| inlined from 'test_atomic64' at ./include/linux/atomic/atomic-instrumented.h:669:2:
| ././include/linux/compiler_types.h:346:38: error: call to '__compiletime_assert_9' declared with attribute error: Need native word sized stores/loads for atomicity.
| 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| | ^
| ././include/linux/compiler_types.h:327:4: note: in definition of macro '__compiletime_assert'
| 327 | prefix ## suffix(); \
| | ^~~~~~
| ././include/linux/compiler_types.h:346:2: note: in expansion of macro '_compiletime_assert'
| 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| | ^~~~~~~~~~~~~~~~~~~
| ././include/linux/compiler_types.h:349:2: note: in expansion of macro 'compiletime_assert'
| 349 | compiletime_assert(__native_word(t), \
| | ^~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:133:2: note: in expansion of macro 'compiletime_assert_atomic_type'
| 133 | compiletime_assert_atomic_type(*p); \
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:164:55: note: in expansion of macro '__smp_store_release'
| 164 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
| | ^~~~~~~~~~~~~~~~~~~
| ./include/linux/atomic/atomic-arch-fallback.h:1270:2: note: in expansion of macro 'smp_store_release'
| 1270 | smp_store_release(&(v)->counter, i);
| | ^~~~~~~~~~~~~~~~~
| make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/setup.o] Error 1
| make[1]: *** [scripts/Makefile.build:550: arch/arm/kernel] Error 2
| make: *** [Makefile:1831: arch/arm] Error 2

Fix this by only using smp_load_acquire() and smp_store_release() for
native atomic types, and otherwise falling back to the regular barriers
necessary for acquire/release semantics, as we do in the more generic
acquire and release fallbacks.

Since the fallback templates are used to generate the atomic64_*() and
atomic_*() operations, the __native_word() check is added to both. For
the atomic_*() operations, which are always 32-bit, the __native_word()
check is redundant but not harmful, as it is always true.

For the example above this works as expected on 32-bit, e.g. for arm
multi_v7_defconfig:

| <test_atomic64>:
| push {r4, r5}
| dmb ish
| pldw [r0]
| mov r2, #5
| mov r3, #0
| ldrexd r4, [r0]
| strexd r4, r2, [r0]
| teq r4, #0
| bne 484 <test_atomic64+0x14>
| ldrexd r2, [r0]
| dmb ish
| pop {r4, r5}
| bx lr

... and also on 64-bit, e.g. for arm64 defconfig:

| <test_atomic64>:
| bti c
| paciasp
| mov x1, #0x5
| stlr x1, [x0]
| ldar x0, [x0]
| autiasp
| ret

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20220207101943.439825-1-mark.rutland@arm.com


# e3d18cee 13-Jul-2021 Mark Rutland <mark.rutland@arm.com>

locking/atomic: centralize generated headers

The generated atomic headers are only intended to be included directly
by <linux/atomic.h>, but are spread across include/linux/ and
include/asm-generic/, where people mnay be encouraged to include them.

This patch centralizes them under include/linux/atomic/.

Other than the header guards and hashes, there is no change to any of
the generated headers as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210713105253.7615-4-mark.rutland@arm.com