Cross Reference: /linux-master/arch/arm64/include/asm/atomic_ll

History log of /linux-master/arch/arm64/include/asm/atomic_ll_sc.h
Revision	Date	Author	Comments
# febe950d	31-May-2023	Peter Zijlstra <peterz@infradead.org>	arch: Remove cmpxchg_double No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
# b23e139d	31-May-2023	Peter Zijlstra <peterz@infradead.org>	arch: Introduce arch_{,try_}_cmpxchg128{,_local}() For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a saner interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org
# 031af500	04-Jan-2023	Mark Rutland <mark.rutland@arm.com>	arm64: cmpxchg_double: hazard against entire exchange variable The inline assembly for arm64's cmpxchg_double() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960bed5e857eb ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double() similarly at the same time. The same problem applies, as demonstrated with the following test: \| struct big { \| u64 lo, hi; \| } __aligned(128); \| \| unsigned long foo(struct big b) \| { \| u64 hi_old, hi_new; \| \| hi_old = b->hi; \| cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); \| hi_new = b->hi; \| \| return hi_old ^ hi_new; \| } ... which GCC 12.1.0 compiles as: \| 0000000000000000 <foo>: \| 0: d503233f paciasp \| 4: aa0003e4 mov x4, x0 \| 8: 1400000e b 40 <foo+0x40> \| c: d2800240 mov x0, #0x12 // #18 \| 10: d2800681 mov x1, #0x34 // #52 \| 14: aa0003e5 mov x5, x0 \| 18: aa0103e6 mov x6, x1 \| 1c: d2800ac2 mov x2, #0x56 // #86 \| 20: d2800f03 mov x3, #0x78 // #120 \| 24: 48207c82 casp x0, x1, x2, x3, [x4] \| 28: ca050000 eor x0, x0, x5 \| 2c: ca060021 eor x1, x1, x6 \| 30: aa010000 orr x0, x0, x1 \| 34: d2800000 mov x0, #0x0 // #0 <--- BANG \| 38: d50323bf autiasp \| 3c: d65f03c0 ret \| 40: d2800240 mov x0, #0x12 // #18 \| 44: d2800681 mov x1, #0x34 // #52 \| 48: d2800ac2 mov x2, #0x56 // #86 \| 4c: d2800f03 mov x3, #0x78 // #120 \| 50: f9800091 prfm pstl1strm, [x4] \| 54: c87f1885 ldxp x5, x6, [x4] \| 58: ca0000a5 eor x5, x5, x0 \| 5c: ca0100c6 eor x6, x6, x1 \| 60: aa0600a6 orr x6, x5, x6 \| 64: b5000066 cbnz x6, 70 <foo+0x70> \| 68: c8250c82 stxp w5, x2, x3, [x4] \| 6c: 35ffff45 cbnz w5, 54 <foo+0x54> \| 70: d2800000 mov x0, #0x0 // #0 <--- BANG \| 74: d50323bf autiasp \| 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: \| 0000000000000000 <foo>: \| 0: f9400407 ldr x7, [x0, #8] \| 4: d503233f paciasp \| 8: aa0003e4 mov x4, x0 \| c: 1400000f b 48 <foo+0x48> \| 10: d2800240 mov x0, #0x12 // #18 \| 14: d2800681 mov x1, #0x34 // #52 \| 18: aa0003e5 mov x5, x0 \| 1c: aa0103e6 mov x6, x1 \| 20: d2800ac2 mov x2, #0x56 // #86 \| 24: d2800f03 mov x3, #0x78 // #120 \| 28: 48207c82 casp x0, x1, x2, x3, [x4] \| 2c: ca050000 eor x0, x0, x5 \| 30: ca060021 eor x1, x1, x6 \| 34: aa010000 orr x0, x0, x1 \| 38: f9400480 ldr x0, [x4, #8] \| 3c: d50323bf autiasp \| 40: ca0000e0 eor x0, x7, x0 \| 44: d65f03c0 ret \| 48: d2800240 mov x0, #0x12 // #18 \| 4c: d2800681 mov x1, #0x34 // #52 \| 50: d2800ac2 mov x2, #0x56 // #86 \| 54: d2800f03 mov x3, #0x78 // #120 \| 58: f9800091 prfm pstl1strm, [x4] \| 5c: c87f1885 ldxp x5, x6, [x4] \| 60: ca0000a5 eor x5, x5, x0 \| 64: ca0100c6 eor x6, x6, x1 \| 68: aa0600a6 orr x6, x5, x6 \| 6c: b5000066 cbnz x6, 78 <foo+0x78> \| 70: c8250c82 stxp w5, x2, x3, [x4] \| 74: 35ffff45 cbnz w5, 5c <foo+0x5c> \| 78: f9400480 ldr x0, [x4, #8] \| 7c: d50323bf autiasp \| 80: ca0000e0 eor x0, x7, x0 \| 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
# 78f6f5c9	17-Aug-2022	Mark Rutland <mark.rutland@arm.com>	arm64: atomic: always inline the assembly The __lse_() and __ll_sc_() atomic implementations are marked as inline rather than __always_inline, permitting a compiler to generate out-of-line versions, which may be instrumented. We marked the atomic wrappers as __always_inline in commit: c35a824c31834d94 ("arm64: make atomic helpers __always_inline") ... but did not think to do the same for the underlying implementations. If the compiler were to out-of-line an LSE or LL/SC atomic, this could break noinstr code. Ensure this doesn't happen by marking the underlying implementations as __always_inline. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220817155914.3975112-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# b2c3ccbd	17-Aug-2022	Mark Rutland <mark.rutland@arm.com>	arm64: atomics: remove LL/SC trampolines When CONFIG_ARM64_LSE_ATOMICS=y, each use of an LL/SC atomic results in a fragment of code being generated in a subsection without a clear association with its caller. A trampoline in the caller branches to the LL/SC atomic with with a direct branch, and the atomic directly branches back into its trampoline. This breaks backtracing, as any PC within the out-of-line fragment will be symbolized as an offset from the nearest prior symbol (which may not be the function using the atomic), and since the atomic returns with a direct branch, the caller's PC may be missing from the backtrace. For example, with secondary_start_kernel() hacked to contain atomic_inc(NULL), the resulting exception can be reported as being taken from cpus_are_stuck_in_kernel(): \| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 \| Mem abort info: \| ESR = 0x0000000096000004 \| EC = 0x25: DABT (current EL), IL = 32 bits \| SET = 0, FnV = 0 \| EA = 0, S1PTW = 0 \| FSC = 0x04: level 0 translation fault \| Data abort info: \| ISV = 0, ISS = 0x00000004 \| CM = 0, WnR = 0 \| [0000000000000000] user address but active_mm is swapper \| Internal error: Oops: 96000004 [#1] PREEMPT SMP \| Modules linked in: \| CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-11219-geb555cb5b794-dirty #3 \| Hardware name: linux,dummy-virt (DT) \| pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) \| pc : cpus_are_stuck_in_kernel+0xa4/0x120 \| lr : secondary_start_kernel+0x164/0x170 \| sp : ffff80000a4cbe90 \| x29: ffff80000a4cbe90 x28: 0000000000000000 x27: 0000000000000000 \| x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 \| x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000 \| x20: 0000000000000001 x19: 0000000000000001 x18: 0000000000000008 \| x17: 3030383832343030 x16: 3030303030307830 x15: ffff80000a4cbab0 \| x14: 0000000000000001 x13: 5d31666130663133 x12: 3478305b20313030 \| x11: 3030303030303078 x10: 3020726f73736563 x9 : 726f737365636f72 \| x8 : ffff800009ff2ef0 x7 : 0000000000000003 x6 : 0000000000000000 \| x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000100 \| x2 : 0000000000000000 x1 : ffff0000029bd880 x0 : 0000000000000000 \| Call trace: \| cpus_are_stuck_in_kernel+0xa4/0x120 \| __secondary_switched+0xb0/0xb4 \| Code: 35ffffa3 17fffc6c d53cd040 f9800011 (885f7c01) \| ---[ end trace 0000000000000000 ]--- This is confusing and hinders debugging, and will be problematic for CONFIG_LIVEPATCH as these cases cannot be unwound reliably. This is very similar to recent issues with out-of-line exception fixups, which were removed in commits: 35d67794b8828333 ("arm64: lib: __arch_clear_user(): fold fixups into body") 4012e0e22739eef9 ("arm64: lib: __arch_copy_from_user(): fold fixups into body") 139f9ab73d60cf76 ("arm64: lib: __arch_copy_to_user(): fold fixups into body") When the trampolines were introduced in commit: addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics") The rationale was to improve icache performance by grouping the LL/SC atomics together. This has never been measured, and this theoretical benefit is outweighed by other factors: * As the subsections are collapsed into sections at object file granularity, these are spread out throughout the kernel and can share cachelines with unrelated code regardless. * GCC 12.1.0 has been observed to place the trampoline out-of-line in specialised __ll_sc_() functions, introducing more branching than was intended. Removing the trampolines has been observed to shrink a defconfig kernel Image by 64KiB when building with GCC 12.1.0. This patch removes the LL/SC trampolines, meaning that the LL/SC atomics will be inlined into their callers (or placed in out-of line functions using regular BL/RET pairs). When CONFIG_ARM64_LSE_ATOMICS=y, the LL/SC atomics are always called in an unlikely branch, and will be placed in a cold portion of the function, so this should have minimal impact to the hot paths. Other than the improved backtracing, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220817155914.3975112-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 8e6082e9	10-Dec-2021	Mark Rutland <mark.rutland@arm.com>	arm64: atomics: format whitespace consistently The code for the atomic ops is formatted inconsistently, and while this is not a functional problem it is rather distracting when working on them. Some have ops have consistent indentation, e.g. \| #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \ \| static inline int __lse_atomic_add_return##name(int i, atomic_t v) \ \| { \ \| u32 tmp; \ \| \ \| asm volatile( \ \| __LSE_PREAMBLE \ \| " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \ \| " add %w[i], %w[i], %w[tmp]" \ \| : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \ \| : "r" (v) \ \| : cl); \ \| \ \| return i; \ \| } While others have negative indentation for some lines, and/or have misaligned trailing backslashes, e.g. \| static inline void __lse_atomic_##op(int i, atomic_t v) \ \| { \ \| asm volatile( \ \| __LSE_PREAMBLE \ \| " " #asm_op " %w[i], %[v]\n" \ \| : [i] "+r" (i), [v] "+Q" (v->counter) \ \| : "r" (v)); \ \| } This patch makes the indentation consistent and also aligns the trailing backslashes. This makes the code easier to read for those (like myself) who are easily distracted by these inconsistencies. This is intended as a cleanup. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 395af861	15-Jan-2020	Catalin Marinas <catalin.marinas@arm.com>	arm64: Move the LSE gas support detection to Kconfig As the Kconfig syntax gained support for $(as-instr) tests, move the LSE gas support detection from Makefile to the main arm64 Kconfig and remove the additional CONFIG_AS_LSE definition and check. Cc: Will Deacon <will@kernel.org> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
# 03adcbd9	29-Aug-2019	Will Deacon <will@kernel.org>	arm64: atomics: Use K constraint when toolchain appears to support it The 'K' constraint is a documented AArch64 machine constraint supported by GCC for matching integer constants that can be used with a 32-bit logical instruction. Unfortunately, some released compilers erroneously accept the immediate '4294967295' for this constraint, which is later refused by GAS at assembly time. This had led us to avoid the use of the 'K' constraint altogether. Instead, detect whether the compiler is up to the job when building the kernel and pass the 'K' constraint to our 32-bit atomic macros when it appears to be supported. Signed-off-by: Will Deacon <will@kernel.org>
# addfc386	28-Aug-2019	Andrew Murray <amurray@thegoodpenguin.co.uk>	arm64: atomics: avoid out-of-line ll/sc atomics When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware or toolchain doesn't support it the existing code will fallback to ll/sc atomics. It achieves this by branching from inline assembly to a function that is built with special compile flags. Further this results in the clobbering of registers even when the fallback isn't used increasing register pressure. Improve this by providing inline implementations of both LSE and ll/sc and use a static key to select between them, which allows for the compiler to generate better atomics code. Put the LL/SC fallback atomics in their own subsection to improve icache performance. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
# 580fa1b8	28-Aug-2019	Andrew Murray <amurray@thegoodpenguin.co.uk>	arm64: Use correct ll/sc atomic constraints The A64 ISA accepts distinct (but overlapping) ranges of immediates for: * add arithmetic instructions ('I' machine constraint) * sub arithmetic instructions ('J' machine constraint) * 32-bit logical instructions ('K' machine constraint) * 64-bit logical instructions ('L' machine constraint) ... but we currently use the 'I' constraint for many atomic operations using sub or logical instructions, which is not always valid. When CONFIG_ARM64_LSE_ATOMICS is not set, this allows invalid immediates to be passed to instructions, potentially resulting in a build failure. When CONFIG_ARM64_LSE_ATOMICS is selected the out-of-line ll/sc atomics always use a register as they have no visibility of the value passed by the caller. This patch adds a constraint parameter to the ATOMIC_xx and __CMPXCHG_CASE macros so that we can pass appropriate constraints for each case, with uses updated accordingly. Unfortunately prior to GCC 8.1.0 the 'K' constraint erroneously accepted '4294967295', so we must instead force the use of a register. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
# caab277b	02-Jun-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 16f18688	22-May-2019	Mark Rutland <mark.rutland@arm.com>	locking/atomic, arm64: Use s64 for atomic64 As a step towards making the atomic64 API use consistent types treewide, let's have the arm64 atomic64 implementation use s64 as the underlying type for atomic64_t, rather than long, matching the generated headers. As atomic64_read() depends on the generic defintion of atomic64_t, this still returns long. This will be converted in a subsequent patch. Note that in arch_atomic64_dec_if_positive(), the x0 variable is left as long, as this variable is also used to hold the pointer to the atomic64_t. Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: aou@eecs.berkeley.edu Cc: arnd@arndb.de Cc: bp@alien8.de Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: ink@jurassic.park.msu.ru Cc: jhogan@kernel.org Cc: linux@armlinux.org.uk Cc: mattst88@gmail.com Cc: mpe@ellerman.id.au Cc: palmer@sifive.com Cc: paul.burton@mips.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: tony.luck@intel.com Cc: vgupta@synopsys.com Link: https://lkml.kernel.org/r/20190522132250.26499-8-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 42305099	18-Sep-2018	Will Deacon <will@kernel.org>	arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraint The "L" AArch64 machine constraint, which we use for the "old" value in an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit logical instruction. However, for cmpxchg() operations on types smaller than 64 bits, this constraint can result in an invalid instruction which is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff. Whilst we could special-case the constraint based on the cmpxchg size, it's far easier to change the constraint to "K" and put up with using a register for large 64-bit immediates. For out-of-line LL/SC atomics, this is all moot anyway. Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# b4f9209b	13-Sep-2018	Will Deacon <will@kernel.org>	arm64: Avoid masking "old" for LSE cmpxchg() implementation The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation. Move the casting into the LL/SC code and remove it altogether for the LSE implementation. Signed-off-by: Will Deacon <will.deacon@arm.com>
# 5ef3fe4c	13-Sep-2018	Will Deacon <will@kernel.org>	arm64: Avoid redundant type conversions in xchg() and cmpxchg() Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory accesses so there is no need to mask the data register prior to being stored. Signed-off-by: Will Deacon <will.deacon@arm.com>
# c0df1081	04-Sep-2018	Mark Rutland <mark.rutland@arm.com>	arm64, locking/atomics: Use instrumented atomics Now that the generic atomic headers provide instrumented wrappers of all the atomics implemented by arm64, let's migrate arm64 over to these. The additional instrumentation will help to find bugs (e.g. when fuzzing with Syzkaller). Mostly this change involves adding an arch_ prefix to a number of function names and macro definitions. When LSE atomics are used, the out-of-line LL/SC atomics will be named __ll_sc_arch_atomic_${OP}. Adding the arch_ prefix requires some whitespace fixups to keep things aligned. Some other unusual whitespace is fixed up at the same time (e.g. in the cmpxchg wrappers). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linuxdrivers@attotech.com Cc: dvyukov@google.com Cc: boqun.feng@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: glider@google.com Link: http://lkml.kernel.org/r/20180904104830.2975-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 8df728e1	12-May-2017	Robin Murphy <robin.murphy@arm.com>	arm64: Remove redundant mov from LL/SC cmpxchg The cmpxchg implementation introduced by commit c342f78217e8 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU") performs an apparently redundant register move of [old] to [oldval] in the success case - it always uses the same register width as [oldval] was originally loaded with, and is only executed when [old] and [oldval] are known to be equal anyway. The only effect it seemingly does have is to take up a surprising amount of space in the kernel text, as removing it reveals: text data bss dec hex filename 12426658 1348614 4499749 18275021 116dacd vmlinux.o.new 12429238 1348614 4499749 18277601 116e4e1 vmlinux.o.old Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# e490f9b1	17-Apr-2016	Peter Zijlstra <peterz@infradead.org>	locking/atomic, arch/arm64: Implement atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() Implement FETCH-OP atomic primitives, these are very similar to the existing OP-RETURN primitives we already have, except they return the value of the atomic variable _before_ modification. This is especially useful for irreversible operations -- such as bitops (because it becomes impossible to reconstruct the state prior to modification). [wildea01: compile fixes for ll/sc] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 57a65667	05-Nov-2015	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	arm64: cmpxchg_dbl: fix return value type The current arm64 __cmpxchg_double{_mb} implementations carry out the compare exchange by first comparing the old values passed in to the values read from the pointer provided and by stashing the cumulative bitwise difference in a 64-bit register. By comparing the register content against 0, it is possible to detect if the values read differ from the old values passed in, so that the compare exchange detects whether it has to bail out or carry on completing the operation with the exchange. Given the current implementation, to detect the cmpxchg operation status, the __cmpxchg_double{_mb} functions should return the 64-bit stashed bitwise difference so that the caller can detect cmpxchg failure by comparing the return value content against 0. The current implementation declares the return value as an int, which means that the 64-bit value stashing the bitwise difference is truncated before being returned to the __cmpxchg_double{_mb} callers, which means that any bitwise difference present in the top 32 bits goes undetected, triggering false positives and subsequent kernel failures. This patch fixes the issue by declaring the arm64 __cmpxchg_double{_mb} return values as a long, so that the bitwise difference is properly propagated on failure, restoring the expected behaviour. Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Cc: <stable@vger.kernel.org> # 4.3+ Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 305d454a	08-Oct-2015	Will Deacon <will@kernel.org>	arm64: atomics: implement native {relaxed, acquire, release} atomics Commit 654672d4ba1a ("locking/atomics: Add _{acquire\|release\|relaxed}() variants of some atomic operation") introduced a relaxed atomic API to Linux that maps nicely onto the arm64 memory model, including the new ARMv8.1 atomic instructions. This patch hooks up the API to our relaxed atomic instructions, rather than have them all expand to the full-barrier variants as they do currently. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 7f08a414	04-Aug-2015	Mark Rutland <mark.rutland@arm.com>	arm64: make ll/sc __cmpxchg_case_##name asm consistent The ll/sc __cmpxchg_case_##name assembly mostly uses symbolic names for operands, but in a single case uses %2 to refer to what is otherwise known as %[v]. This makes the code more painful to read than is necessary. Use %[v] instead. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# db26217e	29-May-2015	Will Deacon <will@kernel.org>	arm64: atomic64_dec_if_positive: fix incorrect branch condition If we attempt to atomic64_dec_if_positive on INT_MIN, we will underflow and incorrectly decide that the original parameter was positive. This patches fixes the broken condition code so that we handle this corner case correctly. Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# 6059a7b6	04-Jun-2015	Will Deacon <will@kernel.org>	arm64: atomics: implement atomic{,64}_cmpxchg using cmpxchg We don't need duplicate cmpxchg implementations, so use cmpxchg to implement atomic{,64}_cmpxchg, like we do for xchg already. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# 0ea366f5	29-May-2015	Will Deacon <will@kernel.org>	arm64: atomics: prefetch the destination word for write prior to stxr The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch makes use of prfm to prefetch cachelines for write prior to ldxr/stxr loops when using the ll/sc atomic routines. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# 4e39715f	29-May-2015	Will Deacon <will@kernel.org>	arm64: cmpxchg: avoid memory barrier on comparison failure cmpxchg doesn't require memory barrier semantics when the value comparison fails, so make the barrier conditional on success. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# 0bc671d3	29-May-2015	Will Deacon <will@kernel.org>	arm64: cmpxchg: avoid "cc" clobber in ll/sc routines We can perform the cmpxchg comparison using eor and cbnz which avoids the "cc" clobber for the ll/sc case and consequently for the LSE case where we may have to fall-back on the ll/sc code at runtime. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# e9a4b795	14-May-2015	Will Deacon <will@kernel.org>	arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of our cmpxchg_double primitives so that the LSE casp instruction is used instead. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# c342f782	23-Apr-2015	Will Deacon <will@kernel.org>	arm64: cmpxchg: patch in lse instructions when supported by the CPU On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of our cmpxchg primitives so that the LSE cas instruction is used instead. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# c09d6a04	03-Feb-2015	Will Deacon <will@kernel.org>	arm64: atomics: patch in lse instructions when supported by the CPU On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of atomic_t and atomic64_t routines so that the call-site for the out-of-line ll/sc sequences is patched with an LSE atomic instruction when we detect that the CPU supports it. If binutils is not recent enough to assemble the LSE instructions, then the ll/sc sequences are inlined as though CONFIG_ARM64_LSE_ATOMICS=n. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# c0385b24	02-Feb-2015	Will Deacon <will@kernel.org>	arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics In order to patch in the new atomic instructions at runtime, we need to generate wrappers around the out-of-line exclusive load/store atomics. This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which causes our atomic functions to branch to the out-of-line ll/sc implementations. To avoid the register spill overhead of the PCS, the out-of-line functions are compiled with specific compiler flags to force out-of-line save/restore of any registers that are usually caller-saved. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
# c275f76b	03-Feb-2015	Will Deacon <will@kernel.org>	arm64: atomics: move ll/sc atomics into separate header file In preparation for the Large System Extension (LSE) atomic instructions introduced by ARM v8.1, move the current exclusive load/store (LL/SC) atomics into their own header file. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>