History log of /linux-master/arch/powerpc/include/asm/bitops.h
Revision Date Author Comments
# 51a752c2 04-Oct-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

powerpc: implement arch_xor_unlock_is_negative_byte on 32-bit

Simply remove the ifdef. The assembly is identical to that in the
non-optimised case of test_and_clear_bits() on PPC32, and it's not clear
to me how the PPC32 optimisation works, nor whether it would work for
arch_xor_unlock_is_negative_byte(). If that optimisation would work,
someone can implement it later, but this is more efficient than the
implementation in filemap.c.

Link: https://lkml.kernel.org/r/20231004165317.1061855-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 247dbcdb 04-Oct-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

bitops: add xor_unlock_is_negative_byte()

Replace clear_bit_and_unlock_is_negative_byte() with
xor_unlock_is_negative_byte(). We have a few places that like to lock a
folio, set a flag and unlock it again. Allow for the possibility of
combining the latter two operations for efficiency. We are guaranteed
that the caller holds the lock, so it is safe to unlock it with the xor.
The caller must guarantee that nobody else will set the flag without
holding the lock; it is not safe to do this with the PG_dirty flag, for
example.

Link: https://lkml.kernel.org/r/20231004165317.1061855-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# eb5a33ea 02-Aug-2022 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Don't hide eh field of lwarx behind a macro

The eh field must remain 0 for PPC32 and is only used
by PPC64.

Don't hide that behind a macro, just leave the responsibility
to the user.

At the time being, the only users of PPC_RAW_L{WDQ}ARX are
setting the eh field to 0, so the special handling of __PPC_EH
is useless. Just take the value given by the caller.

Same for DEFINE_TESTOP(), don't do special handling in that
macro, ensure the caller hands over the proper eh value.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8b9c8a1a14f9143552a85fcbf96698224a8c2469.1659430931.git.christophe.leroy@csgroup.eu


# 0b0057cc 11-Feb-2022 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/bitops: Force inlining of fls()

Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to
the following functions being copied several times in vmlinux:

31 times __ilog2_u32()
34 times fls()

Disassembly follows:

c00f476c <fls>:
c00f476c: 7c 63 00 34 cntlzw r3,r3
c00f4770: 20 63 00 20 subfic r3,r3,32
c00f4774: 4e 80 00 20 blr

c00f4778 <__ilog2_u32>:
c00f4778: 94 21 ff f0 stwu r1,-16(r1)
c00f477c: 7c 08 02 a6 mflr r0
c00f4780: 90 01 00 14 stw r0,20(r1)
c00f4784: 4b ff ff e9 bl c00f476c <fls>
c00f4788: 80 01 00 14 lwz r0,20(r1)
c00f478c: 38 63 ff ff addi r3,r3,-1
c00f4790: 7c 08 03 a6 mtlr r0
c00f4794: 38 21 00 10 addi r1,r1,16
c00f4798: 4e 80 00 20 blr

When forcing inlining of fls(), we get

c0008b80 <__ilog2_u32>:
c0008b80: 7c 63 00 34 cntlzw r3,r3
c0008b84: 20 63 00 1f subfic r3,r3,31
c0008b88: 4e 80 00 20 blr

vmlinux size gets reduced by 1 kbyte with that change.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/adc9c9d6378f6b5008246ca717993d7870188efb.1644569473.git.christophe.leroy@csgroup.eu


# 47d8c156 14-Aug-2021 Yury Norov <yury.norov@gmail.com>

include: move find.h from asm_generic to linux

find_bit API and bitmap API are closely related, but inclusion paths
are different - include/asm-generic and include/linux, correspondingly.
In the past it made a lot of troubles due to circular dependencies
and/or undefined symbols. Fix this by moving find.h under include/linux.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>


# fb350784 21-Sep-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/bitops: Use immediate operand when possible

Today we get the following code generation for bitops like
set or clear bit:

c0009fe0: 39 40 08 00 li r10,2048
c0009fe4: 7c e0 40 28 lwarx r7,0,r8
c0009fe8: 7c e7 53 78 or r7,r7,r10
c0009fec: 7c e0 41 2d stwcx. r7,0,r8

c000d568: 39 00 18 00 li r8,6144
c000d56c: 7c c0 38 28 lwarx r6,0,r7
c000d570: 7c c6 40 78 andc r6,r6,r8
c000d574: 7c c0 39 2d stwcx. r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

c0009fe0: 7d 00 50 28 lwarx r8,0,r10
c0009fe4: 61 08 08 00 ori r8,r8,2048
c0009fe8: 7d 00 51 2d stwcx. r8,0,r10

c000d558: 7c e0 40 28 lwarx r7,0,r8
c000d55c: 54 e7 05 64 rlwinm r7,r7,0,21,18
c000d560: 7c e0 41 2d stwcx. r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu


# 9401f4e4 02-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros

Force the eh flag at 0 on PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu


# 1891ef21 22-Oct-2020 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()

fls() and fls64() are using __builtin_ctz() and _builtin_ctzll().
On powerpc, those builtins trivially use ctlzw and ctlzd power
instructions.

Allthough those instructions provide the expected result with
input argument 0, __builtin_ctz() and __builtin_ctzll() are
documented as undefined for value 0.

The easiest fix would be to use fls() and fls64() functions
defined in include/asm-generic/bitops/builtin-fls.h and
include/asm-generic/bitops/fls64.h, but GCC output is not optimal:

00000388 <testfls>:
388: 2c 03 00 00 cmpwi r3,0
38c: 41 82 00 10 beq 39c <testfls+0x14>
390: 7c 63 00 34 cntlzw r3,r3
394: 20 63 00 20 subfic r3,r3,32
398: 4e 80 00 20 blr
39c: 38 60 00 00 li r3,0
3a0: 4e 80 00 20 blr

000003b0 <testfls64>:
3b0: 2c 03 00 00 cmpwi r3,0
3b4: 40 82 00 1c bne 3d0 <testfls64+0x20>
3b8: 2f 84 00 00 cmpwi cr7,r4,0
3bc: 38 60 00 00 li r3,0
3c0: 4d 9e 00 20 beqlr cr7
3c4: 7c 83 00 34 cntlzw r3,r4
3c8: 20 63 00 20 subfic r3,r3,32
3cc: 4e 80 00 20 blr
3d0: 7c 63 00 34 cntlzw r3,r3
3d4: 20 63 00 40 subfic r3,r3,64
3d8: 4e 80 00 20 blr

When the input of fls(x) is a constant, just check x for nullity and
return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction
directly.

For fls64() on PPC64, do the same but with __builtin_clzll() and
cntlzd instruction. On PPC32, lets take the generic fls64() which
will use our fls(). The result is as expected:

00000388 <testfls>:
388: 7c 63 00 34 cntlzw r3,r3
38c: 20 63 00 20 subfic r3,r3,32
390: 4e 80 00 20 blr

000003a0 <testfls64>:
3a0: 2c 03 00 00 cmpwi r3,0
3a4: 40 82 00 10 bne 3b4 <testfls64+0x14>
3a8: 7c 83 00 34 cntlzw r3,r4
3ac: 20 63 00 20 subfic r3,r3,32
3b0: 4e 80 00 20 blr
3b4: 7c 63 00 34 cntlzw r3,r3
3b8: 20 63 00 40 subfic r3,r3,64
3bc: 4e 80 00 20 blr

Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu


# 455531e9 21-May-2020 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Remove IBM405 Erratum #77

This erratum is dedicated to IBM 405GP and STB03xxx
which are now gone.

Remove this erratum.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/44dbc08e9034681eb28324cbabc086e97044c36c.1590079969.git.christophe.leroy@csgroup.eu


# 5bece3d6 19-Aug-2019 Daniel Axtens <dja@axtens.net>

powerpc: support KASAN instrumentation of bitops

The powerpc-specific bitops are not being picked up by the KASAN
test suite.

Instrumentation is done via the bitops/instrumented-{atomic,lock}.h
headers. They require that arch-specific versions of bitop functions
are renamed to arch_*. Do this renaming.

For clear_bit_unlock_is_negative_byte, the current implementation
uses the PG_waiters constant. This works because it's a preprocessor
macro - so it's only actually evaluated in contexts where PG_waiters
is defined. With instrumentation however, it becomes a static inline
function, and all of a sudden we need the actual value of PG_waiters.
Because of the order of header includes, it's not available and we
fail to compile. Instead, manually specify that we care about bit 7.
This is still correct: bit 7 is the bit that would mark a negative
byte.

While we're at it, replace __inline__ with inline across the file.

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190820024941.12640-2-dja@axtens.net


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 36a7eeaf 05-Jul-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/405: move PPC405_ERR77 in asm-405.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f782ddf2 21-Apr-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Remove __ilog2()s and use generic ones

With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.

The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.

This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()

For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
return __ilog2(x);
}

int test__ilog2_u32(u32 n)
{
return __ilog2_u32(n);
}

int test__ilog2_u64(u64 n)
{
return __ilog2_u64(n);
}

On PPC32 before the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr

0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr

On PPC32 after the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr

0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr

On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr

0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr

0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr

On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr

0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr

0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 22ef33b3 21-Apr-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Replace ffz() by equivalent generic function

With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces ffz() by the generic function.

The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff

For non constant calls, the generated code is doing the same:

unsigned long testffz(unsigned long x)
{
return ffz(x);
}

On PPC32, before the patch:
00000018 <testffz>:
18: 7c 63 18 f9 not. r3,r3
1c: 40 82 00 0c bne 28 <testffz+0x10>
20: 38 60 00 20 li r3,32
24: 4e 80 00 20 blr
28: 7d 23 00 d0 neg r9,r3
2c: 7d 23 18 38 and r3,r9,r3
30: 7c 63 00 34 cntlzw r3,r3
34: 20 63 00 1f subfic r3,r3,31
38: 4e 80 00 20 blr

On PPC32, after the patch:
00000018 <testffz>:
18: 39 23 00 01 addi r9,r3,1
1c: 7d 23 18 78 andc r3,r9,r3
20: 7c 63 00 34 cntlzw r3,r3
24: 20 63 00 1f subfic r3,r3,31
28: 4e 80 00 20 blr

On PPC64, before the patch:
0000000000000030 <.testffz>:
30: 7c 60 18 f9 not. r0,r3
34: 38 60 00 40 li r3,64
38: 4d 82 00 20 beqlr
3c: 7c 60 00 d0 neg r3,r0
40: 7c 63 00 38 and r3,r3,r0
44: 7c 63 00 74 cntlzd r3,r3
48: 20 63 00 3f subfic r3,r3,63
4c: 7c 63 07 b4 extsw r3,r3
50: 4e 80 00 20 blr

On PPC64, after the patch:
0000000000000030 <.testffz>:
30: 38 03 00 01 addi r0,r3,1
34: 7c 03 18 78 andc r3,r0,r3
38: 7c 63 00 74 cntlzd r3,r3
3c: 20 63 00 3f subfic r3,r3,63
40: 4e 80 00 20 blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2fcff790 21-Apr-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Use builtin functions for fls()/__fls()/fls64()

With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly

For non constant calls, the generated code is doing the same:

int testfls(unsigned int x)
{
return fls(x);
}

unsigned long test__fls(unsigned long x)
{
return __fls(x);
}

int testfls64(__u64 x)
{
return fls64(x);
}

On PPC32, before the patch:
00000064 <testfls>:
64: 7c 63 00 34 cntlzw r3,r3
68: 20 63 00 20 subfic r3,r3,32
6c: 4e 80 00 20 blr

00000070 <test__fls>:
70: 7c 63 00 34 cntlzw r3,r3
74: 20 63 00 1f subfic r3,r3,31
78: 4e 80 00 20 blr

0000007c <testfls64>:
7c: 2c 03 00 00 cmpwi r3,0
80: 40 82 00 10 bne 90 <testfls64+0x14>
84: 7c 83 00 34 cntlzw r3,r4
88: 20 63 00 20 subfic r3,r3,32
8c: 4e 80 00 20 blr
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 40 subfic r3,r3,64
98: 4e 80 00 20 blr

On PPC32, after the patch:
00000054 <testfls>:
54: 7c 63 00 34 cntlzw r3,r3
58: 20 63 00 20 subfic r3,r3,32
5c: 4e 80 00 20 blr

00000060 <test__fls>:
60: 7c 63 00 34 cntlzw r3,r3
64: 20 63 00 1f subfic r3,r3,31
68: 4e 80 00 20 blr

0000006c <testfls64>:
6c: 2c 03 00 00 cmpwi r3,0
70: 41 82 00 10 beq 80 <testfls64+0x14>
74: 7c 63 00 34 cntlzw r3,r3
78: 20 63 00 40 subfic r3,r3,64
7c: 4e 80 00 20 blr
80: 7c 83 00 34 cntlzw r3,r4
84: 20 63 00 40 subfic r3,r3,32
88: 4e 80 00 20 blr

On PPC64, before the patch:
00000000000000a0 <.testfls>:
a0: 7c 63 00 34 cntlzw r3,r3
a4: 20 63 00 20 subfic r3,r3,32
a8: 7c 63 07 b4 extsw r3,r3
ac: 4e 80 00 20 blr

00000000000000b0 <.test__fls>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 3f subfic r3,r3,63
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr

00000000000000c0 <.testfls64>:
c0: 7c 63 00 74 cntlzd r3,r3
c4: 20 63 00 40 subfic r3,r3,64
c8: 7c 63 07 b4 extsw r3,r3
cc: 4e 80 00 20 blr

On PPC64, after the patch:
0000000000000090 <.testfls>:
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 20 subfic r3,r3,32
98: 7c 63 07 b4 extsw r3,r3
9c: 4e 80 00 20 blr

00000000000000a0 <.test__fls>:
a0: 7c 63 00 74 cntlzd r3,r3
a4: 20 63 00 3f subfic r3,r3,63
a8: 4e 80 00 20 blr
ac: 60 00 00 00 nop

00000000000000b0 <.testfls64>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 40 subfic r3,r3,64
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f83647d6 21-Apr-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: Discard ffs()/__ffs() function and use builtin functions instead

With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
return 4 << ffs(31);
}

c0012334 <ffs_test>:
c0012334: 39 20 00 01 li r9,1
c0012338: 38 60 00 04 li r3,4
c001233c: 7d 29 00 34 cntlzw r9,r9
c0012340: 21 29 00 20 subfic r9,r9,32
c0012344: 7c 63 48 30 slw r3,r3,r9
c0012348: 4e 80 00 20 blr

With this patch, the same function will compile as follows:

c0012334 <ffs_test>:
c0012334: 38 60 00 08 li r3,8
c0012338: 4e 80 00 20 blr

The same happens with __ffs()

For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():

unsigned long test__ffs(unsigned long x)
{
return __ffs(x);
}

int testffs(int x)
{
return ffs(x);
}

On PPC32, before the patch:
0000003c <test__ffs>:
3c: 7d 23 00 d0 neg r9,r3
40: 7d 23 18 38 and r3,r9,r3
44: 7c 63 00 34 cntlzw r3,r3
48: 20 63 00 1f subfic r3,r3,31
4c: 4e 80 00 20 blr

00000050 <testffs>:
50: 7d 23 00 d0 neg r9,r3
54: 7d 23 18 38 and r3,r9,r3
58: 7c 63 00 34 cntlzw r3,r3
5c: 20 63 00 20 subfic r3,r3,32
60: 4e 80 00 20 blr

On PPC32, after the patch:
0000002c <test__ffs>:
2c: 7d 23 00 d0 neg r9,r3
30: 7d 23 18 38 and r3,r9,r3
34: 7c 63 00 34 cntlzw r3,r3
38: 20 63 00 1f subfic r3,r3,31
3c: 4e 80 00 20 blr

00000040 <testffs>:
40: 7d 23 00 d0 neg r9,r3
44: 7d 23 18 38 and r3,r9,r3
48: 7c 63 00 34 cntlzw r3,r3
4c: 20 63 00 20 subfic r3,r3,32
50: 4e 80 00 20 blr

On PPC64, before the patch:
0000000000000060 <.test__ffs>:
60: 7c 03 00 d0 neg r0,r3
64: 7c 03 18 38 and r3,r0,r3
68: 7c 63 00 74 cntlzd r3,r3
6c: 20 63 00 3f subfic r3,r3,63
70: 7c 63 07 b4 extsw r3,r3
74: 4e 80 00 20 blr

0000000000000080 <.testffs>:
80: 7c 03 00 d0 neg r0,r3
84: 7c 03 18 38 and r3,r0,r3
88: 7c 63 00 74 cntlzd r3,r3
8c: 20 63 00 40 subfic r3,r3,64
90: 7c 63 07 b4 extsw r3,r3
94: 4e 80 00 20 blr

On PPC64, after the patch:
0000000000000050 <.test__ffs>:
50: 7c 03 00 d0 neg r0,r3
54: 7c 03 18 38 and r3,r0,r3
58: 7c 63 00 74 cntlzd r3,r3
5c: 20 63 00 3f subfic r3,r3,63
60: 4e 80 00 20 blr

0000000000000070 <.testffs>:
70: 7c 03 00 d0 neg r0,r3
74: 7c 03 18 38 and r3,r0,r3
78: 7c 63 00 34 cntlzw r3,r3
7c: 20 63 00 20 subfic r3,r3,32
80: 7c 63 07 b4 extsw r3,r3
84: 4e 80 00 20 blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)

In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.

__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 22bd64a6 05-Apr-2017 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Add more PPC bit conversion macros

Add 32 and 8 bit variants

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7b9f71f9 27-Feb-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: POWER9 machine check handler

Add POWER9 machine check handler. There are several new types of errors
added, so logging messages for those are also added.

This doesn't attempt to reuse any of the P7/8 defines or functions,
because that becomes too complex. The better option in future is to use
a table driven approach.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d11914b2 03-Jan-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64: Implement clear_bit_unlock_is_negative_byte()

Commit b91e1302ad9b8 ("mm: optimize PageWaiters bit use for
unlock_page()") added a special bitop function to speed up
unlock_page(). Implement this for 64-bit powerpc.

This improves the unlock_page() core code from this:

li 9,1
lwsync
1: ldarx 10,0,3,0
andc 10,10,9
stdcx. 10,0,3
bne- 1b
ori 2,2,0
ld 9,0(3)
andi. 10,9,0x80
beqlr
li 4,0
b wake_up_page_bit

To this:

li 10,1
lwsync
1: ldarx 9,0,3,0
andc 9,9,10
stdcx. 9,0,3
bne- 1b
andi. 10,9,0x80
beqlr
li 4,0
b wake_up_page_bit

In a test of elapsed time for dd writing into 16GB of already-dirty
pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
this patch reduced overhead by 1.1%:

N Min Max Median Avg Stddev
x 19 2.578 2.619 2.594 2.595 0.011
+ 19 2.552 2.592 2.564 2.565 0.008
Difference at 95.0% confidence
-0.030 +/- 0.006
-1.142% +/- 0.243%

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Made 64-bit only until I can test it properly on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e7a7a65e 10-Nov-2014 Boqun Feng <boqun.feng@linux.vnet.ibm.com>

powerpc: Fix comment typos in arch/powerpc/include/asm/bitops.h

In arch/powerpc/include/asm/bitops.h, the comments about bit numbers in
large (> 1 word) bitmaps have two typos:
- On ppc64 system, the LSB of the 4th word should be bit 192 rather than
196, because if it's bit 196, bit 192-195 will be missing in the
bitmap.
- On ppc32 system, the LSB of the second word should be bit 32 rather
than 31, because bit 31 is already in the first word.

This patch fixes these typos.

Signed-off-by: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6e4c632c 17-Sep-2014 Anton Blanchard <anton@samba.org>

powerpc: make __ffs return unsigned long

I'm seeing a build warning in mm/nobootmem.c after removing
bootmem:

mm/nobootmem.c: In function '__free_pages_memory':
include/linux/kernel.h:713:17: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void) (&_min1 == &_min2); \
^
mm/nobootmem.c:90:11: note: in expansion of macro 'min'
order = min(MAX_ORDER - 1UL, __ffs(start));
^

The rest of the worlds seems to define __ffs as returning unsigned long,
so lets do that.

Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c645073f 13-Mar-2014 Peter Zijlstra <peterz@infradead.org>

arch,powerpc: Convert smp_mb__*()

Powerpc allows reordering over its ll/sc implementation. Implement the
two new barriers as appropriate.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-gg2ffgq32sjgy9b8lj6m3hsc@git.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e22a2274 30-Oct-2013 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.

If we get a machine check exception due to SLB or TLB errors, then flush
SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning
on MMU. Otherwise we would run into nested machine checks.

If we get a machine check when we are in guest, then just flush the
SLBs and continue. This patch handles errors for power7. The next
patch will handle errors for power8

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 576be130 21-Feb-2013 Michael Ellerman <michael@ellerman.id.au>

powerpc: Remove unused postfix parameter to DEFINE_BITOP()

None of the users of DEFINE_BITOP pass a postfix, and as far as I can
tell none ever did, so drop it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>


# a74f350b 01-Mar-2013 Akinobu Mita <akinobu.mita@gmail.com>

powerpc: Remove unused BITOP_LE_SWIZZLE macro

The BITOP_LE_SWIZZLE macro was used in the little-endian bitops functions
for powerpc. But these functions were converted to generic bitops and
the BITOP_LE_SWIZZLE is not used anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 79597be9 03-Nov-2012 Akinobu Mita <akinobu.mita@gmail.com>

powerpc: Use asm-generic/bitops/le.h

The only difference between powerpc and asm-generic le-bitops is
test_bit_le(). Usually all bitops require a long aligned bitmap.
But powerpc test_bit_le() can take an unaligned address.

There is no special callsite of test_bit_le() that needs unaligned
access in powerpc as far as I can see. So convert to use
asm-generic/bitops/le.h for powerpc.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 2237f4f4 03-Nov-2012 Akinobu Mita <akinobu.mita@gmail.com>

powerpc: Remove BITOP_MASK and BITOP_WORD from asm/bitops.h

Replace BITOP_MASK and BITOP_WORD with BIT_MASK and BIT_WORD defined
in linux/bitops.h and remove BITOP_* which are not used anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 0ef8fa69 04-Oct-2012 Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

powerpc: bitops: introduce {clear,set}_bit_le()

Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b97021f8 15-Nov-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Fix atomic_xxx_return barrier semantics

The Documentation/memory-barriers.txt document requires that atomic
operations that return a value act as a memory barrier both before
and after the actual atomic operation.

Our current implementation doesn't guarantee this. More specifically,
while a load following the isync can not be issued before stwcx. has
completed, that completion doesn't architecturally means that the
result of stwcx. is visible to other processors (or any previous stores
for that matter) (typically, the other processors L1 caches can still
hold the old value).

This has caused an actual crash in RCU torture testing on Power 7

This fixes it by changing those atomic ops to use new macros instead
of RELEASE/ACQUIRE barriers, called ATOMIC_ENTRY and ATMOIC_EXIT barriers,
which are then defined respectively to lwsync and sync.

I haven't had a chance to measure the performance impact (or rather
what I measured with kernel compiles is in the noise, I yet have to
find a more precise benchmark)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 148817ba 26-Jul-2011 Akinobu Mita <akinobu.mita@gmail.com>

asm-generic: add another generic ext2 atomic bitops

The majority of architectures implement ext2 atomic bitops as
test_and_{set,clear}_bit() without spinlock.

This adds this type of generic implementation in ext2-atomic-setbit.h and
use it wherever possible.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Andreas Dilger <adilger@dilger.ca>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 61f2e7b0 23-Mar-2011 Akinobu Mita <akinobu.mita@gmail.com>

bitops: remove minix bitops from asm/bitops.h

minix bit operations are only used by minix filesystem and useless by
other modules. Because byte order of inode and block bitmaps is different
on each architecture like below:

m68k:
big-endian 16bit indexed bitmaps

h8300, microblaze, s390, sparc, m68knommu:
big-endian 32 or 64bit indexed bitmaps

m32r, mips, sh, xtensa:
big-endian 32 or 64bit indexed bitmaps for big-endian mode
little-endian bitmaps for little-endian mode

Others:
little-endian bitmaps

In order to move minix bit operations from asm/bitops.h to architecture
independent code in minix filesystem, this provides two config options.

CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
m32r, mips, sh, xtensa). The architectures which always use little-endian
bitmaps do not select these options.

Finally, we can remove minix bit operations from asm/bitops.h for all
architectures.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f312eff8 23-Mar-2011 Akinobu Mita <akinobu.mita@gmail.com>

bitops: remove ext2 non-atomic bitops from asm/bitops.h

As the result of conversions, there are no users of ext2 non-atomic bit
operations except for ext2 filesystem itself. Now we can put them into
architecture independent code in ext2 filesystem, and remove from
asm/bitops.h for all architectures.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f57d7ff1 23-Mar-2011 Akinobu Mita <akinobu.mita@gmail.com>

powerpc: introduce little-endian bitops

Introduce little-endian bit operations by renaming existing powerpc native
little-endian bit operations and changing them to take any pointer types.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a56560b3 23-Mar-2011 Akinobu Mita <akinobu.mita@gmail.com>

asm-generic: change little-endian bitops to take any pointer types

This makes the little-endian bitops take any pointer types by changing the
prototypes and adding casts in the preprocessor macros.

That would seem to at least make all the filesystem code happier, and they
can continue to do just something like

#define ext2_set_bit __test_and_set_bit_le

(or whatever the exact sequence ends up being).

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c4945b9e 23-Mar-2011 Akinobu Mita <akinobu.mita@gmail.com>

asm-generic: rename generic little-endian bitops functions

As a preparation for providing little-endian bitops for all architectures,
This renames generic implementation of little-endian bitops. (remove
"generic_" prefix and postfix "_le")

s/generic_find_next_le_bit/find_next_bit_le/
s/generic_find_next_zero_le_bit/find_next_zero_bit_le/
s/generic_find_first_zero_le_bit/find_first_zero_bit_le/
s/generic___test_and_set_le_bit/__test_and_set_bit_le/
s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/
s/generic_test_le_bit/test_bit_le/
s/generic___set_le_bit/__set_bit_le/
s/generic___clear_le_bit/__clear_bit_le/
s/generic_test_and_set_le_bit/test_and_set_bit_le/
s/generic_test_and_clear_le_bit/test_and_clear_bit_le/

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 64ff3128 12-Aug-2010 Anton Blanchard <anton@samba.org>

powerpc: Add support for popcnt instructions

POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
this patch does all the work out of line, but it would be nice to implement
them as inlines with an out of line fallback.

The performance issue with hweight was noticed when disabling SMT on a large
(192 thread) POWER7 box. The patch improves that testcase by about 8%.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f10e2e5b 09-Feb-2010 Anton Blanchard <anton@samba.org>

powerpc: Rename LWSYNC_ON_SMP to PPC_RELEASE_BARRIER, ISYNC_ON_SMP to PPC_ACQUIRE_BARRIER

For performance reasons we are about to change ISYNC_ON_SMP to sometimes be
lwsync. Now that the macro name doesn't make sense, change it and LWSYNC_ON_SMP
to better explain what the barriers are doing.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 864b9e6f 09-Feb-2010 Anton Blanchard <anton@samba.org>

powerpc: Use lwarx/ldarx hint in bit locks

This patch implements the lwarx/ldarx hint bit for bit locks.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 0d2d3e38 07-Jul-2009 Geoff Thorpe <geoff@geoffthorpe.net>

powerpc: expose the multi-bit ops that underlie single-bit ops.

The bitops.h functions that operate on a single bit in a bitfield are
implemented by operating on the corresponding word location. In all
cases the inner logic is valid if the mask being applied has more than
one bit set, so this patch exposes those inner operations. Indeed,
set_bits() was already available, but it duplicated code from
set_bit() (rather than making the latter a wrapper) - it was also
missing the PPC405_ERR77() workaround and the "volatile" address
qualifier present in other APIs. This corrects that, and exposes the
other multi-bit equivalents.

One advantage of these multi-bit forms is that they allow word-sized
variables to essentially be their own spinlocks, eg. very useful for
state machines where an atomic "flags" variable can obviate the need
for any additional locking.

Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b8b572e1 31-Jul-2008 Stephen Rothwell <sfr@canb.auug.org.au>

powerpc: Move include files to arch/powerpc/include/asm

from include/asm-powerpc. This is the result of a

mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm

Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly. Of the latter only
one was outside the arch code and it is a driver only built for powerpc.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>