History log of /linux-master/lib/raid6/neon.uc
Revision Date Author Comments
# 3de13550 17-May-2023 Arnd Bergmann <arnd@arndb.de>

raid6: neon: add missing prototypes

The raid6 syndrome functions are generated for different sizes and have
no generic prototype, while in the inner functions have a prototype
in a header that cannot be included from the correct file. In both
cases, the compiler warns about missing prototypes:

lib/raid6/recov_neon_inner.c:27:6: warning: no previous prototype for '__raid6_2data_recov_neon' [-Wmissing-prototypes]
lib/raid6/recov_neon_inner.c:77:6: warning: no previous prototype for '__raid6_datap_recov_neon' [-Wmissing-prototypes]
lib/raid6/neon1.c:56:6: warning: no previous prototype for 'raid6_neon1_gen_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon1.c:86:6: warning: no previous prototype for 'raid6_neon1_xor_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon2.c:56:6: warning: no previous prototype for 'raid6_neon2_gen_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon2.c:97:6: warning: no previous prototype for 'raid6_neon2_xor_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon4.c:56:6: warning: no previous prototype for 'raid6_neon4_gen_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon4.c:119:6: warning: no previous prototype for 'raid6_neon4_xor_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon8.c:56:6: warning: no previous prototype for 'raid6_neon8_gen_syndrome_real' [-Wmissing-prototypes]
lib/raid6/neon8.c:163:6: warning: no previous prototype for 'raid6_neon8_xor_syndrome_real' [-Wmissing-prototypes]

Add a new header file that contains the prototypes for both to avoid
the warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230517132220.937200-1-arnd@kernel.org


# 1ad3935b 25-Feb-2019 ndesaulniers@google.com <ndesaulniers@google.com>

lib/raid6: use vdupq_n_u8 to avoid endianness warnings

Clang warns: vector initializers are not compatible with NEON intrinsics
in big endian mode [-Wnonportable-vector-initialization]

While this is usually the case, it's not an issue for this case since
we're initializing the uint8x16_t (16x uint8_t's) with the same value.

Instead, use vdupq_n_u8 which both compilers lower into a single movi
instruction: https://godbolt.org/z/vBrgzt

This avoids the static storage for a constant value.

Link: https://github.com/ClangBuiltLinux/linux/issues/214
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 35129dde 13-Jul-2017 Ard Biesheuvel <ardb@kernel.org>

md/raid6: use faster multiplication for ARM NEON delta syndrome

The P/Q left side optimization in the delta syndrome simply involves
repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
can accelerate this substantially by performing up to 4 such operations
at once, using the NEON instructions for polynomial multiplication.

Results on a Cortex-A57 running in 64-bit mode:

Before:
-------
raid6: neonx1 xor() 1680 MB/s
raid6: neonx2 xor() 2286 MB/s
raid6: neonx4 xor() 3162 MB/s
raid6: neonx8 xor() 3389 MB/s

After:
------
raid6: neonx1 xor() 2281 MB/s
raid6: neonx2 xor() 3362 MB/s
raid6: neonx4 xor() 3787 MB/s
raid6: neonx8 xor() 4239 MB/s

While we're at it, simplify MASK() by using a signed shift rather than
a vector compare involving a temp register.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# 0e833e69 30-Jun-2015 Ard Biesheuvel <ardb@kernel.org>

md/raid6: delta syndrome for ARM NEON

This implements XOR syndrome calculation using NEON intrinsics.
As before, the module can be built for ARM and arm64 from the
same source.

Relative performance on a Cortex-A57 based system:

raid6: int64x1 gen() 905 MB/s
raid6: int64x1 xor() 881 MB/s
raid6: int64x2 gen() 1343 MB/s
raid6: int64x2 xor() 1286 MB/s
raid6: int64x4 gen() 1896 MB/s
raid6: int64x4 xor() 1321 MB/s
raid6: int64x8 gen() 1773 MB/s
raid6: int64x8 xor() 1165 MB/s
raid6: neonx1 gen() 1834 MB/s
raid6: neonx1 xor() 1278 MB/s
raid6: neonx2 gen() 2528 MB/s
raid6: neonx2 xor() 1942 MB/s
raid6: neonx4 gen() 2888 MB/s
raid6: neonx4 xor() 2334 MB/s
raid6: neonx8 gen() 2957 MB/s
raid6: neonx8 xor() 2232 MB/s
raid6: using algorithm neonx8 gen() 2957 MB/s
raid6: .... xor() 2232 MB/s, rmw enabled

Cc: Markus Stockhausen <stockhausen@collogia.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NeilBrown <neilb@suse.com>


# 7d11965d 16-May-2013 Ard Biesheuvel <ardb@kernel.org>

lib/raid6: add ARM-NEON accelerated syndrome calculation

Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: hpa@linux.intel.com