History log of /linux-master/include/linux/raid/pq.h
Revision Date Author Comments
# 36dacddb 05-Jan-2022 Dirk Müller <dmueller@suse.de>

lib/raid6: Use strict priority ranking for pq gen() benchmarking

On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Testing across a hardware pool of
various generations of intel cpus I could not find a single
case where SSE2 won over AVX2 or AVX512. There are cases where
AVX2 wins over AVX512 however.

Change "prefer" into an integer priority field (similar to
how recov selection works) to have more than one ranking level
available, which is backwards compatible with existing behavior.

Give AVX2/512 variants higher priority over SSE2 in order to skip
SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this
saves in the order of 200ms of initialization time.

Signed-off-by: Dirk Müller <dmueller@suse.de>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>


# f591df3c 19-Dec-2019 Zhengyuan Liu <liuzhengyuan@kylinos.cn>

md/raid6: fix algorithm choice under larger PAGE_SIZE

There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon. To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:

const int disks = (65536/PAGE_SIZE) + 2;

So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.

This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>


# 5e5ac01c 19-Dec-2019 Zhengyuan Liu <liuzhengyuan@kylinos.cn>

raid6/test: fix a compilation warning

The compilation warning is redefination showed as following:

In file included from tables.c:2:
../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined
#define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, "")

In file included from tables.c:1:
../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition
#define EXPORT_SYMBOL(sym)

Fixes: 69a94abb82ee ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>


# 6b8651aa 19-Dec-2019 Zhengyuan Liu <liuzhengyuan@kylinos.cn>

raid6/test: fix a compilation error

The compilation error is redeclaration showed as following:

In file included from ../../../include/linux/limits.h:6,
from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38,
from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161,
from /usr/include/limits.h:183,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34,
from ../../../include/linux/raid/pq.h:30,
from algos.c:14:
../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’
typedef s64 int64_t;
^~~~~~~
In file included from /usr/include/stdint.h:34,
from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9,
from /usr/include/inttypes.h:27,
from ../../../include/linux/raid/pq.h:29,
from algos.c:14:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \
declaration of ‘int64_t’ was here
typedef __int64_t int64_t;

Fixes: 54d50897d544 ("linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>")
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Song Liu <songliubraving@fb.com>


# dd165a65 20-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 48

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation inc 53 temple place ste 330 boston ma
02111 1307 usa either version 2 of the license or at your option any
later version incorporated herein by reference

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 13 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170858.645641371@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# be85f93a 12-Nov-2018 Daniel Verkamp <dverkamp@chromium.org>

lib/raid6: add option to skip algo benchmarking

This is helpful for systems where fast startup time is important.
It is especially nice to avoid benchmarking RAID functions that are
never used (for example, BTRFS selects RAID6_PQ even if the parity RAID
mode is not in use).

This saves 250+ milliseconds of boot time on modern x86 and ARM systems
with a dozen or more available implementations.

The new option is defaulted to 'y' to match the previous behavior of
always benchmarking on init.

Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Shaohua Li <shli@fb.com>


# 58af3110 12-Nov-2018 Daniel Verkamp <dverkamp@chromium.org>

lib/raid6: avoid __attribute_const__ redefinition

This is defined in glibc's sys/cdefs.h on my system with the same
definition as the raid6test fallback definition. Add a #ifndef check to
avoid a compiler warning about redefining it.

Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Shaohua Li <shli@fb.com>


# e731f3e2 12-Nov-2018 Daniel Verkamp <dverkamp@chromium.org>

lib/raid6: add missing include for raid6test

Add #include <sys/time.h> for gettimeofday() to fix the compiler warning
about an implicitly defined functions.

Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Shaohua Li <shli@fb.com>


# 889ce12b 09-Mar-2018 Arnd Bergmann <arnd@arndb.de>

raid: remove tile specific raid6 implementation

The Tile architecture is getting removed, so we no longer need this either.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 751ba79c 03-Aug-2017 Matt Brown <matthew.brown.dev@gmail.com>

lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

This patch uses the vpermxor instruction to optimise the raid6 Q
syndrome. This instruction was made available with POWER8, ISA version
2.07. It allows for both vperm and vxor instructions to be done in a
single instruction. This has been tested for correctness on a ppc64le
vm with a basic RAID6 setup containing 5 drives.

The performance benchmarks are from the raid6test in the
/lib/raid6/test directory. These results are from an IBM Firestone
machine with ppc64le architecture. The benchmark results show a 35%
speed increase over the best existing algorithm for powerpc (altivec).
The raid6test has also been run on a big-endian ppc64 vm to ensure it
also works for big-endian architectures.

Performance benchmarks:
raid6: altivecx4 gen() 18773 MB/s
raid6: altivecx8 gen() 19438 MB/s

raid6: vpermxor4 gen() 25112 MB/s
raid6: vpermxor8 gen() 26279 MB/s

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Add VPERMXOR macro so we can build with old binutils]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6ec4e251 13-Jul-2017 Ard Biesheuvel <ardb@kernel.org>

md/raid6: implement recovery using ARM NEON intrinsics

Provide a NEON accelerated implementation of the recovery algorithm,
which supersedes the default byte-by-byte one.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


# b5dceda1 14-May-2017 Anup Patel <anup.patel@broadcom.com>

lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel <anup.patel@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Acked-by: Shaohua Li <shli@fb.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>


# 13c520b2 12-Aug-2016 Gayatri Kammela <gayatri.kammela@intel.com>

lib/raid6: Add AVX512 optimized recovery functions

Optimize RAID6 recovery functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized recovery functions, which is simply based
on recov_avx2.c written by Jim Kukunas

This patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>


# e0a491c1 12-Aug-2016 Gayatri Kammela <gayatri.kammela@intel.com>

lib/raid6: Add AVX512 optimized gen_syndrome functions

Optimize RAID6 gen_syndrom functions to take advantage of
the 512-bit ZMM integer instructions introduced in AVX512.

AVX512 optimized gen_syndrom functions, which is simply based
on avx2.c written by Yuanhan Liu and sse2.c written by hpa.

The patch was tested and benchmarked before submission on
a hardware that has AVX512 flags to support such instructions

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>


# f5b55fa1 31-Aug-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

RAID/s390: provide raid6 recovery optimization

The XC instruction can be used to improve the speed of the raid6
recovery. The loops now operate on blocks of 256 bytes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 474fd6e8 23-Aug-2016 Martin Schwidefsky <schwidefsky@de.ibm.com>

RAID/s390: add SIMD implementation for raid6 gen/xor

Using vector registers is slightly faster:

raid6: vx128x8 gen() 19705 MB/s
raid6: vx128x8 xor() 11886 MB/s
raid6: using algorithm vx128x8 gen() 19705 MB/s
raid6: .... xor() 11886 MB/s, rmw enabled

vs the software algorithms:

raid6: int64x1 gen() 3018 MB/s
raid6: int64x1 xor() 1429 MB/s
raid6: int64x2 gen() 4661 MB/s
raid6: int64x2 xor() 3143 MB/s
raid6: int64x4 gen() 5392 MB/s
raid6: int64x4 xor() 3509 MB/s
raid6: int64x8 gen() 4441 MB/s
raid6: int64x8 xor() 3207 MB/s
raid6: using algorithm int64x4 gen() 5392 MB/s
raid6: .... xor() 3509 MB/s, rmw enabled

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 6a84f572 21-Jan-2016 Gayatri Kammela <gayatri.kammela@intel.com>

raid6/algos.c : bug fix : Add the missing definitions to the pq.h file

Adding these pr_info and pr_err definitions so as to allow code to be
compiled successfully for testing in userspace, since the printk has
been replaced by pr_info and pr_err in algos.c

Absence of these definitions result in the compilation errors
such as ' undefined reference to `pr_info' ' ' undefined reference to
`pr_err' '

Cc: NeilBrown <neilb@suse.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>


# fe5cbc6e 14-Dec-2014 Markus Stockhausen <stockhausen@collogia.de>

md/raid6 algorithms: delta syndrome functions

v3: s-o-b comment, explanation of performance and descision for
the start/stop implementation

Implementing rmw functionality for RAID6 requires optimized syndrome
calculation. Up to now we can only generate a complete syndrome. The
target P/Q pages are always overwritten. With this patch we provide
a framework for inplace P/Q modification. In the first place simply
fill those functions with NULL values.

xor_syndrome() has two additional parameters: start & stop. These
will indicate the first and last page that are changing during a
rmw run. That makes it possible to avoid several unneccessary loops
and speed up calculation. The caller needs to implement the following
logic to make the functions work.

1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
blocks inside P/Q between (and including) start and end.

2) modify any block with start <= block <= stop

3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
source blocks into P/Q between (and including) start and end.

Pages between start and stop that won't be changed should be filled
with a pointer to the kernel zero page. The reasons for not taking NULL
pages are:

1) Algorithms cross the whole source data line by line. Thus avoid
additional branches.

2) Having a NULL page avoids calculating the XOR P parity but still
need calulation steps for the Q parity. Depending on the algorithm
unrolling that might be only a difference of 2 instructions per loop.

The benchmark numbers of the gen_syndrome() functions are displayed in
the kernel log. Do the same for the xor_syndrome() functions. This
will help to analyze performance problems and give an rough estimate
how well the algorithm works. The choice of the fastest algorithm will
still depend on the gen_syndrome() performance.

With the start/stop page implementation the speed can vary a lot in real
life. E.g. a change of page 0 & page 15 on a stripe will be harder to
compute than the case where page 0 & page 1 are XOR candidates. To be not
to enthusiatic about the expected speeds we will run a worse case test
that simulates a change on the upper half of the stripe. So we do:

1) calculation of P/Q for the upper pages

2) continuation of Q for the lower (empty) pages

Signed-off-by: Markus Stockhausen <stockhausen@collogia.de>
Signed-off-by: NeilBrown <neilb@suse.de>


# ae77cbc1 06-Aug-2013 Ken Steele <ken@tilera.com>

RAID: add tilegx SIMD implementation of raid6

This change adds TILE-Gx SIMD instructions to the software raid
(md), modeling the Altivec implementation. This is only for Syndrome
generation; there is more that could be done to improve recovery,
as in the recent Intel SSE3 recovery implementation.

The code unrolls 8 times; this turns out to be the best on tilegx
hardware among the set 1, 2, 4, 8 or 16. The code reads one
cache-line of data from each disk, stores P and Q then goes to the
next cache-line.

The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
read rate for syndrome generation using 18 disks (16 data and 2
parity). It was 1512 MB/s before this SIMD optimizations. This is
running on 1 core with all the data in cache.

This is based on the paper The Mathematics of RAID-6.
(http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf).

Signed-off-by: Ken Steele <ken@tilera.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NeilBrown <neilb@suse.de>


# 7d11965d 16-May-2013 Ard Biesheuvel <ardb@kernel.org>

lib/raid6: add ARM-NEON accelerated syndrome calculation

Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: hpa@linux.intel.com


# 2c935842 30-Nov-2012 Yuanhan Liu <yuanhan.liu@linux.intel.com>

lib/raid6: Add AVX2 optimized gen_syndrome functions

Add AVX2 optimized gen_syndrom functions, which is simply based on
sse2.c written by hpa.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>


# 7056741f 08-Nov-2012 Jim Kukunas <james.t.kukunas@linux.intel.com>

lib/raid6: Add AVX2 optimized recovery functions

Optimize RAID6 recovery functions to take advantage of
the 256-bit YMM integer instructions introduced in AVX2.

The patch was tested and benchmarked before submission.
However hardware is not yet released so benchmark numbers
cannot be reported.

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>


# 048a8b8c 21-May-2012 Jim Kukunas <james.t.kukunas@linux.intel.com>

lib/raid6: Add SSSE3 optimized recovery functions

Add SSSE3 optimized recovery functions, as well as a system
for selecting the most appropriate recovery functions to use.

Originally-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>


# 38059ec2 22-Dec-2011 Steven Rostedt <rostedt@goodmis.org>

md: Fix userspace free_pages() macro

While using etags to find free_pages(), I stumbled across this debug
definition of free_pages() that is to be used while debugging some raid
code in userspace. The __get_free_pages() allocates the correct size,
but the free_pages() does not match. free_pages(), like
__get_free_pages(), takes an order and not a size.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NeilBrown <neilb@suse.de>


# d5302fe4 11-Aug-2010 NeilBrown <neilb@suse.de>

Make lib/raid6/test build correctly.

Some bit-rot needs to be cleaned out.

Signed-off-by: NeilBrown <neilb@suse.de>


# 7820f9e1 13-Dec-2009 NeilBrown <neilb@suse.de>

md: remove sparse warning:symbol XXX was not declared.

Signed-off-by: NeilBrown <neilb@suse.de>


# f701d589 30-Mar-2009 Dan Williams <dan.j.williams@intel.com>

md/raid6: move raid6 data processing to raid6_pq.ko

Move the raid6 data processing routines into a standalone module
(raid6_pq) to prepare them to be called from async_tx wrappers and other
non-md drivers/modules. This precludes a circular dependency of raid456
needing the async modules for data processing while those modules in
turn depend on raid456 for the base level synchronous raid6 routines.

To support this move:
1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
2/ The raid6_call, recovery calls, and table symbols are exported
3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
compile

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>