History log of /linux-master/arch/arm64/crypto/crct10dif-ce-glue.c
Revision Date Author Comments
# fc754c02 02-Feb-2021 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crc-t10dif - move NEON yield to C code

Instead of yielding from the bowels of the asm routine if a reschedule
is needed, divide up the input into 4 KB chunks in the C glue. This
simplifies the code substantially, and avoids scheduling out the task
with the asm routine on the call stack, which is undesirable from a
CFI/instrumentation point of view.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# aaba098f 09-Apr-2019 Andrew Murray <amurray@thegoodpenguin.co.uk>

arm64: HWCAP: add support for AT_HWCAP2

As we will exhaust the first 32 bits of AT_HWCAP let's start
exposing AT_HWCAP2 to userspace to give us up to 64 caps.

Whilst it's possible to use the remaining 32 bits of AT_HWCAP, we
prefer to expand into AT_HWCAP2 in order to provide a consistent
view to userspace between ILP32 and LP64. However internal to the
kernel we prefer to continue to use the full space of elf_hwcap.

To reduce complexity and allow for future expansion, we now
represent hwcaps in the kernel as ordinals and use a
KERNEL_HWCAP_ prefix. This allows us to support automatic feature
based module loading for all our hwcaps.

We introduce cpu_set_feature to set hwcaps which complements the
existing cpu_have_feature helper. These helpers allow us to clean
up existing direct uses of elf_hwcap and reduce any future effort
required to move beyond 64 caps.

For convenience we also introduce cpu_{have,set}_named_feature which
makes use of the cpu_feature macro to allow providing a hwcap name
without a {KERNEL_}HWCAP_ prefix.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
[will: use const_ilog2() and tweak documentation]
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e52b7023 12-Mar-2019 Eric Biggers <ebiggers@google.com>

crypto: arm64 - convert to use crypto_simd_usable()

Replace all calls to may_use_simd() in the arm64 crypto code with
crypto_simd_usable(), in order to allow testing the no-SIMD code paths.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 6227cd12 30-Jan-2019 Eric Biggers <ebiggers@google.com>

crypto: arm64/crct10dif-ce - cleanup and optimizations

The x86, arm, and arm64 asm implementations of crct10dif are very
difficult to understand partly because many of the comments, labels, and
macros are named incorrectly: the lengths mentioned are usually off by a
factor of two from the actual code. Many other things are unnecessarily
convoluted as well, e.g. there are many more fold constants than
actually needed and some aren't fully reduced.

This series therefore cleans up all these implementations to be much
more maintainable. I also made some small optimizations where I saw
opportunities, resulting in slightly better performance.

This patch cleans up the arm64 version.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 8336bdf1 25-Jan-2019 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - register PMULL variants as separate algos

The arm64 CRC-T10DIF implementation either uses 8-bit or 64-bit
polynomial multiplication instructions, since the latter are
faster but not mandatory in the architecture.

Since that prevents us from testing both implementations on the
same system, let's expose both implementations to the crypto API,
with the priorities reflecting that the P64 version is the
preferred one if available.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# d72b9d4a 27-Jan-2019 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - revert to C code for short inputs

The SIMD routine ported from x86 used to have a special code path
for inputs < 16 bytes, which got lost somewhere along the way.
Instead, the current glue code aligns the input pointer to 16 bytes,
which is not really necessary on this architecture (although it
could be beneficial to performance to expose aligned data to the
the NEON routine), but this could result in inputs of less than
16 bytes to be passed in. This not only fails the new extended
tests that Eric has implemented, it also results in the code
reading past the end of the input, which could potentially result
in crashes when dealing with less than 16 bytes of input at the
end of a page which is followed by an unmapped page.

So update the glue code to only invoke the NEON routine if the
input is at least 16 bytes.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 6ef5737f3931 ("crypto: arm64/crct10dif - port x86 SSE implementation to arm64")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 2fffee53 27-Aug-2018 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - implement non-Crypto Extensions alternative

The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit
polynomial multiplication instructions, which are optional in the
architecture, and if these instructions are not available, we fall back
to the C routine which is slow and inefficient.

So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that
uses a sequence of ~40 instructions involving 8x8 bit PMULL and some
shifting and masking. This is a lot slower than the original, but it is
still twice as fast as the current [unoptimized] C code on Cortex-A53,
and it is time invariant and much easier on the D-cache.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 6c1b0da1 27-Aug-2018 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version

Reorganize the CRC-T10DIF asm routine so we can easily instantiate an
alternative version based on 8x8 polynomial multiplication in a
subsequent patch.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 2dde374e 24-Jul-2017 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - add non-SIMD generic fallback

The arm64 kernel will shortly disallow nested kernel mode NEON, so
add a fallback to scalar C code that can be invoked in that case.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 6ef5737f 05-Dec-2016 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64/crct10dif - port x86 SSE implementation to arm64

This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>