Cross Reference: /linux-master/crypto/aegis128-neon-inner.c

History log of /linux-master/crypto/aegis128-neon-inner.c
Revision	Date	Author	Comments
# 4e3901fa	16-May-2023	Arnd Bergmann <arnd@arndb.de>	crypto: aegis128-neon - add header for internal prototypes gcc warns if prototypes are only visible to the caller but not the callee: crypto/aegis128-neon-inner.c:134:6: warning: no previous prototype for 'crypto_aegis128_init_neon' [-Wmissing-prototypes] crypto/aegis128-neon-inner.c:164:6: warning: no previous prototype for 'crypto_aegis128_update_neon' [-Wmissing-prototypes] crypto/aegis128-neon-inner.c:221:6: warning: no previous prototype for 'crypto_aegis128_encrypt_chunk_neon' [-Wmissing-prototypes] crypto/aegis128-neon-inner.c:270:6: warning: no previous prototype for 'crypto_aegis128_decrypt_chunk_neon' [-Wmissing-prototypes] crypto/aegis128-neon-inner.c:316:5: warning: no previous prototype for 'crypto_aegis128_final_neon' [-Wmissing-prototypes] The prototypes cannot be in the regular aegis.h, as the inner neon code cannot include normal kernel headers. Instead add a new header just for the functions provided by this file. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 97b70180	17-Nov-2020	Ard Biesheuvel <ardb@kernel.org>	crypto: aegis128/neon - move final tag check to SIMD domain Instead of calculating the tag and returning it to the caller on decryption, use a SIMD compare and min across vector to perform the comparison. This is slightly more efficient, and removes the need on the caller's part to wipe the tag from memory if the decryption failed. While at it, switch to unsigned int when passing cryptlen and assoclen - we don't support input sizes where it matters anyway. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# ad00d41b	17-Nov-2020	Ard Biesheuvel <ardb@kernel.org>	crypto: aegis128/neon - optimize tail block handling Avoid copying the tail block via a stack buffer if the total size exceeds a single AEGIS block. In this case, we can use overlapping loads and stores and NEON permutation instructions instead, which leads to a modest performance improvement on some cores (< 5%), and is slightly cleaner. Note that we still need to use a stack buffer if the entire input is smaller than 16 bytes, given that we cannot use 16 byte NEON loads and stores safely in this case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 52828263	14-Oct-2019	Ard Biesheuvel <ardb@kernel.org>	crypto: aegis128 - duplicate init() and final() hooks in SIMD code In order to speed up aegis128 processing even more, duplicate the init() and final() routines as SIMD versions in their entirety. This results in a 2x speedup on ARM Cortex-A57 for ~1500 byte packets (using AES instructions). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 389139b3	19-Aug-2019	Ard Biesheuvel <ardb@kernel.org>	crypto: arm64/aegis128 - use explicit vector load for permute vectors When building the new aegis128 NEON code in big endian mode, Clang complains about the const uint8x16_t permute vectors in the following way: crypto/aegis128-neon-inner.c:58:40: warning: vector initializers are not compatible with NEON intrinsics in big endian mode [-Wnonportable-vector-initialization] static const uint8x16_t shift_rows = { ^ crypto/aegis128-neon-inner.c:58:40: note: consider using vld1q_u8() to initialize a vector from memory, or vcombine_u8(vcreate_u8(), vcreate_u8()) to initialize from integer constants Since the same issue applies to the uint8x16x4_t loads of the AES Sbox, update those references as well. However, since GCC does not implement the vld1q_u8_x4() intrinsic, switch from IS_ENABLED() to a preprocessor conditional to conditionally include this code. Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# 19842963	11-Aug-2019	Ard Biesheuvel <ardb@kernel.org>	crypto: arm64/aegis128 - implement plain NEON version Provide a version of the core AES transform to the aegis128 SIMD code that does not rely on the special AES instructions, but uses plain NEON instructions instead. This allows the SIMD version of the aegis128 driver to be used on arm64 systems that do not implement those instructions (which are not mandatory in the architecture), such as the Raspberry Pi 3. Since GCC makes a mess of this when using the tbl/tbx intrinsics to perform the sbox substitution, preload the Sbox into v16..v31 in this case and use inline asm to emit the tbl/tbx instructions. Clang does not support this approach, nor does it require it, since it does a much better job at code generation, so there we use the intrinsics as usual. Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# a4397635	11-Aug-2019	Ard Biesheuvel <ardb@kernel.org>	crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Provide an accelerated implementation of aegis128 by wiring up the SIMD hooks in the generic driver to an implementation based on NEON intrinsics, which can be compiled to both ARM and arm64 code. This results in a performance of 2.2 cycles per byte on Cortex-A53, which is a performance increase of ~11x compared to the generic code. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# c9f1fd4f	01-Aug-2019	Herbert Xu <herbert@gondor.apana.org.au>	Revert "crypto: aegis128 - add support for SIMD acceleration" This reverts commit ecc8bc81f2fb3976737ef312f824ba6053aa3590 ("crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics") and commit 7cdc0ddbf74a19cecb2f0e9efa2cae9d3c665189 ("crypto: aegis128 - add support for SIMD acceleration"). They cause compile errors on platforms other than ARM because the mechanism to selectively compile the SIMD code is broken. Repoted-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
# ecc8bc81	03-Jul-2019	Ard Biesheuvel <ardb@kernel.org>	crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics Provide an accelerated implementation of aegis128 by wiring up the SIMD hooks in the generic driver to an implementation based on NEON intrinsics, which can be compiled to both ARM and arm64 code. This results in a performance of 2.2 cycles per byte on Cortex-A53, which is a performance increase of ~11x compared to the generic code. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>