History log of /linux-master/arch/arm64/crypto/sm4-ce-core.S
Revision Date Author Comments
# 29fa12e9 16-Sep-2023 Herbert Xu <herbert@gondor.apana.org.au>

crypto: arm64/sm4 - Remove cfb(sm4)

Remove the unused CFB implementation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 6b5360a5 27-Oct-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac

This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

update-size | 16 64 256 1024 2048 4096 8192
---------------+--------------------------------------------------------
cmac(sm4-ce) | 293.33 403.69 503.76 527.78 531.10 535.46 535.81
xcbc(sm4-ce) | 292.83 402.50 504.02 529.08 529.87 536.55 538.24
cbcmac(sm4-ce) | 318.42 415.79 497.12 515.05 523.15 521.19 523.01

After:

update-size | 16 64 256 1024 2048 4096 8192
---------------+--------------------------------------------------------
cmac-sm4-ce | 371.99 675.28 903.56 971.65 980.57 990.40 991.04
xcbc-sm4-ce | 372.11 674.55 903.47 971.61 980.96 990.42 991.10
cbcmac-sm4-ce | 371.63 675.33 903.23 972.07 981.42 990.93 991.45

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 01f63311 27-Oct-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - add CE implementation for XTS mode

This patch is a CE-optimized assembly implementation for XTS mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

xts(ecb-sm4-ce) | 16 64 128 256 1024 1420 4096
----------------+--------------------------------------------------------------
XTS enc | 117.17 430.56 732.92 1134.98 2007.03 2136.23 2347.20
XTS dec | 116.89 429.02 733.40 1132.96 2006.13 2130.50 2347.92

After:

xts-sm4-ce | 16 64 128 256 1024 1420 4096
----------------+--------------------------------------------------------------
XTS enc | 224.68 798.91 1248.08 1714.60 2413.73 2467.84 2612.62
XTS dec | 229.85 791.34 1237.79 1720.00 2413.30 2473.84 2611.95

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# b1863fd0 27-Oct-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - add CE implementation for CTS-CBC mode

This patch is a CE-optimized assembly implementation for CTS-CBC mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

cts(cbc-sm4-ce) | 16 64 128 256 1024 1420 4096
----------------+--------------------------------------------------------------
CTS-CBC enc | 286.09 297.17 457.97 627.75 868.58 900.80 957.69
CTS-CBC dec | 286.67 285.63 538.35 947.08 2241.03 2577.32 3391.14

After:

cts-cbc-sm4-ce | 16 64 128 256 1024 1420 4096
----------------+--------------------------------------------------------------
CTS-CBC enc | 288.19 428.80 593.57 741.04 911.73 931.80 950.00
CTS-CBC dec | 292.22 468.99 838.23 1380.76 2741.17 3036.42 3409.62

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# cb9ba02b 27-Oct-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - simplify sm4_ce_expand_key() of CE implementation

Use a 128-bit swap mask and tbl instruction to simplify the implementation
for generating SM4 rkey_dec.

Also fixed the issue of not being wrapped by kernel_neon_begin/end() when
using the sm4_ce_expand_key() function.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# ce41fefd 27-Oct-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - refactor and simplify CE implementation

This patch does not add new features, but only refactors and simplifies the
implementation of the Crypto Extension acceleration of the SM4 algorithm:

Extract the macro optimized by SM4 Crypto Extension for reuse in the
subsequent optimization of CCM/GCM modes.

Encryption in CBC and CFB modes processes four blocks at a time instead of
one, allowing the ld1 instruction to load 64 bytes of data at a time, which
will reduces unnecessary memory accesses.

CBC/CFB/CTR makes full use of free registers to reduce redundant memory
accesses, and rearranges some instructions to improve out-of-order execution
capabilities.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 5b33e0ec 15-Mar-2022 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

crypto: arm64/sm4 - add ARMv8 Crypto Extensions implementation

This adds ARMv8 implementations of SM4 in ECB, CBC, CFB and CTR
modes using Crypto Extensions, also includes key expansion operations
because the Crypto Extensions instruction is much faster than software
implementations.

The Crypto Extensions for SM4 can only run on ARMv8 implementations
that have support for these optional extensions.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218
mode of tcrypt. The abscissas are blocks of different lengths. The
data is tabulated and the unit is Mb/s:

sm4-generic | 16 64 128 256 1024 1420 4096
ECB enc | 80.05 91.42 93.66 94.77 95.69 95.77 95.86
ECB dec | 79.98 91.41 93.64 94.76 95.66 95.77 95.85
CBC enc | 78.55 86.50 88.02 88.77 89.36 89.42 89.48
CBC dec | 76.82 89.06 91.52 92.77 93.75 93.83 93.96
CFB enc | 77.64 86.13 87.62 88.42 89.08 88.83 89.18
CFB dec | 77.57 88.34 90.36 91.45 92.34 92.00 92.44
CTR enc | 77.80 88.28 90.23 91.22 92.11 91.81 92.25
CTR dec | 77.83 88.22 90.22 91.22 92.04 91.82 92.28
sm4-neon
ECB enc | 28.31 112.77 203.03 209.89 215.49 202.11 210.59
ECB dec | 28.36 113.45 203.23 210.00 215.52 202.13 210.65
CBC enc | 79.32 87.02 88.51 89.28 89.85 89.89 89.97
CBC dec | 28.29 112.20 203.30 209.82 214.99 201.51 209.95
CFB enc | 79.59 87.16 88.54 89.30 89.83 89.62 89.92
CFB dec | 28.12 111.05 202.47 209.02 214.21 210.90 209.12
CTR enc | 28.04 108.81 200.62 206.65 211.78 208.78 206.74
CTR dec | 28.02 108.82 200.45 206.62 211.78 208.74 206.70
sm4-ce-cipher
ECB enc | 336.79 587.13 682.70 747.37 803.75 811.52 818.06
ECB dec | 339.18 584.52 679.72 743.68 798.82 803.83 811.54
CBC enc | 316.63 521.47 597.00 647.14 690.82 695.21 700.55
CBC dec | 291.80 503.79 585.66 640.82 689.86 695.16 701.72
CFB enc | 294.79 482.31 552.13 594.71 631.60 628.91 638.92
CFB dec | 293.09 466.44 526.56 563.17 594.41 592.26 601.97
CTR enc | 309.61 506.13 576.86 620.47 656.38 654.51 665.10
CTR dec | 306.69 505.57 576.84 620.18 657.09 654.52 665.32
sm4-ce
ECB enc | 366.96 1329.81 2024.29 2755.50 3790.07 3861.91 4051.40
ECB dec | 367.30 1323.93 2018.72 2747.43 3787.39 3862.55 4052.62
CBC enc | 358.09 682.68 807.24 885.35 958.29 963.60 973.73
CBC dec | 366.51 1303.63 1978.64 2667.93 3624.53 3683.41 3856.08
CFB enc | 351.51 681.26 807.81 893.10 968.54 969.17 985.83
CFB dec | 354.98 1266.61 1929.63 2634.81 3614.23 3611.59 3841.68
CTR enc | 324.23 1121.25 1689.44 2256.70 2981.90 3007.79 3060.74
CTR dec | 324.18 1120.44 1694.31 2258.32 2982.01 3010.09 3060.99

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# 0e89640b 13-Dec-2019 Mark Brown <broonie@kernel.org>

crypto: arm64 - Use modern annotations for assembly functions

In an effort to clarify and simplify the annotation of assembly functions
in the kernel new macros have been introduced. These replace ENTRY and
ENDPROC and also add a new annotation for static functions which previously
had no ENTRY equivalent. Update the annotations in the crypto code to the
new macros.

There are a small number of files imported from OpenSSL where the assembly
is generated using perl programs, these are not currently annotated at all
and have not been modified.

Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# e99ce921 25-Apr-2018 Ard Biesheuvel <ardb@kernel.org>

crypto: arm64 - add support for SM4 encryption using special instructions

Add support for the SM4 symmetric cipher implemented using the special
SM4 instructions introduced in ARM architecture revision 8.2.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>