History log of /linux-master/arch/loongarch/include/asm/string.h
Revision Date Author Comments
# 5aa4ac64 06-Sep-2023 Qing Zhang <zhangqing@loongson.cn>

LoongArch: Add KASAN (Kernel Address Sanitizer) support

1/8 of kernel addresses reserved for shadow memory. But for LoongArch,
There are a lot of holes between different segments and valid address
space (256T available) is insufficient to map all these segments to kasan
shadow memory with the common formula provided by kasan core, saying
(addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET

So LoongArch has a arch-specific mapping formula, different segments are
mapped individually, and only limited space lengths of these specific
segments are mapped to shadow.

At early boot stage the whole shadow region populated with just one
physical page (kasan_early_shadow_page). Later, this page is reused as
readonly zero shadow for some memory that kasan currently don't track.
After mapping the physical memory, pages for shadow memory are allocated
and mapped.

Functions like memset()/memcpy()/memmove() do a lot of memory accesses.
If bad pointer passed to one of these function it is important to be
caught. Compiler's instrumentation cannot do this since these functions
are written in assembly.

KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions in
mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in names, so we could call non-instrumented variant
if needed.

Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>


# a275a82d 10-Dec-2022 Huacai Chen <chenhuacai@kernel.org>

LoongArch: Use alternative to optimize libraries

Use the alternative to optimize common libraries according whether CPU
has UAL (hardware unaligned access support) feature, including memset(),
memcopy(), memmove(), copy_user() and clear_user().

We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):

1, One copy, before patch:

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9566582.0 819.8
Double-Precision Whetstone 55.0 2805.3 510.1
Execl Throughput 43.0 2120.0 493.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 209833.0 529.9
File Copy 256 bufsize 500 maxblocks 1655.0 89400.0 540.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 320036.0 551.8
Pipe Throughput 12440.0 340624.0 273.8
Pipe-based Context Switching 4000.0 109939.1 274.8
Process Creation 126.0 4728.7 375.3
Shell Scripts (1 concurrent) 42.4 2223.1 524.3
Shell Scripts (8 concurrent) 6.0 883.1 1471.9
System Call Overhead 15000.0 518639.1 345.8
========
System Benchmarks Index Score 500.2

2, One copy, after patch:

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9567674.7 819.9
Double-Precision Whetstone 55.0 2805.5 510.1
Execl Throughput 43.0 2392.7 556.4
File Copy 1024 bufsize 2000 maxblocks 3960.0 417804.0 1055.1
File Copy 256 bufsize 500 maxblocks 1655.0 112909.5 682.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 1255207.4 2164.2
Pipe Throughput 12440.0 555712.0 446.7
Pipe-based Context Switching 4000.0 99964.5 249.9
Process Creation 126.0 5192.5 412.1
Shell Scripts (1 concurrent) 42.4 2302.4 543.0
Shell Scripts (8 concurrent) 6.0 919.6 1532.6
System Call Overhead 15000.0 511159.3 340.8
========
System Benchmarks Index Score 640.1

3, Four copies, before patch:

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 38268610.5 3279.2
Double-Precision Whetstone 55.0 11222.2 2040.4
Execl Throughput 43.0 7892.0 1835.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 235149.6 593.8
File Copy 256 bufsize 500 maxblocks 1655.0 74959.6 452.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 545048.5 939.7
Pipe Throughput 12440.0 1337359.0 1075.0
Pipe-based Context Switching 4000.0 473663.9 1184.2
Process Creation 126.0 17491.2 1388.2
Shell Scripts (1 concurrent) 42.4 6865.7 1619.3
Shell Scripts (8 concurrent) 6.0 1015.9 1693.1
System Call Overhead 15000.0 1899535.2 1266.4
========
System Benchmarks Index Score 1278.3

4, Four copies, after patch:

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 38272815.5 3279.6
Double-Precision Whetstone 55.0 11222.8 2040.5
Execl Throughput 43.0 8839.2 2055.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 313912.9 792.7
File Copy 256 bufsize 500 maxblocks 1655.0 80976.1 489.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 1176594.3 2028.6
Pipe Throughput 12440.0 2100941.9 1688.9
Pipe-based Context Switching 4000.0 476696.4 1191.7
Process Creation 126.0 18394.7 1459.9
Shell Scripts (1 concurrent) 42.4 7172.2 1691.6
Shell Scripts (8 concurrent) 6.0 1058.3 1763.9
System Call Overhead 15000.0 1874714.7 1249.8
========
System Benchmarks Index Score 1488.8

Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>


# 559671e0 31-May-2022 Huacai Chen <chenhuacai@kernel.org>

LoongArch: Add some library functions

Add some library functions for LoongArch, including: delay, memset,
memcpy, memmove, copy_user, strncpy_user, strnlen_user and tlb dump
functions.

Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>