#
bc61614d |
|
02-Sep-2023 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function This function is only called by memcpy(), there is no real reason to have this wrapper. Delete this function and move the code to memcpy() directly. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
#
5dfc79b2 |
|
02-Sep-2023 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function This nolibc internal function is not used. Delete it. It was probably supposed to handle memmove(), but today the memmove() has its own implementation. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
#
12108aa8 |
|
02-Sep-2023 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc: x86-64: Use `rep stosb` for `memset()` Simplify memset() on the x86-64 arch. The x86-64 arch has a 'rep stosb' instruction, which can perform memset() using only a single instruction, given: %al = value (just like the second argument of memset()) %rdi = destination %rcx = length Before this patch: ``` 00000000000010c9 <memset>: 10c9: 48 89 f8 mov %rdi,%rax 10cc: 48 85 d2 test %rdx,%rdx 10cf: 74 0e je 10df <memset+0x16> 10d1: 31 c9 xor %ecx,%ecx 10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1) 10d7: 48 ff c1 inc %rcx 10da: 48 39 ca cmp %rcx,%rdx 10dd: 75 f4 jne 10d3 <memset+0xa> 10df: c3 ret ``` After this patch: ``` 0000000000001511 <memset>: 1511: 96 xchg %eax,%esi 1512: 48 89 d1 mov %rdx,%rcx 1515: 57 push %rdi 1516: f3 aa rep stos %al,%es:(%rdi) 1518: 58 pop %rax 1519: c3 ret ``` v2: - Use pushq %rdi / popq %rax (Alviro). - Use xchg %eax, %esi (Willy). Link: https://lore.kernel.org/lkml/ZO9e6h2jjVIMpBJP@1wt.eu Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Suggested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
#
553845ee |
|
02-Sep-2023 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Simplify memcpy() and memmove() on the x86-64 arch. The x86-64 arch has a 'rep movsb' instruction, which can perform memcpy() using only a single instruction, given: %rdi = destination %rsi = source %rcx = length Additionally, it can also handle the overlapping case by setting DF=1 (backward copy), which can be used as the memmove() implementation. Before this patch: ``` 00000000000010ab <memmove>: 10ab: 48 89 f8 mov %rdi,%rax 10ae: 31 c9 xor %ecx,%ecx 10b0: 48 39 f7 cmp %rsi,%rdi 10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx 10b7: 48 85 d2 test %rdx,%rdx 10ba: 74 25 je 10e1 <memmove+0x36> 10bc: 48 83 c9 01 or $0x1,%rcx 10c0: 48 39 f0 cmp %rsi,%rax 10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 10ca: 48 0f 43 fa cmovae %rdx,%rdi 10ce: 48 01 cf add %rcx,%rdi 10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b 10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1) 10d9: 48 01 cf add %rcx,%rdi 10dc: 48 ff ca dec %rdx 10df: 75 f0 jne 10d1 <memmove+0x26> 10e1: c3 ret 00000000000010e2 <memcpy>: 10e2: 48 89 f8 mov %rdi,%rax 10e5: 48 85 d2 test %rdx,%rdx 10e8: 74 12 je 10fc <memcpy+0x1a> 10ea: 31 c9 xor %ecx,%ecx 10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil 10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1) 10f4: 48 ff c1 inc %rcx 10f7: 48 39 ca cmp %rcx,%rdx 10fa: 75 f0 jne 10ec <memcpy+0xa> 10fc: c3 ret ``` After this patch: ``` // memmove is an alias for memcpy 000000000040133b <memcpy>: 40133b: 48 89 d1 mov %rdx,%rcx 40133e: 48 89 f8 mov %rdi,%rax 401341: 48 89 fa mov %rdi,%rdx 401344: 48 29 f2 sub %rsi,%rdx 401347: 48 39 ca cmp %rcx,%rdx 40134a: 72 03 jb 40134f <memcpy+0x14> 40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40134e: c3 ret 40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi 401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi 401359: fd std 40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40135c: fc cld 40135d: c3 ret ``` v3: - Make memmove as an alias for memcpy (Willy). - Make the forward copy the likely case (Alviro). v2: - Fix the broken memmove implementation (David). Link: https://lore.kernel.org/lkml/20230902062237.GA23141@1wt.eu Link: https://lore.kernel.org/lkml/5a821292d96a4dbc84c96ccdc6b5b666@AcuMS.aculab.com Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
#
7f291cfa |
|
06-Apr-2023 |
Thomas Weißschuh <linux@weissschuh.net> |
tools/nolibc: use standard __asm__ statements Most of the code was migrated to C99-conformant __asm__ statements before. It seems string.h was missed. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
1bfbe1f3 |
|
09-Jan-2023 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc: prevent gcc from making memset() loop over itself When building on ARM in thumb mode with gcc-11.3 at -O2 or -O3, nolibc-test segfaults during the select() tests. It turns out that at this level, gcc recognizes an opportunity for using memset() to zero the fd_set, but it miscompiles it because it also recognizes a memset pattern as well, and decides to call memset() from the memset() code: 000122bc <memset>: 122bc: b510 push {r4, lr} 122be: 0004 movs r4, r0 122c0: 2a00 cmp r2, #0 122c2: d003 beq.n 122cc <memset+0x10> 122c4: 23ff movs r3, #255 ; 0xff 122c6: 4019 ands r1, r3 122c8: f7ff fff8 bl 122bc <memset> 122cc: 0020 movs r0, r4 122ce: bd10 pop {r4, pc} Simply placing an empty asm() statement inside the loop suffices to avoid this. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
55abdd1f |
|
09-Jan-2023 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc: fix missing includes causing build issues at -O0 After the nolibc includes were split to facilitate portability from standard libcs, programs that include only what they need may miss some symbols which are needed by libgcc. This is the case for raise() which is needed by the divide by zero code in some architectures for example. Regardless, being able to include only the apparently needed files is convenient. Instead of trying to move all exported definitions to a single file, since this can change over time, this patch takes another approach consisting in including the nolibc header at the end of all standard include files. This way their types and functions are already known at the moment of inclusion, and including any single one of them is sufficient to bring all the required ones. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b3f4f51e |
|
21-Oct-2022 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
tools/nolibc/string: Fix memcmp() implementation The C standard says that memcmp() must treat the buffers as consisting of "unsigned chars". If char happens to be unsigned, the casts are ok, but then obviously the c1 variable can never contain a negative value. And when char is signed, the casts are wrong, and there's still a problem with using an 8-bit quantity to hold the difference, because that can range from -255 to +255. For example, assuming char is signed, comparing two 1-byte buffers, one containing 0x00 and another 0x80, the current implementation would return -128 for both memcmp(a, b, 1) and memcmp(b, a, 1), whereas one of those should of course return something positive. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
bfc3b0f0 |
|
09-Oct-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc: Fix missing strlen() definition and infinite loop with gcc-12 When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen() and replaces it with a jump to strlen(), which is not defined as a symbol and breaks compilation. Worse, when the function is called strlen(), the function is simply replaced with a jump to itself, hence becomes an infinite loop. One way to avoid this is to always set -ffreestanding, but the calling code doesn't know this and there's no way (either via attributes or pragmas) to globally enable it from include files, effectively leaving a painful situation for the caller. Alexey suggested to place an empty asm() statement inside the loop to stop gcc from recognizing a well-known pattern, which happens to work pretty fine. At least it allows us to make sure our local definition is not replaced with a self jump. The function only needs to be renamed back to strlen() so that the symbol exists, which implies that nolibc_strlen() which is used on variable strings has to be declared as a macro that points back to it before the strlen() macro is redifined. It was verified to produce valid code with gcc 3.4 to 12.1 at different optimization levels, and both with constant and variable strings. In case this problem surfaces again in the future, an alternate approach consisting in adding an optimize("no-tree-loop-distribute-patterns") function attribute for gcc>=12 worked as well but is less pretty. Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0") Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
11dbdaef |
|
29-Mar-2022 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc/string: Implement `strdup()` and `strndup()` These functions are currently only available on architectures that have my_syscall6() macro implemented. Since these functions use malloc(), malloc() uses mmap(), mmap() depends on my_syscall6() macro. On architectures that don't support my_syscall6(), these function will always return NULL with errno set to ENOSYS. Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b26823c1 |
|
29-Mar-2022 |
Ammar Faizi <ammarfaizi2@gnuweeb.org> |
tools/nolibc/string: Implement `strnlen()` size_t strnlen(const char *str, size_t maxlen); The strnlen() function returns the number of bytes in the string pointed to by sstr, excluding the terminating null byte ('\0'), but at most maxlen. In doing this, strnlen() looks only at the first maxlen characters in the string pointed to by str and never beyond str[maxlen-1]. The first use case of this function is for determining the memory allocation size in the strndup() function. Link: https://lore.kernel.org/lkml/CAOG64qMpEMh+EkOfjNdAoueC+uQyT2Uv3689_sOr37-JxdJf4g@mail.gmail.com Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
96980b83 |
|
23-Mar-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: do not use __builtin_strlen() at -O0 clang wants to use strlen() for __builtin_strlen() at -O0. We don't really care about -O0 but it at least ought to build, so let's make sure we don't choke on this, by dropping the optimizationn for constant strings in this case. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
0e7b4929 |
|
21-Mar-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: add strcmp() and strncmp() We need these functions all the time, including when checking environment variables and parsing command-line arguments. These implementations were optimized to show optimal code size on a wide range of compilers (22 bytes return included for strcmp(), 33 for strncmp()). Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
8d304a37 |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: export memset() and memmove() "clang -Os" and "gcc -Ofast" without -ffreestanding may ignore memset() and memmove(), hoping to provide their builtin equivalents, and finally not find them. Thus we must export these functions for these rare cases. Note that as they're set in their own sections, they will be eliminated by the linker if not used. In addition, they do not prevent gcc from identifying them and replacing them with the shorter "rep movsb" or "rep stosb" when relevant. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
07f47ea0 |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc: move exported functions to their own section Some functions like raise() and memcpy() are permanently exported because they're needed by libgcc on certain platforms. However most of the time they are not needed and needlessly take space. Let's move them to their own sub-section, called .text.nolibc_<function>. This allows ld to get rid of them if unused when passed --gc-sections. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d9390de6 |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: add tiny versions of strncat() and strlcat() While these functions are often dangerous, forcing the user to work around their absence is often much worse. Let's provide small versions of each of them. The respective sizes in bytes on a few architectures are: strncat(): x86:0x33 mips:0x68 arm:0x3c strlcat(): x86:0x25 mips:0x4c arm:0x2c The two are quite different, and strncat() is even different from strncpy() in that it limits the amount of data it copies and will always terminate the output by one zero, while strlcat() will always limit the total output to the specified size and will put a zero if possible. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
b312eb0b |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: add strncpy() and strlcpy() These are minimal variants. strncpy() always fills the destination for <size> chars, while strlcpy() copies no more than <size> including the zero and returns the source's length. The respective sizes on various archs are: strncpy(): x86:0x1f mips:0x30 arm:0x20 strlcpy(): x86:0x17 mips:0x34 arm:0x1a Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d76232ff |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: slightly simplify memmove() The direction test inside the loop was not always completely optimized, resulting in a larger than necessary function. This change adds a direction variable that is set out of the loop. Now the function is down to 48 bytes on x86, 32 on ARM and 68 on mips. It's worth noting that other approaches were attempted (including relying on the up and down functions) but they were only slightly beneficial on x86 and cost more on others. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
d8dcc2d8 |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: use unidirectional variants for memcpy() Till now memcpy() relies on memmove(), but it's always included for libgcc, so we have a larger than needed function. Let's implement two unidirectional variants to copy from bottom to top and from top to bottom, and use the former for memcpy(). The variants are optimized to be compact, and at the same time the compiler is sometimes able to detect the loop and to replace it with a "rep movsb". The new function is 24 bytes instead of 52 on x86_64. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
#
c91eb033 |
|
07-Feb-2022 |
Willy Tarreau <w@1wt.eu> |
tools/nolibc/string: split the string functions into string.h The string manipulation functions (mem*, str*) are now found in string.h. The file depends on almost nothing and will be usable from other includes if needed. Maybe more functions could be added. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|