Cross Reference: /linux-master/arch/arm/lib/memcpy.S

History log of /linux-master/arch/arm/lib/memcpy.S
Revision	Date	Author	Comments
# ba999a04	03-Oct-2021	Ard Biesheuvel <ardb@kernel.org>	ARM: memcpy: use frame pointer as unwind anchor The memcpy template is a bit unusual in the way it manages the stack pointer: depending on the execution path through the function, the SP assumes different values as different subsets of the register file are preserved and restored again. This is problematic when it comes to EHABI unwind info, as it is not instruction accurate, and does not allow tracking the SP value as it changes. Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory copy functions") addressed this by carving up the function in different chunks as far as the unwinder is concerned, and keeping a set of unwind directives for each of them, each corresponding with the state of the stack pointer during execution of the chunk in question. This not only duplicates unwind info unnecessarily, but it also complicates unwinding the stack upon overflow. Instead, let's do what the compiler does when the SP is updated halfway through a function, which is to use a frame pointer and emit the appropriate unwind directives to communicate this to the unwinder. Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's avoid touching R7 in the body of the template, so that Thumb-2 can use it as the frame pointer. R11 was not modified in the first place. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Keith Packard <keithpac@amazon.com> Tested-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
# 735e8d93	06-Nov-2020	Fangrui Song <maskray@google.com>	ARM: 9022/1: Change arch/arm/lib/mem.S to use WEAK instead of .weak Commit d6d51a96c7d6 ("ARM: 9014/2: Replace string mem functions for KASan") add .weak directives to memcpy/memmove/memset to avoid collision with KASAN interceptors. This does not work with LLVM's integrated assembler (the assembly snippet `.weak memcpy ... .globl memcpy` produces a STB_GLOBAL memcpy while GNU as produces a STB_WEAK memcpy). LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate WEAK macro instead. Link: https://github.com/ClangBuiltLinux/linux/issues/1190 -- Fixes: d6d51a96c7d6 ("ARM: 9014/2: Replace string mem* functions for KASan") Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
# d6d51a96	25-Oct-2020	Linus Walleij <linus.walleij@linaro.org>	ARM: 9014/2: Replace string mem* functions for KASan Functions like memset()/memmove()/memcpy() do a lot of memory accesses. If a bad pointer is passed to one of these functions it is important to catch this. Compiler instrumentation cannot do this since these functions are written in assembly. KASan replaces these memory functions with instrumented variants. The original functions are declared as weak symbols so that the strong definitions in mm/kasan/kasan.c can replace them. The original functions have aliases with a '__' prefix in their name, so we can call the non-instrumented variant if needed. We must use __memcpy()/__memset() in place of memcpy()/memset() when we copy .data to RAM and when we clear .bss, because kasan_early_init cannot be called before the initialization of .data and .bss. For the kernel compression and EFI libstub's custom string libraries we need a special quirk: even if these are built without KASan enabled, they rely on the global headers for their custom string libraries, which means that e.g. memcpy() will be defined to __memcpy() and we get link failures. Since these implementations are written i C rather than assembly we use e.g. __alias(memcpy) to redirected any users back to the local implementation. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: kasan-dev@googlegroups.com Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # QEMU/KVM/mach-virt/LPAE/8G Tested-by: Florian Fainelli <f.fainelli@gmail.com> # Brahma SoCs Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> # i.MX6Q Reported-by: Russell King - ARM Linux <rmk+kernel@armlinux.org.uk> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Abbott Liu <liuwenliang@huawei.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
# d2912cb1	04-Jun-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# a216376a	17-Feb-2019	Stefan Agner <stefan@agner.ch>	ARM: 8841/1: use unified assembler in macros Use unified assembler syntax (UAL) in macros. Divided syntax is considered deprecated. This will also allow to build the kernel using LLVM's integrated assembler. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
# 8478132a	23-Nov-2016	Russell King <rmk+kernel@armlinux.org.uk>	Revert "arm: move exports to definitions" This reverts commit 4dd1837d7589f468ed109556513f476e7a7f9121. Moving the exports for assembly code into the assembly files breaks KSYM trimming, but also breaks modversions. While fixing the KSYM trimming is trivial, fixing modversions brings us to a technically worse position that we had prior to the above change: - We end up with the prototype definitions divorsed from everything else, which means that adding or removing assembly level ksyms become more fragile: * if adding a new assembly ksyms export, a missed prototype in asm-prototypes.h results in a successful build if no module in the selected configuration makes use of the symbol. * when removing a ksyms export, asm-prototypes.h will get forgotten, with armksyms.c, you'll get a build error if you forget to touch the file. - We end up with the same amount of include files and prototypes, they're just in a header file instead of a .c file with their exports. As for lines of code, we don't get much of a size reduction: (original commit) 47 files changed, 131 insertions(+), 208 deletions(-) (fix for ksyms trimming) 7 files changed, 18 insertions(+), 5 deletions(-) (two fixes for modversions) 1 file changed, 34 insertions(+) 3 files changed, 7 insertions(+), 2 deletions(-) which results in a net total of only 25 lines deleted. As there does not seem to be much benefit from this change of approach, revert the change. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
# 4dd1837d	13-Jan-2016	Al Viro <viro@zeniv.linux.org.uk>	arm: move exports to definitions Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
# 1bd46782	03-Jul-2015	Russell King <rmk+kernel@arm.linux.org.uk>	ARM: avoid unwanted GCC memset()/memcpy() optimisations for IO variants We don't want GCC optimising our memset_io(), memcpy_fromio() or memcpy_toio() variants, so we must not call one of the standard functions. Provide a separate name for our assembly memcpy() and memset() functions, and use that instead, thereby bypassing GCC's ability to optimise these operations. GCCs optimisation may introduce unaligned accesses which are invalid for device mappings. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 279f487e	26-Nov-2014	Lin Yongting <linyongting@gmail.com>	ARM: 8225/1: Add unwinding support for memory copy functions The memory copy functions(memcpy, __copy_from_user, __copy_to_user) never had unwinding annotations added. Currently, when accessing invalid pointer by these functions occurs the backtrace shown will stop at these functions or some completely unrelated function. Add unwinding annotations in hopes of getting a more useful backtrace in following cases: 1. die on accessing invalid pointer by these functions 2. kprobe trapped at any instruction within these functions 3. interrupted at any instruction within these functions Signed-off-by: Lin Yongting <linyongting@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 8b592783	23-Jul-2009	Catalin Marinas <catalin.marinas@arm.com>	Thumb-2: Implement the unified arch/arm/lib functions This patch adds the ARM/Thumb-2 unified support for the arch/arm/lib/* files. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 93ed3970	28-Aug-2008	Catalin Marinas <catalin.marinas@arm.com>	[ARM] 5227/1: Add the ENDPROC declarations to the .S files This declaration specifies the "function" type and size for various assembly functions, mainly needed for generating the correct branch instructions in Thumb-2. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 75494230	01-Nov-2005	Nicolas Pitre <nico@cam.org>	[ARM] 2947/1: copy template with new memcpy/memmove Patch from Nicolas Pitre This patch provides a new implementation for optimized memory copy functions on ARM. It is made of two levels: a template that consists of the core copy code and separate files that define macros to be used with the core code depending on the type of copy needed. This allows for best performances while sharing the same core for implementing memcpy(), copy_from_user() and copy_to_user() for instance. Two reasons for this work: 1) the current copy_to_user/copy_from_user implementation assumes no task switch will ever occur in the middle of each copied page making it completely unsafe with CONFIG_PREEMPT=y. 2) current copy implementations are measurably suboptimal and optimizing different implementations separately is a pain and more opportunities for bugs. The reason for (1) is the fact that copy inside user pages are performed with the ldm instruction which has no mean for testing user protections and could possibly race with process preemption bypassing the COW mechanism for example. This is a longstanding issue that we said ought to be fixed for about two years now. The solution is to substitute those ldm insns with a series of ldrt or strt insns to enforce user memory protection. At least on StrongARM and XScale cores the ldm is not faster than the equivalent ldr/str insns with a warm i-cache so there is no measurable performance degradation with that change. The fact that the copy code is a template makes it pretty easy to reuse the same core code as for memcpy and benefit from the same performance optimizations. Now (2) is best demonstrated with actual throughput measurements. First, here is a summary of memcopy tests performed on a StrongARM core: PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 59.73 107.43 unaligned 32 61.31 74.72 aligned 100 132.47 136.15 unaligned 100 103.84 123.76 aligned 4096 130.67 130.80 unaligned 4096 130.68 130.64 aligned 1048576 68.03 68.18 unaligned 1048576 68.03 68.18 The buffer size is in bytes and the measured speed in MB/s. The copy was performed repeatedly with given buffer and throughput averaged over 3 seconds. Here we can see that the current kernel version has a higher entry cost that shows up with small buffers. As buffer size grows both implementation converge to the same throughput. Now here's the exact same test performed on an XScale core (PXA255): PTR alignment buffer size kernel version this version ------------------------------------------------------------ aligned 32 46.99 77.58 unaligned 32 53.61 59.59 aligned 100 107.19 136.59 unaligned 100 83.61 97.58 aligned 4096 129.13 129.98 unaligned 4096 128.36 128.53 aligned 1048576 53.76 59.41 unaligned 1048576 33.67 56.96 Again we can see the entry setup cost being higher for the current kernel before getting to the main copy loop. Then throughput results converge as long as the buffer remains in the cache. Then the 1MB case shows more differences probably due to better pld placement and/or less instruction interlocks in this proposed implementation. Disclaimer: The PXA system was running with slower clocks than the StrongARM system so trying to infer any conclusion by comparing those separate sets of results side by side would be completely inappropriate. So... What this patch does is to replace both memcpy and memmove with an implementation based on the provided copy code template. The memmove code is kept separate since it is used only if the memory areas involved do overlap in which case the code is a transposition of the template but with the copy occurring in the opposite direction (trying to fit that mode into the template turned it into a mess not worth it for memmove alone). And obviously both memcpy and memmove were tested with all kinds of pointer alignments and buffer sizes to exercise all code paths for correctness. The next patch will provide the now trivial replacement implementation copy_to_user and copy_from_user. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
# 1da177e4	16-Apr-2005	Linus Torvalds <torvalds@ppc970.osdl.org>	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!