History log of /linux-master/arch/powerpc/lib/memcpy_64.S
Revision Date Author Comments
# 39326182 06-Aug-2023 Masahiro Yamada <masahiroy@kernel.org>

powerpc: replace #include <asm/export.h> with #include <linux/export.h>

Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.

Replace #include <asm/export.h> with #include <linux/export.h>.

After all the <asm/export.h> lines are converted, <asm/export.h> and
<asm-generic/export.h> will be removed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
[mpe: Fixup selftests that stub asm/export.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230806150954.394189-2-masahiroy@kernel.org


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 26deb043 26-Apr-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: prepare string/mem functions for KASAN

CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See commit 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fixups to keep selftests working]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 98c45f51 03-Aug-2018 Paul Mackerras <paulus@ozlabs.org>

selftests/powerpc/64: Test all paths through copy routines

The hand-coded assembler 64-bit copy routines include feature sections
that select one code path or another depending on which CPU we are
executing on. The self-tests for these copy routines end up testing
just one path. This adds a mechanism for selecting any desired code
path at compile time, and makes 2 or 3 versions of each test, each
using a different code path, so as to cover all the possible paths.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Add -mcpu=power4 to CFLAGS for older compilers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2c86cd18 05-Jul-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: clean inclusions of asm/feature-fixups.h

files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ec0c464c 05-Jul-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: move ASM_CONST and stringify_in_c() into asm-const.h

This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 15a3204d 20-Feb-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Set assembler machine type to POWER4

Rather than override the machine type in .S code (which can hide wrong
or ambiguous code generation for the target), set the type to power4
for all assembly.

This also means we need to be careful not to build power4-only code
when we're not building for Book3S, such as the "power7" versions of
copyuser/page/memcpy.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9445aa1a 13-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

ppc: move exports to definitions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 00f554fa 29-Apr-2014 Philippe Bergheaud <felix@linux.vnet.ibm.com>

powerpc: memcpy optimization for 64bit LE

Unaligned stores take alignment exceptions on POWER7 running in little-endian.
This is a dumb little-endian base memcpy that prevents unaligned stores.
Once booted the feature fixup code switches over to the VMX copy loops
(which are already endian safe).

The question is what we do before that switch over. The base 64bit
memcpy takes alignment exceptions on POWER7 so we can't use it as is.
Fixing the causes of alignment exception would slow it down, because
we'd need to ensure all loads and stores are aligned either through
rotate tricks or bytewise loads and stores. Either would be bad for
all other 64bit platforms.

[ I simplified the loop a bit - Anton ]

Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 169c7cee 02-Apr-2014 Anton Blanchard <anton@samba.org>

powerpc: Add _GLOBAL_TOC for ABIv2 assembly functions exported to modules

If an assembly function that calls back into c code is exported to
modules, we need to ensure r2 is setup correctly. There are only
two places crazy enough to do it (two of which are my fault).

Signed-off-by: Anton Blanchard <anton@samba.org>


# 752a6422 14-Feb-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

powerpc: Fix unsafe accesses to parameter area in ELFv2

Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10. This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.

Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).

In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>


# b37c10d1 03-Feb-2014 Anton Blanchard <anton@samba.org>

powerpc: Fix ABIv2 issues with stack offsets in assembly code

Fix STK_PARAM and use it instead of hardcoding ABIv1 offsets.

Signed-off-by: Anton Blanchard <anton@samba.org>


# 22d651dc 20-Jan-2014 Michael Ellerman <mpe@ellerman.id.au>

selftests/powerpc: Import Anton's memcpy / copy_tofrom_user tests

Turn Anton's memcpy / copy_tofrom_user test into something that can
live in tools/testing/selftests.

It requires one turd in arch/powerpc/lib/memcpy_64.S, but it's pretty
harmless IMHO.

We are sailing very close to the wind with the feature macros. We define
them to nothing, which currently means we get a few extra nops and
include the unaligned calls.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 86e32fdc 25-Jun-2012 Michael Neuling <mikey@neuling.org>

powerpc: Change mtcrf to use real register names

mtocrf define is just a wrapper around the real instructions so we can
just use real register names here (ie. lower case).

Also remove braces in macro so this is possible.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c75df6f9 25-Jun-2012 Michael Neuling <mikey@neuling.org>

powerpc: Fix usage of register macros getting ready for %r0 change

Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
std r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b3f271e8 30-May-2012 Anton Blanchard <anton@samba.org>

powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch

Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b8197d (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

.memcpy
.async_memcpy
.async_copy_data
.__raid_run_ops
.handle_stripe
.raid5d
.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After: 999 MB/s

A 12% improvement.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 694caf02 17-Apr-2012 Anton Blanchard <anton@samba.org>

powerpc: Remove CONFIG_POWER4_ONLY

Remove CONFIG_POWER4_ONLY, the option is badly named and only does two
things:

- It wraps the MMU segment table code. With feature fixups there is
little downside to compiling this in.

- It uses the newer mtocrf instruction in various assembly functions.
Instead of making this a compile option just do it at runtime via
a feature fixup.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# e423b9ec 25-Feb-2009 Mark Nelson <markn@au1.ibm.com>

powerpc: Fix 64bit memcpy() regression

This fixes a regression introduced by commit
25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
pc: c000000000039574: .memcpy+0x74/0x244
lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp: c00000003baf32a0
msr: 8000000000009032
dar: c00000003f380000
dsisr: 40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 25d6e2d7 26-Oct-2008 Mark Nelson <markn@au1.ibm.com>

powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD

Update memcpy() to add two new feature sections: one for aligning the
destination before copying and one for copying using aligned load
and store doubles.

These new feature sections will only affect Power6 and Cell because
the CPU feature bit was only added to these two processors.

Power6 gets its best performance in memcpy() when aligning neither the
source nor the destination, while Cell gets its best performance when
just the destination is aligned. But in order to save on CPU feature
bits we can use the previously added CPU_FTR_CP_USE_DCBTZ feature bit
to differentiate between Power6 and Cell (because CPU_FTR_CP_USE_DCBTZ
was added to Cell but not Power6).

The first feature section acts to nop out the branch that takes us to
the code that aligns us to an eight byte boundary for the destination.
We only want to nop out this branch on Power6.

So the ALT_FTR_SECTION_END() for this feature section creates a test
mask of the two feature bits ORed together and provides an expected
result of just CPU_FTR_UNALIGNED_LD_STD, thus we nop out the branch
if we're on a CPU that has CPU_FTR_UNALIGNED_LD_STD set and
CPU_FTR_CP_USE_DCBTZ unset.

For the second feature section added, if we're on a CPU that has the
CPU_FTR_UNALIGNED_LD_STD bit set then we don't want to do the copy
with aligned loads and stores (and the appropriate shifting left and
right instructions), so we want to nop out the branch to
.Lsrc_unaligned.

The andi. used for this branch is moved to just above the branch
because this allows us to nop out both instructions with just one
feature section which gives us better performance and doesn't hurt
readability which two separate feature sections did.

Moving the andi. to just above the branch doesn't have any noticeable
negative effect on the remaining 64bit processors (the ones that
didn't have this feature bit added).

On Cell this simple modification results in an improvement to measured
memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in
the cold cache case.

On Power6 we get memory bandwidth results that are up to three times
faster in the hot cache case and up to 50% faster in the cold cache
case.

Commit 2a9294369bd020db89bfdf78b84c3615b39a5c84 ("powerpc: Add new CPU
feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was
added.

To say that Cell gets its best performance in memcpy() with just the
destination aligned is true but only for the reason that the indirect
shift and rotate instructions, sld and srd, are microcoded on Cell.
This means that either the destination or the source can be aligned,
but not both, and seeing as we get better performance with the
destination aligned we choose this option.

While we're at it make a one line change from cmpldi r1,... to
cmpldi cr1,... for consistency.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 3467bfd3 22-Mar-2007 Olof Johansson <olof@lixom.net>

[POWERPC] Use mtocrf instruction in asm when CONFIG_POWER4_ONLY=y

mtocrf is a faster single-field mtcrf (move to condition register
fields) instruction available in POWER4 and later processors. It can
make quite a difference in performance on some implementations, so use
it for CONFIG_POWER4_ONLY builds.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# d0027bf0 30-Aug-2006 Paul Mackerras <paulus@samba.org>

[POWERPC] Fix return value from memcpy

As pointed out by Herbert Xu <herbert@gondor.apana.org.au>, our
memcpy implementation didn't return the destination pointer as its
return value, and there is code in the kernel that expects that.
This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# 2ef9481e 23-Jan-2006 Jon Mason <jdmason@us.ibm.com>

[PATCH] powerpc: trivial: modify comments to refer to new location of files

This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.

Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 70d64cea 10-Oct-2005 Paul Mackerras <paulus@samba.org>

powerpc: Rename files to have consistent _32/_64 suffixes

This doesn't change any code, just renames things so we consistently
have foo_32.c and foo_64.c where we have separate 32- and 64-bit
versions.

Signed-off-by: Paul Mackerras <paulus@samba.org>