History log of /linux-master/arch/powerpc/lib/copypage_power7.S
Revision Date Author Comments
# 8488cdcb 29-Feb-2024 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Move dcbt/dcbtst sequence into a macro

There's an almost identical code sequence to specify load/store access
hints in __copy_tofrom_user_power7(), copypage_power7() and
memcpy_power7().

Move the sequence into a common macro, which is passed the registers to
use as they differ slightly.

There also needs to be a copy in the selftests, it could be shared in
future if the headers are cleaned up / refactored.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240229122521.762431-1-mpe@ellerman.id.au


# 4e991e3c 07-Apr-2023 Nicholas Piggin <npiggin@gmail.com>

powerpc: add CFUNC assembly label annotation

This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definitions to fix selftests build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com


# 1a59d1b8 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1334 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d58badfb 06-Jun-2018 Simon Guo <wei.guo.simon@gmail.com>

powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision

This patch add VMX primitives to do memcmp() in case the compare size
is equal or greater than 4K bytes. KSM feature can benefit from this.

Test result with following test program(replace the "^>" with ""):
------
># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
>#include <malloc.h>
>#include <stdlib.h>
>#include <string.h>
>#include <time.h>
>#include "utils.h"
>#define SIZE (1024 * 1024 * 900)
>#define ITERATIONS 40

int test_memcmp(const void *s1, const void *s2, size_t n);

static int testcase(void)
{
char *s1;
char *s2;
unsigned long i;

s1 = memalign(128, SIZE);
if (!s1) {
perror("memalign");
exit(1);
}

s2 = memalign(128, SIZE);
if (!s2) {
perror("memalign");
exit(1);
}

for (i = 0; i < SIZE; i++) {
s1[i] = i & 0xff;
s2[i] = i & 0xff;
}
for (i = 0; i < ITERATIONS; i++) {
int ret = test_memcmp(s1, s2, SIZE);

if (ret) {
printf("return %d at[%ld]! should have returned zero\n", ret, i);
abort();
}
}

return 0;
}

int main(void)
{
return test_harness(testcase, "memcmp");
}
------
Without this patch (but with the first patch "powerpc/64: Align bytes
before fall back to .Lshort in powerpc64 memcmp()." in the series):
4.726728762 seconds time elapsed ( +- 3.54%)
With VMX patch:
4.234335473 seconds time elapsed ( +- 2.63%)
There is ~+10% improvement.

Testing with unaligned and different offset version (make s1 and s2 shift
random offset within 16 bytes) can archieve higher improvement than 10%..

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 15a3204d 20-Feb-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Set assembler machine type to POWER4

Rather than override the machine type in .S code (which can hide wrong
or ambiguous code generation for the target), set the type to power4
for all assembly.

This also means we need to be careful not to build power4-only code
when we're not building for Book3S, such as the "power7" versions of
copyuser/page/memcpy.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8a583c0a 05-Aug-2017 Andreas Schwab <schwab@linux-m68k.org>

powerpc: Fix invalid use of register expressions

binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals, for example:

arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression

In practice these are almost all uses of r0 that should just be a
literal 0.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Mention r0 is almost always the culprit, fold in purgatory change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c2ce6f9f 09-Feb-2015 Anton Blanchard <anton@samba.org>

powerpc: Change vrX register defines to vX to match gcc and glibc

As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 752a6422 14-Feb-2014 Ulrich Weigand <ulrich.weigand@de.ibm.com>

powerpc: Fix unsafe accesses to parameter area in ELFv2

Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10. This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.

Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).

In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.

Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>


# b37c10d1 03-Feb-2014 Anton Blanchard <anton@samba.org>

powerpc: Fix ABIv2 issues with stack offsets in assembly code

Fix STK_PARAM and use it instead of hardcoding ABIv1 offsets.

Signed-off-by: Anton Blanchard <anton@samba.org>


# b1576fec 03-Feb-2014 Anton Blanchard <anton@samba.org>

powerpc: No need to use dot symbols when branching to a function

binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.

Alan tells me that binutils has been doing this for 9 years.

Signed-off-by: Anton Blanchard <anton@samba.org>


# 280a5ba2 29-May-2013 Michael Neuling <mikey@neuling.org>

powerpc/pseries: Improve stream generation comments in copypage/user

No code changes, just documenting what's happening a little better.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 44ce6a5e 25-Jun-2012 Michael Neuling <mikey@neuling.org>

powerpc: Merge STK_REG/PARAM/FRAMESIZE

Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different
places.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c75df6f9 25-Jun-2012 Michael Neuling <mikey@neuling.org>

powerpc: Fix usage of register macros getting ready for %r0 change

Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
std r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# fde69282 29-May-2012 Anton Blanchard <anton@samba.org>

powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch

Implement a POWER7 optimised copy_page using VMX and enhanced
prefetch instructions. We use enhanced prefetch hints to prefetch
both the load and store side. We copy a cacheline at a time and
fall back to regular loads and stores if we are unable to use VMX
(eg we are in an interrupt).

The following microbenchmark was used to assess the impact of
the patch:

http://ozlabs.org/~anton/junkcode/page_fault_file.c

We test MAP_PRIVATE page faults across a 1GB file, 100 times:

# time ./page_fault_file -p -l 1G -i 100

Before: 22.25s
After: 18.89s

17% faster

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>