History log of /linux-master/arch/powerpc/include/asm/exception-64s.h
Revision Date Author Comments
# 04ece7b6 28-May-2021 Nicholas Piggin <npiggin@gmail.com>

KVM: PPC: Book3S 64: Move hcall early register setup to KVM

System calls / hcalls have a different calling convention than
other interrupts, so there is code in the KVMTEST to massage these
into the same form as other interrupt handlers.

Move this work into the KVM hcall handler. This means teaching KVM
a little more about the low level interrupt handler setup, PACA save
areas, etc., although that's not obviously worse than the current
approach of coming up with an entirely different interrupt register
/ save convention.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com


# 08685be7 10-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: fix scv entry fallback flush vs interrupt

The L1D flush fallback functions are not recoverable vs interrupts,
yet the scv entry flush runs with MSR[EE]=1. This can result in a
timer (soft-NMI) or MCE or SRESET interrupt hitting here and overwriting
the EXRFI save area, which ends up corrupting userspace registers for
scv return.

Fix this by disabling RI and EE for the scv entry fallback flush.

Fixes: f79643787e0a0 ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # 5.9+ which also have flush L1D patch backport
Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210111062408.287092-1-npiggin@gmail.com


# 9a32a7e7 16-Nov-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: flush L1D after user accesses

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache after user accesses.

This is part of the fix for CVE-2020-4788.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f7964378 16-Nov-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: flush L1D on kernel entry

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This patch flushes the L1 cache on kernel entry.

This is part of the fix for CVE-2020-4788.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2384b36f 15-Jul-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORE

powerpc return from interrupt and return from system call sequences
are context synchronising.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200716013522.338318-1-npiggin@gmail.com


# 7fa95f9a 11-Jun-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: system call support for scv/rfscv instructions

Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical to
'sc' system calls, with the exception that LR is not preserved, nor
are volatile CR registers, and error is not indicated with CR0[SO],
but by returning a negative errno.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve LR.

getpid syscall throughput on POWER9 is improved by 26% (428 to 318
cycles), largely due to reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fix ppc64e build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com


# 8729c26e 25-Feb-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: Move real to virt switch into the common handler

The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

mpe: Incorporate fix from Nick:

It's possible for interrupts to be replayed when TM is enabled and
suspended, for example rt_sigreturn, where the mtmsrd MSR_KERNEL in
the real-mode entry point to the common handler causes a TM Bad Thing
exception (due to attempting to clear suspended).

The fix for this is to have replay interrupts go to the _virt entry
point and skip the mtmsrd, which matches what happens before this
patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-11-npiggin@gmail.com


# 0a882e28 28-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: remove bad stack branch

The bad stack test in interrupt handlers has a few problems. For
performance it is taken in the common case, which is a fetch bubble
and a waste of i-cache.

For code development and maintainence, it requires yet another stack
frame setup routine, and that constrains all exception handlers to
follow the same register save pattern which inhibits future
optimisation.

Remove the test/branch and replace it with a trap. Teach the program
check handler to use the emergency stack for this case.

This does not result in quite so nice a message, however the SRR0 and
SRR1 of the crashed interrupt can be seen in r11 and r12, as is the
original r1 (adjusted by INT_FRAME_SIZE). These are the most important
parts to debugging the issue.

The original r9-12 and cr0 is lost, which is the main downside.

kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847!
Oops: Exception in kernel mode, sig: 5 [#1]
BE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
NIP: c000000000009108 LR: c000000000cadbcc CTR: c0000000000090f0
REGS: c0000000fffcbd70 TRAP: 0700 Not tainted
MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 28222448 XER: 20040000
CFAR: c000000000009100 IRQMASK: 0
GPR00: 000000000000003d fffffffffffffd00 c0000000018cfb00 c0000000f02b3166
GPR04: fffffffffffffffd 0000000000000007 fffffffffffffffb 0000000000000030
GPR08: 0000000000000037 0000000028222448 0000000000000000 c000000000ca8de0
GPR12: 9000000002009032 c000000001ae0000 c000000000010a00 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c0000000f00322c0 c000000000f85200 0000000000000004 ffffffffffffffff
GPR24: fffffffffffffffe 0000000000000000 0000000000000000 000000000000000a
GPR28: 0000000000000000 0000000000000000 c0000000f02b391c c0000000f02b3167
NIP [c000000000009108] decrementer_common+0x18/0x160
LR [c000000000cadbcc] .vsnprintf+0x3ec/0x4f0
Call Trace:
Instruction dump:
996d098a 994d098b 38610070 480246ed 48005518 60000000 38200000 718a4000
7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 15820091 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move paca save area offsets into exception-64s.S

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a0502434 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move head-64.h code to exception-64s.S where it is used

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 12a04809 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move exception-64s.h code to exception-64s.S where it is used

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f1ff37e8 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move KVM related code together

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6d18f29c 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: remove STD_EXCEPTION_COMMON variants

These are only called in one place each.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f0ac4478 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical place

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# fc557537 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: unwind exception-64s.h macros

Many of these macros just specify 1-4 lines which are only called a
few times each at most, and often just once. Remove this indirection.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 47169fba 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: Move EXCEPTION_COMMON additions into callers

More cases of code insertion via macros that does not add a great
deal. All the additions have to be specified in the macro arguments,
so they can just as well go after the macro.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c06075f3 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches into callers

The aim is to reduce the amount of indirection it takes to get through
the exception handler macros, particularly where it provides little
code sharing.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5dba1d50 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency with others

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c0c6cd15 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: KVM handler can set the HSRR trap bit

Move the KVM trap HSRR bit into the KVM handler, which can be
conditionally applied when hsrr parameter is set.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 17bdc064 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: merge KVM handler and skip variants

Conditionally expand the skip case if it is specified.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# fa4cf6b7 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: consolidate maskable and non-maskable prologs

Conditionally expand the soft-masking test if a mask is passed in.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a7c1ca19 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: remove the "extra" macro parameter

Rather than pass in the soft-masking and KVM tests via macro that is
passed to another macro to expand it, switch to usig gas macros and
conditionally expand the soft-masking and KVM tests.

The system reset with its idle test is open coded as it is a one-off.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2d046308 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variants

- Re-name the macros to _REAL and _VIRT suffixes rather than no and
_RELON suffix.

- Move the macro definitions together in the file.

- Move RELOCATABLE ifdef inside the _VIRT macro.

Further consolidation between variants does not buy much here.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# bd7b6d13 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI variant

Switch to a gas macro that conditionally expands the RI clearing
instruction.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4508a74a 22-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: remove H concatenation for EXC_HV variants

Replace all instances of this with gas macros that test the hsrr
parameter and use the appropriate register names / labels.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Remove extraneous 2nd check for 0xea0 in SOFTEN_TEST]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 1efd8caa 02-Jul-2019 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s/exception: Remove unused SOFTEN_VALUE_0x980

Remove SOFTEN_VALUE_0x980, it's been unused since commit
dabe859ec636 ("powerpc: Give hypervisor decrementer interrupts their
own handler") (Sep 2012).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4b1f5ccc 11-Jun-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/exception: fix line wrap and semicolon inconsistencies in macros

By convention, all lines should be separated by a semicolons. Last line
should have neither semicolon or line wrap.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 890274c2 18-Apr-2019 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Implement KUAP for Radix MMU

Kernel Userspace Access Prevention utilises a feature of the Radix MMU
which disallows read and write access to userspace addresses. By
utilising this, the kernel is prevented from accessing user data from
outside of trusted paths that perform proper safety checks, such as
copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when
performing an operation like copy_{to/from}_user(). The register that
controls this (AMR) does not prevent userspace from accessing itself,
so there is no need to save and restore when entering and exiting
userspace.

When entering the kernel from the kernel we save AMR and if it is not
blocking user access (because eg. we faulted doing a user access) we
reblock user access for the duration of the exception (ie. the page
fault) and then restore the AMR when returning back to the kernel.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y)
and performing the following:

# (echo ACCESS_USERSPACE) > [debugfs]/provoke-crash/DIRECT

If enabled, this should send SIGSEGV to the thread.

We also add paranoid checking of AMR in switch and syscall return
under CONFIG_PPC_KUAP_DEBUG.

Co-authored-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c911d2e1 12-Jan-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFO

Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACA_THREAD_INFO(r13).

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Add PACA_THREAD_INFO rather than using PACACURRENT]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 48e7b769 14-Sep-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/hash: Convert SLB miss handlers to C

This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory must not be accessed when handling kernel
space SLB misses, so care should be taken there. However user SLB
misses can access any kernel memory, which can be used to move some
fields out of the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to
bad address handling, etc ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Disallow tracing for all of slb.c for now.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4c2de74c 12-Oct-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64: Interrupts save PPR on stack rather than thread_struct

PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 54be0b9c 02-Oct-2018 Michael Ellerman <mpe@ellerman.id.au>

Revert "convert SLB miss handlers to C" and subsequent commits

This reverts commits:
5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C")
8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca")
655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps")
2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup")
89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache")

This series had a few bugs, and the fixes are not all trivial. So
revert most of it for now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5e46e29e 14-Sep-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/hash: convert SLB miss handlers to C

This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to bad
address handling, etc ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Since RFC:
- Added MSR[RI] handling
- Fixed up a register loss bug exposed by irq tracing (Aneesh)
- Reject misses outside the defined kernel regions (Aneesh)
- Added several more sanity checks and error handling (Aneesh), we may
look at consolidating these tests and tightenig up the code but for
a first pass we decided it's better to check carefully.

Since v1:
- Fixed SLB cache corruption (Aneesh)
- Fixed untidy SLBE allocation "leak" in get_vsid error case
- Now survives some stress testing on real hardware

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b536da7c 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Drop unused loc parameter to MASKABLE_EXCEPTION macros

We pass the "loc" (location) parameter to MASKABLE_EXCEPTION and
friends, but it's not used, so drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0a55c241 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Remove PSERIES naming from the MASKABLE macros

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6adc6e9c 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Drop _MASKABLE_RELON_EXCEPTION_PSERIES()

_MASKABLE_RELON_EXCEPTION_PSERIES() does nothing useful, update all
callers to use __MASKABLE_RELON_EXCEPTION_PSERIES() directly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9bf2877a 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Drop _MASKABLE_EXCEPTION_PSERIES()

_MASKABLE_EXCEPTION_PSERIES() does nothing useful, update all callers
to use __MASKABLE_EXCEPTION_PSERIES() directly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# bdf08e1d 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES to EXCEPTION_PROLOG

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 270373f1 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES

To just EXCEPTION_RELON_PROLOG().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6ebb9397 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1

The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as
EXCEPTION_PROLOG_2 (which we just recently created), except for
"RELON" (relocation on) exceptions.

So rename it as such.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 94f3cc8e 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Remove PSERIES from the NORI macros

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# cb58a4a4 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2

As with the other patches in this series, we are removing the
"PSERIES" from the name as it's no longer meaningful.

In this case it's not simply a case of removing the "PSERIES" as that
would result in a clash with the existing EXCEPTION_PROLOG_1.

Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in
sequence after 0 and 1.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b706f423 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to STD_RELON_EXCEPTION_OOL

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e42389c5 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTION

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 75e8bef3 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOL

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e899fce5 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTION

The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros
from the legacy iSeries versions, which are called
STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs
powernv or powermac etc.

We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x
("powerpc: Remove the main legacy iSerie platform code").

So remove "PSERIES" from the macros.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 92b6d65c 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()

EXCEPTION_RELON_PROLOG_PSERIES() only has two users,
STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of
which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into
EXCEPTION_RELON_PROLOG_PSERIES().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4a7a0a84 26-Jul-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()

EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES()
and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just
move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2c86cd18 05-Jul-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc: clean inclusions of asm/feature-fixups.h

files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a048a07d 21-May-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit

On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.

This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.

Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.

Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.

Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.

Thanks to Michal Suchánek for bug fixes and review.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5c11d1e5 06-Feb-2018 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro

Commit f14e953b191f ("powerpc/64s: Add support to take additional
parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL
macro by adding the wrong SOFTEN test which caused guest kernel crash
at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of
SOFTEN_NOTEST_HV.

Fixes: f14e953b191f ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fix-Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f442d004 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64s: Add support to mask perf interrupts and replay them

Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support
the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking
checking.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added
to use in the exception code to check for PMI interrupts.

In the masked_interrupt handler, for PMIs we reset the MSR[EE] and
return. In the __check_irq_replay(), replay the PMI interrupt by
calling performance_monitor_common handler.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f14e953b 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64s: Add support to take additional parameter in MASKABLE_* macro

To support addition of "bitmask" to MASKABLE_* macros, factor out the
EXCPETION_PROLOG_1 macro.

Make it explicit the interrupt masking supported by a gievn interrupt
handler. Patch correspondingly extends the MASKABLE_* macros with an
addition's parameter. "bitmask" parameter is passed to SOFTEN_TEST
macro to decide on masking the interrupt.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d32eb1b5 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64s: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*

Currently we use both EXCEPTION_PROLOG_1 and __EXCEPTION_PROLOG_1 in
the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_* to
use only __EXCEPTION_PROLOG_1. There is not logic change.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4e26bc4a 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64: Rename soft_enabled to irq_soft_mask

Rename the paca->soft_enabled to paca->irq_soft_mask as it is no
longer used as a flag for interrupt state, but a mask.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 01417c6c 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64: Change soft_enabled from flag to bitmask

"paca->soft_enabled" is used as a flag to mask some of interrupts.
Currently supported flags values and their details:

soft_enabled MSR[EE]

0 0 Disabled (PMI and HMI not masked)
1 1 Enabled

"paca->soft_enabled" is initialized to 1 to make the interripts as
enabled. arch_local_irq_disable() will toggle the value when
interrupts needs to disbled. At this point, the interrupts are not
actually disabled, instead, interrupt vector has code to check for the
flag and mask it when it occurs. By "mask it", it update interrupt
paca->irq_happened and return. arch_local_irq_restore() is called to
re-enable interrupts, which checks and replays interrupts if any
occured.

Now, as mentioned, current logic doesnot mask "performance monitoring
interrupts" and PMIs are implemented as NMI. But this patchset depends
on local_irq_* for a successful local_* update. Meaning, mask all
possible interrupts during local_* update and replay them after the
update.

So the idea here is to reserve the "paca->soft_enabled" logic. New
values and details:

soft_enabled MSR[EE]

1 0 Disabled (PMI and HMI not masked)
0 1 Enabled

Reason for the this change is to create foundation for a third mask
value "0x2" for "soft_enabled" to add support to mask PMIs. When
->soft_enabled is set to a value "3", PMI interrupts are mask and when
set to a value of "1", PMI are not mask. With this patch also extends
soft_enabled as interrupt disable mask.

Current flags are renamed from IRQ_[EN?DIS}ABLED to
IRQS_ENABLED and IRQS_DISABLED.

Patch also fixes the ptrace call to force the user to see the softe
value to be alway 1. Reason being, even though userspace has no
business knowing about softe, it is part of pt_regs. Like-wise in
signal context.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c2e480ba 19-Dec-2017 Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

powerpc/64: Add #defines for paca->soft_enabled flags

Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when
updating paca->soft_enabled. Replace the hardcoded values used when
updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic
change.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# aa8a5e00 09-Jan-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Add support for RFI flush of L1-D cache

On some CPUs we can prevent the Meltdown vulnerability by flushing the
L1-D cache on exit from kernel to user mode, and from hypervisor to
guest.

This is known to be the case on at least Power7, Power8 and Power9. At
this time we do not know the status of the vulnerability on other CPUs
such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale
CPUs. As more information comes to light we can enable this, or other
mechanisms on those CPUs.

The vulnerability occurs when the load of an architecturally
inaccessible memory region (eg. userspace load of kernel memory) is
speculatively executed to the point where its result can influence the
address of a subsequent speculatively executed load.

In order for that to happen, the first load must hit in the L1,
because before the load is sent to the L2 the permission check is
performed. Therefore if no kernel addresses hit in the L1 the
vulnerability can not occur. We can ensure that is the case by
flushing the L1 whenever we return to userspace. Similarly for
hypervisor vs guest.

In order to flush the L1-D cache on exit, we add a section of nops at
each (h)rfi location that returns to a lower privileged context, and
patch that with some sequence. Newer firmwares are able to advertise
to us that there is a special nop instruction that flushes the L1-D.
If we do not see that advertised, we fall back to doing a displacement
flush in software.

For guest kernels we support migration between some CPU versions, and
different CPUs may use different flush instructions. So that we are
prepared to migrate to a machine with a different flush instruction
activated, we may have to patch more than one flush instruction at
boot if the hypervisor tells us to.

In the end this patch is mostly the work of Nicholas Piggin and
Michael Ellerman. However a cast of thousands contributed to analysis
of the issue, earlier versions of the patch, back ports testing etc.
Many thanks to all of them.

Tested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 222f20f1 09-Jan-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Simple RFI macro conversions

This commit does simple conversions of rfi/rfid to the new macros that
include the expected destination context. By simple we mean cases
where there is a single well known destination context, and it's
simply a matter of substituting the instruction for the appropriate
macro.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 50e51c13 09-Jan-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64: Add macros for annotating the destination of rfid/hrfid

The rfid/hrfid ((Hypervisor) Return From Interrupt) instruction is
used for switching from the kernel to userspace, and from the
hypervisor to the guest kernel. However it can and is also used for
other transitions, eg. from real mode kernel code to virtual mode
kernel code, and it's not always clear from the code what the
destination context is.

To make it clearer when reading the code, add macros which encode the
expected destination context.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ba41e1e1 28-Sep-2017 Balbir Singh <bsingharora@gmail.com>

powerpc/mce: Hookup derror (load/store) UE errors

Extract physical_address for UE errors by walking the page
tables for the mm and address at the NIP, to extract the
instruction. Then use the instruction to find the effective
address via analyse_instr().

We might have page table walking races, but we expect them to
be rare, the physical address extraction is best effort. The idea
is to then hook up this infrastructure to memory failure eventually.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8568f1e0 21-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/paca: EX_CTR is not used with RELOCATABLE=n, remove it

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 635942ae 21-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/paca: EX_R3 can be merged with EX_DAR

EX_R3 is used only for a small section of the bad stack handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# dbeea1d6 21-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/paca: EX_LR can be merged with EX_DAR

EX_LR is used only for a small section of the SLB miss handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 36670fcf 21-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/paca: EX_SRR0 is unused, remove it

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8c388514 21-May-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Add EX_SIZE definition for paca exception save areas

Rather than open-coding it 4 times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b51351e2 13-Jun-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s/idle: Branch to handler with virtual mode offset

Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b1ee8a3d 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Dedicated system reset interrupt stack

The system reset interrupt is used for crash/debug situations, so it is
desirable to have as little impact on the normal state of the system as
possible.

Currently it uses the current kernel stack to process the exception.
This stores into the stack which may be involved with the crash. The
stack pointer may be corrupted, or it may have overflowed.

Avoid or minimise these problems by creating a dedicated NMI stack for
the system reset interrupt to use.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c4f3b52c 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Disallow system reset vs system reset reentrancy

In preparation for using a dedicated stack for system reset interrupts,
prevent a nested system reset from recovering, in order to simplify
code that is called in crash/debug path. This allows a system reset
interrupt to just use the base stack pointer.

Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
the interrrupt non-recoverable if it is taken inside another system
reset.

Interrupt nesting could be allowed similarly to MCE, but system reset
is a special case that's not for normal operation, so simplicity wins
until there is requirement for nested system reset interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a3d96f70 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix system reset vs general interrupt reentrancy

The system reset interrupt can occur when MSR_EE=0, and it currently
uses the PACA_EXGEN save area.

Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0
when the save area is still in use. A system reset interrupt in this
window can lead to undetected corruption when the save area gets
overwritten.

This patch introduces PACA_EXNMI save area for system reset exceptions,
which closes this corruption window. It's also helpful to retain the
EXGEN state for debugging situations, even if not considering the
recoverability aspect.

This patch also moves the PACA_EXMC area down to a less frequently used
part of the paca with the new save area.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a4087a4d 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Exception macro for stack frame and initial register save

This code is common to a few exceptions, and another user will be added.
This causes a trivial change to generated code:

- 604: std r9,416(r1)
- 608: mfspr r11,314
- 60c: std r11,368(r1)
- 610: mfspr r12,315
+ 604: mfspr r11,314
+ 608: mfspr r12,315
+ 60c: std r9,416(r1)
+ 610: std r11,368(r1)

machine_check_powernv_early could also use this, but that requires non
trivial changes to generated code, so that's for another patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 83a980f7 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Add exception macro that does not enable RI

Subsequent patches will add more non-RI variant exceptions, so
create a macro for it rather than open-code it.

This does not change generated instructions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 544686ca 19-Apr-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Stop using bit in HSPRG0 to test winkle

The POWER8 idle code has a neat trick of programming the power on engine
to restore a low bit into HSPRG0, so idle wakeup code can test and see
if it has been programmed this way and therefore lost all state. Restore
time can be reduced if winkle has not been reached.

However this messes with our r13 PACA pointer, and requires HSPRG0 to be
written to. It also optimizes the slowest and most uncommon case at the
expense of another SPR write in the common nap state wakeup.

Remove this complexity and assume winkle sleeps always require a state
restore. This speedup could be made entirely contained within the winkle
idle code by counting per-core winkles and setting a thread bitmap when
all have gone to winkle.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2563a70c 19-Apr-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Remove unnecessary relocation branch from idle handler

The system reset idle handler system_reset_idle_common is relocated, so
relocation is not required to branch to kvm_start_guest. The superfluous
relocation does not result in incorrect code, but it does not compile
outside of exception-64s.S (with fixed section definitions).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a050d20d 13-Apr-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Use relon prolog for EXC_VIRT_OOL_MASKABLE_HV handlers

Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers
use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which
calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the *real mode* interrupt
prolog. This means we needlessly rfid from virtual mode to virtual mode.

For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between
threads with snooze disabled (which causes IPI) gets about 3% faster, about 370
cycles. Should be more important on POWER9 with global doorbells and HVI for
host interrupts.

Use the RELON variant instead to reduce overhead.

Fixes: 1707dd1613 ("powerpc: Save CFAR before branching in interrupt entry paths")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold some more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# be5c5e84 17-Apr-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y

Prior to commit 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi
interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode()
was just a bl hmi_exception_realmode, which the linker would turn into a bl to
the local entry point of hmi_exception_realmode. This was broken when
CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of
the kernel text that is copied down to 0x0.

But in fixing that, we added a new bug on little endian kernels. Because the
branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry
point of hmi_exception_realmode(). The global entry point must be called with
r12 containing the address of hmi_exception_realmode(), because it uses that
value to calculate the TOC value (r2).

This may manifest as a checkstop, because we take a junk value from r12 which
came from HSRR1, add a small constant to it and then use that as the TOC
pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above
RAM and somewhere in MMIO space.

Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the
label we're branching to. This means r12 will be setup correctly on LE, fixing
this bug, and r12 is also volatile across function calls on BE so it's a good
choice anyway.

Fixes: 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2337d207 26-Jan-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts

The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# bc355125 30-Jan-2017 Paul Mackerras <paulus@ozlabs.org>

powerpc/64: Allow for relocation-on interrupts from guest to host

With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).

Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.

We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a97a65d5 26-Jan-2017 Nicholas Piggin <npiggin@gmail.com>

KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts

64-bit Book3S exception handlers must find the dynamic kernel base
to add to the target address when branching beyond __end_interrupts,
in order to support kernel running at non-0 physical address.

Support this in KVM by branching with CTR, similarly to regular
interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
restored after the branch.

Without this, the host kernel hangs and crashes randomly when it is
running at a non-0 address and a KVM guest is started.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d3918e7f 21-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV

Change the calling convention to put the trap number together with
CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
handler.

The 64-bit PR handler entry translates the calling convention back
to match the previous call convention (i.e., shared with 32-bit), for
simplicity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e6740ae6 07-Nov-2016 Hugh Dickins <hughd@google.com>

powerpc: Fix exception vector build with 2.23 era binutils

The changes to use gas sections for constructing the exception vectors
causes a build break when using binutils 2.23:

arch/powerpc/kernel/exceptions-64s.S:770: Error: operand out of range
(0xffffffffffff8100 is not between 0x0000000000000000 and 0x000000000000ffff)

And so on.

Reported by Hugh with binutils-2.23.2-8.1.4.ppc64 from openSUSE 13.1 and
also Naveen & Denis using 2.23.52.0.1-26.el7 from RHEL 7. Strangely
binutils 2.22 (what I test with) is not affected.

This is caused by the use of @l in LOAD_HANDLER(). The @l was only
recently added in commit a24553dd02dc ("powerpc/pseries: Remove
unnecessary syscall trampoline").

Luckily the gas section changes split out the LOAD_SYSCALL_HANDLER()
macro, which means we actually *don't* need to use @l in LOAD_HANDLER()
any more, only in LOAD_SYSCALL_HANDLER().

So drop the @l from LOAD_HANDLER().

Fixes: 57f266497d81 ("powerpc: Use gas sections for arranging exception vectors")
Signed-off-by: Hugh Dickins <hughd@google.com>
[mpe: Add gory details to change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f23ed166 02-Nov-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix system reset interrupt winkle wakeups

Wakeups from winkle set the low bit of the HSPRG0 register, to
distinguish it from other sleep states. This is also the PACA pointer.
The system reset exception handler fails to mask this bit away before
using this value before using it as the PACA pointer.

Fix this by adding a new type of exception prolog macro where we already
have the PACA set in r13, and have the system reset vector mask it out.
The winkle wakeup handler will store the masked value back into HSPRG0.

Fixes: fb479e44a9e2 ("powerpc/64s: relocation, register save fixes for system reset interrupt")
Cc: stable@vger.kernel.org # v3.0+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# fb479e44 12-Oct-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: relocation, register save fixes for system reset interrupt

This patch does a couple of things. First of all, powernv immediately
explodes when running a relocated kernel, because the system reset
exception for handling sleeps does not do correct relocated branches.

Secondly, the sleep handling code trashes the condition and cfar
registers, which we would like to preserve for debugging purposes (for
non-sleep case exception).

This patch changes the exception to use the standard format that saves
registers before any tests or branches are made. It adds the test for
idle-wakeup as an "extra" to break out of the normal exception path.
Then it branches to a relocated idle handler that calls the various
idle handling functions.

After this patch, POWER8 CPU simulator now boots powernv kernel that is
running at non-zero.

Fixes: 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode")
Cc: stable@vger.kernel.org # v3.0+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 57f26649 27-Sep-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Use gas sections for arranging exception vectors

Use assembler sections of fixed size and location to arrange the 64-bit
Book3S exception vector code (64-bit Book3E also uses it in head_64.S
for 0x0..0x100).

This allows better flexibility in arranging exception code and hiding
unimportant details behind macros.

Gas sections can be a bit painful to use this way, mainly because the
assembler does not know where they will be finally linked. Taking
absolute addresses requires a bit of trickery for example, but it can
be hidden behind macros for the most part.

Generated code is mostly the same except locations, offsets, alignments.

The "+ 0x2" is only required for the trap number / kvm exit number,
which gets loaded as a constant into a register.

Previously, code also used + 0x2 for label names, but we changed to
using "H" to distinguish HV case for that. Remove the last vestiges
of that.

__after_prom_start is taking absolute address of a label in another
fixed section. Newer toolchains seemed to compile this okay, but older
ones do not. FIXED_SYMBOL_ABS_ADDR is more foolproof, it just takes an
additional line to define.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# da2bc464 30-Sep-2016 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Add new exception vector macros

Create arch/powerpc/include/asm/head-64.h with macros that specify
an exception vector (name, type, location), which will be used to
label and lay out exceptions into the object file.

Naming is moved out of exception-64s.h, which is used to specify the
implementation of exception handlers.

objdump of generated code in exception vectors is unchanged except for
names. Alignment directives scattered around are annoying, but done
this way so that disassembly can verify identical instruction
generation before and after patch. These get cleaned up in future
patch.

We change the way KVMTEST works, explicitly passing EXC_HV or EXC_STD
rather than overloading the trap number. This removes the need to have
SOFTEN values for the overloaded trap numbers, eg. 0x502.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a24553dd 12-Sep-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/pseries: Remove unnecessary syscall trampoline

When we originally added the ability to split the exception vectors from
the kernel (commit 1f6a93e4c35e ("powerpc: Make it possible to move the
interrupt handlers away from the kernel" 2008-09-15)), the LOAD_HANDLER() macro
used an addi instruction to compute the offset of the common handler
from the kernel base address.

Using addi meant the handler had to be within 32K of the kernel base
address, due to the addi instruction taking a signed immediate value.
That necessitated creating a trampoline for the system call handler,
because system_call_common (in entry64.S) is not linked within 32K of
the kernel base address.

Later in commit 61e2390ede3c ("powerpc: Make load_hander handle upto 64k
offset" 2012-11-15) we changed LOAD_HANDLER to take a 64K offset, by
changing it to use ori.

Although system_call_common is not in head_64.S or exceptions-64s.S, it
is included in head-y, which causes it to be linked early in the kernel
text, so in practice it ends up below 64K. Additionally if it can't be
placed below 64K the linker will fail to build with a "relocation
truncated to fit" error.

So remove the trampoline.

Newer toolchains are able to work out that the ori in LOAD_HANDLER only
takes a 16 bit offset, and so they generate a 16 bit relocation. Older
toolchains (binutils 2.22 at least) are not so smart, so we have to add
the @l annotation to tell the assembler to generate a 16 bit relocation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d8d42b05 25-Jul-2016 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64: Do load of PACAKBASE in LOAD_HANDLER

The LOAD_HANDLER macro requires that you have previously loaded "reg"
with PACAKBASE. Although that gives callers flexibility to get PACAKBASE
in some interesting way, none of the callers actually do that. So fold
the load of PACAKBASE into the macro, making it simpler for callers to
use correctly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 27510235 25-Jul-2016 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64: Correct comment on LOAD_HANDLER()

The comment for LOAD_HANDLER() was wrong. The part about kdump has not
been true since 1f6a93e4c35e ("powerpc: Make it possible to move the
interrupt handlers away from the kernel").

Describe how it currently works, and combine the two separate comments
into one.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9baaef0a 08-Jul-2016 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/irq: Add support for HV virtualization interrupts

This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c223c903 17-May-2016 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc32: provide VIRT_CPU_ACCOUNTING

This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>


# 2613265c 16-Dec-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES

The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
location (memory address). However both are always identical, so combine
them to save repeating ourselves.

This does mean an exception handler must always exist at the location in
memory that matches its vector number. But that's OK because this is the
"STD" macro (standard), which does exactly that. We have other macros
for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d6265aea 24-Nov-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD

HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
of our first level exception handlers. It conditionally executes a
HMT_MEDIUM instruction, which sets the processor priority to medium.

On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
All it does is make the exception vectors more cramped, and consume 4
bytes of icache.

On old systems it has the effect of boosting the processor priority at
the start of exception processing. If we were previously in the idle
loop for example, we may be at low or very low priority. This is
desirable as we want to process the exception as fast as possible.

However looking closely at the generated code, we see that in all cases
we execute another HMT_MEDIUM just four instructions later. With code
patching applied, the final code on an old (Power6) system will look
like, eg:

c000000000000300 <data_access_pSeries>:
c000000000000300: 7c 42 13 78 mr r2,r2 <-
c000000000000304: 7d b2 43 a6 mtsprg 2,r13
c000000000000308: 7d b1 42 a6 mfsprg r13,1
c00000000000030c: f9 2d 00 80 std r9,128(r13)
c000000000000310: 60 00 00 00 nop
c000000000000314: 7c 42 13 78 mr r2,r2 <-

So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
not justified by the benefit of boosting the processor priority for the
duration of four instructions, and therefore we drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 31a40e2b 11-Nov-2015 Paul Mackerras <paulus@ozlabs.org>

powerpc/64: Include KVM guest test in all interrupt vectors

Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest. This includes things like program
interrupts.

However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code. If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.

Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.

This means that the distinction between KVMTEST and KVMTEST_PR goes
away. Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST. It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0869b6fd 29-Jul-2014 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Add basic infrastructure to handle HMI in Linux.

Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 9daf112b 15-Jul-2014 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Remove misleading DISABLE_INTS

DISABLE_INTS has a long and storied history, but for some time now it
has not actually disabled interrupts.

For the open-coded exception handlers, just stop using it, instead call
RECONCILE_IRQ_STATE directly. This has the benefit of removing a level
of indirection, and making it clear that r10 & r11 are used at that
point.

For the addition case we still need a macro, so rename it to clarify
what it actually does.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a1d711c5 15-Jul-2014 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Document register clobbering in EXCEPTION_COMMON()

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b1576fec 03-Feb-2014 Anton Blanchard <anton@samba.org>

powerpc: No need to use dot symbols when branching to a function

binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.

Alan tells me that binutils has been doing this for 9 years.

Signed-off-by: Anton Blanchard <anton@samba.org>


# d410ae21 10-Mar-2014 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Fix CFAR clobbering issue in machine check handler.

While checking powersaving mode in machine check handler at 0x200, we
clobber CFAR register. Fix it by saving and restoring it during beq/bgt.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 90ff5d68 15-Dec-2013 Michael Neuling <mikey@neuling.org>

powerpc: Fix bad stack check in exception entry

In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel. If it's not valid, we die but
with a nice oops message.

Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative. Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.

This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.

Kudos to Paulus for finding this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b14a7253 30-Oct-2013 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Split the common exception prolog logic into two section.

This patch splits the common exception prolog logic into three parts to
facilitate reuse of existing code in the next patch. This patch also
re-arranges few instructions in such a way that the second part now deals
with saving register values from paca save area to stack frame, and
the third part deals with saving current register values to stack frame.

The second and third part will be reused in the machine check exception
routine in the subsequent patch.

Please note that this patch does not introduce or change existing code
logic. Instead it is just a code movement and instruction re-ordering.

Patch Acked-by Paul. But made some minor modification (explained above) to
address Paul's comment in the later patch(3).

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# dd96b2c2 07-Oct-2013 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

kvm: powerpc: book3s: Cleanup interrupt handling code

With this patch if HV is included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
in enabling both HV and PR, which we do in later patch

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 7aa79938 07-Oct-2013 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE

With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 4b8473c9 19-Sep-2013 Paul Mackerras <paulus@samba.org>

KVM: PPC: Book3S HV: Add support for guest Program Priority Register

POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads. This priority can be controlled by writing to
the PPR or by use of a set of instructions of the form or rN,rN,rN
which are otherwise no-ops but have been defined to set the priority
to particular levels.

This adds code to context switch the PPR when entering and exiting
guests and to make the PPR value accessible through the SET/GET_ONE_REG
interface. When entering the guest, we set the PPR as late as
possible, because if we are setting a low thread priority it will
make the code run slowly from that point on. Similarly, the
first-level interrupt handlers save the PPR value in the PACA very
early on, and set the thread priority to the medium level, so that
the interrupt handling code runs at a reasonable speed.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>


# bc2e6c6a 12-Aug-2013 Michael Neuling <mikey@neuling.org>

powerpc: Avoid link stack corruption for MMU on exceptions

When we have MMU on exceptions (POWER8) and a relocatable kernel, we
need to branch from the initial exception vectors at 0x0 to up high
where the kernel might be located. Currently we do this using the link
register.

Unfortunately this corrupts the link stack and instead we should use the
count register. We did this for the syscall entry path in:
6a40480 powerpc: Avoid link stack corruption in MMU on syscall entry path
but I stupidly forgot to do the same for other exceptions.

This patch changes the initial exception vectors to use the count
register instead of the link register when we need to branch up to the
relocated kernel.

I have a dodgy userspace test which loops calling a function that reads
the PVR (mfpvr in userspace will be emulated by the kernel via the
program check exception). On POWER8 and with CONFIG_RELOCATABLE=y, I
get a ~10% performance improvement with my userspace test with this
patch.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# de021bb7 15-Jul-2013 Tiejun Chen <tiejun.chen@windriver.com>

powerpc/ppc64: Rename SOFT_DISABLE_INTS with RECONCILE_IRQ_STATE

The SOFT_DISABLE_INTS seems an odd name for something that updates the
software state to be consistent with interrupts being hard disabled, so
rename SOFT_DISABLE_INTS with RECONCILE_IRQ_STATE to avoid this confusion.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c9f69518 25-Jun-2013 Michael Ellerman <michael@ellerman.id.au>

powerpc: Remove KVMTEST from RELON exception handlers

KVMTEST is a macro which checks whether we are taking an exception from
guest context, if so we branch out of line and eventually call into the
KVM code to handle the switch.

When running real guests on bare metal (HV KVM) the hardware ensures
that we never take a relocation on exception when transitioning from
guest to host. For PR KVM we disable relocation on exceptions ourself in
kvmppc_core_init_vm(), as of commit a413f47 "Disable relocation on
exceptions whenever PR KVM is active".

So convert all the RELON macros to use NOTEST, and drop the remaining
KVM_HANDLER() definitions we have for 0xe40 and 0xe80.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.9+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 0e37739b 13-Jun-2013 Michael Ellerman <michael@ellerman.id.au>

powerpc: Fix stack overflow crash in resume_kernel when ftracing

It's possible for us to crash when running with ftrace enabled, eg:

Bad kernel stack pointer bffffd12 at c00000000000a454
cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
pc: c00000000000a454: resume_kernel+0x34/0x60
lr: c00000000000335c: performance_monitor_common+0x15c/0x180
sp: bffffd12
msr: 8000000000001032
dar: bffffd12
dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

3:mon> t c0000002ecab0000
[c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
[c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
--- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
[c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
[c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
[c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
[c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
[c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
[c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
[c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
[c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
[c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
"powerpc: Rework runlatch code" by myself --BenH
]

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a485c709 25-Apr-2013 Paul Mackerras <paulus@samba.org>

powerpc: Fix "attempt to move .org backwards" error

Building a 64-bit powerpc kernel with PR KVM enabled currently gives
this error:

AS arch/powerpc/kernel/head_64.o
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards
make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1

This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into
33 instructions, but we only have space for 32 at the decrementer
interrupt vector (from 0x900 to 0x980).

In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we
currently have two instances of the HMT_MEDIUM macro, which has the
effect of setting the SMT thread priority to medium. One is the
first instruction, and is overwritten by a no-op on processors where
we save the PPR (processor priority register), that is, POWER7 or
later. The other is after we have saved the PPR.

In order to reduce the code at 0x900 by one instruction, we omit the
first HMT_MEDIUM. On processors without SMT this will have no effect
since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this
will mean that the first few instructions take a little longer in the
case where a decrementer interrupt occurs when the hardware thread is
running at low SMT priority. On POWER6 and later machines, the
hardware automatically boosts the thread priority when a decrementer
interrupt is taken if the thread priority was below medium, so this
change won't make any difference.

The alternative would be to branch out of line after saving the CFAR.
However, that would incur an extra overhead on all processors, whereas
the approach adopted here only adds overhead on older threaded processors.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 0acb9111 04-Feb-2013 Paul Mackerras <paulus@samba.org>

powerpc/kvm/book3s_hv: Preserve guest CFAR register value

The CFAR (Come-From Address Register) is a useful debugging aid that
exists on POWER7 processors. Currently HV KVM doesn't save or restore
the CFAR register for guest vcpus, making the CFAR of limited use in
guests.

This adds the necessary code to capture the CFAR value saved in the
early exception entry code (it has to be saved before any branch is
executed), save it in the vcpu.arch struct, and restore it on entry
to the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1707dd16 04-Feb-2013 Paul Mackerras <paulus@samba.org>

powerpc: Save CFAR before branching in interrupt entry paths

Some of the interrupt vectors on 64-bit POWER server processors are
only 32 bytes long, which is not enough for the full first-level
interrupt handler. For these we currently just have a branch to an
out-of-line handler. However, this means that we corrupt the CFAR
(come-from address register) on POWER7 and later processors.

To fix this, we split the EXCEPTION_PROLOG_1 macro into two pieces:
EXCEPTION_PROLOG_0 contains the part up to the point where the CFAR
is saved in the PACA, and EXCEPTION_PROLOG_1 contains the rest. We
then put EXCEPTION_PROLOG_0 in the short interrupt vectors before
we branch to the out-of-line handler, which contains the rest of the
first-level interrupt handler. To facilitate this, we define new
_OOL (out of line) variants of STD_EXCEPTION_PSERIES, etc.

In order to get EXCEPTION_PROLOG_0 to be short enough, i.e., no more
than 6 instructions, it was necessary to move the stores that move
the PPR and CFAR values into the PACA into __EXCEPTION_PROLOG_1 and
to get rid of one of the two HMT_MEDIUM instructions. Previously
there was a HMT_MEDIUM_PPR_DISCARD before the prolog, which was
nop'd out on processors with the PPR (POWER7 and later), and then
another HMT_MEDIUM inside the HMT_MEDIUM_PPR_SAVE macro call inside
__EXCEPTION_PROLOG_1, which was nop'd out on processors without PPR.
Now the HMT_MEDIUM inside EXCEPTION_PROLOG_0 is there unconditionally
and the HMT_MEDIUM_PPR_DISCARD is not strictly necessary, although
this leaves it in for the interrupt vectors where there is room for
it.

Previously we had a handler for hypervisor maintenance interrupts at
0xe50, which doesn't leave enough room for the vector for hypervisor
emulation assist interrupts at 0xe40, since we need 8 instructions.
The 0xe50 vector was only used on POWER6, as the HMI vector was moved
to 0xe60 on POWER7. Since we don't support running in hypervisor mode
on POWER6, we just remove the handler at 0xe50.

This also changes denorm_exception_hv to use EXCEPTION_PROLOG_0
instead of open-coding it, and removes the HMT_MEDIUM_PPR_DISCARD
from the relocation-on vectors (since any CPU that supports
relocation-on interrupts also has the PPR).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 44e9309f 06-Dec-2012 Haren Myneni <haren@linux.vnet.ibm.com>

powerpc: Implement PPR save/restore

[PATCH 6/6] powerpc: Implement PPR save/restore

When the task enters in to kernel space, the user defined priority (PPR)
will be saved in to PACA at the beginning of first level exception
vector and then copy from PACA to thread_info in second level vector.
PPR will be restored from thread_info before exits the kernel space.

P7/P8 temporarily raises the thread priority to higher level during
exception until the program executes HMT_* calls. But it will not modify
PPR register. So we save PPR value whenever some register is available
to use and then calls HMT_MEDIUM to increase the priority. This feature
supports on P7 or later processors.

We save/ restore PPR for all exception vectors except system call entry.
GLIBC will be saving / restore for system calls. So the default PPR
value (3) will be set for the system call exit when the task returned
to the user space.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 13e7a8e8 06-Dec-2012 Haren Myneni <haren@linux.vnet.ibm.com>

powerpc: Macros for saving/restore PPR

[PATCH 5/6] powerpc: Macros for saving/restore PPR

Several macros are defined for saving and restore user defined PPR value.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a09688cd 06-Dec-2012 Haren Myneni <haren@linux.vnet.ibm.com>

powerpc: Increase exceptions arrays in paca struct to save PPR

[PATCH 3/6] powerpc: Increase exceptions arrays in paca struct to save PPR

Using paca to save user defined PPR value in the first level exception vector.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 5d75b264 06-Dec-2012 Haren Myneni <haren@linux.vnet.ibm.com>

powerpc: Move branch instruction from ACCOUNT_CPU_USER_ENTRY to caller

[PATCH 1/6] powerpc: Move branch instruction from ACCOUNT_CPU_USER_ENTRY to caller

The first instruction in ACCOUNT_CPU_USER_ENTRY is 'beq' which checks for
exceptions coming from kernel mode. PPR value will be saved immediately after
ACCOUNT_CPU_USER_ENTRY and is also for user level exceptions. So moved this
branch instruction in the caller code.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1dbdafec 14-Nov-2012 Ian Munsie <imunsie@au1.ibm.com>

powerpc: Add book3s privileged doorbell exception vectors

Directed Privileged Doorbell Interrupts come in at 0xa00 (or
0xc000000000004a00 if relocation on exception is enabled), so add
exception vectors at these locations.

If doorbell support is not compiled in we handle it as an
unknown_exception.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 655bb3f4 14-Nov-2012 Ian Munsie <imunsie@au1.ibm.com>

powerpc: Add book3s hypervisor doorbell exception vectors

Directed Hypervisor Doorbell Interrupts come in at 0xe80 (or
0xc000000000004e80 if relocation on exceptions is enabled), so add
exception vectors at these locations.

If doorbell support is not compiled in we handle it as an
unknown_exception.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c1fb6816 02-Nov-2012 Michael Neuling <mikey@neuling.org>

powerpc: Add relocation on exception vector handlers

POWER8/v2.07 allows exceptions to be taken with the MMU still on.

A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.

The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on. Examples of this are when we can't trust the current MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already. In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.

This uses the new macros added previously too implement these new execption
vectors at 0xc000_0000_0000_4xxx. We exit these exception vectors using
mflr/blr (rather than mtspr SSR0/RFID), since we don't need the costly MMU
switch anymore.

This moves the __end_interrupts marker down past these new 0x4000 vectors since
they will need to be copied down to 0x0 when the kernel is not at 0x0.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4700dfaf 02-Nov-2012 Michael Neuling <mikey@neuling.org>

powerpc: Add new macros needed for relocation on exceptions

POWER8/v2.07 allows exceptions to be taken with the MMU still on.

A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW
takes us here, MSR IR/DR will be set already and we no longer need a costly
RFID to turn the MMU back on again.

The original 0x0 based exception vectors remain for when the HW can't leave the
MMU on. Examples of this are when we can't trust the current the MMU mappings,
like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU
was off already. In these cases the HW will take us to the original 0x0 based
exception vectors with the MMU off as before.

The below macros are copies of the macros used at the 0x0 offset but modified
to handle the MMU being on. In these macros we use the link register to jump
to the secondary handlers rather than using RFID (RFID was also use to turn on
the MMU).

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 61e2390e 04-Nov-2012 Michael Neuling <mikey@neuling.org>

powerpc: Make load_hander handle upto 64k offset

If we change load_hander() to use an ori instead of addi, we can load handlers
upto 64k away provided we are still 64k aligned.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 9778b696 04-Jul-2012 Stuart Yoder <stuart.yoder@freescale.com>

powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a3512b2d 07-May-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/irq: Make alignment & program interrupt behave the same

Alignment was the last user of the ENABLE_INTS macro, which we can
now remove. All non-syscall exceptions now disable interrupts on
entry, they get re-enabled conditionally from C code. Don't
unconditionally re-enable in program check either, check the
original context.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 7230c564 06-Mar-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Rework lazy-interrupt handling

The current implementation of lazy interrupts handling has some
issues that this tries to address.

We don't do the various workarounds we need to do when re-enabling
interrupts in some cases such as when returning from an interrupt
and thus we may still lose or get delayed decrementer or doorbell
interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

Additionally, we tend to keep interrupts hard disabled in a number
of cases, such as decrementer interrupts, external interrupts, or
when a masked decrementer interrupt is pending. This is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling from the ground up.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field.

We then implement replaying of the missed interrupts either by
re-using the existing exception frame (in exception exit case) or via
the creation of a new one from an assembly trampoline (in the
arch_local_irq_enable case).

This removes the need to play with the decrementer to try to create
fake interrupts, among others.

In addition, this adds a few refinements:

- We no longer hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

- Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

- On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter). (This also fixes a bug where BookE
perfmon interrupts could clobber r14 ... oops)

- We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
to retrigger an interrupt without preventing hard-enable

v3:

- Fix or vs. ori bug on Book3E
- Fix enabling of interrupts for some exceptions on Book3E

v4:

- Fix resend of doorbells on return from interrupt on Book3E

v5:

- Rebased on top of my latest series, which involves some significant
rework of some aspects of the patch.

v6:
- 32-bit compile fix
- more compile fixes with various .config combos
- factor out the asm code to soft-disable interrupts
- remove the C wrapper around preempt_schedule_irq

v7:
- Fix a bug with hard irq state tracking on native power7


# d9ada91a 01-Mar-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Replace mfmsr instructions with load from PACA kernel_msr field

On 64-bit, the mfmsr instruction can be quite slow, slower
than loading a field from the cache-hot PACA, which happens
to already contain the value we want in most cases.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1b701179 29-Feb-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Improve behaviour of irq tracing on 64-bit exception entry

Some exceptions would unconditionally disable interrupts on entry,
which is fine, but calling lockdep every time not only adds more
overhead than strictly needed, but also means we get quite a few
"redudant" disable logged, which makes it hard to spot the really
bad ones.

So instead, split the macro used by the exception code into a
normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is
enabled, and make the later skip th tracing if interrupts were
already disabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# fe1952fc 29-Feb-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Rework runlatch code

This moves the inlines into system.h and changes the runlatch
code to use the thread local flags (non-atomic) rather than
the TIF flags (atomic) to keep track of the latch state.

The code to turn it back on in an asynchronous interrupt is
now simplified and partially inlined.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 7450f6f0 29-Feb-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Use the same interrupt prolog for perfmon as other interrupts

The perfmon interrupt is the sole user of a special variant of the
interrupt prolog which differs from the one used by external and timer
interrupts in that it saves the non-volatile GPRs and doesn't turn the
runlatch on.

The former is unnecessary and the later is arguably incorrect, so
let's clean that up by using the same prolog. While at it we rename
that prolog to use the _ASYNC prefix.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4f8cf36f 27-Feb-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Remove legacy iSeries bits from assembly files

This removes the various bits of assembly in the kernel entry,
exception handling and SLB management code that were specific
to running under the legacy iSeries hypervisor which is no
longer supported.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 9e368f29 28-Jun-2011 Paul Mackerras <paulus@samba.org>

KVM: PPC: book3s_hv: Add support for PPC970-family processors

This adds support for running KVM guests in supervisor mode on those
PPC970 processors that have a usable hypervisor mode. Unfortunately,
Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
1), but the YDL PowerStation does have a usable hypervisor mode.

There are several differences between the PPC970 and POWER7 in how
guests are managed. These differences are accommodated using the
CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
bits. Notably, on PPC970:

* The LPCR, LPID or RMOR registers don't exist, and the functions of
those registers are provided by bits in HID4 and one bit in HID0.

* External interrupts can be directed to the hypervisor, but unlike
POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
SRR0/1 not HSRR0/1.

* There is no virtual RMA (VRMA) mode; the guest must use an RMO
(real mode offset) area.

* The TLB entries are not tagged with the LPID, so it is necessary to
flush the whole TLB on partition switch. Furthermore, when switching
partitions we have to ensure that no other CPU is executing the tlbie
or tlbsync instructions in either the old or the new partition,
otherwise undefined behaviour can occur.

* The PMU has 8 counters (PMC registers) rather than 6.

* The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.

* The SLB has 64 entries rather than 32.

* There is no mediated external interrupt facility, so if we switch to
a guest that has a virtual external interrupt pending but the guest
has MSR[EE] = 0, we have to arrange to have an interrupt pending for
it so that we can get control back once it re-enables interrupts. We
do that by sending ourselves an IPI with smp_send_reschedule after
hard-disabling interrupts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>


# de56a948 28-Jun-2011 Paul Mackerras <paulus@samba.org>

KVM: PPC: Add support for Book3S processors in hypervisor mode

This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 3c42bf8a 28-Jun-2011 Paul Mackerras <paulus@samba.org>

KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu

There are several fields in struct kvmppc_book3s_shadow_vcpu that
temporarily store bits of host state while a guest is running,
rather than anything relating to the particular guest or vcpu.
This splits them out into a new kvmppc_host_state structure and
modifies the definitions in asm-offsets.c to suit.

On 32-bit, we have a kvmppc_host_state structure inside the
kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
to get to them both with one pointer. On 64-bit they are separate
fields in the PACA. This means that on 64-bit we don't need to
copy the kvmppc_host_state in and out on vcpu load/unload, and
in future will mean that the book3s_hv code doesn't need a
shadow_vcpu struct in the PACA at all. That does mean that we
have to be careful not to rely on any values persisting in the
hstate field of the paca across any point where we could block
or get preempted.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>


# b01c8b54 28-Jun-2011 Paul Mackerras <paulus@samba.org>

powerpc, KVM: Rework KVM checks in first-level interrupt handlers

Instead of branching out-of-line with the DO_KVM macro to check if we
are in a KVM guest at the time of an interrupt, this moves the KVM
check inline in the first-level interrupt handlers. This speeds up
the non-KVM case and makes sure that none of the interrupt handlers
are missing the check.

Because the first-level interrupt handlers are now larger, some things
had to be move out of line in exceptions-64s.S.

This all necessitated some minor changes to the interrupt entry code
in KVM. This also streamlines the book3s_32 KVM test.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 48404f2e 01-May-2011 Paul Mackerras <paulus@samba.org>

powerpc: Save Come-From Address Register (CFAR) in exception frame

Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
Address Register" (CFAR), that records the address of the most recent
branch or rfid (return from interrupt) instruction for debugging purposes.

This saves the value of the CFAR in the exception entry code and stores
it in the exception frame. We also make xmon print the CFAR value in
its register dump code.

Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
field, which is only used for system calls, and use it for the CFAR value
for all exceptions/interrupts other than system calls. This means we
don't save the CFAR on system calls, which is not a great problem since
system calls tend not to happen unexpectedly, and also avoids adding the
overhead of reading the CFAR to the system call entry path.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1977b502 01-May-2011 Paul Mackerras <paulus@samba.org>

powerpc: Save register r9-r13 values accurately on interrupt with bad stack

When we take an interrupt or exception from kernel mode and the stack
pointer is obviously not a kernel address (i.e. the top bit is 0), we
switch to an emergency stack, save register values and panic. However,
on 64-bit server machines, we don't actually save the values of r9 - r13
at the time of the interrupt, but rather values corrupted by the
exception entry code for r12-r13, and nothing at all for r9-r11.

This fixes it by passing a pointer to the register save area in the paca
through to the bad_stack code in r3. The register values are saved in
one of the paca register save areas (depending on which exception this
is). Using the pointer in r3, the bad_stack code now retrieves the
saved values of r9 - r13 and stores them in the exception frame on the
emergency stack. This also stores the normal exception frame marker
("regshere") in the exception frame.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 673b189a 04-Apr-2011 Paul Mackerras <paulus@samba.org>

powerpc: Always use SPRN_SPRG_HSCRATCH0 when running in HV mode

This uses feature sections to arrange that we always use HSPRG1
as the scratch register in the interrupt entry code rather than
SPRG2 when we're running in hypervisor mode on POWER7. This will
ensure that we don't trash the guest's SPRG2 when we are running
KVM guests. To simplify the code, we define GET_SCRATCH0() and
SET_SCRATCH0() macros like the GET_PACA/SET_PACA macros.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b3e6b5df 04-Apr-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: More work to support HV exceptions

Rework exception macros a bit to split offset from vector and add
some basic support for HDEC, HDSI, HISI and a few more.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a5d4f3ad 04-Apr-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Base support for exceptions using HSRR0/1

Pass the register type to the prolog, also provides alternate "HV"
version of hardware interrupt (0x500) and adjust LPES accordingly

We tag those interrupts by setting bit 0x2 in the trap number

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 2dd60d79 19-Jan-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: In HV mode, use HSPRG0 for PACA

When running in Hypervisor mode (arch 2.06 or later), we store the PACA
in HSPRG0 instead of SPRG1. The architecture specifies that SPRGs may be
lost during a "nap" power management operation (though they aren't
currently on POWER7) and this enables use of SPRG1 by KVM guests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# cf9efce0 26-Aug-2010 Paul Mackerras <paulus@samba.org>

powerpc: Account time using timebase rather than PURR

Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times. This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish. The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in. Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log. We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called). So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval. This avoids having to
read the SPURR on every kernel entry and exit. On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 842f2fed 29-Oct-2009 Alexander Graf <agraf@suse.de>

Make head_64.S aware of KVM real mode code

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c5a8c0c9 16-Jul-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Remove use of a second scratch SPRG in STAB code

The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
save a GPR in order to decide whether to go to do_stab_bolted_* or
to handle a normal data access exception.

This prevents our scheme of freeing SPRG3 which is user visible for
user uses since we cannot use SPRG0 which, on RS/64, seems to be
read-only for supervisor mode (like POWER4).

This reworks the STAB exception entry to use the PACA as temporary
storage instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ee43eb78 14-Jul-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Use names rather than numbers for SPRGs (v2)

The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.

We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.

This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.

The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 8aa34ab8 14-Jul-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Rename exception.h to exception-64s.h

The file include/asm/exception.h contains definitions
that are specific to exception handling on 64-bit server
type processors.

This renames the file to exception-64s.h to reflect that
fact and avoid confusion.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>