History log of /linux-master/arch/powerpc/kernel/traps.c
Revision Date Author Comments
# 5352090a 18-May-2022 Daniel Axtens <dja@axtens.net>

powerpc/kasan: Don't instrument non-maskable or raw interrupts

Disable address sanitization for raw and non-maskable interrupt
handlers, because they can run in real mode, where we cannot access
the shadow memory. (Note that kasan_arch_is_ready() doesn't test for
real mode, since it is a static branch for speed, and in any case not
all the entry points to the generic KASAN code are protected by
kasan_arch_is_ready guards.)

The changes to interrupt_nmi_enter/exit_prepare() look larger than
they actually are. The changes are equivalent to adding
!IS_ENABLED(CONFIG_KASAN) to the conditions for calling nmi_enter() or
nmi_exit() in real mode. That is, the code is equivalent to using the
following condition for calling nmi_enter/exit:

if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
!firmware_has_feature(FW_FEATURE_LPAR) ||
radix_enabled()) &&
!IS_ENABLED(CONFIG_KASAN) ||
(mfmsr() & MSR_DR))

That unwieldy condition has been split into several statements with
comments, for easier reading.

The nmi_ipi_lock functions that call atomic functions (i.e.,
nmi_ipi_lock_start(), nmi_ipi_lock() and nmi_ipi_unlock()), besides
being marked noinstr, now call arch_atomic_* functions instead of
atomic_* functions because with KASAN enabled, the atomic_* functions
are wrappers which explicitly do address sanitization on their
arguments. Since we are trying to avoid address sanitization, we have
to use the lower-level arch_atomic_* versions.

In hv_nmi_check_nonrecoverable(), the regs_set_unrecoverable() call
has been open-coded so as to avoid having to either trust the inlining
or mark regs_set_unrecoverable() as noinstr.

[paulus@ozlabs.org: combined a few work-in-progress commits of
Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTFGaKM8Pd46PIK@cleo


# 0e25498f 28-Jun-2021 Eric W. Biederman <ebiederm@xmission.com>

exit: Add and use make_task_dead.

There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# f08fb25b 04-Oct-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix unrecoverable MCE calling async handler from NMI

The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.

This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover, so it calls
machine_check_exception directly so something might be done about it.

Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
arch/powerpc/include/asm/interrupt.h:168
BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));

Fix this by making an _async version of the handler which is called
in the normal case, and a NMI version that is called for unrecoverable
interrupts.

Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-6-npiggin@gmail.com


# d0afd44c 04-Oct-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/traps: do not enable irqs in _exception

_exception can be called by machine check handlers when the MCE hits
user code (e.g., pseries and powernv). This will enable local irqs
because, which is a dicey thing to do in NMI or hard irq context.

This seemed to worked out okay because a userspace MCE can basically be
treated like a synchronous interrupt (after async / imprecise MCEs are
filtered out). Since NMI and hard irq handlers have started growing
nmi_enter / irq_enter, and more irq state sanity checks, this has
started to cause problems (or at least trigger warnings).

The Fixes tag to the commit which introduced this rather than try to
work out exactly which commit was the first that could possibly cause a
problem because that may be difficult to prove.

Fixes: 9f2f79e3a3c1 ("powerpc: Disable interrupts in 64-bit kernel FP and vector faults")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-3-npiggin@gmail.com


# 8b097881 07-Sep-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

trap: cleanup trap_init()

There are some empty trap_init() definitions in different ARCHs, Introduce
a new weak trap_init() function to clean them up.

Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm32]
Acked-by: Vineet Gupta [arc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <palmerdabbelt@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 806c0e6e 23-Aug-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Refactor verification of MSR_RI

40x and BOOKE don't have MSR_RI therefore all tests involving
MSR_RI may be problematic on those plateforms.

Create helpers to check or set MSR_RI in regs, and use them
in common code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2fb93708196734f4176dda334aaa3055f213b89.1629707037.git.christophe.leroy@csgroup.eu


# 4f8e78c0 06-Aug-2021 Xiongwei Song <sxwjean@gmail.com>

powerpc: Add esr as a synonym for pt_regs.dsisr

Create an anonymous union for dsisr and esr regsiters, we can reference
esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dsisr. This makes code more clear.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Reword commit title]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-2-sxwjean@me.com


# 1e688dd2 13-Apr-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

unsigned long test(struct pt_regs *regs)
{
int ret;

WARN_ON(regs->msr & MSR_PR);

return regs->gpr[3];
}

unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}

Before the patch:

000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr

3bc: 0f e0 00 00 twui r0,0
3c0: 80 63 00 0c lwz r3,12(r3)
3c4: 4e 80 00 20 blr

0000000000000bf0 <.test9w>:
bf0: 7c 89 00 74 cntlzd r9,r4
bf4: 79 29 d1 82 rldicl r9,r9,58,6
bf8: 0b 09 00 00 tdnei r9,0
bfc: 2c 24 00 00 cmpdi r4,0
c00: 41 82 00 0c beq c0c <.test9w+0x1c>
c04: 7c 63 23 92 divdu r3,r3,r4
c08: 4e 80 00 20 blr

c0c: 38 60 00 00 li r3,0
c10: 4e 80 00 20 blr

After the patch:

000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr

3bc: 0f e0 00 00 twui r0,0

0000000000000c50 <.test9w>:
c50: 7c 89 00 74 cntlzd r9,r4
c54: 79 29 d1 82 rldicl r9,r9,58,6
c58: 0b 09 00 00 tdnei r9,0
c5c: 7c 63 23 92 divdu r3,r3,r4
c60: 4e 80 00 20 blr

c70: 38 60 00 00 li r3,0
c74: 4e 80 00 20 blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu


# dbf77fed 12-Aug-2021 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

powerpc: rename powerpc_debugfs_root to arch_debugfs_dir

No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com


# 93d102f0 15-Jul-2021 John Ogness <john.ogness@linutronix.de>

printk: remove safe buffers

With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console and console_owner locks is left
in place. This is because the console and console_owner locks are
needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de


# 01fcac8e 10-Aug-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/interrupt: Do not call single_step_exception() from other exceptions

single_step_exception() is called by emulate_single_step() which
is called from (at least) alignment exception() handler and
program_check_exception() handler.

Redefine it as a regular __single_step_exception() which is called
by both single_step_exception() handler and emulate_single_step()
function.

Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aed174f5cbc06f2cf95233c071d8aac948e46043.1628611921.git.christophe.leroy@csgroup.eu


# 59dc5bfc 17-Jun-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: avoid reloading (H)SRR registers if they are still valid

When an interrupt is taken, the SRR registers are set to return to where
it left off. Unless they are modified in the meantime, or the return
address or MSR are modified, there is no need to reload these registers
when returning from interrupt.

Introduce per-CPU flags that track the validity of SRR and HSRR
registers. These are cleared when returning from interrupt, when
using the registers for something else (e.g., OPAL calls), when
adjusting the return address or MSR of a context, and when context
switching (which changes the return address and MSR).

This improves the performance of interrupt returns.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fixup patch from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com


# deefd0ae 20-May-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/traps: Start using PPC_RAW_xx() macros

Start using PPC_RAW_xx() macros where relevant.

PPC_INST_SYNC is used to both represent the 'sync' instruction and
the family of synchronisation instructions. Keep it for the later,
maybe we'll change the name in the future to avoid confusion.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0945c155d6cb113431185fc1296ac127359fe29b.1621506159.git.christophe.leroy@csgroup.eu


# 7153d4bf 14-Apr-2021 Xiongwei Song <sxwjean@gmail.com>

powerpc/traps: Enhance readability for trap types

Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.

Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com


# 8dc7f022 16-Mar-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: remove partial register save logic

All subarchitectures always save all GPRs to pt_regs interrupt frames
now. Remove FULL_REGS and associated bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-11-npiggin@gmail.com


# 3db8aa10 16-Mar-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64e/interrupt: NMI save irq soft-mask state in C

64e non-maskable interrupts save the state of the irq soft-mask in
asm. This can be done in C in interrupt wrappers as 64s does.

I haven't been able to test this with qemu because it doesn't seem
to cause FSL bookE WDT interrupts.

This makes WatchdogException an NMI interrupt, which affects 32-bit
as well (okay, or create a new handler?)

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-6-npiggin@gmail.com


# bad956b8 10-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/uaccess: Remove __get/put_user_inatomic()

Powerpc is the only architecture having _inatomic variants of
__get_user() and __put_user() accessors. They were introduced
by commit e68c825bb016 ("[POWERPC] Add inatomic versions of __get_user
and __put_user").

Those variants expand to the _nosleep macros instead of expanding
to the _nocheck macros. The only difference between the _nocheck
and the _nosleep macros is the call to might_fault().

Since commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
pagefault_disable()"), __get/put_user() can be used in atomic parts
of the code, therefore __get/put_user_inatomic() have become useless.

Remove __get_user_inatomic() and __put_user_inatomic().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e5c895669e8d54a7810b62dc61eb111f33c2c37.1615398265.git.christophe.leroy@csgroup.eu


# e448e1e7 14-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/math: Fix missing __user qualifier for get_user() and other sparse warnings

Sparse reports the following problems:

arch/powerpc/math-emu/math.c:228:21: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:31: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:41: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:228:51: warning: Using plain integer as NULL pointer
arch/powerpc/math-emu/math.c:237:13: warning: incorrect type in initializer (different address spaces)
arch/powerpc/math-emu/math.c:237:13: expected unsigned int [noderef] __user *_gu_addr
arch/powerpc/math-emu/math.c:237:13: got unsigned int [usertype] *
arch/powerpc/math-emu/math.c:226:1: warning: symbol 'do_mathemu' was not declared. Should it be static?

Add missing __user qualifier when casting pointer used in get_user()

Use NULL instead of 0 to initialise opX local variables.

Add a prototype for do_mathemu() (Added in processor.h like sparc)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e4d1aae7604d89c98a52dfd8ce8443462e595670.1615809591.git.christophe.leroy@csgroup.eu


# 57472306 11-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/32: Remove ksp_limit

ksp_limit is there to help detect stack overflows.
That is specific to ppc32 as it was removed from ppc64 in
commit cbc9565ee826 ("powerpc: Remove ksp_limit on ppc64").

There are other means for detecting stack overflows.

As ppc64 has proven to not need it, ppc32 should be able to do
without it too.

Lets remove it and simplify exception handling.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d789c3385b22e07bedc997613c0d26074cb513e7.1615552866.git.christophe.leroy@csgroup.eu


# a58cbed6 11-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/traps: Declare unrecoverable_exception() as __noreturn

unrecoverable_exception() is never expected to return, most callers
have an infiniteloop in case it returns.

Ensure it really never returns by terminating it with a BUG(), and
declare it __no_return.

It always GCC to really simplify functions calling it. In the exemple
below, it avoids the stack frame in the likely fast path and avoids
code duplication for the exit.

With this patch:

00000348 <interrupt_exit_kernel_prepare>:
348: 81 43 00 84 lwz r10,132(r3)
34c: 71 48 00 02 andi. r8,r10,2
350: 41 82 00 2c beq 37c <interrupt_exit_kernel_prepare+0x34>
354: 71 4a 40 00 andi. r10,r10,16384
358: 40 82 00 20 bne 378 <interrupt_exit_kernel_prepare+0x30>
35c: 80 62 00 70 lwz r3,112(r2)
360: 74 63 00 01 andis. r3,r3,1
364: 40 82 00 28 bne 38c <interrupt_exit_kernel_prepare+0x44>
368: 7d 40 00 a6 mfmsr r10
36c: 7c 11 13 a6 mtspr 81,r0
370: 7c 12 13 a6 mtspr 82,r0
374: 4e 80 00 20 blr
378: 48 00 00 00 b 378 <interrupt_exit_kernel_prepare+0x30>
37c: 94 21 ff f0 stwu r1,-16(r1)
380: 7c 08 02 a6 mflr r0
384: 90 01 00 14 stw r0,20(r1)
388: 48 00 00 01 bl 388 <interrupt_exit_kernel_prepare+0x40>
388: R_PPC_REL24 unrecoverable_exception
38c: 38 e2 00 70 addi r7,r2,112
390: 3d 00 00 01 lis r8,1
394: 7c c0 38 28 lwarx r6,0,r7
398: 7c c6 40 78 andc r6,r6,r8
39c: 7c c0 39 2d stwcx. r6,0,r7
3a0: 40 a2 ff f4 bne 394 <interrupt_exit_kernel_prepare+0x4c>
3a4: 38 60 00 01 li r3,1
3a8: 4b ff ff c0 b 368 <interrupt_exit_kernel_prepare+0x20>

Without this patch:

00000348 <interrupt_exit_kernel_prepare>:
348: 94 21 ff f0 stwu r1,-16(r1)
34c: 93 e1 00 0c stw r31,12(r1)
350: 7c 7f 1b 78 mr r31,r3
354: 81 23 00 84 lwz r9,132(r3)
358: 71 2a 00 02 andi. r10,r9,2
35c: 41 82 00 34 beq 390 <interrupt_exit_kernel_prepare+0x48>
360: 71 29 40 00 andi. r9,r9,16384
364: 40 82 00 28 bne 38c <interrupt_exit_kernel_prepare+0x44>
368: 80 62 00 70 lwz r3,112(r2)
36c: 74 63 00 01 andis. r3,r3,1
370: 40 82 00 3c bne 3ac <interrupt_exit_kernel_prepare+0x64>
374: 7d 20 00 a6 mfmsr r9
378: 7c 11 13 a6 mtspr 81,r0
37c: 7c 12 13 a6 mtspr 82,r0
380: 83 e1 00 0c lwz r31,12(r1)
384: 38 21 00 10 addi r1,r1,16
388: 4e 80 00 20 blr
38c: 48 00 00 00 b 38c <interrupt_exit_kernel_prepare+0x44>
390: 7c 08 02 a6 mflr r0
394: 90 01 00 14 stw r0,20(r1)
398: 48 00 00 01 bl 398 <interrupt_exit_kernel_prepare+0x50>
398: R_PPC_REL24 unrecoverable_exception
39c: 80 01 00 14 lwz r0,20(r1)
3a0: 81 3f 00 84 lwz r9,132(r31)
3a4: 7c 08 03 a6 mtlr r0
3a8: 4b ff ff b8 b 360 <interrupt_exit_kernel_prepare+0x18>
3ac: 39 02 00 70 addi r8,r2,112
3b0: 3d 40 00 01 lis r10,1
3b4: 7c e0 40 28 lwarx r7,0,r8
3b8: 7c e7 50 78 andc r7,r7,r10
3bc: 7c e0 41 2d stwcx. r7,0,r8
3c0: 40 a2 ff f4 bne 3b4 <interrupt_exit_kernel_prepare+0x6c>
3c4: 38 60 00 01 li r3,1
3c8: 4b ff ff ac b 374 <interrupt_exit_kernel_prepare+0x2c>

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1e883e9d93fdb256853d1434c8ad77c257349b2d.1615552866.git.christophe.leroy@csgroup.eu


# 1a0e4550 03-Mar-2021 Zhang Yunkai <zhang.yunkai@zte.com.cn>

powerpc: Remove duplicate includes

asm/tm.h included in traps.c is duplicated. It is also included on
the 62nd line.

asm/udbg.h included in setup-common.c is duplicated. It is also
included on the 61st line.

asm/bug.h included in arch/powerpc/include/asm/book3s/64/mmu-hash.h
is duplicated. It is also included on the 12th line.

asm/tlbflush.h included in arch/powerpc/include/asm/pgtable.h is
duplicated. It is also included on the 11th line.

asm/page.h included in arch/powerpc/include/asm/thread_info.h is
duplicated. It is also included on the 13th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
[mpe: Squash together from multiple commits]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5c4a4802 24-Feb-2021 Bhaskar Chowdhury <unixbhaskar@gmail.com>

powerpc: Fix spelling of "droping" to "dropping" in traps.c

s/droping/dropping/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210224075547.763063-1-unixbhaskar@gmail.com


# 0b736881 08-Mar-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/traps: unrecoverable_exception() is not an interrupt handler

unrecoverable_exception() is called from interrupt handlers or
after an interrupt handler has failed.

Make it a standard function to avoid doubling the actions
performed on interrupt entry (e.g.: user time accounting).

Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ae96c59fa2cb7f24a8929c58cfa2c909cb8ff1f1.1615291471.git.christophe.leroy@csgroup.eu


# e4bb64c7 10-Feb-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: remove interrupt handler functions from the noinstr section

The allyesconfig ppc64 kernel fails to link with relocations unable to
fit after commit 3a96570ffceb ("powerpc: convert interrupt handlers to
use wrappers"), which is due to the interrupt handler functions being
put into the .noinstr.text section, which the linker script places on
the opposite side of the main .text section from the interrupt entry
asm code which calls the handlers.

This results in a lot of linker stubs that overwhelm the 252-byte sized
space we allow for them, or in the case of BE a .opd relocation link
error for some reason.

It's not required to put interrupt handlers in the .noinstr section,
previously they used NOKPROBE_SYMBOL, so take them out and replace
with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit
NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes
a number of interrupt handlers nokprobe that were not prior to the
interrupt wrappers commit, but since that commit they were made
nokprobe due to being in .noinstr.text, so this fix does not change
that.

The fixes tag is different to the commit that first exposes the problem
because it is where the wrapper macros were introduced.

Fixes: 8d41fc618ab8 ("powerpc: interrupt handler wrapper functions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Slightly fix up comment wording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com


# 118178e6 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: move NMI entry/exit code into wrapper

This moves the common NMI entry and exit code into the interrupt handler
wrappers.

This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and
also MCE interrupts on 64e, by adding missing parts of the NMI entry to
them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-40-npiggin@gmail.com


# 1b1b6a6f 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: handle irq_enter/irq_exit in interrupt handler wrappers

Move irq_enter/irq_exit into asynchronous interrupt handler wrappers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-35-npiggin@gmail.com


# 540d4d34 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/64: context tracking move to interrupt wrappers

This moves exception_enter/exit calls to wrapper functions for
synchronous interrupts. More interrupt handlers are covered by
this than previously.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com


# e6f8a6c8 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: add interrupt_cond_local_irq_enable helper

Simple helper for synchronous interrupt handlers (i.e., process-context)
to enable interrupts if it was taken in an interrupts-enabled context.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-30-npiggin@gmail.com


# 3a96570f 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: convert interrupt handlers to use wrappers

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com


# fd3f1e0f 07-Feb-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/traps: factor common code from program check and emulation assist

Move the program check handling into a function called by both, rather
than have the emulation assist handler call the program check handler.

This allows each of these handlers to be implemented with "interrupt
wrappers" in a later change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612702475.d6qyt6qtfy.astroid@bobo.none


# 11cb0a25 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: improve handling of unrecoverable system reset

If an unrecoverable system reset hits in process context, the system
does not have to panic. Similar to machine check, call nmi_exit()
before die().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-26-npiggin@gmail.com


# c538938f 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/mce: ensure machine check handler always tests RI

A machine check that is handled must still check MSR[RI] for
recoverability of the interrupted context. Without this patch
it's possible for a handled machine check to return to a
context where it has clobbered live registers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-25-npiggin@gmail.com


# 209e9d50 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: introduce die_mce

As explained by commit daf00ae71dad ("powerpc/traps: restore
recoverability of machine_check interrupts"), die() can't be called from
within nmi_enter to nicely kill a process context that was interrupted.
nmi_exit must be called first.

This adds a function die_mce which takes care of this for machine check
handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-24-npiggin@gmail.com


# 6c6aee00 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: add and use unknown_async_exception

This is currently the same as unknown_exception, but it will diverge
after interrupt wrappers are added and code moved out of asm into the
wrappers (e.g., async handlers will check FINISH_NAP).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-22-npiggin@gmail.com


# 156b5371 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/perf: move perf irq/nmi handling details into traps.c

This is required in order to allow more significant differences between
NMI type interrupt handlers and regular asynchronous handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-20-npiggin@gmail.com


# 3a313883 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/traps: add NOKPROBE_SYMBOL for sreset and mce

These NMIs could fire any time including inside kprobe code, so
exclude them from kprobes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-19-npiggin@gmail.com


# 8458c628 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: bad_page_fault get registers from regs

Similar to the previous patch this makes interrupt handler function
types more regular so they can be wrapped with the next patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-12-npiggin@gmail.com


# 755d6641 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc: DebugException remove args

Like other interrupt handler conversions, switch to getting registers
from the pt_regs argument.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-10-npiggin@gmail.com


# b4ced803 30-Jan-2021 Nicholas Piggin <npiggin@gmail.com>

powerpc/fsl_booke/32: CacheLockingException remove args

Like other interrupt handler conversions, switch to getting registers
from the pt_regs argument.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-8-npiggin@gmail.com


# 39c8bf2b 16-Nov-2020 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Retire e200 core (mpc555x processor)

There is no defconfig selecting CONFIG_E200, and no platform.

e200 is an earlier version of booke, a predecessor of e500,
with some particularities like an unified cache instead of both an
instruction cache and a data cache.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu


# 48a8ab4e 26-Nov-2020 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.

Now that kernel correctly store/restore userspace AMR/IAMR values, avoid
manipulating AMR and IAMR from the kernel on behalf of userspace.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201127044424.40686-15-aneesh.kumar@linux.ibm.com


# b6254ced 18-Aug-2020 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc/signal: Don't manage floating point regs when no FPU

There is no point in copying floating point regs when there
is no FPU and MATH_EMULATION is not selected.

Create a new CONFIG_PPC_FPU_REGS bool that is selected by
CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to
opt out everything related to fp_state in thread_struct.

The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU
as fpu.S build is conditionnal to CONFIG_PPC_FPU.

The following app spends approx 8.1 seconds system time on an 8xx
without the patch, and 7.0 seconds with the patch (13.5% reduction).

On an 832x, it spends approx 2.6 seconds system time without
the patch and 2.1 seconds with the patch (19% reduction).

void sigusr1(int sig) { }

int main(int argc, char **argv)
{
int i = 100000;

signal(SIGUSR1, sigusr1);
for (;i--;)
raise(SIGUSR1);
exit(0);
}

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu


# 1da4a027 12-Oct-2020 Michael Neuling <mikey@neuling.org>

powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation

__get_user_atomic_128_aligned() stores to kaddr using stvx which is a
VMX store instruction, hence kaddr must be 16 byte aligned otherwise
the store won't occur as expected.

Unfortunately when we call __get_user_atomic_128_aligned() in
p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
guaranteed to be 16B aligned. This means that the write to vbuf in
__get_user_atomic_128_aligned() has the bottom bits of the address
truncated. This results in other local variables being
overwritten. Also vbuf will not contain the correct data which results
in the userspace emulation being wrong and hence undetected user data
corruption.

In the past we've been mostly lucky as vbuf has ended up aligned but
this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
particular can change the stack arrangement enough that our luck runs
out.

This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal.

The fix is to align vbuf to a 16 byte boundary.

Fixes: 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org


# 8b14e1df 29-Sep-2020 Christophe Leroy <christophe.leroy@csgroup.eu>

powerpc: Remove support for PowerPC 601

PowerPC 601 has been retired.

Remove all associated specific code.

CPU_FTRS_PPC601 has CPU_FTR_COHERENT_ICACHE and CPU_FTR_COMMON.

CPU_FTR_COMMON is already present via other CPU_FTRS.
None of the remaining CPU selects CPU_FTR_COHERENT_ICACHE.

So CPU_FTRS_PPC601 can be removed from the possible features,
hence can be removed completely.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/60b725d55e21beec3335175c20b77903ff98284f.1601362098.git.christophe.leroy@csgroup.eu


# 69eeff02 24-Jul-2020 Michael Ellerman <mpe@ellerman.id.au>

powerpc/32s: Remove TAUException wart in traps.c

All 32 and 64-bit builds that don't have CONFIG_TAU_INT enabled (all
of them), get a definition of TAUException() in traps.c.

On 64-bit it's completely useless, and just wastes ~120 bytes of text.
On 32-bit it allows the kernel to link because head_32.S calls it
unconditionally.

Instead follow the example of altivec_assist_exception(), and if
CONFIG_TAU_INT is not enabled just point it at unknown_exception using
the preprocessor.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200724131728.1643966-6-mpe@ellerman.id.au


# e31cf2f4 08-Jun-2020 Mike Rapoport <rppt@kernel.org>

mm: don't include asm/pgtable.h if linux/mm.h is already included

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once. For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g. pte_alloc() and
pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

for f in $(git grep -l "include <linux/mm.h>") ; do
sed -i -e '/include <asm\/pgtable.h>/ d' $f
done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 69ea03b5 19-Feb-2020 Peter Zijlstra <peterz@infradead.org>

hardirq/nmi: Allow nested nmi_enter()

Since there are already a number of sites (ARM64, PowerPC) that effectively
nest nmi_enter(), make the primitive support this before adding even more.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lkml.kernel.org/r/20200505134100.864179229@linutronix.de


# 9409d2f9 05-May-2020 Jordan Niethe <jniethe5@gmail.com>

powerpc: Support prefixed instructions in alignment handler

If a prefixed instruction results in an alignment exception, the
SRR1_PREFIXED bit is set. The handler attempts to emulate the
responsible instruction and then increment the NIP past it. Use
SRR1_PREFIXED to determine by how much the NIP should be incremented.

Prefixed instructions are not permitted to cross 64-byte boundaries. If
they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set.
If this occurs send a SIGBUS to the offending process if in user mode.
If in kernel mode call bad_page_fault().

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Link: https://lore.kernel.org/r/20200506034050.24806-29-jniethe5@gmail.com


# 2aa6195e 05-May-2020 Alistair Popple <alistair@popple.id.au>

powerpc: Enable Prefixed Instructions

Prefix instructions have their own FSCR bit which needs to enabled via
a CPU feature. The kernel will save the FSCR for problem state but it
needs to be enabled initially.

If prefixed instructions are made unavailable by the [H]FSCR, attempting
to use them will cause a facility unavailable exception. Add "PREFIX" to
the facility_strings[].

Currently there are no prefixed instructions that are actually emulated
by emulate_instruction() within facility_unavailable_exception().
However, when caused by a prefixed instructions the SRR1 PREFIXED bit is
set. Prepare for dealing with emulated prefixed instructions by checking
for this bit.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20200506034050.24806-22-jniethe5@gmail.com


# 265d6e58 07-May-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/traps: Make unrecoverable NMIs die instead of panic

System Reset and Machine Check interrupts that are not recoverable due
to being nested or interrupting when RI=0 currently panic. This is not
necessary, and can often just kill the current context and recover.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200508043408.886394-16-npiggin@gmail.com


# bbbc8032 07-May-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/traps: Do not trace system reset

Similarly to the previous patch, do not trace system reset. This code
is used when there is a crash or hang, and tracing disturbs the system
more and has been known to crash in the crash handling path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20200508043408.886394-15-npiggin@gmail.com


# 116ac378 07-May-2020 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: machine check interrupt update NMI accounting

machine_check_early() is taken as an NMI, so nmi_enter() is used
there. machine_check_exception() is no longer taken as an NMI (it's
invoked via irq_work in the case a machine check hits in kernel mode),
so remove the nmi_enter() from that case.

In NMI context, hash faults don't try to refill the hash table, which
can lead to crashes accessing non-pinned kernel pages. System reset
still has this potential problem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop change in show_regs() which breaks Book3E]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200508043408.886394-12-npiggin@gmail.com


# 860286cf 09-Feb-2020 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

powerpc/kernel: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200209105901.1620958-1-gregkh@linuxfoundation.org


# 3978eb78 21-Dec-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/32: Add early stack overflow detection with VMAP stack.

To avoid recursive faults, stack overflow detection has to be
performed before writing in the stack in exception prologs.

Do it by checking the alignment. If the stack pointer alignment is
wrong, it means it is pointing to the following or preceding page.

Without VMAP stack, a stack overflow is catastrophic. With VMAP
stack, a stack overflow isn't destructive, so don't panic. Kill
the task with SIGSEGV instead.

A dedicated overflow stack is set up for each CPU.

lkdtm: Performing direct entry EXHAUST_STACK
lkdtm: Calling function with 512 frame size to depth 32 ...
lkdtm: loop 32/32 ...
lkdtm: loop 31/32 ...
lkdtm: loop 30/32 ...
lkdtm: loop 29/32 ...
lkdtm: loop 28/32 ...
lkdtm: loop 27/32 ...
lkdtm: loop 26/32 ...
lkdtm: loop 25/32 ...
lkdtm: loop 24/32 ...
lkdtm: loop 23/32 ...
lkdtm: loop 22/32 ...
lkdtm: loop 21/32 ...
lkdtm: loop 20/32 ...
Kernel stack overflow in process test[359], r1=c900c008
Oops: Kernel stack overflow, sig: 6 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in:
CPU: 0 PID: 359 Comm: test Not tainted 5.3.0-rc7+ #2225
NIP: c0622060 LR: c0626710 CTR: 00000000
REGS: c0895f48 TRAP: 0000 Not tainted (5.3.0-rc7+)
MSR: 00001032 <ME,IR,DR,RI> CR: 28004224 XER: 00000000
GPR00: c0626ca4 c900c008 c783c000 c07335cc c900c010 c07335cc c900c0f0 c07335cc
GPR08: c900c0f0 00000001 00000000 00000000 28008222 00000000 00000000 00000000
GPR16: 00000000 00000000 10010128 10010000 b799c245 10010158 c07335cc 00000025
GPR24: c0690000 c08b91d4 c068f688 00000020 c900c0f0 c068f668 c08b95b4 c08b91d4
NIP [c0622060] format_decode+0x0/0x4d4
LR [c0626710] vsnprintf+0x80/0x5fc
Call Trace:
[c900c068] [c0626ca4] vscnprintf+0x18/0x48
[c900c078] [c007b944] vprintk_store+0x40/0x214
[c900c0b8] [c007bf50] vprintk_emit+0x90/0x1dc
[c900c0e8] [c007c5cc] printk+0x50/0x60
[c900c128] [c03da5b0] recursive_loop+0x44/0x6c
[c900c338] [c03da5c4] recursive_loop+0x58/0x6c
[c900c548] [c03da5c4] recursive_loop+0x58/0x6c
[c900c758] [c03da5c4] recursive_loop+0x58/0x6c
[c900c968] [c03da5c4] recursive_loop+0x58/0x6c
[c900cb78] [c03da5c4] recursive_loop+0x58/0x6c
[c900cd88] [c03da5c4] recursive_loop+0x58/0x6c
[c900cf98] [c03da5c4] recursive_loop+0x58/0x6c
[c900d1a8] [c03da5c4] recursive_loop+0x58/0x6c
[c900d3b8] [c03da5c4] recursive_loop+0x58/0x6c
[c900d5c8] [c03da5c4] recursive_loop+0x58/0x6c
[c900d7d8] [c03da5c4] recursive_loop+0x58/0x6c
[c900d9e8] [c03da5c4] recursive_loop+0x58/0x6c
[c900dbf8] [c03da5c4] recursive_loop+0x58/0x6c
[c900de08] [c03da67c] lkdtm_EXHAUST_STACK+0x30/0x4c
[c900de18] [c03da3e8] direct_entry+0xc8/0x140
[c900de48] [c029fb40] full_proxy_write+0x64/0xcc
[c900de68] [c01500f8] __vfs_write+0x30/0x1d0
[c900dee8] [c0152cb8] vfs_write+0xb8/0x1d4
[c900df08] [c0152f7c] ksys_write+0x58/0xe8
[c900df38] [c0014208] ret_from_syscall+0x0/0x34
--- interrupt: c01 at 0xf806664
LR = 0x1000c868
Instruction dump:
4bffff91 80010014 7c832378 7c0803a6 38210010 4e800020 3d20c08a 3ca0c089
8089a0cc 38a58f0c 38600001 4ba2d494 <9421ffe0> 7c0802a6 bfc10018 7c9f2378

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1b89c121b4070c7ee99e4f22cc178f15a736b07b.1576916812.git.christophe.leroy@c-s.fr


# d7e02f7b 11-Jul-2019 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

powerpc/book3s/mm: Update Oops message to print the correct translation in use

Avoids confusion when printing Oops message like below

Faulting instruction address: 0xc00000000008bdb4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV

This was because we never clear the MMU_FTR_HPTE_TABLE feature flag
even if we run with radix translation. It was discussed that we should
look at this feature flag as an indication of the capability to run
hash translation and we should not clear the flag even if we run in
radix translation. All the code paths check for radix_enabled() check and
if found true consider we are running with radix translation. Follow the
same sequence for finding the MMU translation string to be used in Oops
message.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711145814.17970-1-aneesh.kumar@linux.ibm.com


# e7ca44ed 04-Sep-2019 Ganesh Goudar <ganeshgr@linux.ibm.com>

powerpc: dump kernel log before carrying out fadump or kdump

Since commit 4388c9b3a6ee ("powerpc: Do not send system reset request
through the oops path"), pstore dmesg file is not updated when dump is
triggered from HMC. This commit modified system reset (sreset) handler
to invoke fadump or kdump (if configured), without pushing dmesg to
pstore. This leaves pstore to have old dmesg data which won't be much
of a help if kdump fails to capture the dump. This patch fixes that by
calling kmsg_dump() before heading to fadump ot kdump.

Fixes: 4388c9b3a6ee ("powerpc: Do not send system reset request through the oops path")
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 2e1661d2 23-May-2019 Eric W. Biederman <ebiederm@xmission.com>

signal: Remove the task parameter from force_sig_fault

As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.

The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.

The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:

force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# de6da1e8 17-May-2019 Feng Tang <feng.tang@intel.com>

panic: add an option to replay all the printk message in buffer

Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ef429124 29-Apr-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/fsl_booke: ensure SPEFloatingPointException() reenables interrupts

SPEFloatingPointException() is the only exception handler which 'forgets' to
re-enable interrupts. This patch makes sure it does.

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# de3c83c2 12-Mar-2019 Mathieu Malaterre <malat@debian.org>

powerpc/64s: Include <asm/nmi.h> header file to fix a warning

Make sure to include <asm/nmi.h> to provide the following prototype:
hv_nmi_check_nonrecoverable.

Remove the following warning treated as error (W=1):

arch/powerpc/kernel/traps.c:393:6: error: no previous prototype for 'hv_nmi_check_nonrecoverable'

Fixes: ccd477028a20 ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# bd3524fe 01-Mar-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix unrelocated interrupt trampoline address test

The recent commit got this test wrong, it declared the assembler
symbols the wrong way, and also used the wrong symbol name
(xxx_start rather than start_xxx, see asm/head-64.h).

Fixes: ccd477028a ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# cbf2ba95 26-Feb-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: system reset interrupt preserve HSRRs

Code that uses HSRR registers is not required to clear MSR[RI] by
convention, however the system reset NMI itself may use HSRR
registers (e.g., to call OPAL) and clobber them.

Rather than introduce the requirement to clear RI in order to use
HSRRs, have system reset interrupt save and restore HSRRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ccd47702 26-Feb-2019 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix HV NMI vs HV interrupt recoverability test

HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.

This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.

Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0bbea75c 22-Jan-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: fix recoverability of machine check handling on book3s/32

Looks like book3s/32 doesn't set RI on machine check, so
checking RI before calling die() will always be fatal
allthought this is not an issue in most cases.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9bf3d3c4 29-Jan-2019 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: Fix the message printed when stack overflows

Today's message is useless:

[ 42.253267] Kernel stack overflow in process (ptrval), r1=c65500b0

This patch fixes it:

[ 66.905235] Kernel stack overflow in process sh[356], r1=c65560b0

Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Use task_pid_nr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 16842516 10-Jan-2019 Michael Ellerman <mpe@ellerman.id.au>

powerpc/64s: Add MMU type to __die() output

On Power9 machines (64-bit Book3S), we can be running with either the
Hash table or Radix tree MMU enabled. So add some text to the __die()
output to tell us which is enabled, for the case where all you have is
the oops output and no other information.

Example output:

kernel BUG at drivers/misc/lkdtm/bugs.c:63!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: kvm vmx_crypto binfmt_misc ip_tables x_tables

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 18405139 10-Jan-2019 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Show PAGE_SIZE in __die() output

The page size the kernel is built with is useful info when debugging a
crash, so add it to the output in __die().

Result looks like eg:

kernel BUG at drivers/misc/lkdtm/bugs.c:63!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: vmx_crypto kvm binfmt_misc ip_tables

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 78227443 10-Jan-2019 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Stop using pr_cont() in __die()

Using pr_cont() risks having our output interleaved with other output
from other CPUs. Instead print everything in a single printk() call.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 96d4f267 03-Jan-2019 Linus Torvalds <torvalds@linux-foundation.org>

Remove 'type' argument from access_ok() function

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

- csky still had the old "verify_area()" name as an alias.

- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)

- microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 11be3958 26-Nov-2018 Breno Leitao <leitao@debian.org>

powerpc/tm: Print scratch value

Usually a TM Bad Thing exception is raised due to three different problems.
a) touching SPRs in an active transaction; b) using TM instruction with the
facility disabled and c) setting a wrong MSR/SRR1 at RFID.

The two initial cases are easy to identify by looking at the instructions.
The latter case is harder, because the MSR is masked after RFID, so, it is
very useful to look at the previous MSR (SRR1) before RFID as also the
current and masked MSR.

Since MSR is saved at paca just before RFID, this patch prints it if a TM
Bad thing happen, helping to understand what is the invalid TM transition
that is causing the exception.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# daf00ae7 13-Oct-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: restore recoverability of machine_check interrupts

commit b96672dd840f ("powerpc: Machine check interrupt is a non-
maskable interrupt") added a call to nmi_enter() at the beginning of
machine check restart exception handler. Due to that, in_interrupt()
always returns true regardless of the state before entering the
exception, and die() panics even when the system was not already in
interrupt.

This patch calls nmi_exit() before calling die() in order to restore
the interrupt state we had before calling nmi_enter()

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# bd03fd84 15-Oct-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: remove redundant in_interrupt panic in die()

do_exit() already includes a test to panic() is in_interrupt()

This patch removes powerpc one which is redundant.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 422123cc 15-Oct-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: fix machine check handlers to use pr_cont()

When printing the machine check cause, the cause appears on the
following line due to bad use of printk without \n:

[ 33.663993] Machine check in kernel mode.
[ 33.664011] Caused by (from SRR1=9032):
[ 33.664036] Data access error at address c90c8000

This patch fixes it by using pr_cont() for the second part:

[ 133.258131] Machine check in kernel mode.
[ 133.258146] Caused by (from SRR1=9032): Data access error at address c90c8000

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8a03e81c 26-Sep-2018 Michal Suchanek <msuchanek@suse.de>

powerpc/64s: consolidate MCE counter increment.

The code in machine_check_exception excludes 64s hvmode when
incrementing the MCE counter only to call opal_machine_check to
increment it specifically for this case.

Remove the exclusion and special case.

Fixes: a43c1590426c ("powerpc/pseries: Flush SLB contents on SLB MCE
errors.")

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 51303113 07-Aug-2018 Breno Leitao <leitao@debian.org>

powerpc/tm: Print 64-bits MSR

On a kernel TM Bad thing program exception, the Machine State Register
(MSR) is not being properly displayed. The exception code dumps a 32-bits
value but MSR is a 64 bits register for all platforms that have HTM
enabled.

This patch dumps the MSR value as a 64-bits value instead of 32 bits. In
order to do so, the 'reason' variable could not be used, since it trimmed
MSR to 32-bits (int).

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 51423a9c 25-Sep-2018 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception()

PPC32 uses nonrecoverable_exception() while PPC64 uses
unrecoverable_exception().

Both functions are doing almost the same thing.

This patch removes nonrecoverable_exception()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 77c70728 18-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Simplify _exception_pkey by using force_sig_pkuerr

Call force_sig_pkuerr directly instead of rolling it by hand
in _exception_pkey.

Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 5d8fb8a5 18-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Specialize _exception_pkey for handling pkey exceptions

Now that _exception no longer calls _exception_pkey it is no longer
necessary to handle any signal with any si_code. All pkey exceptions
are SIGSEGV with paired with SEGV_PKUERR. So just handle
that case and remove the now unnecessary parameters from _exception_pkey.

Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# c1c7c85c 18-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Call force_sig_fault from _exception

The callers of _exception don't need the pkey exception logic because
they are not processing a pkey exception. So just call exception_common
directly and then call force_sig_fault to generate the appropriate siginfo
and deliver the appropriate signal.

Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 2c44ce28 18-Sep-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Factor the common exception code into exception_common

It is brittle and wrong to populate si_pkey when there was not a pkey
exception. The field does not exist for all si_codes and in some
cases another field exists in the same memory location.

So factor out the code that all exceptions handlers must run
into exception_common, leaving the individual exception handlers
to generate the signals themselves.

Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# efc463ad 16-Apr-2018 Eric W. Biederman <ebiederm@xmission.com>

signal: Simplify tracehook_report_syscall_exit

Replace user_single_step_siginfo with user_single_step_report
that allocates siginfo structure on the stack and sends it.

This allows tracehook_report_syscall_exit to become a simple
if statement that calls user_single_step_report or ptrace_report_syscall
depending on the value of step.

Update the default helper function now called user_single_step_report
to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0.
The default helper has always been doing this (using memset) but it
was far from obvious.

The powerpc helper can now just call force_sig_fault.
The x86 helper can now just call send_sigtrap.

Unfortunately the default implementation of user_single_step_report
can not use force_sig_fault as it does not use a SIGTRAP si_code.
So it has to carefully setup the siginfo and use use force_sig_info.

The net result is code that is easier to understand and simpler
to maintain.

Ref: 85ec7fd9f8e5 ("ptrace: introduce user_single_step_siginfo() helper")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 96695563 18-Jun-2018 Breno Leitao <leitao@debian.org>

powerpc/tm: Fix HTM documentation

This patch simply fix part of the documentation on the HTM code.

This fixes reference to old fields that were renamed in commit
000ec280e3dd ("powerpc: tm: Rename transct_(*) to ck(\1)_state")

It also documents better the flow after commit eb5c3f1c8647 ("powerpc:
Always save/restore checkpointed regs during treclaim/trecheckpoint"),
where tm_recheckpoint can recheckpoint what is in ck{fp,vr}_state
blindly.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 997dd26c 15-Aug-2018 Michael Ellerman <mpe@ellerman.id.au>

powerpc/traps: Avoid rate limit messages from show unhandled signals

In the recent commit to add an explicit ratelimit state when showing
unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an
explicit ratelimit state for show_signal_msg()"), I put the check of
show_unhandled_signals and the ratelimit state before the call to
unhandled_signal() so as to avoid unnecessarily calling the latter
when show_unhandled_signals is false.

However that causes us to check the ratelimit state on every call, so
if we take a lot of *handled* signals that has the effect of making
the ratelimit code print warnings that callbacks have been suppressed
when they haven't.

So rearrange the code so that we check show_unhandled_signals first,
then call unhandled_signal() and finally check the ratelimit state.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>


# a99b9c5e 01-Aug-2018 Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

powerpc/traps: Show instructions on exceptions

Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump
instructions at faulty location, useful to debugging.

Before this patch, an unhandled signal message looked like:

pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]

After this patch, it looks like:

pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0f642d61 01-Aug-2018 Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

powerpc/traps: Print VMA for unhandled signals

This adds VMA address in the message printed for unhandled signals,
similarly to what other architectures, like x86, print.

Before this patch, a page fault looked like:

pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2

After this patch, a page fault looks like:

pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000]

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 49d8f201 01-Aug-2018 Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

powerpc/traps: Use %lx format in show_signal_msg()

Use %lx format to print registers. This avoids having two different
formats and avoids checking for MSR_64BIT, improving readability of the
function.

Even though we could have used %px, which is functionally equivalent to %lx
as per Documentation/core-api/printk-formats.rst, it is not semantically
correct because the data printed are not pointers. And using %px requires
casting data to (void *).

Besides that, %lx matches the format used in show_regs().

Before this patch:

pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2

After this patch:

pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 35a52a10 01-Aug-2018 Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

powerpc/traps: Use an explicit ratelimit state for show_signal_msg()

Replace printk_ratelimited() by printk() and a default rate limit
burst to limit displaying unhandled signals messages.

This will allow us to call print_vma_addr() in a future patch, which
does not work with printk_ratelimited().

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 658b0f92 01-Aug-2018 Murilo Opsfelder Araujo <muriloo@linux.ibm.com>

powerpc/traps: Print unhandled signals in a separate function

Isolate the logic of printing unhandled signals out of _exception_pkey().
No functional change, only code rearrangement.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e821fa42 17-Apr-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Replace TRAP_FIXME with TRAP_UNK

Using an si_code of 0 that aliases with SI_USER is clearly the wrong
thing todo, and causes problems in interesting ways.

For use in unknown_exception the recently defined TRAP_UNK
semantically is a perfect fit. For use in RunModeException it looks
like something more specific than TRAP_UNK could be used. No one has
bothered to find a better fit than the broken si_code of 0 in all of
these years and I don't see an obvious better fit so TRAP_UNK is
switching RunModeException to return TRAP_UNK is clearly an
improvement.

Recent history suggests no actually cares about crazy corner
cases of the kernel behavior like this so I don't expect any
regressions from changing this. However if something does
happen this change is easy to revert.

Though I wonder if SIGKILL might not be a better fit.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# aeb1c0f6 17-Apr-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Replace FPE_FIXME with FPE_FLTUNK

Using an si_code of 0 that aliases with SI_USER is clearly the
wrong thing todo, and causes problems in interesting ways.

The newly defined FPE_FLTUNK semantically appears to fit the
bill so use it instead.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 3eb0f519 17-Apr-2018 Eric W. Biederman <ebiederm@xmission.com>

signal: Ensure every siginfo we send has all bits initialized

Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.

Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.

The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.

In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.

Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 709b973c 29-Mar-2018 Anshuman Khandual <khandual@linux.vnet.ibm.com>

powerpc/fscr: Enable interrupts earlier before calling get_user()

The function get_user() can sleep while trying to fetch instruction
from user address space and causes the following warning from the
scheduler.

BUG: sleeping function called from invalid context

Though interrupts get enabled back but it happens bit later after
get_user() is called. This change moves enabling these interrupts
earlier covering the function get_user(). While at this, lets check
for kernel mode and crash as this interrupt should not have been
triggered from the kernel context.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0e524e76 26-Mar-2018 Matt Evans <matt@ozlabs.org>

powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP

When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the
user context when single_step_exception() prepares the SIGTRAP
delivery. The resulting branch-trap-within-the-SIGTRAP-handler
isn't healthy.

Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by
replacing an MSR mask operation of ~(MSR_SE | MSR_BE) with a call
to clear_single_step() which only clears MSR_SE.

This patch adds a new helper, clear_br_trace(), which clears the
debug trap before invoking the signal handler. This helper is a
NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d40b6768 26-Mar-2018 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: sreset panic if there is no debugger or crash dump handlers

system_reset_exception does most of its own crash handling now,
invoking the debugger or crash dumps if they are registered. If not,
then it goes through to die() to print stack traces, and then is
supposed to panic (according to comments).

However after die() prints oopses, it does its own handling which
doesn't allow system_reset_exception to panic (e.g., it may just
kill the current process). This patch causes sreset exceptions to
return from die after it prints messages but before acting.

This also stops die from invoking the debugger on 0x100 crashes.
system_reset_exception similarly calls the debugger. It had been
thought this was harmless (because if the debugger was disabled,
neither call would fire, and if it was enabled the first call
would return). However in some cases like xmon 'X' command, the
debugger returns 0, which currently causes it to be entered
again (first in system_reset_exception, then in die), which is
confusing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f384796c 26-Mar-2018 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

powerpc/mm: Add support for handling > 512TB address in SLB miss

For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.

The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 47355040 16-Jan-2018 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap

signal_code is always TRAP_HWBKPT

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 35adacd6 23-Dec-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/pseries, ps3: panic flush kernel messages before halting system

Platforms with a panic handler that halts the system can have problems
getting kernel messages out, because the panic notifiers are called
before kernel/panic.c does its flushing of printk buffers an console
etc.

This was attempted to be solved with commit a3b2cb30f252 ("powerpc: Do
not call ppc_md.panic in fadump panic notifier"), but that wasn't the
right approach and caused other problems, and was reverted by commit
ab9dbf771ff9.

Instead, the powernv shutdown paths have already had a similar
problem, fixed by taking the message flushing sequence from
kernel/panic.c. That's a little bit ugly, but while we have the code
duplicated, it will work for this case as well. So have ppc panic
handlers do the same flushing before they terminate.

Without this patch, a qemu pseries_le_defconfig guest stops silently
when issued the nmi command when xmon is off and no crash dumpers
enabled. Afterwards, an oops is printed by each CPU as expected.

Fixes: ab9dbf771ff9 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 57ad583f 11-Jan-2017 Russell Currey <ruscur@russell.cc>

powerpc: Use octal numbers for file permissions

Symbolic macros are unintuitive and hard to read, whereas octal constants
are much easier to interpret. Replace macros for the basic permission
flags (user/group/other read/write/execute) with numeric constants
instead, across the whole powerpc tree.

Introducing a significant number of changes across the tree for no runtime
benefit isn't exactly desirable, but so long as these macros are still
used in the tree people will keep sending patches that add them. Not only
are they hard to parse at a glance, there are multiple ways of coming to
the same value (as you can see with 0444 and 0644 in this patch) which
hurts readability.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c5cc1f4d 18-Jan-2018 Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>

powerpc/ptrace: Add memory protection key regset

The AMR/IAMR/UAMOR are part of the program context.
Allow it to be accessed via ptrace and through core files.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 99cd1302 18-Jan-2018 Ram Pai <linuxram@us.ibm.com>

powerpc: Deliver SEGV signal on pkey violation

The value of the pkey, whose protection got violated,
is made available in si_pkey field of the siginfo structure.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4552d128 23-Dec-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc: System reset avoid interleaving oops using die synchronisation

The die() oops path contains a serializing lock to prevent oops
messages from being interleaved. In the case of a system reset
initiated oops (e.g., qemu nmi command), __die was being called
which lacks that synchronisation and oops reports could be
interleaved across CPUs.

A recent patch 4388c9b3a6ee7 ("powerpc: Do not send system reset
request through the oops path") changed this to __die to avoid
the debugger() call, but there is no real harm to calling it twice
if the first time fell through. So go back to using die() here.
This was observed to fix the problem.

Fixes: 4388c9b3a6ee7 ("powerpc: Do not send system reset request through the oops path")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2271db20 11-Jan-2018 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Use the TRAP macro whenever comparing a trap number

Trap numbers can have extra bits at the bottom that need to
be filtered out. There are a few cases where we don't do that.

It's possible that we got lucky but better safe than sorry.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# cf4674c4 19-Aug-2017 Eric W. Biederman <ebiederm@xmission.com>

signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP

Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER. Posix and common sense requires
that SI_USER not be a signal specific si_code. As such this use of 0
for the si_code is a pretty horribly broken ABI.

Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design. Making this a very
flakey implementation.

Utilizing FPE_FIXME and TRAP_FIXME, siginfo_layout() will now return
SIL_FAULT and the appropriate fields will be reliably copied.

Possible ABI fixes includee:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen
Cc: Paul Mackerras <paulus@samba.org>
Cc: Kumar Gala <kumar.gala@freescale.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Ref: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx")
Ref: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 6f700d38 01-Nov-2017 Cyril Bur <cyrilbur@gmail.com>

powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailable

After handling a transactional FP, Altivec or VSX unavailable exception.
The return to userspace code will detect that the TIF_RESTORE_TM bit is
set and call restore_tm_state(). restore_tm_state() will call
restore_math() to ensure that the correct facilities are loaded.

This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm()
is doing pointless work and can simply be removed.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# eb5c3f1c 01-Nov-2017 Cyril Bur <cyrilbur@gmail.com>

powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 91381b9c 01-Nov-2017 Cyril Bur <cyrilbur@gmail.com>

powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

This patch is a minimal fix for ease of backporting. A more correct fix
which removes the msr parameter to tm_reclaim() and tm_recheckpoint()
altogether has been upstreamed to apply on top of this patch.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to
store live registers")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a7771176 01-Nov-2017 Cyril Bur <cyrilbur@gmail.com>

powerpc: Don't enable FP/Altivec if not checkpointed

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

Lazy save and restore of FP/Altivec cannot be done if a process is
transactional. If a facility was enabled it must remain enabled whenever
a thread is transactional.

Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use") ensures that the facilities are always
enabled if a thread is transactional. A bug in the introduced code may
cause it to inadvertently enable a facility that was (and should remain)
disabled. The problem with this extraneous enablement is that the
registers for the erroneously enabled facility have not been correctly
recheckpointed - the recheckpointing code assumed the facility would
remain disabled.

Further compounding the issue, the transactional {fp,altivec,vsx}
unavailable code has been incorrectly using the MSR to enable
facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply
means if the registers are live on the CPU, not if the kernel should
load them before returning to userspace. This has worked due to the bug
mentioned above.

This causes transactional threads which return to their failure handler
to observe incorrect checkpointed registers. Perhaps an example will
help illustrate the problem:

A userspace process is running and uses both FP and Altivec registers.
This process then continues to run for some time without touching
either sets of registers. The kernel subsequently disables the
facilities as part of lazy save and restore. The userspace process then
performs a tbegin and the CPU checkpoints 'junk' FP and Altivec
registers. The process then performs a floating point instruction
triggering a fp unavailable exception in the kernel.

The kernel then loads the FP registers - and only the FP registers.
Since the thread is transactional it must perform a reclaim and
recheckpoint to ensure both the checkpointed registers and the
transactional registers are correct. It then (correctly) enables
MSR[FP] for the process. Later (on exception exist) the kernel also
(inadvertently) enables MSR[VEC]. The process is then returned to
userspace.

Since the act of loading the FP registers doomed the transaction we know
CPU will fail the transaction, restore its checkpointed registers, and
return the process to its failure handler. The problem is that we're
now running with Altivec enabled and the 'junk' checkpointed registers
are restored. The kernel had only recheckpointed FP.

This patch solves this by only activating FP/Altivec if userspace was
using them when it entered the kernel and not simply if the process is
transactional.

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware
transactional memory in use")

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 632f0574 11-Oct-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/tm: Don't check for WARN in TM Bad Thing handling

Currently when we take a TM Bad Thing program check exception, we
search the bug table to see if the program check was generated by a
WARN/WARN_ON etc.

That makes no sense, the WARN macros use trap instructions, which
should never generate a TM Bad Thing exception. If they ever did that
would be a bug and we should oops.

We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but
those are all BUGs not WARNs, and they all use trap instructions
anyway. Almost certainly this check was incorrectly copied from the
REASON_TRAP handling in the same function.

Remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5080332c 14-Sep-2017 Michael Neuling <mikey@neuling.org>

powerpc/64s: Add workaround for P9 vector CI load issue

POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.

The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.

When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.

In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.

Some additional debugfs emulated instruction counters are added also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b96672dd 19-Jul-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc: Machine check interrupt is a non-maskable interrupt

Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6fcd6baa 19-Jul-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/powernv: Use kernel crash path for machine checks

There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
[ 127.426651616,0] OPAL: Reboot requested due to Platform error.
Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
buggy/mising code in OPAL for this BMC
Rebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
Effective address: ffff000000000000
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36
task: c000000932300000 task.stack: c000000932380000
NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty)
MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
CR: 24000484 XER: 20000000
CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
NIP [ffff000000000000] 0xffff000000000000
LR [00000000217706a4] 0x217706a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 4388c9b3 04-Jul-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc: Do not send system reset request through the oops path

A system reset is a request to crash / debug the system rather than
necessarily caused by encountering a BUG. So there is no need to
serialize all CPUs behind the die lock, adding taints to all
subsequent traces beyond the first, breaking console locks, etc.

The system reset is NMI context which has its own printk buffers to
prevent output being interleaved. Then it's better to have all
secondaries print out their debug as quickly as possible and the
primary will flush out all printk buffers during panic().

So remove the 0x100 path from die, and move it into system_reset. Name
the crash/dump reasons "System Reset".

This gives "not tained" traces when crashing an untainted kernel. It
also gives the panic reason as "System Reset" as opposed to "Fatal
exception in interrupt" (or "die oops" for fadump).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a4e89ffb 28-Jun-2017 Matt Weber <matthew.weber@rockwellcollins.com>

powerpc/e6500: Update machine check for L1D cache err

This patch updates the machine check handler of Linux kernel to
handle the e6500 architecture case. In e6500 core, L1 Data Cache Write
Shadow Mode (DCWS) register is not implemented but L1 data cache always
runs in write shadow mode. So, on L1 data cache parity errors, hardware
will automatically invalidate the data cache but will still log a
machine check interrupt.

Signed-off-by: Ronak Desai <ronak.desai@rockwellcollins.com>
Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Signed-off-by: Scott Wood <oss@buserror.net>


# 1c56cd8e 23-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/oops: Use IS_ENABLED() for oops markers

Just because it looks less gross.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2e82ca3c 23-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/oops: Print the kernel's endian in the oops

Although the MSR tells you what endian you're in it's possible that
isn't the same endian the kernel was built for, and if that happens
you're usually having a very bad day. So print a marker to make
it 100% clear which endian the kernel was built for.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 72c0d9ee 23-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/oops: Fix the oops markers to use pr_cont()

When we oops we print a few markers for significant config options
such as PREEMPT, SMP etc. Currently these appear on separate lines
because we're not using pr_cont() properly. Fix it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# fbbcc3bb 08-Aug-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/8xx: Remove SoftwareEmulation()

Since commit aa42c69c67f82 ("[POWERPC] Add support for FP emulation
for the e300c2 core"), program_check_exception() can be called for
math emulation. In that case, 'reason' is 0.

On the 8xx, there is a Software Emulation interrupt which is
called for all unimplemented and illegal instructions. This
interrupt calls SoftwareEmulation() which does almost the
same as program_check_exception() called with reason = 0.

The Software Emulation interrupt sets all reason bits to 0,
it is therefore possible to call program_check_exception()
directly from the interrupt handler.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f70b1e8d 08-Aug-2017 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/8xx: Move 8xx machine check handlers into platforms/8xx

In the same spirit as what was done for 4xx and 44x, move
the 8xx machine check into platforms/8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d30a5a52 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/traps: Use SRR1 defines for program check reasons

Currently we open code the reason codes for program checks. Instead use
the existing SRR1 defines.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ccd3cd36 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/mce: Move 64-bit machine check code into mce.c

We already have mce.c which is built for 64bit and contains other parts
of the machine check code, so move these bits in there too.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7f3f819e 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/traps: machine_check_generic() is only used on 32-bit

Make it clear that the fallback version of machine_check_generic() is
only used on 32-bit configs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 42bff234 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/traps: Inline get_mc_reason()

get_mc_reason() no longer provides (if it ever really did) any
meaningful abstraction, so remove it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 0d0935b3 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/4xx: Move machine_check_4xx() into platforms/4xx

Now that we have 4xx platform directory we can move the 4xx machine
check handler in there. Again we drop get_mc_reason() and replace it
with regs->dsisr directly (which is actually SPRN_ESR).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5b4e2857 08-Aug-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc/44x: Move 44x machine check handlers into platforms/44x

We have several 44x machine check handlers defined in traps.c. It would
be preferable if they were split out with the platforms that use them.
Do that.

In the process, drop get_mc_reason() and instead just open code the
lookup of reason from regs->dsisr. This avoids a pointless layer of
abstraction.

We know to use regs->dsisr because 44x enables BOOKE which enables
PPC_ADV_DEBUG_REGS, and FSL_BOOKE is not enabled on 44x builds.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# ca41ad43 01-Aug-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc: Add irq accounting for system reset interrupts

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f886f0f6 01-Aug-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Fix mce accounting for powernv

On 64-bit Book3s, when we're in HV mode, we have already counted the
machine check exception in machine_check_early().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use IS_ENABLED() rather than an #ifdef]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 15770a13 29-Jun-2017 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

powerpc/64s: Blacklist functions invoked on a trap

Blacklist all functions involved while handling a trap. We:
- convert some of the symbols into private symbols, and
- blacklist most functions involved while handling a trap.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# d93b0ac0 18-Apr-2017 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s/mce: Move add_taint() later in virtual mode

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads to unexpected behaviour or the
kernel to panic. Instead move the add_taint() call later in the virtual
mode where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.

Fixes: 27ea2c420cad ("powerpc: Set the correct kernel taint on machine check errors.")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 2b4f3ac5 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Mark system reset as an NMI with nmi_enter/exit()

System reset is a non-maskable interrupt from Linux's point of view
(occurs under local_irq_disable()), so it should use nmi_enter/exit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c4f3b52c 19-Dec-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Disallow system reset vs system reset reentrancy

In preparation for using a dedicated stack for system reset interrupts,
prevent a nested system reset from recovering, in order to simplify
code that is called in crash/debug path. This allows a system reset
interrupt to just use the base stack pointer.

Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
the interrrupt non-recoverable if it is taken inside another system
reset.

Interrupt nesting could be allowed similarly to MCE, but system reset
is a special case that's not for normal operation, so simplicity wins
until there is requirement for nested system reset interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 9b7ff0c6 06-Apr-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Add SCV FSCR bit for ISA v3.0

Add the bit definition and use it in facility_unavailable_exception() so we can
intelligently report the cause if we take a fault for SCV. This doesn't actually
enable SCV.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop whitespace changes to the existing entries, flush out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 794464f4 06-Apr-2017 Nicholas Piggin <npiggin@gmail.com>

powerpc/64s: Add msgp facility unavailable log string

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7644d581 09-Feb-2017 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there

powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b17b0153 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>

We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 93c2ec0f 29-Nov-2016 Balbir Singh <bsingharora@gmail.com>

powerpc Don't print misleading facility name in facility unavailable exception

The current facility_strings[] are correct when the trap address is
0xf80 (hypervisor facility unavailable). When the trap address is
0xf60 (facility unavailable) IC (Interruption Cause) a.k.a status in the
code is undefined for values 0 and 1.

Add a check to prevent printing the (misleading) facility name for IC 0
and 1 when we came in via 0xf60. In all cases, print the actual IC
value, to avoid any confusion.

This hasn't been seen on real hardware, on only qemu which was
misreporting an exception.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Fix indentation, combine printks(), massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# da665885 29-Nov-2016 Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>

powerpc: Change places using CONFIG_KEXEC to use CONFIG_KEXEC_CORE instead.

Commit 2965faa5e03d ("kexec: split kexec_load syscall from kexec core
code") introduced CONFIG_KEXEC_CORE so that CONFIG_KEXEC means whether
the kexec_load system call should be compiled-in and CONFIG_KEXEC_FILE
means whether the kexec_file_load system call should be compiled-in.
These options can be set independently from each other.

Since until now powerpc only supported kexec_load, CONFIG_KEXEC and
CONFIG_KEXEC_CORE were synonyms. That is not the case anymore, so we
need to make a distinction. Almost all places where CONFIG_KEXEC was
being used should be using CONFIG_KEXEC_CORE instead, since
kexec_file_load also needs that code compiled in.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6cc89bad 21-Nov-2016 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

powerpc/kprobes: Invoke handlers directly

Invoke the kprobe handlers directly rather than through notify_die(), to
reduce path taken for handling kprobes. Similar to commit 6f6343f53d13
("kprobes/x86: Call exception handlers directly from do_int3/do_debug").

While at it, rename post_kprobe_handler() to kprobe_post_handler() for
more uniform naming.

Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 7458e8b2 08-Nov-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Fix second nested oops hang

When ending an oops, don't clear die_owner unless the nest count
went to zero. This prevents a second nested oops from hanging forever
on the die_lock.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 6f44b20e 08-Nov-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Fix graceful debugger recovery

When exiting xmon with 'x' (exit and recover), oops_begin bails
out immediately, but die then calls __die() and oops_end(), which
cause a lot of bad things to happen.

If the debugger was attached then went to graceful recovery, exit
from die() immediately.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 29a969b7 30-Oct-2016 Michael Neuling <mikey@neuling.org>

powerpc: Revert Load Monitor Register Support

Load monitored is no longer supported on POWER9 so let's remove the
code.

This reverts commit bd3ea317fddf ("powerpc: Load Monitor Register
Support").

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 61a92f70 13-Oct-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Add support for relative exception tables

This halves the exception table size on 64-bit builds, and it allows
build-time sorting of exception tables to work on relocated kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Minor asm fixups and bits to keep the selftests working]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 5d176f75 14-Sep-2016 Cyril Bur <cyrilbur@gmail.com>

powerpc: tm: Enable transactional memory (TM) lazily for userspace

Currently the MSR TM bit is always set if the hardware is TM capable.
This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
TFAIR) must be swapped for each process regardless of if they use TM.

For processes that don't use TM the TM MSR bit can be turned off
allowing the kernel to avoid the expensive swap of the TM registers.

A TM unavailable exception will occur if a thread does use TM and the
kernel will enable MSR_TM and leave it so for some time afterwards.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 172f7aaa 14-Sep-2016 Cyril Bur <cyrilbur@gmail.com>

powerpc/tm: Add TM Unavailable Exception

If the kernel disables transactional memory (TM) and userspace still
tries TM related actions (TM instructions or TM SPR accesses) TM aware
hardware will cause the kernel to take a facility unavailable
exception.

Add checks for the exception being caused by illegal TM access in
userspace.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Rewrite comment entirely, bugs in it are mine]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# dc310669 23-Sep-2016 Cyril Bur <cyrilbur@gmail.com>

powerpc: tm: Always use fp_state and vr_state to store live registers

There is currently an inconsistency as to how the entire CPU register
state is saved and restored when a thread uses transactional memory
(TM).

Using transactional memory results in the CPU having duplicated
(almost) all of its register state. This duplication results in a set
of registers which can be considered 'live', those being currently
modified by the instructions being executed and another set that is
frozen at a point in time.

On context switch, both sets of state have to be saved and (later)
restored. These two states are often called a variety of different
things. Common terms for the state which only exists after the CPU has
entered a transaction (performed a TBEGIN instruction) in hardware are
'transactional' or 'speculative'.

Between a TBEGIN and a TEND or TABORT (or an event that causes the
hardware to abort), regardless of the use of TSUSPEND the
transactional state can be referred to as the live state.

The second state is often to referred to as the 'checkpointed' state
and is a duplication of the live state when the TBEGIN instruction is
executed. This state is kept in the hardware and will be rolled back
to on transaction failure.

Currently all the registers stored in pt_regs are ALWAYS the live
registers, that is, when a thread has transactional registers their
values are stored in pt_regs and the checkpointed state is in
ckpt_regs. A strange opposite is true for fp_state/vr_state. When a
thread is non transactional fp_state/vr_state holds the live
registers. When a thread has initiated a transaction fp_state/vr_state
holds the checkpointed state and transact_fp/transact_vr become the
structure which holds the live state (at this point it is a
transactional state).

This method creates confusion as to where the live state is, in some
circumstances it requires extra work to determine where to put the
live state and prevents the use of common functions designed (probably
before TM) to save the live state.

With this patch pt_regs, fp_state and vr_state all represent the
same thing and the other structures [pending rename] are for
checkpointed state.

Acked-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e627f8dc 16-Sep-2016 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/8xx: add dedicated machine check handler

During a machine check, the 8xx provides indication of
whether the check is due to data or instruction access, so
let's display it.

Lets also move 8xx specific handling into the new handler.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>


# f307939f 05-Sep-2016 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc/8xx: add system_reset_exception

When the watchdog is in NMI mode, the system reset interrupt is
generated when the watchdog counter expires.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>


# ddc6cd0d 17-May-2016 Christophe Leroy <christophe.leroy@c-s.fr>

powerpc32: Use instruction symbolic names in check_io_access()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>


# 03465f89 16-Sep-2016 Nicholas Piggin <npiggin@gmail.com>

powerpc: Use kprobe blacklist for exception handlers

Currently we mark the C implementations of some exception handlers as
__kprobes. This has the effect of putting them in the ".kprobes.text"
section, which separates them from the rest of the text.

Instead we can use the blacklist macros to add the symbols to a
blacklist which kprobes will check. This allows the linker to move
exception handler functions close to callers and avoids trampolines in
larger kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Reword change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# f0f558b1 02-Sep-2016 Paul Mackerras <paulus@ozlabs.org>

powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address

Currently, if userspace or the kernel accesses a completely bogus address,
for example with any of bits 46-59 set, we first take an SLB miss interrupt,
install a corresponding SLB entry with VSID 0, retry the instruction, then
take a DSI/ISI interrupt because there is no HPT entry mapping the address.
However, by the time of the second interrupt, the Come-From Address Register
(CFAR) has been overwritten by the rfid instruction at the end of the SLB
miss interrupt handler. Since bogus accesses can often be caused by a
function return after the stack has been overwritten, the CFAR value would
be very useful as it could indicate which function it was whose return had
led to the bogus address.

This patch adds code to create a full exception frame in the SLB miss handler
in the case of a bogus address, rather than inserting an SLB entry with a
zero VSID field. Then we call a new slb_miss_bad_addr() function in C code,
which delivers a signal for a user access or creates an oops for a kernel
access. In the latter case the oops message will show the CFAR value at the
time of the access.

In the case of the radix MMU, a segment miss interrupt indicates an access
outside the ranges mapped by the page tables. Previously this was handled
by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
not really correct. With this patch, we now handle these interrupts with
slb_miss_bad_addr(), which is much more consistent.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8a39b05f 16-Aug-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

powerpc: migrate exception table users off module.h and onto extable.h

These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# bd3ea317 08-Jun-2016 Jack Miller <jack@codezen.org>

powerpc: Load Monitor Register Support

This enables new registers, LMRR and LMSER, that can trigger an EBB in
userspace code when a monitored load (via the new ldmx instruction)
loads memory from a monitored space. This facility is controlled by a
new FSCR bit, LM.

This patch disables the FSCR LM control bit on task init and enables
that bit when a load monitor facility unavailable exception is taken
for using it. On context switch, this bit is then used to determine
whether the two relevant registers are saved and restored. This is
done lazily for performance reasons.

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# b57bd2de 08-Jun-2016 Michael Neuling <mikey@neuling.org>

powerpc: Improve FSCR init and context switching

This fixes a few issues with FSCR init and switching.

In commit 152d523e6307 ("powerpc: Create context switch helpers
save_sprs() and restore_sprs()") we moved the setting of the FSCR
register from inside an CPU_FTR_ARCH_207S section to inside just a
CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where
the FSCR doesn't exist. This is harmless but we shouldn't do it.

Also, we can simplify the FSCR context switch. We don't need to go
through the calculation involving dscr_inherit. We can just restore
what we saved last time.

We also set an initial value in INIT_THREAD, so that pid 1 which is
cloned from that gets a sane value.

Based on patch by Jack Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# fd7bacbc 14-May-2016 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt

When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.

When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.

If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.

With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.

This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.

On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.

All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.

It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.

One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete. Once TB resync is complete all threads will then
proceed.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>


# 42f5b4ca 17-May-2016 Daniel Axtens <dja@axtens.net>

powerpc: Introduce asm-prototypes.h

Sparse picked up a number of functions that are implemented in C and
then only referred to in asm code.

This introduces asm-prototypes.h, which provides a place for
prototypes of these functions.

This silences some sparse warnings.

Signed-off-by: Daniel Axtens <dja@axtens.net>
[mpe: Add include guards, clean up copyright & GPL text]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# e7df0d88 17-Mar-2016 Joonsoo Kim <iamjoonsoo.kim@lge.com>

powerpc: query dynamic DEBUG_PAGEALLOC setting

We can disable debug_pagealloc processing even if the code is compiled
with CONFIG_DEBUG_PAGEALLOC. This patch changes the code to query
whether it is enabled or not in runtime.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 446957ba 24-Feb-2016 Adam Buchbinder <adam.buchbinder@gmail.com>

powerpc: Fix misspellings in comments.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# a4c3f909 17-Feb-2016 Balbir Singh <bsingharora@gmail.com>

powerpc: Fix BUG_ON() reporting in real mode

I ran into this issue while debugging an early boot problem. The system
hit a BUG_ON() but report bug failed to print the line number and file
name. The reason being that the system was running in real mode and
report_bug() searches for addresses in the PAGE_OFFSET+ region.

Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# cdfc8ed6 18-Nov-2015 Rashmica Gupta <rashmicy@gmail.com>

powerpc: Remove unused function trace_syscall()

This function has been unused since commit 14cf11af6cf6 ("powerpc: Merge enough
to start building in arch/powerpc."), so remove it.

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 27ea2c42 14-Jun-2015 Daniel Axtens <dja@axtens.net>

powerpc: Set the correct kernel taint on machine check errors.

This means the 'M' flag will work properly when the kernel prints a backtrace.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# c952c1c4 20-May-2015 Anshuman Khandual <khandual@linux.vnet.ibm.com>

powerpc: Fix handling of DSCR related facility unavailable exception

Currently DSCR (Data Stream Control Register) can be accessed with
mfspr or mtspr instructions inside a thread via two different SPR
numbers. One being the user accessible problem state SPR number 0x03
and the other being the privilege state SPR number 0x11. All access
through the privilege state SPR number get emulated through illegal
instruction exception. Any access through the problem state SPR number
raises one facility unavailable exception which sets the thread based
dscr_inherit bit and enables DSCR facility through FSCR register thus
allowing direct access to DSCR without going through this exception in
the future. We set the thread.dscr_inherit bit whether the access was
with mfspr or mtspr instruction which is neither correct nor does it
match the behaviour through the instruction emulation code path driven
from privilege state SPR number. User currently observes two different
kind of behaviour when accessing the DSCR through these two SPR numbers.
This problem can be observed through these two test cases by replacing
the privilege state SPR number with the problem state SPR number.

(1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c
(2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c

This patch fixes the problem by making sure that the behaviour visible
to the user remains the same irrespective of which SPR number is being
used. Inside facility unavailable exception, we check whether it was
cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction,
just emulate the instruction. In case of mtspr instruction, set the
thread based dscr_inherit bit and also enable the facility through FSCR.
All user SPR based mfspr instruction will be emulated till one user SPR
based mtspr has been executed.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 8aa989b8 26-Jan-2015 Michael Ellerman <mpe@ellerman.id.au>

powerpc: Remove some unused functions

Remove slice_set_psize() which is not used.

It was added in 3a8247cc2c85 "powerpc: Only demote individual slices
rather than whole process" but was never used.

Remove vsx_assist_exception() which is not used.

It was added in ce48b2100785 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.

Remove generic_mach_cpu_die() which is not used.

Its last caller was removed in 375f561a4131 "powerpc/powernv: Always go
into nap mode when CPU is offline".

Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.

These were introduced in c5d56332fd6c "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.

This was partially found by using a static code analysis program called
cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 69111bac 21-Oct-2014 Christoph Lameter <cl@linux.com>

powerpc: Replace __get_cpu_var uses

This still has not been merged and now powerpc is the only arch that does
not have this change. Sorry about missing linuxppc-dev before.

V2->V2
- Fix up to work against 3.18-rc1

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)

Converts to

int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);

Converts to

memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;

Converts to

__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++

Converts to

__this_cpu_inc(y)

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
[mpe: Fix build errors caused by set/or_softirq_pending(), and rework
assignment in __set_breakpoint() to use memcpy().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 23f66e2d 27-Aug-2014 Tejun Heo <tj@kernel.org>

Revert "powerpc: Replace __get_cpu_var uses"

This reverts commit 5828f666c069af74e00db21559f1535103c9f79a due to
build failure after merging with pending powerpc changes.

Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.au

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 5828f666 16-Aug-2014 Christoph Lameter <cl@linux.com>

powerpc: Replace __get_cpu_var uses

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);

Converts to

int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)

Converts to

int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);

Converts to

memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;

Converts to

__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++

Converts to

__this_cpu_inc(y)

tj: Folded a fix patch.
http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 0869b6fd 29-Jul-2014 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Add basic infrastructure to handle HMI in Linux.

Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# c1528339 17-Jun-2014 Wladislav Wiebe <wladislav.kw@gmail.com>

powerpc/traps/e500: fix misleading error output

In machine_check_e500 exception handler is a wrong indication
in case of MCSR_BUS_WBERR - so print "Write" instead of "Read".

Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>


# e6654d5b 11-Jun-2014 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Increment the mce counter during machine_check_early call.

We don't see MCE counter getting increased in /proc/interrupts which gives
false impression of no MCE occurred even when there were MCE events.
The machine check early handling was added for PowerKVM and we missed to
increment the MCE count in the early handler.

We also increment mce counters in the machine_check_exception call, but
in most cases where we handle the error hypervisor never reaches there
unless its fatal and we want to crash. Only during fatal situation we may
see double increment of mce count. We need to fix that. But for
now it always good to have some count increased instead of zero.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f83319d7 28-Mar-2014 Anton Blanchard <anton@samba.org>

powerpc: Add lq/stq emulation

Recent CPUs support quad word load and store instructions. Add
support to the alignment handler for them.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ee4ed6fa 14-Mar-2014 Michael Neuling <mikey@neuling.org>

powerpc: Rate-limit users spamming kernel log buffer

The facility unavailable exception can be triggered from userspace by
accessing PMU registers when EBB is not enabled. This causes the
included pr_err() to run, hence spamming the kernel log buffer.

This avoids this by rate limiting these messages.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 3ac8ff1c 12-Jan-2014 Paul Mackerras <paulus@samba.org>

powerpc: Fix transactional FP/VMX/VSX unavailable handlers

Currently, if a process starts a transaction and then takes an
exception because the FPU, VMX or VSX unit is unavailable to it,
we end up corrupting any FP/VMX/VSX state that was valid before
the interrupt. For example, if the process starts a transaction
with the FPU available to it but VMX unavailable, and then does
a VMX instruction inside the transaction, the FP state gets
corrupted.

Loading up the desired state generally involves doing a reclaim
and a recheckpoint. To avoid corrupting already-valid state, we have
to be careful not to reload that state from the thread_struct
between the reclaim and the recheckpoint (since the thread_struct
values are stale by now), and we have to reload that state from
the transact_fp/vr arrays after the recheckpoint to get back the
current transactional values saved there by the reclaim.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# d31626f7 12-Jan-2014 Paul Mackerras <paulus@samba.org>

powerpc: Don't corrupt transactional state when using FP/VMX in kernel

Currently, when we have a process using the transactional memory
facilities on POWER8 (that is, the processor is in transactional
or suspended state), and the process enters the kernel and the
kernel then uses the floating-point or vector (VMX/Altivec) facility,
we end up corrupting the user-visible FP/VMX/VSX state. This
happens, for example, if a page fault causes a copy-on-write
operation, because the copy_page function will use VMX to do the
copy on POWER8. The test program below demonstrates the bug.

The bug happens because when FP/VMX state for a transactional process
is stored in the thread_struct, we store the checkpointed state in
.fp_state/.vr_state and the transactional (current) state in
.transact_fp/.transact_vr. However, when the kernel wants to use
FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(),
which saves the current state in .fp_state/.vr_state. Furthermore,
when we return to the user process we return with FP/VMX/VSX
disabled. The next time the process uses FP/VMX/VSX, we don't know
which set of state (the current register values, .fp_state/.vr_state,
or .transact_fp/.transact_vr) we should be using, since we have no
way to tell if we are still in the same transaction, and if not,
whether the previous transaction succeeded or failed.

Thus it is necessary to strictly adhere to the rule that if FP has
been enabled at any point in a transaction, we must keep FP enabled
for the user process with the current transactional state in the
FP registers, until we detect that it is no longer in a transaction.
Similarly for VMX; once enabled it must stay enabled until the
process is no longer transactional.

In order to keep this rule, we add a new thread_info flag which we
test when returning from the kernel to userspace, called TIF_RESTORE_TM.
This flag indicates that there is FP/VMX/VSX state to be restored
before entering userspace, and when it is set the .tm_orig_msr field
in the thread_struct indicates what state needs to be restored.
The restoration is done by restore_tm_state(). The TIF_RESTORE_TM
bit is set by new giveup_fpu/altivec_maybe_transactional helpers,
which are called from enable_kernel_fp/altivec, giveup_vsx, and
flush_fp/altivec_to_thread instead of giveup_fpu/altivec.

The other thing to be done is to get the transactional FP/VMX/VSX
state from .fp_state/.vr_state when doing reclaim, if that state
has been saved there by giveup_fpu/altivec_maybe_transactional.
Having done this, we set the FP/VMX bit in the thread's MSR after
reclaim to indicate that that part of the state is now valid
(having been reclaimed from the processor's checkpointed state).

Finally, in the signal handling code, we move the clearing of the
transactional state bits in the thread's MSR a bit earlier, before
calling flush_fp_to_thread(), so that we don't unnecessarily set
the TIF_RESTORE_TM bit.

This is the test program:

/* Michael Neuling 4/12/2013
*
* See if the altivec state is leaked out of an aborted transaction due to
* kernel vmx copy loops.
*
* gcc -m64 htm_vmxcopy.c -o htm_vmxcopy
*
*/

/* We don't use all of these, but for reference: */

int main(int argc, char *argv[])
{
long double vecin = 1.3;
long double vecout;
unsigned long pgsize = getpagesize();
int i;
int fd;
int size = pgsize*16;
char tmpfile[] = "/tmp/page_faultXXXXXX";
char buf[pgsize];
char *a;
uint64_t aborted = 0;

fd = mkstemp(tmpfile);
assert(fd >= 0);

memset(buf, 0, pgsize);
for (i = 0; i < size; i += pgsize)
assert(write(fd, buf, pgsize) == pgsize);

unlink(tmpfile);

a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
assert(a != MAP_FAILED);

asm __volatile__(
"lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value
TBEGIN
"beq 3f ;"
TSUSPEND
"xxlxor 40,40,40 ; " // set 40 to 0
"std 5, 0(%[map]) ;" // cause kernel vmx copy page
TABORT
TRESUME
TEND
"li %[res], 0 ;"
"b 5f ;"
"3: ;" // Abort handler
"li %[res], 1 ;"
"5: ;"
"stxvd2x 40,0,%[vecoutptr] ; "
: [res]"=r"(aborted)
: [vecinptr]"r"(&vecin),
[vecoutptr]"r"(&vecout),
[map]"r"(a)
: "memory", "r0", "r3", "r4", "r5", "r6", "r7");

if (aborted && (vecin != vecout)){
printf("FAILED: vector state leaked on abort %f != %f\n",
(double)vecin, (double)vecout);
exit(1);
}

munmap(a, size);

close(fd);

printf("PASSED!\n");
return 0;
}

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4c703416 30-Oct-2013 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: Introduce a early machine check hook in cpu_spec.

This patch adds the early machine check function pointer in cputable for
CPU specific early machine check handling. The early machine handle routine
will be called in real mode to handle SLB and TLB errors. We can not reuse
the existing machine_check hook because it is always invoked in kernel
virtual mode and we would already be in trouble if we get SLB or TLB errors.
This patch just sets up a mechanism to invoke CPU specific handler. The
subsequent patches will populate the function pointer.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1e9b4507 30-Oct-2013 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

powerpc/book3s: handle machine check in Linux host.

Move machine check entry point into Linux. So far we were dependent on
firmware to decode MCE error details and handover the high level info to OS.

This patch introduces early machine check routine that saves the MCE
information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
stack frame on emergency stack and set the r1 accordingly. This allows us to be
prepared to take another exception without loosing context. One thing to note
here that, if we get another machine check while ME bit is off then we risk a
checkstop. Hence we restrict ourselves to save only MCE information and
register saved on PACA_EXMC save are before we turn the ME bit on. We use
paca->in_mce flag to differentiate between first entry and nested machine check
entry which helps proper use of emergency stack. We increment paca->in_mce
every time we enter in early machine check handler and decrement it while
leaving. When we enter machine check early handler first time (paca->in_mce ==
0), we are sure nobody is using MC emergency stack and allocate a stack frame
at the start of the emergency stack. During subsequent entry (paca->in_mce >
0), we know that r1 points inside emergency stack and we allocate separate
stack frame accordingly. This prevents us from clobbering MCE information
during nested machine checks.

The early machine check handler changes are placed under CPU_FTR_HVMODE
section. This makes sure that the early machine check handler will get executed
only in hypervisor kernel.

This is the code flow:

Machine Check Interrupt
|
V
0x200 vector ME=0, IR=0, DR=0
|
V
+-----------------------------------------------+
|machine_check_pSeries_early: | ME=0, IR=0, DR=0
| Alloc frame on emergency stack |
| Save srr1, srr0, dar and dsisr on stack |
+-----------------------------------------------+
|
(ME=1, IR=0, DR=0, RFID)
|
V
machine_check_handle_early ME=1, IR=0, DR=0
|
V
+-----------------------------------------------+
| machine_check_early (r3=pt_regs) | ME=1, IR=0, DR=0
| Things to do: (in next patches) |
| Flush SLB for SLB errors |
| Flush TLB for TLB errors |
| Decode and save MCE info |
+-----------------------------------------------+
|
(Fall through existing exception handler routine.)
|
V
machine_check_pSerie ME=1, IR=0, DR=0
|
(ME=1, IR=1, DR=1, RFID)
|
V
machine_check_common ME=1, IR=1, DR=1
.
.
.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a3821b2a 28-Oct-2013 Scott Wood <scottwood@freescale.com>

powerpc: Fix PPC_EMULATED_STATS build break with sync patch

Commit 9863c28a2af90a56c088f5f6288d7f6d2c923c14 ("powerpc: Emulate sync
instruction variants") introduced a build breakage with
CONFIG_PPC_EMULATED_STATS enabled.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Kumar Gala <galak@kernel.org>
Cc: James Yang <James.Yang@freescale.com>
---


# 1eb2819d 28-Aug-2013 LEROY Christophe <christophe.leroy@c-s.fr>

powerpc/mpc8xx: Clearer Oops message for Software Emulation Exception

This patch modifies the Oops message in case of Software Emulation Exception.
The existing message is quite confusing because it refers to FPU Emulation
while most often the issue is due to either a non supported instruction
(not necessarily FPU related) or a stale instruction due to HW issues.
The new message tries to be more generic in order to make the user understand
that the Oops is due to something wrong with an instruction, not necessarily
due to an FPU instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>


# 51ae8d4a 04-Jul-2013 Bharat Bhushan <r65777@freescale.com>

powerpc: move debug registers in a structure

This way we can use same data type struct with KVM and
also help in using other debug related function.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Michael Neuling <mikey@neuling.org>
[scottwood@freescale.com: removed obvious debug_reg comment]
Signed-off-by: Scott Wood <scottwood@freescale.com>


# 95791988 25-Jun-2013 Bharat Bhushan <r65777@freescale.com>

powerpc: move debug registers in a structure

This way we can use same data type struct with KVM and
also help in using other debug related function.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 9863c28a 03-Jul-2013 James Yang <James.Yang@freescale.com>

powerpc: Emulate sync instruction variants

Reserved fields of the sync instruction have been used for other
instructions (e.g. lwsync). On processors that do not support variants
of the sync instruction, emulate it by executing a sync to subsume the
effect of the intended instruction.

Signed-off-by: James Yang <James.Yang@freescale.com>
[scottwood@freescale.com: whitespace and subject line fix]
Signed-off-by: Scott Wood <scottwood@freescale.com>


# de79f7b9 10-Sep-2013 Paul Mackerras <paulus@samba.org>

powerpc: Put FP/VSX and VR state into structures

This creates new 'thread_fp_state' and 'thread_vr_state' structures
to store FP/VSX state (including FPSCR) and Altivec/VSX state
(including VSCR), and uses them in the thread_struct. In the
thread_fp_state, the FPRs and VSRs are represented as u64 rather
than double, since we rarely perform floating-point computations
on the values, and this will enable the structures to be used
in KVM code as well. Similarly FPSCR is now a u64 rather than
a structure of two 32-bit values.

This takes the offsets out of the macros such as SAVE_32FPRS,
REST_32FPRS, etc. This enables the same macros to be used for normal
and transactional state, enabling us to delete the transactional
versions of the macros. This also removes the unused do_load_up_fpu
and do_load_up_altivec, which were in fact buggy since they didn't
create large enough stack frames to account for the fact that
load_up_fpu and load_up_altivec are not designed to be called from C
and assume that their caller's stack frame is an interrupt frame.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# bc683a7e 25-Aug-2013 Michael Neuling <mikey@neuling.org>

powerpc: Cleanup handling of the DSCR bit in the FSCR register

As suggested by paulus we can simplify the Data Stream Control Register
(DSCR) Facility Status and Control Register (FSCR) handling.

Firstly, we simplify the asm by using a rldimi.

Secondly, we now use the FSCR only to control the DSCR facility, rather
than both the FSCR and HFSCR. Users will see no functional change from
this but will get a minor speedup as they will trap into the kernel only
once (rather than twice) when they first touch the DSCR. Also, this
changes removes a bunch of ugly FTR_SECTION code.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b3f6a459 14-Aug-2013 Michael Ellerman <michael@ellerman.id.au>

powerpc: Skip emulating & leave interrupts off for kernel program checks

In the program check handler we handle some causes with interrupts off
and others with interrupts on.

We need to enable interrupts to handle the emulation cases, because they
access userspace memory and might sleep.

For faults in the kernel we don't want to do any emulation, and
emulate_instruction() enforces that. do_mathemu() doesn't but probably
should.

The other disadvantage of enabling interrupts for kernel faults is that
we may take another interrupt, and recurse. As seen below:

--- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
[link register ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
[c000000fb185dc10] 0000000000000000 (unreliable)
[c000000fb185dc80] c0000000007d8558 .program_check_exception+0x298/0x2d0
[c000000fb185dd00] c000000000002f40 emulation_assist_common+0x140/0x180
--- Exception: e40 at c000000000004ee0 performance_monitor_relon_pSeries_1
[link register ] c00000000000f858 .arch_local_irq_restore+0x38/0x90
[c000000fb185dff0] 00000000008b9190 (unreliable)
[c000000fb185e060] c0000000007d8558 .program_check_exception+0x298/0x2d0

So avoid both problems by checking if the fault was in the kernel and
skipping the enable of interrupts and the emulation. Go straight to
delivering the SIGILL, which for kernel faults calls die() and so on,
dropping us in the debugger etc.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4288e343 06-Aug-2013 Anton Blanchard <anton@samba.org>

powerpc: Emulate instructions in little endian mode

Alistair noticed we got a SIGILL on userspace mfpvr instructions.

Remove the little endian check in the emulation code, it is
probably there to protect against the old pseudo little endian
implementations but doesn't make sense for real little endian.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 3a3b5aa6 14-Jul-2013 Kevin Hao <haokexin@gmail.com>

powerpc: Introduce function emulate_math()

There are two invocations of do_mathemu() in traps.c. And the codes
in these two places are almost the same. Introduce a locale function
to eliminate the duplication. With this change we can also make sure
that in program_check_exception() the PPC_WARN_EMULATED is invoked for
the correctly emulated math instructions.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 6761ee3d 14-Jul-2013 Kevin Hao <haokexin@gmail.com>

powerpc/math-emu: Move the flush FPU state function into do_mathemu

By doing this we can make sure that the FPU state is only flushed to
the thread struct when it is really needed.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 2517617e 09-Aug-2013 Michael Neuling <mikey@neuling.org>

powerpc: Fix context switch DSCR on POWER8

POWER8 allows the DSCR to be accessed directly from userspace via a new SPR
number 0x3 (Rather than 0x11. DSCR SPR number 0x11 is still used on POWER8 but
like POWER7, is only accessible in HV and OS modes). Currently, we allow this
by setting H/FSCR DSCR bit on boot.

Unfortunately this doesn't work, as the kernel needs to see the DSCR change so
that it knows to no longer restore the system wide version of DSCR on context
switch (ie. to set thread.dscr_inherit).

This clears the H/FSCR DSCR bit initially. If a process then accesses the DSCR
(via SPR 0x3), it'll trap into the kernel where we set thread.dscr_inherit in
facility_unavailable_exception().

We also change _switch() so that we set or clear the H/FSCR DSCR bit based on
the thread.dscr_inherit.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4e0e3435 27-Apr-2013 Hongtao Jia <hongtao.jia@freescale.com>

powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx

A PCIe erratum of mpc85xx may causes a core hang when a link of PCIe
goes down. when the link goes down, Non-posted transactions issued
via the ATMU requiring completion result in an instruction stall.
At the same time a machine-check exception is generated to the core
to allow further processing by the handler. We implements the handler
which skips the instruction caused the stall.

This patch depends on patch:
powerpc/85xx: Add platform_device declaration to fsl_pci.h

Signed-off-by: Zhao Chenhui <b35336@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Liu Shuo <soniccat.liu@gmail.com>
Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>


# b14b6260 25-Jun-2013 Michael Ellerman <michael@ellerman.id.au>

powerpc: Wire up the HV facility unavailable exception

Similar to the facility unavailble exception, except the facilities are
controlled by HFSCR.

Adapt the facility_unavailable_exception() so it can be called for
either the regular or Hypervisor facility unavailable exceptions.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 021424a1 25-Jun-2013 Michael Ellerman <michaele@au1.ibm.com>

powerpc: Rename and flesh out the facility unavailable exception handler

The exception at 0xf60 is not the TM (Transactional Memory) unavailable
exception, it is the "Facility Unavailable Exception", rename it as
such.

Flesh out the handler to acknowledge the fact that it can be called for
many reasons, one of which is TM being unavailable.

Use STD_EXCEPTION_COMMON() for the exception body, for some reason we
had it open-coded, I've checked the generated code is identical.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 80aa0fb4 25-Jun-2013 James Yang <James.Yang@freescale.com>

powerpc: Fix string instr. emulation for 32-bit processes on ppc64

String instruction emulation would erroneously result in a segfault if
the upper bits of the EA are set and is so high that it fails access
check. Truncate the EA to 32 bits if the process is 32-bit.

Signed-off-by: James Yang <James.Yang@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 968219fa 09-Jun-2013 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/8xx: Remove 8xx specific "minimal FPU emulation"

This is duplicated code from math-emu and implements such a small
subset of the FPU (load/stores/fmr) that it's essentially pointless
nowdays.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4e63f8ed 09-Jun-2013 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/math-emu: Allow math-emu to be used for HW FPU

(Including 64-bit ones)

This allow SW emulation by the kernel of optional instructions
such as fsqrt which aren't implemented on some processors, and
thus fixes some Fedora 19 issues such as Anaconda since the
compiler is set to generate those by default on 64-bit.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# bf593907 14-Jun-2013 Paul Mackerras <paulus@samba.org>

powerpc: Fix emulation of illegal instructions on PowerNV platform

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr). The emulation of unimplemented instructions is currently not
working on the PowerNV platform. The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt. This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt(). With this, old programs that use no-longer
implemented instructions such as dcba now work again.

CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 6ce6c629 26-May-2013 Michael Neuling <mikey@neuling.org>

powerpc/tm: Abort on emulation and alignment faults

If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context. We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now). If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure. This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ba12eede 13-May-2013 Li Zhong <zhong@linux.vnet.ibm.com>

powerpc: Exception hooks for context tracking subsystem

This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
commit 6ba3c97a38803883c2eee489505796cb0a727122

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7c8734f055da2a6bac553caf4a468eb
context_tracking: Move exception handling to generic code
6c1e0256fad84a843d915414e4b5973b7443d48d
context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 73d2fb75 01-May-2013 Anton Blanchard <anton@samba.org>

powerpc: Emulate non privileged DSCR read and write

POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.

Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.

A simple test was created to verify the fix:

http://ozlabs.org/~anton/junkcode/user_dscr_test.c

Without the patch we get a SIGILL and it passes with the patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# bc2a9408 13-Feb-2013 Michael Neuling <mikey@neuling.org>

powerpc: Hook in new transactional memory code

This hooks the new transactional memory code into context switching, FP/VMX/VMX
unavailable and exception return.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# f54db641 13-Feb-2013 Michael Neuling <mikey@neuling.org>

powerpc: Routines for FP/VSX/VMX unavailable during a transaction

We do lazy FP but not lazy TM (ie. userspace starts with MSR TM=1 FP=0). Hence
if userspace does an FP instruction during a transaction, we'll take an
fp unavailable exception.

This adds functions needed to handle this case. We have to inject the current
FP state into the checkpoint so that the hardware can decide what to do with
the transaction. We can't inject only the FP so we have to do a full treclaim
and recheckpoint to inject just the FP state. This will cause the transaction
to be marked as aborted by the hardware.

This just add the routines needed to do this for FP, VMX and VSX. It doesn't
hook them into the rest of the code yet.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# d0c0c9a1 13-Feb-2013 Michael Neuling <mikey@neuling.org>

powerpc: Add transactional memory unavaliable execption handler

These should never happen since we always turn on MSR TM when in userspace. We
don't do lazy TM.

Hence if we hit this, we barf and kill the task as something's gone horribly
wrong.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 8b3c34cf 13-Feb-2013 Michael Neuling <mikey@neuling.org>

powerpc: New macros for transactional memory support

This adds new macros for saving and restoring checkpointed architected state
from and to the thread_struct.

It also adds some debugging macros for when your brain explodes trying to debug
your transactional memory enabled kernel.

Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 373d4d09 20-Jan-2013 Rusty Russell <rusty@rustcorp.com.au>

taint: add explicit flag to show whether lock dep is still OK.

Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 9422de3e 20-Dec-2012 Michael Neuling <mikey@neuling.org>

powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers

This is a rewrite so that we don't assume we are using the DABR throughout the
code. We now use the arch_hw_breakpoint to store the breakpoint in a generic
manner in the thread_struct, rather than storing the raw DABR value.

The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from
userspace. We keep this functionality, so that future changes (like the POWER8
DAWR), will still fake the DABR to userspace.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 1e18c17a 05-Oct-2012 Jason Gunthorpe <jgg@ziepe.ca>

powerpc: Enable the Watchdog vector for 405

The watchdog and FIT code has been #if 0'd for ever, if the CPU takes
an exception to either of those vectors it will jump into the middle
of the PIT or Data TLB code and surely crash.

At least some (all?) 405 cores have both the WDT and FIT
vectors defined, so lets have proper entry points for them.

Tested that the WDT vector works on a 405F6 core.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 00ca0de0 03-Sep-2012 Anton Blanchard <anton@samba.org>

powerpc: Keep thread.dscr and thread.dscr_inherit in sync

When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.

If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.

This issue was found with the following testcase:

http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 41ab5266 23-Aug-2012 Ananth N Mavinakayanahalli <ananth@in.ibm.com>

powerpc: Add trap_nr to thread_struct

Add thread_struct.trap_nr and use it to store the last exception
the thread experienced. In this patch, we populate the field at
various places where we force_sig_info() to the process.

This is also used in uprobes to determine if the probed instruction
caused an exception.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a3512b2d 07-May-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/irq: Make alignment & program interrupt behave the same

Alignment was the last user of the ENABLE_INTS macro, which we can
now remove. All non-syscall exceptions now disable interrupts on
entry, they get re-enabled conditionally from C code. Don't
unconditionally re-enable in program check either, check the
original context.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ae3a197e 28-Mar-2012 David Howells <dhowells@redhat.com>

Disintegrate asm/system.h for PowerPC

Disintegrate asm/system.h for PowerPC.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org


# 9f2f79e3 29-Feb-2012 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Disable interrupts in 64-bit kernel FP and vector faults

If we get a floating point, altivec or vsx unavaible interrupt in
kernel, we trigger a kernel error. There is no point preserving
the interrupt state, in fact, that can even make debugging harder
as the processor state might change (we may even preempt) between
taking the exception and landing in a debugger.

So just make those 3 disable interrupts unconditionally.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2: On BookE only disable when hitting the kernel unavailable
path, otherwise it will fail to restore softe as
fast_exception_return doesn't do it.


# ebaeb5ae 15-Feb-2012 Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

fadump: Convert firmware-assisted cpu state dump data into elf notes.

When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 760ca4dc 29-Nov-2011 Anton Blanchard <anton@samba.org>

powerpc: Rework die()

Our die() code was based off a very old x86 version. Update it to
mirror the current x86 code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 9b00ac06 29-Nov-2011 Anton Blanchard <anton@samba.org>

powerpc: Remove broken and complicated kdump system reset code

We have a lot of complicated logic that handles possible recursion between
kdump and a system reset exception. We can solve this in a much simpler
way using the same setjmp/longjmp tricks xmon does.

As a first step, this patch removes the old system reset code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 58154c8c 29-Nov-2011 Anton Blanchard <anton@samba.org>

powerpc: Give us time to get all oopses out before panicking

I've been seeing truncated output when people send system reset info
to me. We should see a backtrace for every CPU, but the panic() code
takes the box down before they all make it out to the console. The
panic code runs unlocked so we also see corrupted console output.

If we are going to panic, then delay 1 second before calling into the
panic code. Move oops_exit inside the die lock and put a newline
between oopses for clarity.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# b95bc219 05-Oct-2011 Kumar Gala <galak@kernel.crashing.org>

powerpc: Remove extraneous CONFIG_PPC_ADV_DEBUG_REGS define

All of DebugException is already protected by CONFIG_PPC_ADV_DEBUG_REGS
there is no need to have another such ifdef inside the function.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 37caf9f2 27-Aug-2011 Kumar Gala <galak@kernel.crashing.org>

powerpc/fsl-booke: Handle L1 D-cache parity error correctly on e500mc

If the L1 D-Cache is in write shadow mode the HW will auto-recover the
error. However we might still log the error and cause a machine check
(if L1CSR0[CPE] - Cache error checking enable). We should only treat
the non-write shadow case as non-recoverable.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 685659ee 14-Jun-2011 yu liu <yu.liu@freescale.com>

powerpc/e500: Save SPEFCSR in flush_spe_to_thread()

giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().

This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>


# 76462232 03-Jun-2011 Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>

arch/powerpc: use printk_ratelimited instead of printk_ratelimit

Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 82a9a480 16-Jun-2011 Scott Wood <scottwood@freescale.com>

powerpc/e500: fix breakage with fsl_rio_mcheck_exception

The wrong MCSR bit was being used on e500mc. MCSR_BUS_RBERR only exists
on e500v1/v2. Use MCSR_LD on e500mc, and remove all MCSR checking
in fsl_rio_mcheck_exception as we now no longer call that function
if the appropriate bit in MCSR is not set.

If RIO support was enabled at compile-time, but was never probed, just
return from fsl_rio_mcheck_exception rather than dereference a NULL
pointer.

TODO: There is still a remaining, though comparitively minor, issue in
that this recovery mechanism will falsely engage if there's an unrelated
MCSR_LD event at the same time as a RIO error.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# cce1f106 17-Nov-2010 Shaohui Xie <b21989@freescale.com>

powerpc/fsl_rio: move machine_check handler

Add support for machine_check support into machine_check_e500 and
machine_check_e500mc.

Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 82a3242e 12-May-2011 Greg Kroah-Hartman <gregkh@suse.de>

sysfs: remove "last sysfs file:" line from the oops messages

On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.

This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.

So it's time to delete the line. This is good as we need all the space
we can get for oops messages at times on consoles.

Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 104699c0 27-Apr-2011 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

powerpc: Convert old cpumask API into new one

Adapt new API.

Almost change is trivial. Most important change is the below line
because we plan to change task->cpus_allowed implementation.

- ctx->cpus_allowed = current->cpus_allowed;

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 9f0b0793 07-Apr-2011 Michael Ellerman <michael@ellerman.id.au>

powerpc: Use MSR_64BIT in places

Use the new MSR_64BIT in a few places. Some of these are already ifdef'ed
for BOOKE vs BOOKS, but it's still clearer, MSR_SF does not immediately
parse as "MSR bit for 64bit".

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# efcac658 02-Mar-2011 Alexey Kardashevskiy <aik@au1.ibm.com>

powerpc: Per process DSCR + some fixes (try#4)

The DSCR (aka Data Stream Control Register) is supported on some
server PowerPC chips and allow some control over the prefetch
of data streams.

This patch allows the value to be specified per thread by emulating
the corresponding mfspr and mtspr instructions. Children of such
threads inherit the value. Other threads use a default value that
can be specified in sysfs - /sys/devices/system/cpu/dscr_default.

If a thread starts with non default value in the sysfs entry,
all children threads inherit this non default value even if
the sysfs value is changed later.

Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# e49b1fae 11-Jan-2011 Anton Blanchard <anton@samba.org>

powerpc: Don't silently handle machine checks from userspace

If a machine check comes from userspace we send a SIGBUS to the task and
fail to printk anything.

If we are taking machine checks due to bad hardware we want to know about
it right away. Furthermore if we don't complain loudly then it will look
a lot like a bug in the userspace application, potentially causing a lot
of confusion.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# dfb5509f 11-Jan-2011 Anton Blanchard <anton@samba.org>

powerpc: Remove duplicate debugger hook in machine_check_exception

We are calling debugger_fault_handler twice in machine_check_exception.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# a443506b 11-Jan-2011 Anton Blanchard <anton@samba.org>

powerpc: Don't force MSR_RI in machine_check_exception

We should never force MSR_RI on. If we take a machine check with MSR_RI off
then we have no chance of recovering safely.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 4490c06b 08-Oct-2010 Kumar Gala <galak@kernel.crashing.org>

powerpc/fsl-booke: Add support for FSL 64-bit e5500 core

The new e5500 core is similar to the e500mc core but adds 64-bit
support. We support running it in 32-bit mode as it is identical to the
e500mc.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# e3145b38 08-Jul-2010 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/book3e: Move doorbell_exception from traps.c to dbell.c

... where it belongs

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 2538c2d0 15-Jun-2010 K.Prasad <prasad@linux.vnet.ibm.com>

powerpc, hw_breakpoint: Handle concurrent alignment interrupts

If an alignment interrupt occurs on an instruction that is being
single-stepped, the alignment interrupt handler currently handles
the single-step condition by unconditionally sending a SIGTRAP to
the process. Other synchronous interrupts that result in the
instruction being emulated do likewise.

With hw_breakpoint support, the hw_breakpoint code needs to be able
to intercept these single-step events as well as those where the
instruction executes normally and a trace interrupt happens.

Fix this by making emulate_single_step() use the existing
single_step_exception() function instead of calling _exception()
directly. We then make single_step_exception() use the abstracted
clear_single_step() rather than clearing bits in the MSR image
directly so that emulate_single_step() will continue to work
correctly on Book 3E processors.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# fe04b112 07-Apr-2010 Scott Wood <scottwood@freescale.com>

powerpc/e500mc: Implement machine check handler.

Most of the MSCR bit assigments are different in e500mc versus
e500, and they are now write-one-to-clear.

Some e500mc machine check conditions are made recoverable (as long as
they aren't stuck on), most notably L1 instruction cache parity errors.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# ba797b28 20-May-2010 Jason Wessel <jason.wessel@windriver.com>

powerpc,kgdb: Introduce low level trap catching

The only way the debugger can handle a trap in inside rcu_lock,
notify_die, or atomic_notifier_call_chain without a recursive fault is
to allow the kernel debugger to handle the exception first in
program_check_exception().

The other change here is to make sure that kgdb_handle_exception() is
called with correct parameters when catching an oops, because kdb
needs to know if the entry was an oops, single step, or breakpoint
exception.

[benh@kernel.crashing.org: move debugger_bpt instead of #ifdef]

CC: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# fc5e7097 04-Mar-2010 Dave Kleikamp <shaggy@linux.vnet.ibm.com>

powerpc/476: add machine check handler for 47x core

The 47x core's MCSR varies from 44x, so it needs it's own machine check
handler.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# b8f87782 17-Feb-2010 Thomas Gleixner <tglx@linutronix.de>

powerpc: Convert die.lock to raw_spinlock

die.lock needs to be a real spinlock in RT. Convert it to
raw_spinlock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 3bffb652 08-Feb-2010 Dave Kleikamp <shaggy@linux.vnet.ibm.com>

powerpc/booke: Add support for advanced debug registers

powerpc/booke: Add support for advanced debug registers

From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

Based on patches originally written by Torez Smith.

This patch defines context switch and trap related functionality
for BookE specific Debug Registers. It adds support to ptrace()
for setting and getting BookE related Debug Registers

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <dwg@au1.ibm.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Sergio Durigan Junior <sergiodj@br.ibm.com>
Cc: Thiago Jung Bauermann <bauerman@br.ibm.com>
Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 172ae2e7 08-Feb-2010 Dave Kleikamp <shaggy@linux.vnet.ibm.com>

powerpc/booke: Introduce new CONFIG options for advanced debug registers

powerpc/booke: Introduce new CONFIG options for advanced debug registers

From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>

Introduce new config options to simplify the ifdefs pertaining to the
advanced debug registers for booke and 40x processors:

CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors
CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers
CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers
CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers
CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported

Beginning conservatively, since I only have the facilities to test 440
hardware. I believe all 40x and booke platforms support at least 2 IAC
and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as
well as the DAC ranges.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 89713ed1 31-Jan-2010 Anton Blanchard <anton@samba.org>

powerpc: Add timer, performance monitor and machine check counts to /proc/interrupts

With NO_HZ it is useful to know how often the decrementer is going off. The
patch below adds an entry for it and also adds it into the /proc/stat
summaries.

While here, I added performance monitoring and machine check exceptions.
I found it useful to keep an eye on the PMU exception rate
when using the perf tool. Since it's possible to take a completely
handled machine check on a System p box it also sounds like a good idea to
keep a machine check summary.

The event naming matches x86 to keep gratuitous differences to a minimum.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 66fcb105 07-Feb-2010 Anton Blanchard <anton@samba.org>

powerpc: Add last sysfs file and dump of ftrace buffer to oops printout

Add printout of last accessed sysfs file, added to x86 in
ae87221d3ce49d9de1e43756da834fd0bf05a2ad (sysfs: crash debugging)

Also add the notify_die hook that allows us to print out the ftrace
buffer on oops. This is useful in conjunction with ftrace function_graph:

Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
last sysfs file: /sys/class/net/tunl0/type
Dumping ftrace buffer:

...

0) | .sysrq_handle_crash() {
0) 0.476 us | .hash_page();
0) 0.488 us | .xmon_fault_handler();
0) | .bad_page_fault() {
0) | .search_exception_tables() {
0) 0.590 us | .search_module_extables();
0) 2.546 us | }
0) | .printk() {
0) | .vprintk() {
0) 0.488 us | ._raw_spin_lock();
0) 0.572 us | .emit_log_char();

Showing the function graph of a sysrq-c crash.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 5be3492f 11-Jan-2010 Anton Blanchard <anton@samba.org>

powerpc: Mark some variables in the page fault path __read_mostly

Using perf to trace L1 dcache misses and dumping data addresses I found a few
variables taking a lot of misses. Since they are almost never written, they
should go into the __read_mostly section.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 25baa35b 15-Dec-2009 Oleg Nesterov <oleg@redhat.com>

ptrace: powerpc: implement user_single_step_siginfo()

Suggested by Roland.

Implement user_single_step_siginfo() for powerpc.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0592d42 10-Nov-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: kill the obsolete code under is_global_init()

The code under "if (is_global_init())" is bogus, and is_global_init()
itself is not right in mt case.

Contrary to what the comment says, nowadays force_sig_info() does kill
init even if the handler is SIG_DFL. Note that force_sig_info() clears
SIGNAL_UNKILLABLE exactly for this case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# eecff81d 27-Oct-2009 Anton Blanchard <anton@samba.org>

powerpc: Create PPC_WARN_ALIGNMENT to match PPC_WARN_EMULATED

perf_event wants a separate event for alignment and emulation faults,
so create another emulation event. This will make it easy to hook in
perf_event at one spot.

We pass in regs which will be required for these events.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# ec097c84 28-May-2009 Roland McGrath <roland@redhat.com>

powerpc: Add PTRACE_SINGLEBLOCK support

Reworked by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This adds block-step support on powerpc, including a PTRACE_SINGLEBLOCK
request for ptrace.

The BookE implementation is tweaked to fire a single step after a
block step in order to mimmic the server behaviour.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 80947e7c 17-May-2009 Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>

powerpc: Keep track of emulated instructions

If CONFIG_PPC_EMULATED_STATS is enabled, make available counters for the
various classes of emulated instructions under
/sys/kernel/debug/powerpc/emulated_instructions/ (assumed debugfs is mounted on
/sys/kernel/debug). Optionally (controlled by
/sys/kernel/debug/powerpc/emulated_instructions/do_warn), rate-limited warnings
can be printed to the console when instructions are emulated.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 620165f9 12-Feb-2009 Kumar Gala <galak@kernel.crashing.org>

powerpc: Add support for using doorbells for SMP IPI

The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 16c57b36 10-Feb-2009 Kumar Gala <galak@kernel.crashing.org>

powerpc: Unify opcode definitions and support

Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.

We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.

Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# 6a800f36 27-Oct-2008 Liu Yu <yu.liu@freescale.com>

powerpc: Add SPE/EFP math emulation for E500v1/v2 processors.

This patch add the handlers of SPE/EFP exceptions.
The code is used to emulate float point arithmetic,
when MSR(SPE) is enabled and receive EFP data interrupt or EFP round interrupt.

This patch has no conflict with or dependence on FP math-emu.

The code has been tested by TestFloat.

Now the code doesn't support SPE/EFP instructions emulation
(it won't be called when receive program interrupt),
but it could be easily added.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 9d5a9e74 27-Jun-2008 Adrian Bunk <bunk@kernel.org>

Remove asm/a.out.h files for all architectures without a.out support.

This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.

[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# d6a61bfc 23-Jul-2008 Luis Machado <luisgpm@linux.vnet.ibm.com>

powerpc: BookE hardware watchpoint support

This patch implements support for HW based watchpoint via the
DBSR_DAC (Data Address Compare) facility of the BookE processors.

It does so by interfacing with the existing DABR breakpoint code
and adding the necessary bits and pieces for the new bits to
be properly set or cleared

Signed-off-by: Luis Machado <luisgpm@br.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>


# ce48b210 24-Jun-2008 Michael Neuling <mikey@neuling.org>

powerpc: Add VSX context save/restore, ptrace and signal support

This patch extends the floating point save and restore code to use the
VSX load/stores when VSX is available. This will make FP context
save/restore marginally slower on FP only code, when VSX is available,
as it has to load/store 128bits rather than just 64bits.

Mixing FP, VMX and VSX code will get constant architected state.

The signals interface is extended to enable access to VSR 0-31
doubleword 1 after discussions with tool chain maintainers. Backward
compatibility is maintained.

The ptrace interface is also extended to allow access to VSR 0-31 full
registers.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# f8279621 26-Jun-2008 Kumar Gala <galak@kernel.crashing.org>

powerpc/booke: Add kprobes support for booke style processors

This patch is based on work done by Madhvesh. R. Sulibhavi back in
March 2007.

We refactor some of the single step handling since it differs between
"classic" and "booke" powerpc cores.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 7dbb922c 30-Jan-2008 Olof Johansson <olof@lixom.net>

[POWERPC] Fix compilation for CONFIG_DEBUGGER=n and CONFIG_KEXEC=y

Looks like "[POWERPC] kdump shutdown hook support" broke builds when
CONFIG_DEBUGGER=n and CONFIG_KEXEC=y, such as in g5_defconfig:

arch/powerpc/kernel/crash.c: In function 'default_machine_crash_shutdown':
arch/powerpc/kernel/crash.c:388: error: '__debugger_fault_handler' undeclared (first use in this function)
arch/powerpc/kernel/crash.c:388: error: (Each undeclared identifier is reported only once
arch/powerpc/kernel/crash.c:388: error: for each function it appears in.)

Move the debugger hooks to under CONFIG_DEBUGGER || CONFIG_KEXEC, since
that's when the crash code is enabled.

(I should have caught this with my build-script pre-merge, my bad. :( )

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 47c0bd1a 20-Dec-2007 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[POWERPC] Reworking machine check handling and Fix 440/440A

This adds a cputable function pointer for the CPU-side machine
check handling. The semantic is still the same as the old one,
the one in ppc_md. overrides the one in cputable, though
ultimately we'll want to change that so the CPU gets first.

This removes CONFIG_440A which was a problem for multiplatform
kernels and instead fixes up the IVOR at runtime from a setup_cpu
function. The "A" version of the machine check also tweaks the
regs->trap value to differenciate the 2 versions at the C level.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>


# c1469f13 19-Nov-2007 Kumar Gala <galak@kernel.crashing.org>

[POWERPC] Emulate isel (Integer Select) instruction

isel (Integer Select) is a new user space instruction in the
PowerISA 2.04 spec. Not all processors implement it so lets emulate
to ensure code built with isel will run everywhere.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 19c5870c 19-Oct-2007 Alexey Dobriyan <adobriyan@openvz.org>

Use helpers to obtain task pid in printks (arch code)

One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.

It took some time to cross-compile it, but hopefully these are all the
printks in arch code.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b460cbc5 19-Oct-2007 Serge E. Hallyn <serue@us.ibm.com>

pid namespaces: define is_global_init() and is_container_init()

is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().

A cgroup init has it's tsk->pid == 1.

A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.

Changelog:

2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().

2.6.21-mm2-pidns2:

- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.

[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0c3d534 11-Oct-2007 Olof Johansson <olof@lixom.net>

[POWERPC] Implement logging of unhandled signals

Implement show_unhandled_signals sysctl + support to print when a process
is killed due to unhandled signals just as i386 and x86_64 does.

Default to having it off, unlike x86 that defaults on.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 5dd57a13 18-Sep-2007 Scott Wood <scottwood@freescale.com>

[POWERPC] 8xx: Move softemu8xx.c from arch/ppc

Previously, Soft_emulate_8xx was called with no implementation, resulting in
build failures whenever building 8xx without math emulation. The
implementation is copied from arch/ppc to resolve this issue.

However, this sort of minimal emulation is not a very good idea other than
for compatibility with existing userspaces, as it's less efficient than
soft-float and can mislead users into believing they have soft-float. Thus,
it is made a configurable option, off by default.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 75918a4b 20-Sep-2007 Olof Johansson <olof@lixom.net>

[POWERPC] Separate out legacy machine check exception parsers

Move out the old-style exception parsers to a separate function, and
don't call it on platforms that have a platform-specific handler.

It would make sense to move out the generic versions into their platforms
instead, but that can be done gradually down the road.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 01f1c735 04-Sep-2007 Olof Johansson <olof@lixom.net>

[POWERPC] Remove unused platform_machine_check()

Remove leftover cruft from ARCH=ppc.

There are no users of platform_machine_check() in ARCH=powerpc, and none
should be added (they should use ppc_md.machine_check_handler instead).

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 86d7a9a9 02-Aug-2007 Becky Bruce <becky.bruce@freescale.com>

[POWERPC] Fix FSL BookE machine check reporting

Reserved MCSR bits on FSL BookE parts may have spurious values
when mcheck occurs. Mask these off when printing the MCSR to
avoid confusion. Also, get rid of the MCSR_GL_CI bit defined
for e500 - this bit doesn't actually have any meaning.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# bcdcd8e7 17-Jul-2007 Pavel Emelianov <xemul@openvz.org>

Report that kernel is tainted if there was an OOPS

If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 608e2619 16-Jul-2007 Heiko Carstens <hca@linux.ibm.com>

generic bug: use show_regs() instead of dump_stack()

The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit. Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning. This will give more debug informations like register contents,
etc... In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning. E.g. on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():

[<000000000001517a>] show_trace+0x92/0xe8)
[<0000000000015270>] show_stack+0xa0/0xd0
[<00000000000152ce>] dump_stack+0x2e/0x3c
[<0000000000195450>] report_bug+0x98/0xf8
[<0000000000016cc8>] illegal_op+0x1fc/0x21c
[<00000000000227d6>] sysc_return+0x0/0x10

Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1eeb66a1 08-May-2007 Christoph Hellwig <hch@lst.de>

move die notifier handling to common code

This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ae7f4463 20-Mar-2007 anton@samba.org <anton@samba.org>

[POWERPC] Fix backwards ? : when printing machine type

Looks like someone got this backwards, highlighting the perils of the
? : !!! :)

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 34c2a14f 20-Mar-2007 anton@samba.org <anton@samba.org>

[POWERPC] Handle recursive oopses

Handle recursive oopses, like on x86. We had a few cases recently where
we locked up in oops printing and didnt make it into crashdump.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 6031d9d9 20-Mar-2007 anton@samba.org <anton@samba.org>

[POWERPC] Clean up pmac_backlight_unblank in oops path

Move pmac_backlight_unblank into its own function and only take the
pmac_backlight_mutex when we are on a pmac for that added bit of
paranoia.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 293e4688 20-Mar-2007 anton@samba.org <anton@samba.org>

[POWERPC] Add missing oops_enter/oops_exit

Add missing oops_enter/oops_exit, makes pause_on_oops boot parameter work.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 599a52d1 10-Feb-2007 Richard Purdie <rpurdie@rpsys.net>

backlight: Separate backlight properties from backlight ops pointers

Per device data such as brightness belongs to the indivdual device
and should therefore be separate from the the backlight operation
function pointers. This patch splits the two types of data and
allows simplifcation of some code.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>


# 28ee086d 08-Feb-2007 Richard Purdie <rpurdie@rpsys.net>

backlight: Fix external uses of backlight internal semaphore

backlight_device->sem has a very specific use as documented in the
header file. The external users of this are using it for a different
reason, to serialise access to the update_status() method.

backlight users were supposed to implement their own internal
serialisation of update_status() if needed but everyone is doing
things differently and incorrectly. Therefore add a global mutex to
take care of serialisation for everyone, once and for all.

Locking for get_brightness remains optional since most users don't
need it.

Also update the lcd class in a similar way.

Signed-off-by: Richard Purdie <rpurdie@rpsys.net>


# 5fad293b 07-Feb-2007 Kumar Gala <galak@kernel.crashing.org>

[POWERPC] Fixup error handling when emulating a floating point instruction

When we do full FP emulation its possible that we need to post a SIGFPE based
on the results of the emulation. The previous code ignored this case completely.

Additionally, the Soft_emulate_8xx case had two issues. One, we should never
generate a SIGFPE since the code only does data movement. Second, we were
interpreting the return codes incorrectly, it returns 0 on success, 1 on
illop and -EFAULT on a data access error.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 04903a30 07-Feb-2007 Kumar Gala <galak@kernel.crashing.org>

[POWERPC] Enable interrupts if we are doing fp math emulation

Anytime we are emulating an instruction we are going to be doing some form of
get_user() to get the instruction image to decode. Since get_user() might
sleep we need to ensure we have interrupts enabled or we might see something
like:

Debug: sleeping function called from invalid context at arch/powerpc/kernel/traps.c:697
in_atomic():0, irqs_disabled():1
Call Trace:
[D6023EB0] [C0007F84] show_stack+0x58/0x174 (unreliable)
[D6023EE0] [C0022C34] __might_sleep+0xbc/0xd0
[D6023EF0] [C000D158] program_check_exception+0x1d8/0x4fc
[D6023F40] [C000E744] ret_from_except_full+0x0/0x4c
--- Exception: 700 at 0x102a7100
LR = 0xdb9ef04

However, we want to ensure that interrupts are disabled when handling a trap
exception that might be used for a kernel breakpoint. This is why ProgramCheck
is marked as EXC_XFER_STD instead of EXC_XFER_EE.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 60bccbed 19-Dec-2006 Akinobu Mita <akinobu.mita@gmail.com>

[POWERPC] Use is_init() instead of pid==1

Use is_init() rather than hard coded pid comparison.

Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 73c9ceab 08-Dec-2006 Jeremy Fitzhardinge <jeremy@goop.org>

[POWERPC] Generic BUG for powerpc

This makes powerpc use the generic BUG machinery. The biggest reports the
function name, since it is redundant with kallsyms, and not needed in general.

There is an overall reduction of code, since module_32/64 duplicated several
functions.

Unfortunately there's no way to tell gcc that BUG won't return, so the BUG
macro includes a goto loop. This will generate a real jmp instruction, which
is never used.

[akpm@osdl.org: build fix]
[paulus@samba.org: remove infinite loop in BUG_ON]
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# aa42c69c 08-Dec-2006 Kim Phillips <kim.phillips@freescale.com>

[POWERPC] Add support for FP emulation for the e300c2 core

The e300c2 has no FPU. Its MSR[FP] is grounded to zero. If an attempt
is made to execute a floating point instruction (including floating-point
load, store, or move instructions), the e300c2 takes a floating-point
unavailable interrupt.

This patch adds support for FP emulation on the e300c2 by declaring a
new CPU_FTR_FP_TAKES_FPUNAVAIL, where FP unavail interrupts are
intercepted and redirected to the ProgramCheck exception path for
correct emulation handling.

(If we run out of CPU_FTR bits we could look to reclaim this bit by adding
support to test the cpu_user_features for PPC_FEATURE_HAS_FPU instead)

It adds a nop to the exception path for 32-bit processors with a FPU.

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# 68a64357 12-Nov-2006 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[POWERPC] Merge 32 and 64 bits asm-powerpc/io.h

powerpc: Merge 32 and 64 bits asm-powerpc/io.h

The rework on io.h done for the new hookable accessors made it easier,
so I just finished the work and merged 32 and 64 bits io.h for arch/powerpc.

arch/ppc still uses the old version in asm-ppc, there is just too much gunk
in there that I really can't be bothered trying to cleanup.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 4393c4f6 31-Oct-2006 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[POWERPC] Make alignment exception always check exception table

The alignment exception used to only check the exception table for
-EFAULT, not for other errors. That opens an oops window if we can
coerce the kernel into getting an alignment exception for other reasons
in what would normally be a user-protected accessor, which can be done
via some of the futex ops. This fixes it by always checking the
exception tables.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 6c4841c2 12-Oct-2006 Anton Blanchard <anton@samba.org>

[POWERPC] Never panic when taking altivec exceptions from userspace

At the moment we rely on a cpu feature bit or a firmware property to
detect altivec. If we dont have either of these and the cpu does in fact
support altivec we can cause a panic from userspace.

It seems safer to always send a signal if we manage to get an 0xf20
exception from userspace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# c3412dcb 30-Aug-2006 Will Schmidt <will_schmidt@vnet.ibm.com>

[POWERPC] Emulate power5 popcntb instruction

In an attempt to make it easier for a power5 optimized app to run on a
power4 or a 970 or random earlier machine, this provides emulation of
the popcntb instruction.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 87589f08 23-Aug-2006 Paul Mackerras <paulus@samba.org>

[POWERPC] Correct masks used in emulating some instructions

When we get an illegal instruction exception, we check to see whether
the instruction is one that we emulate for the user program. Some of
the masks we use in checking whether the offending instruction is one
we care about didn't have the top bit set, which is the MSB of the
major opcode. Thus some undefined opcodes could get emulated as other
(defined but unimplemented) instructions. This corrects the masks.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# b6f35b49 04-Jul-2006 Michael Ellerman <michael@ellerman.id.au>

[POWERPC] Make crash.c work on 32-bit and 64-bit

To compile kexec on 32-bit we need a few more bits and pieces. Rather
than add empty definitions, we can make crash.c work on 32-bit, with
only a couple of kludges.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# eac8392f 29-Jun-2006 David Wilder <dwilder@us.ibm.com>

[POWERPC] Make secondary CPUs call into kdump on reset exception

In the case of a system hang, the user will invoke soft-reset to
initiate the kdump boot. If xmon is enabled, the CPU(s) enter into the
xmon debugger. Unfortunately, the secondary CPU(s) will return to the
hung state when they exit from the debugger (returned from die() ->
system_reset_exception()). This causes a problem in kdump since the
hung CPU(s) will not respond to the IPI sent from kdump. This patch
fixes the issue by calling crash_kexec_secondary() directly from
system_reset_exception() without returning to the previous state. These
secondary CPUs wait 5ms until the kdump boot is started by the primary
CPU. In the case we exited from the debugger to "recover" (command 'x'
in xmon) the primary and the secondary CPUs will all return from die()
-> system_reset_exception() ->crash_kexec_secondary() wait 5ms, then
return to the previous state. A kdump boot is not started in this case.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 012c437d 14-Aug-2006 Horms <horms@verge.net.au>

[PATCH] Change panic_on_oops message to "Fatal exception"

Previously the message was "Fatal exception: panic_on_oops", as introduced
in a recent patch whith removed a somewhat dangerous call to ssleep() in
the panic_on_oops path. However, Paul Mackerras suggested that this was
somewhat confusing, leadind people to believe that it was panic_on_oops
that was the root cause of the fatal exception. On his suggestion, this
patch changes the message to simply "Fatal exception". A suitable oops
message should already have been displayed.

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# cea6a4ba 30-Jul-2006 Horms <horms@verge.net.au>

[PATCH] panic_on_oops: remove ssleep()

This patch is part of an effort to unify the panic_on_oops behaviour across
all architectures that implement it.

It was pointed out to me by Andi Kleen that if an oops has occured in
interrupt context, then calling sleep() in the oops path will only cause a
panic, and that it would be really better for it not to be in the path at
all.

This patch removes the ssleep() call and reworks the console message
accordinly. I have a slght concern that the resulting console message is
too long, feedback welcome.

For powerpc it also unifies the 32bit and 64bit behaviour.

Fror x86_64, this patch only updates the console message, as ssleep() is
already not present.

Signed-off-by: Horms <horms@verge.net.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# c0ce7d08 23-Jun-2006 David Wilder <dwilder@us.ibm.com>

[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.

With this patch, kdump uses the firmware soft-reset NMI for two purposes:
1) Initiate the kdump (take a crash dump) by issuing a soft-reset.
2) Break a CPU out of a deadlock condition that is detected during kdump
processing.

When a soft-reset is initiated each CPU will enter
system_reset_exception() and set its corresponding bit in the global
bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set
in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first
CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs
are "secondary CPUs". The secondary CPU's pass through to
crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs
to enter via soft-reset then boots the kdump kernel (see
crash_soft_reset_check())

When the system crashes due to a panic or exception, crash_kexec() is
called by panic() or die(). The crashing CPU sends an IPI to all other
CPUs to notify them of the pending shutdown. If a CPU is in a deadlock
or hung state with interrupts disabled, the IPI will not be delivered.
The result being, that the kdump kernel is not booted. This problem is
solved with the use of a firmware generated soft-reset. After the
crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to
enter crash_ipi_callback(). A CPU signifies its entry to
crash_ipi_callback() by setting its corresponding bit in the
cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set
their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The
operator is then prompted to generate a soft-reset to break the
deadlock. Each CPU enters the soft reset handler as described above.

Two conditions must be handled at this point:
1) The system crashed because the operator generated a soft-reset. See
2) The system had crashed before the soft-reset was generated ( in the
case of a Panic or oops).

The first CPU to enter crash_kexec() uses the state of the kexec_lock to
determine this state. If kexec_lock is already held then condition 2 is
true and crash_kexec_secondary() is called, else; this CPU is flagged as
the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds
as described above.

Each additional CPUs responding to the soft-reset will pass through
crash_kexec() to kexec_secondary(). All secondary CPUs call
crash_ipi_callback() readying them self's for the shutdown. When ready
they clear their bit in cpus_in_sr. The crashing CPU waits in
kexec_secondary() until all other CPUs have cleared their bits in
cpus_in_sr. The kexec kernel boot is then started.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 5474c120 25-Jun-2006 Michael Hanselmann <linux-kernel@hansmi.ch>

[PATCH] Rewritten backlight infrastructure for portable Apple computers

This patch contains a total rewrite of the backlight infrastructure for
portable Apple computers. Backward compatibility is retained. A sysfs
interface allows userland to control the brightness with more steps than
before. Userland is allowed to upload a brightness curve for different
monitors, similar to Mac OS X.

[akpm@osdl.org: add needed exports]
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e9370ae1 07-Jun-2006 Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: Implement PR_[GS]ET_UNALIGN prctls for powerpc

This gives the ability to control whether alignment exceptions get
fixed up or reported to the process as a SIGBUS, using the existing
PR_SET_UNALIGN and PR_GET_UNALIGN prctls. We do not implement the
option of logging a message on alignment exceptions.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# fab5db97 07-Jun-2006 Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: Implement support for setting little-endian mode via prctl

This adds the PowerPC part of the code to allow processes to change
their endian mode via prctl.

This also extends the alignment exception handler to be able to fix up
alignment exceptions that occur in little-endian mode, both for
"PowerPC" little-endian and true little-endian.

We always enter signal handlers in big-endian mode -- the support for
little-endian mode does not amount to the creation of a little-endian
user/kernel ABI. If the signal handler returns, the endian mode is
restored to what it was when the signal was delivered.

We have two new kernel CPU feature bits, one for PPC little-endian and
one for true little-endian. Most of the classic 32-bit processors
support PPC little-endian, and this is reflected in the CPU feature
table. There are two corresponding feature bits reported to userland
in the AT_HWCAP aux vector entry.

This is based on an earlier patch by Anton Blanchard.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# 1a6a4ffe 30-Mar-2006 Kumar Gala <galak@kernel.crashing.org>

powerpc: merge machine_check_exception between ppc32 & ppc64

Make machine_check_exception handling code path the same on ppc32 & ppc64.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


# e8222502 28-Mar-2006 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] powerpc: Kill _machine and hard-coded platform numbers

This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism. With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants. This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# e041c683 27-Mar-2006 Alan Stern <stern@rowland.harvard.edu>

[PATCH] Notifier chain update: API changes

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain

It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# cd8a5673 02-Mar-2006 Paul Mackerras <paulus@samba.org>

powerpc: Fix might-sleep warning in program check exception handler

On 32-bit, the exception prolog for the program check exception doesn't
enable interrupts early on. If it is an illegal instruction exception,
we read the instruction in order to emulate certain instructions, and
the get_user of the instruction triggers a WARN_ON since interrupts
are still disabled. This adds a local_irq_enable() to enable
interrupts before reading the instruction.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# c902be71 04-Jan-2006 Arnd Bergmann <arnd@arndb.de>

[PATCH] cell: enable pause(0) in cpu_idle

This patch enables support for pause(0) power management state
for the Cell Broadband Processor, which is import for power efficient
operation. The pervasive infrastructure will in the future enable
us to introduce more functionality specific to the Cell's
pervasive unit.

From: Maximino Aguilar <maguilar@us.ibm.com>
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 555d97ac 15-Dec-2005 Andy Fleming <afleming@freescale.com>

[PATCH] powerpc: G4+ oprofile support

This patch adds oprofile support for the 7450 and all its multitudinous
derivatives.

* Added 7450 (and derivatives) support for oprofile
* Changed e500 cputable to have oprofile model and cpu_type fields
* Added support for classic 32-bit performance monitor interrupt
* Cleaned up common powerpc oprofile code to be as common as possible
* Cleaned up oprofile_impl.h to reflect 32 bit classic code
* Added 32-bit MMCRx bitfield definitions and SPR numbers

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# cc532915 04-Dec-2005 Michael Ellerman <michael@ellerman.id.au>

[PATCH] powerpc: Add arch dependent basic infrastructure for Kdump.

Implementing the machine_crash_shutdown which will be called by
crash_kexec (called in case of a panic, sysrq etc.). Disable the
interrupts, shootdown cpus using debugger IPI and collect regs
for all CPUs.

elfcorehdr= specifies the location of elf core header stored by
the crashed kernel. This command line option will be passed by
the kexec-tools to capture kernel.

savemaxmem= specifies the actual memory size that the first kernel
has and this value will be used for dumping in the capture kernel.
This command line option will be passed by the kexec-tools to
capture kernel.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# a7f290da 11-Nov-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel

This patch moves the vdso's to arch/powerpc, adds support for the 32
bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds
some new (still untested) routines to both vdso's: clock_gettime() with
support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same
clocks) and get_tbfreq() for glibc to retreive the timebase frequency.

Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits
returns a long long (r3, r4) not a long. This is such that if we ever
add support for >4Ghz timebases on ppc32, the userland interface won't
have to change.

I have tested gettimeofday() using some glibc patches in both ppc32 and
ppc64 kernels using 32 bits userland (I haven't had a chance to test a
64 bits userland yet, but the implementation didn't change and was
tested earlier). I haven't tested yet the new functions.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 799d6046 09-Nov-2005 Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: merge code values for identifying platforms

This patch merges platform codes. systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32. Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# a31751e0 08-Nov-2005 Matt Porter <mporter@kernel.crashing.org>

[PATCH] ppc32: fix perf_irq extern on e500

Fixes e500 build and cleans up traps.c by moving perf_irq extern to
pmc.h.

Signed-off-by: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 570142ca 07-Nov-2005 Anton Blanchard <anton@samba.org>

[PATCH] ppc64: remove some direct xmon calls

Even though we can enable and disable xmon at runtime now, there are a
few places in the merge tree that call xmon and xmon_printf directly.

In the case below we call die() which will call xmon if it is enabled.

Also remove an unnecessary include of xmon.h in smp.c.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 104dd65f 01-Nov-2005 Paul Mackerras <paulus@samba.org>

powerpc: clean up bug.h further

This simplifies the macros which are different between 32-bit and
64-bit. It also fixes a couple of printks on the bug->line element,
which is now a long.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# f3f66f59 31-Oct-2005 Arnd Bergmann <arndb@de.ibm.com>

[PATCH] powerpc: Rename BPA to Cell

The official name for BPA is now CBEA (Cell Broadband
Engine Architecture). This patch renames all occurences
of the term BPA to 'Cell' for easier recognition.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 25c8a78b 27-Oct-2005 David Gibson <david@gibson.dropbear.id.au>

[PATCH] powerpc: Fix handling of fpscr on 64-bit

The recent merge of fpu.S broken the handling of fpscr for
ARCH=powerpc and CONFIG_PPC64=y. FP registers could be corrupted,
leading to strange random application crashes.

The confusion arises, because the thread_struct has (and requires) a
64-bit area to save the fpscr, because we use load/store double
instructions to get it in to/out of the FPU. However, only the low
32-bits are actually used, so we want to treat it as a 32-bit quantity
when manipulating its bits to avoid extra load/stores on 32-bit. This
patch replaces the current definition with a structure of two 32-bit
quantities (pad and val), to clarify things as much as is possible.
The 'val' field is used when manipulating bits, the structure itself
is used when obtaining the address for loading/unloading the value
from the FPU.

While we're at it, consolidate the 4 (!) almost identical versions of
cvt_fd() and cvt_df() (arch/ppc/kernel/misc.S,
arch/ppc64/kernel/misc.S, arch/powerpc/kernel/misc_32.S,
arch/powerpc/kernel/misc_64.S) into a single version in fpu.S. The
new version takes a pointer to thread_struct and applies the correct
offset itself, rather than a pointer to the fpscr field itself, again
to avoid confusion as to which is the correct field to use.

Finally, this patch makes ARCH=ppc64 also use the consolidated fpu.S
code, which it previously did not.

Built for G5 (ARCH=ppc64 and ARCH=powerpc), 32-bit powermac (ARCH=ppc
and ARCH=powerpc) and Walnut (ARCH=ppc, CONFIG_MATH_EMULATION=y).
Booted on G5 (ARCH=powerpc) and things which previously fell over no
longer do.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# f7f6f4fe 18-Oct-2005 David Gibson <david@gibson.dropbear.id.au>

[PATCH] powerpc: Merge ppc64 pmc.[ch] with ppc32 perfmon.[ch]

This patches the ppc32 and ppc64 versions of the headers and .c files
with helper functions for manipulating the performance counting
hardware. As a side effect, it removes use of the term "perfmon" from
ppc32, thus avoiding confusion with the unrelated performance counter
interface from HP Labs also called "perfmon".

Built, but not booted, for g5, pSeries, iSeries, and 32-bit Powermac
with both ARCH=powerpc and ARCH=ppc{,64} as appropriate.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>


# 86417780 10-Oct-2005 Paul Mackerras <paulus@samba.org>

powerpc: Reduce the 32/64-bit differences in traps.c

Signed-off-by: Paul Mackerras <paulus@samba.org>


# 8dad3f92 05-Oct-2005 Paul Mackerras <paulus@samba.org>

powerpc: Merge traps.c a bit more

This reduces the differences between ppc32 and ppc64 in
arch/powerpc/kernel/traps.c a bit further.

Signed-off-by: Paul Mackerras <paulus@samba.org>


# dc1c1ca3 01-Oct-2005 Stephen Rothwell <sfr@canb.auug.org.au>

powerpc: merge idle_power4.S and trapc.s

Use idle_power4.S from ppc64 as we are not going to support
32 bit power4 in the merged tree.

Merge ppc64 traps.c into powerpc traps.c:
use ppc64 versions of exception routine names
(as they don't have StudlyCaps)
make all the versions if die() have the same
prototype

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>


# 14cf11af 26-Sep-2005 Paul Mackerras <paulus@samba.org>

powerpc: Merge enough to start building in arch/powerpc.

This creates the directory structure under arch/powerpc and a bunch
of Kconfig files. It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.

For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes
to files in those directories and files outside arch/powerpc.

The boot directory is still not merged. That's going to be interesting.

Signed-off-by: Paul Mackerras <paulus@samba.org>