#
8c328de8 |
|
15-Feb-2024 |
Shrikanth Hegde <sshegde@linux.ibm.com> |
powerpc: Remove duplicate/unnecessary ifdefs When an ifdef is used in the below manner, second one could be considered as duplicate. ifdef DEFINE_A ...code block... ifdef DEFINE_A <-- This is a duplicate. ...code block... endif else ifndef DEFINE_A <-- This is also duplicate. ...code block... endif endif More details about the script and methods used to find these code patterns are in cover letter of [1]. Few places in arch/powerpc where this pattern was seen: paca.h: Hunk1: Code is under check of CONFIG_PPC64 from line 13, hence the second CONFIG_PPC64 at line 166 is a duplicate. Hunk2: CONFIG_PPC_BOOK3S_64 was defined back to back. Merged the two ifdefs. asm-offsets.c: Code is under check of CONFIG_PPC64 from line 176 hence second CONFIG_PPC64 at line 249 is a duplicate. powermac/feature.c: #ifndef CONFIG_PPC64 is used at line 2066. And then in #else again #ifdef CONFIG_PPC64 is used. Which is a duplicate since in #else means CONFIG_PPC64 is defined. xmon.c: Code is under the check of CONFIG_SMP from line 521 hence the same check of CONFIG_SMP at line 646 is a duplicate. No functional change is intended here. It only aims to improve code readability. [1] https://lore.kernel.org/all/20240118080326.13137-1-sshegde@linux.ibm.com/ Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240216053016.528906-1-sshegde@linux.ibm.com
|
#
1aa00066 |
|
22-Aug-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Don't include lppaca.h in paca.h By adding a forward declaration for struct lppaca we can untangle paca.h and lppaca.h. Also move get_lppaca() into lppaca.h for consistency. Add includes of lppaca.h to some files that need it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230823055317.751786-3-mpe@ellerman.id.au
|
#
b0c5b4f1 |
|
09-May-2023 |
Rohan McLure <rmclure@linux.ibm.com> |
powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention The idle_state entry in the PACA on PowerNV features a bit which is atomically tested and set through ldarx/stdcx. to be used as a spinlock. This lock then guards access to other bit fields of idle_state. KCSAN cannot differentiate between any of these bitfield accesses as they all are implemented by 8-byte store/load instructions, thus cores contending on the bit-lock appear to data race with modifications to idle_state. Separate the bit-lock entry from the data guarded by the lock to avoid the possibility of data races being detected by KCSAN. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-7-rmclure@linux.ibm.com
|
#
7e3a68be |
|
07-Apr-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: vmlinux support building with PCREL addresing PC-Relative or PCREL addressing is an extension to the ELF ABI which uses Power ISA v3.1 PC-relative instructions to calculate addresses, rather than the traditional TOC scheme. Add an option to build vmlinux using pcrel addressing. Modules continue to use TOC addressing. - TOC address helpers and r2 are poisoned with -1 when running vmlinux. r2 could be used for something useful once things are ironed out. - Assembly must call C functions with @notoc annotation, or the linker complains aobut a missing nop after the call. This is done with the CFUNC macro introduced earlier. - Boot: with the exception of prom_init, the execution branches to the kernel virtual address early in boot, before any addresses are generated, which ensures 34-bit pcrel addressing does not miss the high PAGE_OFFSET bits. TOC relative addressing has a similar requirement. prom_init does not go to the virtual address and its addresses should not carry over to the post-prom kernel. - Ftrace trampolines are converted from TOC addressing to pcrel addressing, including module ftrace trampolines that currently use the kernel TOC to find ftrace target functions. - BPF function prologue and function calling generation are converted from TOC to pcrel. - copypage_64.S has an interesting problem, prefixed instructions have alignment restrictions so the linker can add padding, which makes the assembler treat the difference between two local labels as non-constant even if alignment is arranged so padding is not required. This may need toolchain help to solve nicely, for now move the prefix instruction out of the alternate patch section to work around it. This reduces kernel text size by about 6%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
|
#
dc222fa7 |
|
16-Dec-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Move paca allocation to early_setup() The early paca and boot cpuid dance is complicated and currently does not quite work as expected for boot cpuid != 0 cases. early_init_devtree() currently allocates the paca_ptrs and boot cpuid paca, but until that returns and early_setup() calls setup_paca(), this thread is currently still executing with smp_processor_id() == 0. One problem this causes is the paca_ptrs[smp_processor_id()] pointer is poisoned, so valid_emergency_stack() (any backtrace) and any similar users will crash. Another is that the hardware id which is set here will not be returned by get_hard_smp_processor_id(smp_processor_id()), but it would work correctly for boot_cpuid == 0, which could lead to difficult to reproduce or find bugs. The hard id does not seem to be used by the rest of early_init_devtree(), it just looks like all this code might have been put here to allocate somewhere to store boot CPU hardware id while scanning the devtree. Rearrange things so the hwid is put in a global variable like boot_cpuid, and do all the paca allocation and boot paca setup in the 64-bit early_setup() after we have everything ready to go. The paca_ptrs[0] re-poisoning code in early_setup does not seem to have ever worked, because paca_ptrs[0] was never not-poisoned when boot_cpuid is not 0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix build error on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-4-npiggin@gmail.com
|
#
e0d68273 |
|
19-Sep-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Remove CONFIG_PPC_BOOK3E CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64. The later is more explicit about the fact that it's a 64 bits target. Remove CONFIG_PPC_BOOK3E. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d0891490813c19cdcfc04678f512ea68cba3e64.1663606876.git.christophe.leroy@csgroup.eu
|
#
f88aabad |
|
07-Sep-2022 |
Nathan Lynch <nathanl@linux.ibm.com> |
Revert "powerpc/rtas: Implement reentrant rtas call" At the time this was submitted by Leonardo, I confirmed -- or thought I had confirmed -- with PowerVM partition firmware development that the following RTAS functions: - ibm,get-xive - ibm,int-off - ibm,int-on - ibm,set-xive were safe to call on multiple CPUs simultaneously, not only with respect to themselves as indicated by PAPR, but with arbitrary other RTAS calls: https://lore.kernel.org/linuxppc-dev/875zcy2v8o.fsf@linux.ibm.com/ Recent discussion with firmware development makes it clear that this is not true, and that the code in commit b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call") is unsafe, likely explaining several strange bugs we've seen in internal testing involving DLPAR and LPM. These scenarios use ibm,configure-connector, whose internal state can be corrupted by the concurrent use of the "reentrant" functions, leading to symptoms like endless busy statuses from RTAS. Fixes: b664db8e3f97 ("powerpc/rtas: Implement reentrant rtas call") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220907220111.223267-1-nathanl@linux.ibm.com
|
#
dcf280e6 |
|
27-May-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64: Include cache.h directly in paca.h paca.h uses ____cacheline_aligned without directly including cache.h, where it's defined. For Book3S builds that's OK because paca.h includes lppaca.h, and it does include cache.h. But Book3E builds have been getting cache.h indirectly via printk.h, which is dicey, and in fact that include was recently removed, leading to build errors such as: ld: fs/isofs/dir.o:(.bss+0x0): multiple definition of `____cacheline_aligned'; fs/isofs/namei.o:(.bss+0x0): first defined here So include cache.h directly to fix the build error. Fixes: 534aa1dc975a ("printk: stop including cache.h from printk.h") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f693d38d |
|
09-Apr-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/mm: Remove CONFIG_PPC_MM_SLICES CONFIG_PPC_MM_SLICES is always selected by hash book3s/64. CONFIG_PPC_MM_SLICES is never selected by other platforms. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dc2cdc204de8978574bf7c02329b6cfc4db0bce7.1649523076.git.christophe.leroy@csgroup.eu
|
#
cc15ff32 |
|
20-Jan-2022 |
Ganesh Goudar <ganeshgr@linux.ibm.com> |
powerpc/mce: Avoid using irq_work_queue() in realmode In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside RMA, hence we enable translation before calling irq_work_queue and disable on return, though it is not safe to do in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220120121931.517974-1-ganeshgr@linux.ibm.com
|
#
387e220a |
|
01-Dec-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
|
#
13799748 |
|
17-Jun-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: use interrupt restart table to speed up return from interrupt Use the restart table facility to return from interrupt or system calls without disabling MSR[EE] or MSR[RI]. Interrupt return asm is put into the low soft-masked region, to prevent interrupts being processed here, although they are still taken as masked interrupts which causes SRRs to be clobbered, and a pending soft-masked interrupt to require replaying. The return code uses restart table regions to redirct to a fixup handler rather than continue with the exit, if such an interrupt happens. In this case the interrupt return is redirected to a fixup handler which reloads r1 for the interrupt stack and reloads registers and sets state up to replay the soft-masked interrupt and try the exit again. Some types of security exit fallback flushes and barriers are currently unable to cope with reentrant interrupts, e.g., because they store some state in the scratch SPR which would be clobbered even by masked interrupts. For now the interrupts-enabled exits are disabled when these flushes are used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Guard unused exit_must_hard_disable() as reported by lkp] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-13-npiggin@gmail.com
|
#
59dc5bfc |
|
17-Jun-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: avoid reloading (H)SRR registers if they are still valid When an interrupt is taken, the SRR registers are set to return to where it left off. Unless they are modified in the meantime, or the return address or MSR are modified, there is no need to reload these registers when returning from interrupt. Introduce per-CPU flags that track the validity of SRR and HSRR registers. These are cleared when returning from interrupt, when using the registers for something else (e.g., OPAL calls), when adjusting the return address or MSR of a context, and when context switching (which changes the return address and MSR). This improves the performance of interrupt returns. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in fixup patch from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
|
#
13c7dad9 |
|
05-May-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/paca: Remove mm_ctx_id and mm_ctx_slb_addr_limit mm_ctx_id and mm_ctx_slb_addr_limit are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6e1813953da38c452c131fe3e2a2761a0fddb975.1620223303.git.christophe.leroy@csgroup.eu
|
#
ac7c5e9b |
|
07-Feb-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove EXSLB interrupt save area SLB faults should not be taken while the PACA save areas are live, all memory accesses should be fetches from the kernel text, and access to PACA and the current stack, before C code is called or any other accesses are made. All of these have pinned SLBs so will not take a SLB fault. Therefore EXSLB is not be required. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208063406.331655-1-npiggin@gmail.com
|
#
6c6fdbb2 |
|
25-Jan-2021 |
Chengyang Fan <cy.fan@huawei.com> |
powerpc: remove unneeded semicolons Remove superfluous semicolons after function definitions. Signed-off-by: Chengyang Fan <cy.fan@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210125095338.1719405-1-cy.fan@huawei.com
|
#
923b3cf0 |
|
28-Jan-2021 |
Ganesh Goudar <ganeshgr@linux.ibm.com> |
powerpc/mce: Remove per cpu variables from MCE handlers Access to per-cpu variables requires translation to be enabled on pseries machine running in hash mmu mode, Since part of MCE handler runs in realmode and part of MCE handling code is shared between ppc architectures pseries and powernv, it becomes difficult to manage these variables differently on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210128104143.70668-2-ganeshgr@linux.ibm.com
|
#
ada68a66 |
|
23-Jun-2020 |
Mahesh Salgaonkar <mahesh@linux.ibm.com> |
powerpc/64s: Move HMI IRQ stat from percpu variable to paca. With the proposed change in percpu bootmem allocator to use page mapping [1], the percpu first chunk memory area can come from vmalloc ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler crash the kernel whenever percpu variable is accessed in real mode. This patch fixes this issue by moving the HMI IRQ stat inside paca for safe access in realmode. [1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/ Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
|
#
d6bdceb6 |
|
29-May-2020 |
Peter Zijlstra <peterz@infradead.org> |
powerpc64: Break asm/percpu.h vs spinlock_types.h dependency In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> https://lkml.kernel.org/r/20200623083721.336906073@infradead.org
|
#
b664db8e |
|
18-May-2020 |
Leonardo Bras <leobras.c@gmail.com> |
powerpc/rtas: Implement reentrant rtas call Implement rtas_call_reentrant() for reentrant rtas-calls: "ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive". On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4, items 2 and 3 say: 2 - For the PowerPC External Interrupt option: The * call must be reentrant to the number of processors on the platform. 3 - For the PowerPC External Interrupt option: The * argument call buffer for each simultaneous call must be physically unique. So, these rtas-calls can be called in a lockless way, if using a different buffer for each cpu doing such rtas call. For this, it was suggested to add the buffer (struct rtas_args) in the PACA struct, so each cpu can have it's own buffer. The PACA struct received a pointer to rtas buffer, which is allocated in the memory range available to rtas 32-bit. Reentrant rtas calls are useful to avoid deadlocks in crashing, where rtas-calls are needed, but some other thread crashed holding the rtas.lock. This is a backtrace of a deadlock from a kdump testing environment: #0 arch_spin_lock #1 lock_rtas () #2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0) #3 ics_rtas_mask_real_irq (hw_irq=4100) #4 machine_kexec_mask_interrupts #5 default_machine_crash_shutdown #6 machine_crash_shutdown #7 __crash_kexec #8 crash_kexec #9 oops_end Signed-off-by: Leonardo Bras <leobras.c@gmail.com> [mpe: Move under #ifdef PSERIES to avoid build breakage] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com
|
#
0a882e28 |
|
28-Jun-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/exception: remove bad stack branch The bad stack test in interrupt handlers has a few problems. For performance it is taken in the common case, which is a fetch bubble and a waste of i-cache. For code development and maintainence, it requires yet another stack frame setup routine, and that constrains all exception handlers to follow the same register save pattern which inhibits future optimisation. Remove the test/branch and replace it with a trap. Teach the program check handler to use the emergency stack for this case. This does not result in quite so nice a message, however the SRR0 and SRR1 of the crashed interrupt can be seen in r11 and r12, as is the original r1 (adjusted by INT_FRAME_SIZE). These are the most important parts to debugging the issue. The original r9-12 and cr0 is lost, which is the main downside. kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847! Oops: Exception in kernel mode, sig: 5 [#1] BE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted NIP: c000000000009108 LR: c000000000cadbcc CTR: c0000000000090f0 REGS: c0000000fffcbd70 TRAP: 0700 Not tainted MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 28222448 XER: 20040000 CFAR: c000000000009100 IRQMASK: 0 GPR00: 000000000000003d fffffffffffffd00 c0000000018cfb00 c0000000f02b3166 GPR04: fffffffffffffffd 0000000000000007 fffffffffffffffb 0000000000000030 GPR08: 0000000000000037 0000000028222448 0000000000000000 c000000000ca8de0 GPR12: 9000000002009032 c000000001ae0000 c000000000010a00 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c0000000f00322c0 c000000000f85200 0000000000000004 ffffffffffffffff GPR24: fffffffffffffffe 0000000000000000 0000000000000000 000000000000000a GPR28: 0000000000000000 0000000000000000 c0000000f02b391c c0000000f02b3167 NIP [c000000000009108] decrementer_common+0x18/0x160 LR [c000000000cadbcc] .vsnprintf+0x3ec/0x4f0 Call Trace: Instruction dump: 996d098a 994d098b 38610070 480246ed 48005518 60000000 38200000 718a4000 7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
10d91611 |
|
12-Apr-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Reimplement book3s idle code in C Reimplement Book3S idle code in C, moving POWER7/8/9 implementation speific HV idle code to the powernv platform code. Book3S assembly stubs are kept in common code and used only to save the stack frame and non-volatile GPRs before executing architected idle instructions, and restoring the stack and reloading GPRs then returning to C after waking from idle. The complex logic dealing with threads and subcores, locking, SPRs, HMIs, timebase resync, etc., is all done in C which makes it more maintainable. This is not a strict translation to C code, there are some significant differences: - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs, but saves and restores them itself. - The optimisation where EC=ESL=0 idle modes did not have to save GPRs or change MSR is restored, because it's now simple to do. ESL=1 sleeps that do not lose GPRs can use this optimization too. - KVM secondary entry and cede is now more of a call/return style rather than branchy. nap_state_lost is not required because KVM always returns via NVGPR restoring path. - KVM secondary wakeup from offline sequence is moved entirely into the offline wakeup, which avoids a hwsync in the normal idle wakeup path. Performance measured with context switch ping-pong on different threads or cores, is possibly improved a small amount, 1-3% depending on stop state and core vs thread test for shallow states. Deep states it's in the noise compared with other latencies. KVM improvements: - Idle sleepers now always return to caller rather than branch out to KVM first. - This allows optimisations like very fast return to caller when no state has been lost. - KVM no longer requires nap_state_lost because it controls NVGPR save/restore itself on the way in and out. - The heavy idle wakeup KVM request check can be moved out of the normal host idle code and into the not-performance-critical offline code. - KVM nap code now returns from where it is called, which makes the flow a bit easier to follow. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Squash the KVM changes in] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
420af155 |
|
22-Feb-2019 |
Will Deacon <will@kernel.org> |
powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code In a bid to kill off explicit mmiowb() usage in driver code, hook up the asm-generic mmiowb() tracking code but provide a definition of arch_mmiowb_state() so that the tracking data can remain in the paca as it does at present This replaces the existing (flawed) implementation. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
|
#
e15a4fea |
|
02-Oct-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Add some SLB debugging tests This adds CONFIG_DEBUG_VM checks to ensure: - The kernel stack is in the SLB after it's flushed and bolted. - We don't insert an SLB for an address that is aleady in the SLB. - The kernel SLB miss handler does not take an SLB miss. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
126b11b2 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Add SLB allocation status bitmaps Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
06ec27ae |
|
27-Sep-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/64: add stack protector support On PPC64, as register r13 points to the paca_struct at all time, this patch adds a copy of the canary there, which is copied at task_switch. That new canary is then used by using the following GCC options: -mstack-protector-guard=tls -mstack-protector-guard-reg=r13 -mstack-protector-guard-offset=offsetof(struct paca_struct, canary)) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
54be0b9c |
|
02-Oct-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
Revert "convert SLB miss handlers to C" and subsequent commits This reverts commits: 5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C") 8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca") 655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps") 2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup") 89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache") This series had a few bugs, and the fixes are not all trivial. So revert most of it for now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
655deecf |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: SLB allocation status bitmaps Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8fed04d0 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: remove user SLB data from the paca User SLB mappig data is copied into the PACA from the mm->context so it can be accessed by the SLB miss handlers. After the C conversion, SLB miss handlers now run with relocation on, and user SLB misses are able to take recursive kernel SLB misses, so the user SLB mapping data can be removed from the paca and accessed directly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c6d15258 |
|
11-Sep-2018 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/pseries: Dump the SLB contents on SLB MCE errors. If we get a machine check exceptions due to SLB errors then dump the current SLB contents which will be very much helpful in debugging the root cause of SLB errors. Introduce an exclusive buffer per cpu to hold faulty SLB entries. In real mode mce handler saves the old SLB contents into this buffer accessible through paca and print it out later in virtual mode. With this patch the console will log SLB contents like below on SLB MCE errors: [ 507.297236] SLB contents of cpu 0x1 [ 507.297237] Last SLB entry inserted at slot 16 [ 507.297238] 00 c000000008000000 400ea1b217000500 [ 507.297239] 1T ESID= c00000 VSID= ea1b217 LLP:100 [ 507.297240] 01 d000000008000000 400d43642f000510 [ 507.297242] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297243] 11 f000000008000000 400a86c85f000500 [ 507.297244] 1T ESID= f00000 VSID= a86c85f LLP:100 [ 507.297245] 12 00007f0008000000 4008119624000d90 [ 507.297246] 1T ESID= 7f VSID= 8119624 LLP:110 [ 507.297247] 13 0000000018000000 00092885f5150d90 [ 507.297247] 256M ESID= 1 VSID= 92885f5150 LLP:110 [ 507.297248] 14 0000010008000000 4009e7cb50000d90 [ 507.297249] 1T ESID= 1 VSID= 9e7cb50 LLP:110 [ 507.297250] 15 d000000008000000 400d43642f000510 [ 507.297251] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297252] 16 d000000008000000 400d43642f000510 [ 507.297253] 1T ESID= d00000 VSID= d43642f LLP:110 [ 507.297253] ---------------------------------- [ 507.297254] SLB cache ptr value = 3 [ 507.297254] Valid SLB cache entries: [ 507.297255] 00 EA[0-35]= 7f000 [ 507.297256] 01 EA[0-35]= 1 [ 507.297257] 02 EA[0-35]= 1000 [ 507.297257] Rest of SLB cache entries: [ 507.297258] 03 EA[0-35]= 7f000 [ 507.297258] 04 EA[0-35]= 1 [ 507.297259] 05 EA[0-35]= 1000 [ 507.297260] 06 EA[0-35]= 12 [ 507.297260] 07 EA[0-35]= 7f000 Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
94675cce |
|
04-Jul-2018 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/pseries: Defer the logging of rtas error to irq work queue. rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area. On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. Apart from that, as Nick pointed out, pSeries_log_error() also takes a spin_lock while logging error which is not safe in NMI context. It may endup in deadlock if we get another MCE before releasing the lock. Fix this by deferring the logging of rtas error to irq work queue. Current implementation uses two different buffers to hold rtas error log depending on whether extended log is provided or not. This makes bit difficult to identify which buffer has valid data that needs to logged later in irq work. Simplify this using single buffer, one per paca, and copy rtas log to it irrespective of whether extended log is provided or not. Allocate this buffer below RMA region so that it can be accessed in real mode mce handler. Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org # v4.14+ Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2bf1071a |
|
05-Jul-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove POWER9 DD1 support POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7b08729c |
|
02-May-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64: Save stack pointer when we hard disable interrupts A CPU that gets stuck with interrupts hard disable can be difficult to debug, as on some platforms we have no way to interrupt the CPU to find out what it's doing. A stop-gap is to have the CPU save it's stack pointer (r1) in its paca when it hard disables interrupts. That way if we can't interrupt it, we can at least trace the stack based on where it last disabled interrupts. In some cases that will be total junk, but the stack trace code should handle that. In the simple case of a CPU that disable interrupts and then gets stuck in a loop, the stack trace should be informative. We could clear the saved stack pointer when we enable interrupts, but that loses information which could be useful if we have nothing else to go on. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
|
#
c4ec1f03 |
|
02-May-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64: Remove unused paca->soft_enabled In commit 4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask") we renamed paca->soft_enabled. But then in commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") we added it back. Oops. This happened because the two patches were in flight at the same time and rebased vs each other multiple times, and we missed it in review. Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
ea678ac6 |
|
18-Apr-2018 |
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> |
powerpc64/ftrace: Add a field in paca to disable ftrace in unsafe code paths We have some C code that we call into from real mode where we cannot take any exceptions. Though the C functions themselves are mostly safe, if these functions are traced, there is a possibility that we may take an exception. For instance, in certain conditions, the ftrace code uses WARN(), which uses a 'trap' to do its job. For such scenarios, introduce a new field in paca 'ftrace_enabled', which is checked on ftrace entry before continuing. This field can then be set to zero to disable/pause ftrace, and set to a non-zero value to resume ftrace. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
59f57774 |
|
13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Defer paca allocation until memory topology is discovered Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rename the dummy allocate_pacas() to fix 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d2e60075 |
|
13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Use array of paca pointers and allocate pacas individually Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8e0b634b |
|
13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Do not allocate lppaca if we are not virtualized The "lppaca" is a structure registered with the hypervisor. This is unnecessary when running on non-virtualised platforms. One field from the lppaca (pmcregs_in_use) is also used by the host, so move the host part out into the paca (lppaca field is still updated in guest mode). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix non-pseries build with some #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7672691a |
|
21-Mar-2018 |
Paul Mackerras <paulus@ozlabs.org> |
powerpc/powernv: Provide a way to force a core into SMT4 mode POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
15472423 |
|
22-Feb-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/mm/slice: Allow up to 64 low slices While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This means we could have at least 64 slices. In order to override this limitation, this patch switches the handling of low_slices_psize to char array as done already for high_slices_psize. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bdcb1aef |
|
17-Jan-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Improve RFI L1-D cache flush fallback The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4e26bc4a |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Rename soft_enabled to irq_soft_mask Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
aa8a5e00 |
|
09-Jan-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64s: Add support for RFI flush of L1-D cache On some CPUs we can prevent the Meltdown vulnerability by flushing the L1-D cache on exit from kernel to user mode, and from hypervisor to guest. This is known to be the case on at least Power7, Power8 and Power9. At this time we do not know the status of the vulnerability on other CPUs such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale CPUs. As more information comes to light we can enable this, or other mechanisms on those CPUs. The vulnerability occurs when the load of an architecturally inaccessible memory region (eg. userspace load of kernel memory) is speculatively executed to the point where its result can influence the address of a subsequent speculatively executed load. In order for that to happen, the first load must hit in the L1, because before the load is sent to the L2 the permission check is performed. Therefore if no kernel addresses hit in the L1 the vulnerability can not occur. We can ensure that is the case by flushing the L1 whenever we return to userspace. Similarly for hypervisor vs guest. In order to flush the L1-D cache on exit, we add a section of nops at each (h)rfi location that returns to a lower privileged context, and patch that with some sequence. Newer firmwares are able to advertise to us that there is a special nop instruction that flushes the L1-D. If we do not see that advertised, we fall back to doing a displacement flush in software. For guest kernels we support migration between some CPU versions, and different CPUs may use different flush instructions. So that we are prepared to migrate to a machine with a different flush instruction activated, we may have to patch more than one flush instruction at boot if the hypervisor tells us to. In the end this patch is mostly the work of Nicholas Piggin and Michael Ellerman. However a cast of thousands contributed to analysis of the issue, earlier versions of the patch, back ports testing etc. Many thanks to all of them. Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4722476b |
|
09-Nov-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: mm_context.addr_limit is only used on hash Radix keeps no meaningful state in addr_limit, so remove it from radix code and rename to slb_addr_limit to make it clear it applies to hash only. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4e003747 |
|
18-Oct-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU found in "server" processors, from IBM mainly. Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's annoying to have two symbols that always have the same value, it's not quite annoying enough to bother removing one. However with the arrival of Power9, we now have the situation where CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the Radix MMU - *not* the "standard" MMU. So it is now actively confusing to use it, because it implies that code is disabled or inactive when the Radix MMU is in use, however that is not necessarily true. So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor formatting updates of some of the affected lines. This will be a pain for backports, but c'est la vie. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5080332c |
|
14-Sep-2017 |
Michael Neuling <mikey@neuling.org> |
powerpc/64s: Add workaround for P9 vector CI load issue POWER9 DD2.1 and earlier has an issue where some cache inhibited vector load will return bad data. The workaround is two part, one firmware/microcode part triggers HMI interrupts when hitting such loads, the other part is this patch which then emulates the instructions in Linux. The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and lxvh8x. When an instruction triggers the HMI, all threads in the core will be sent to the HMI handler, not just the one running the vector load. In general, these spurious HMIs are detected by the emulation code and we just return back to the running process. Unfortunately, if a spurious interrupt occurs on a vector load that's to normal memory we have no way to detect that it's spurious (unless we walk the page tables, which is very expensive). In this case we emulate the load but we need do so using a vector load itself to ensure 128bit atomicity is preserved. Some additional debugfs emulated instruction counters are added also. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e1c1cfed |
|
21-Jul-2017 |
Gautham R. Shenoy <ego@linux.vnet.ibm.com> |
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle The stop4 idle state on POWER9 is a deep idle state which loses hypervisor resources, but whose latency is low enough that it can be exposed via cpuidle. Until now, the deep idle states which lose hypervisor resources (eg: winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup from such states, barring a few SPRs which need to be restored to their older value, rest of the SPRS are reinitialized to their values corresponding to that at boot time. When stop4 is used in the context of cpuidle, we want these additional SPRs to be restored to their older value, to ensure that the context on the CPU coming back from idle is same as it was before going idle. In this patch, we define a SPR save area in PACA (since we have used up the volatile register space in the stack) and on POWER9, we restore SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1, SPRN_MMCR2 to the values they had before entering stop. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8c388514 |
|
21-May-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Add EX_SIZE definition for paca exception save areas Rather than open-coding it 4 times. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
22c6663d |
|
16-May-2017 |
Gautham R. Shenoy <ego@linux.vnet.ibm.com> |
powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1 On Power9 DD1 due to a hardware bug the Power-Saving Level Status field (PLS) of the PSSCR for a thread waking up from a deep state can under-report if some other thread in the core is in a shallow stop state. The scenario in which this can manifest is as follows: 1) All the threads of the core are in deep stop. 2) One of the threads is woken up. The PLS for this thread will correctly reflect that it is waking up from deep stop. 3) The thread that has woken up now executes a shallow stop. 4) When some other thread in the core is woken, its PLS will reflect the shallow stop state. Thus, the subsequent thread for which the PLS is under-reporting the wakeup state will not restore the hypervisor resources. Hence, on DD1 systems, use the Requested Level (RL) field as a workaround to restore the contents of the hypervisor resources on the wakeup from the stop state. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b1ee8a3d |
|
19-Dec-2016 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Dedicated system reset interrupt stack The system reset interrupt is used for crash/debug situations, so it is desirable to have as little impact on the normal state of the system as possible. Currently it uses the current kernel stack to process the exception. This stores into the stack which may be involved with the crash. The stack pointer may be corrupted, or it may have overflowed. Avoid or minimise these problems by creating a dedicated NMI stack for the system reset interrupt to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c4f3b52c |
|
19-Dec-2016 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Disallow system reset vs system reset reentrancy In preparation for using a dedicated stack for system reset interrupts, prevent a nested system reset from recovering, in order to simplify code that is called in crash/debug path. This allows a system reset interrupt to just use the base stack pointer. Keep an in_nmi nesting counter similarly to the in_mce counter. Consider the interrrupt non-recoverable if it is taken inside another system reset. Interrupt nesting could be allowed similarly to MCE, but system reset is a special case that's not for normal operation, so simplicity wins until there is requirement for nested system reset interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a3d96f70 |
|
19-Dec-2016 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Fix system reset vs general interrupt reentrancy The system reset interrupt can occur when MSR_EE=0, and it currently uses the PACA_EXGEN save area. Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0 when the save area is still in use. A system reset interrupt in this window can lead to undetected corruption when the save area gets overwritten. This patch introduces PACA_EXNMI save area for system reset exceptions, which closes this corruption window. It's also helpful to retain the EXGEN state for debugging situations, even if not considering the recoverability aspect. This patch also moves the PACA_EXMC area down to a less frequently used part of the paca with the new save area. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
17ed4c8f |
|
22-Mar-2017 |
Gautham R. Shenoy <ego@linux.vnet.ibm.com> |
powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bb183221 |
|
21-Mar-2017 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/mm/hash: Store addr_limit in PACA We optmize the slice page size array copy to paca by copying only the range based on addr_limit. This will require us to not look at page size array beyond addr_limit in PACA on slb fault. To enable that copy task size to paca which will be used during slb fault. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rename from task_size to addr_limit, consolidate #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
52b1e665 |
|
21-Mar-2017 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/mm: Move copy_mm_to_paca to paca.c We also update the function arg to struct mm_struct. Move this so that function finds the definition of struct mm_struct. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f828c3d0 |
|
05-Jan-2017 |
Frederic Weisbecker <fweisbec@gmail.com> |
sched/cputime, powerpc: Migrate stolen_time field to the accounting structure That in order to gather all cputime accumulation to the same place. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
3f257774 |
|
11-Aug-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
powerpc: move hmi.c to arch/powerpc/kvm/ hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
7c379526 |
|
11-Aug-2016 |
Paolo Bonzini <pbonzini@redhat.com> |
powerpc: move hmi.c to arch/powerpc/kvm/ hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c223c903 |
|
17-May-2016 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc32: provide VIRT_CPU_ACCOUNTING This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
|
#
fd7bacbc |
|
14-May-2016 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt When a guest is assigned to a core it converts the host Timebase (TB) into guest TB by adding guest timebase offset before entering into guest. During guest exit it restores the guest TB to host TB. This means under certain conditions (Guest migration) host TB and guest TB can differ. When we get an HMI for TB related issues the opal HMI handler would try fixing errors and restore the correct host TB value. With no guest running, we don't have any issues. But with guest running on the core we run into TB corruption issues. If we get an HMI while in the guest, the current HMI handler invokes opal hmi handler before forcing guest to exit. The guest exit path subtracts the guest TB offset from the current TB value which may have already been restored with host value by opal hmi handler. This leads to incorrect host and guest TB values. With split-core, things become more complex. With split-core, TB also gets split and each subcore gets its own TB register. When a hmi handler fixes a TB error and restores the TB value, it affects all the TB values of sibling subcores on the same core. On TB errors all the thread in the core gets HMI. With existing code, the individual threads call opal hmi handle independently which can easily throw TB out of sync if we have guest running on subcores. Hence we will need to co-ordinate with all the threads before making opal hmi handler call followed by TB resync. This patch introduces a sibling subcore state structure (shared by all threads in the core) in paca which holds information about whether sibling subcores are in Guest mode or host mode. An array in_guest[] of size MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore. The subcore id is used as index into in_guest[] array. Only primary thread entering/exiting the guest is responsible to set/unset its designated array element. On TB error, we get HMI interrupt on every thread on the core. Upon HMI, this patch will now force guest to vacate the core/subcore. Primary thread from each subcore will then turn off its respective bit from the above bitmap during the guest exit path just after the guest->host partition switch is complete. All other threads that have just exited the guest OR were already in host will wait until all other subcores clears their respective bit. Once all the subcores turn off their respective bit, all threads will will make call to opal hmi handler. It is not necessary that opal hmi handler would resync the TB value for every HMI interrupts. It would do so only for the HMI caused due to TB errors. For rest, it would not touch TB value. Hence to make things simpler, primary thread would call TB resync explicitly once for each core immediately after opal hmi handler instead of subtracting guest offset from TB. TB resync call will restore the TB with host value. Thus we can be sure about the TB state. One of the primary threads exiting the guest will take up the responsibility of calling TB resync. It will use one of the top bits (bit 63) from subcore state flags bitmap to make the decision. The first primary thread (among the subcores) that is able to set the bit will have to call the TB resync. Rest all other threads will wait until TB resync is complete. Once TB resync is complete all threads will then proceed. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
#
c33e54fa |
|
08-Jan-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Fix build break due to paca mm_context_t changes Commit 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y and CONFIG_PPC_MM_SLICES=n. That only happens for a kernel built with 4K pages and HUGETLB disabled, which is why we missed it. Fix it by adding a mm_ctx_user_psize member to the paca and populating it in the appropriate places. Fixes: 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to the paca") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2fc251a8 |
|
10-Dec-2015 |
Michael Neuling <mikey@neuling.org> |
powerpc: Copy only required pieces of the mm_context_t to the paca Currently we copy the whole mm_context_t to the paca but only access a few bits of it. This is wasteful of space paca and also takes quite some time in the hot path of context switching. This patch pulls in only the required bits from the mm_context_t to the paca and on context switch, copies only those. Benchmarking this (On top of Anton's recent MSR context switching changes [1]) using processes and yield shows an improvement of almost 3% on POWER8: http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield --process 0 0 1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Rename paca fields to be mm_ctx_foo rather than context_foo] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c395465da6 |
|
27-Oct-2015 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add function to copy mm_context_t to the paca This adds a function to copy the mm->context to the paca. This is only a basic conversion for now but will be used more extensively in the next patch. This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's not used elsewhere. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
016f8cf0 |
|
10-Mar-2015 |
Kevin Hao <haokexin@gmail.com> |
powerpc: book3e_64: fix the align size for paca_struct All the cache line size of the current book3e 64bit SoCs are 64 bytes. So we should use this size to align the member of paca_struct. This only change the paca_struct's members which are private to book3e CPUs, and should not have any effect to book3s ones. With this, we save 192 bytes. Also change it to __aligned(size) since it is preferred over __attribute__((aligned(size))). Before: /* size: 1920, cachelines: 30, members: 46 */ /* sum members: 1667, holes: 6, sum holes: 141 */ /* padding: 112 */ After: /* size: 1728, cachelines: 27, members: 46 */ /* sum members: 1667, holes: 4, sum holes: 13 */ /* padding: 48 */ Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
77b54e9f |
|
09-Dec-2014 |
Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> |
powernv/powerpc: Add winkle support for offline cpus Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7cba160a |
|
09-Dec-2014 |
Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> |
powernv/cpuidle: Redesign idle states management Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6d626c5e |
|
24-Nov-2014 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/powernv: Cleanup unused MCE definitions/declarations. Cleanup OpalMCE_* definitions/declarations and other related code which is not used anymore. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0ef95b41 |
|
29-Jul-2014 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/powernv: Invoke opal call to handle hmi. When we hit the HMI in Linux, invoke opal call to handle/recover from HMI errors in real mode and then in virtual mode during check_irq_replay() invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from OPAL and act accordingly. Now that we are ready to handle HMI interrupt directly in linux, remove the HMI interrupt registration with firmware. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
376af594 |
|
09-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Remove STAB code Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had a Segment Table (STAB). Now that we've dropped support for those cpus, we can remove the STAB support entirely. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1739ea9e |
|
21-May-2014 |
Sam bobroff <sam.bobroff@au1.ibm.com> |
powerpc: Fix regression of per-CPU DSCR setting Since commit "efcac65 powerpc: Per process DSCR + some fixes (try#4)" it is no longer possible to set the DSCR on a per-CPU basis. The old behaviour was to minipulate the DSCR SPR directly but this is no longer sufficient: the value is quickly overwritten by context switching. This patch stores the per-CPU DSCR value in a kernel variable rather than directly in the SPR and it is used whenever a process has not set the DSCR itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged. Writes to the old global default (/sys/devices/system/cpu/dscr_default) now set all of the per-CPU values and reads return the last written value. The new per-CPU default is added to the paca_struct and is used everywhere outside of sysfs.c instead of the old global default. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
609af38f |
|
10-Mar-2014 |
Scott Wood <scottwood@freescale.com> |
powerpc/booke64: Critical and machine check exception support Add special state saving for critical and machine check exceptions. Most of this code could be used to handle debug exceptions taken from kernel space, but actually doing so is outside the scope of this patch. The various critical and machine check exceptions now point to their real handlers, rather than hanging the kernel. Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
9d378dfa |
|
10-Mar-2014 |
Scott Wood <scottwood@freescale.com> |
powerpc/booke64: Use SPRG7 for VDSO Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad ("powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm-ppc@vger.kernel.org
|
#
c141611f |
|
08-Jan-2014 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
28efc35f |
|
11-Oct-2013 |
Scott Wood <scottwood@freescale.com> |
powerpc/e6500: TLB miss handler with hardware tablewalk support There are a few things that make the existing hw tablewalk handlers unsuitable for e6500: - Indirect entries go in TLB1 (though the resulting direct entries go in TLB0). - It has threads, but no "tlbsrx." -- so we need a spinlock and a normal "tlbsx". Because we need this lock, hardware tablewalk is mandatory on e6500 unless we want to add spinlock+tlbsx to the normal bolted TLB miss handler. - TLB1 has no HES (nor next-victim hint) so we need software round robin (TODO: integrate this round robin data with hugetlb/KVM) - The existing tablewalk handlers map half of a page table at a time, because IBM hardware has a fixed 1MiB indirect page size. e6500 has variable size indirect entries, with a minimum of 2MiB. So we can't do the half-page indirect mapping, and even if we could it would be less efficient than mapping the full page. - Like on e5500, the linear mapping is bolted, so we don't need the overhead of supporting nested tlb misses. Note that hardware tablewalk does not work in rev1 of e6500. We do not expect to support e6500 rev1 in mainline Linux. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com>
|
#
729b0f71 |
|
30-Oct-2013 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/book3s: Introduce exclusive emergency stack for machine check exception. This patch introduces exclusive emergency stack for machine check exception. We use emergency stack to handle machine check exception so that we can save MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be ready for re-entrancy. This helps us to prevent clobbering of MCE information in case of nested machine checks. The reason for using emergency stack over normal kernel stack is that the machine check might occur in the middle of setting up a stack frame which may result into improper use of kernel stack. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7aa79938 |
|
07-Oct-2013 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE With later patches supporting PR kvm as a kernel module, the changes that has to be built into the main kernel binary to enable PR KVM module is now selected via KVM_BOOK3S_PR_POSSIBLE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
54bb7f4b |
|
06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Make rwlocks endian safe Our ppc64 spinlocks and rwlocks use a trick where a lock token and the paca index are placed in the lock with a single store. Since we are using two u16s they need adjusting for little endian. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
bc2e6c6a |
|
12-Aug-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Avoid link stack corruption for MMU on exceptions When we have MMU on exceptions (POWER8) and a relocatable kernel, we need to branch from the initial exception vectors at 0x0 to up high where the kernel might be located. Currently we do this using the link register. Unfortunately this corrupts the link stack and instead we should use the count register. We did this for the syscall entry path in: 6a40480 powerpc: Avoid link stack corruption in MMU on syscall entry path but I stupidly forgot to do the same for other exceptions. This patch changes the initial exception vectors to use the count register instead of the link register when we need to branch up to the relocated kernel. I have a dodgy userspace test which loops calling a function that reads the PVR (mfpvr in userspace will be emulated by the kernel via the program check exception). On POWER8 and with CONFIG_RELOCATABLE=y, I get a ~10% performance improvement with my userspace test with this patch. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
afc07701 |
|
13-Feb-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add transactional memory paca scratch register to show_regs Add transactional memory paca scratch register to show_regs. This is useful for debugging. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6a7e4064 |
|
13-Feb-2013 |
Geoff Levand <geoff@infradead.org> |
powerpc: Move boot_paca into early_setup The powerpc boot_paca symbol is now only used within the early_setup() routine, so move it from its global definition into early_setup(). Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
a09688cd |
|
06-Dec-2012 |
Haren Myneni <haren@linux.vnet.ibm.com> |
powerpc: Increase exceptions arrays in paca struct to save PPR [PATCH 3/6] powerpc: Increase exceptions arrays in paca struct to save PPR Using paca to save user defined PPR value in the first level exception vector. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
735cafc3 |
|
09-Sep-2012 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/mm: Use 32bit array for slb cache With larger vsid we need to track more bits of ESID in slb cache for slb invalidate. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0127262c |
|
05-Sep-2012 |
Mihai Caraman <mihai.caraman@freescale.com> |
powerpc: Restore VDSO information on critical exception om BookE Critical exception on 64-bit booke uses user-visible SPRG3 as scratch. Restore VDSO information in SPRG3 on exception prolog. Use a common sprg3 field in PACA for all powerpc64 architectures. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7230c564 |
|
06-Mar-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Rework lazy-interrupt handling The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
|
#
2fde6d20 |
|
05-Dec-2011 |
Paul Mackerras <paulus@samba.org> |
powerpc: Provide a way for KVM to indicate that NV GPR values are lost This fixes a problem where a CPU thread coming out of nap mode can think it has valid values in the nonvolatile GPRs (r14 - r31) as saved away in power7_idle, but in fact the values have been trashed because the thread was used for KVM in the mean time. The result is that the thread crashes because code that called power7_idle (e.g., pnv_smp_cpu_kill_self()) goes to use values in registers that have been trashed. The bit field in SRR1 that tells whether state was lost only reflects the most recent nap, which may not have been the nap instruction in power7_idle. So we need an extra PACA field to indicate that state has been lost even if SRR1 indicates that the most recent nap didn't lose state. We clear this field when saving the state in power7_idle, we set it to a non-zero value when we use the thread for KVM, and we test it in power7_wakeup_noloss. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ed79ba9e |
|
19-Sep-2011 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/powernv: Machine check and other system interrupts OPAL can handle various interrupt for us such as Machine Checks (it performs all sorts of recovery tasks and passes back control to us with informations about the error), Hardware Management Interrupts and Softpatch interrupts. This wires up the mechanisms and prints out specific informations returned by HAL when a machine check occurs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
de56a948 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
3c42bf8a |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu There are several fields in struct kvmppc_book3s_shadow_vcpu that temporarily store bits of host state while a guest is running, rather than anything relating to the particular guest or vcpu. This splits them out into a new kvmppc_host_state structure and modifies the definitions in asm-offsets.c to suit. On 32-bit, we have a kvmppc_host_state structure inside the kvmppc_book3s_shadow_vcpu since the assembly code needs to be able to get to them both with one pointer. On 64-bit they are separate fields in the PACA. This means that on 64-bit we don't need to copy the kvmppc_host_state in and out on vcpu load/unload, and in future will mean that the book3s_hv code doesn't need a shadow_vcpu struct in the PACA at all. That does mean that we have to be careful not to rely on any values persisting in the hstate field of the paca across any point where we could block or get preempted. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
f67f4ef5 |
|
22-Jun-2011 |
Scott Wood <scottwood@freescale.com> |
powerpc/book3e-64: use a separate TLB handler when linear map is bolted On MMUs such as FSL where we can guarantee the entire linear mapping is bolted, we don't need to worry about linear TLB misses. If on top of that we do a full table walk, we get rid of all recursive TLB faults, and can dispense with some state saving. This gains a few percent on TLB-miss-heavy workloads, and around 50% on a benchmark that had a high rate of virtual page table faults under the normal handler. While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and EX_TLB_SRR1 as they're not used. [BenH: Fixed build with 64K pages (wsp config)] Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
48404f2e |
|
01-May-2011 |
Paul Mackerras <paulus@samba.org> |
powerpc: Save Come-From Address Register (CFAR) in exception frame Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From Address Register" (CFAR), that records the address of the most recent branch or rfid (return from interrupt) instruction for debugging purposes. This saves the value of the CFAR in the exception entry code and stores it in the exception frame. We also make xmon print the CFAR value in its register dump code. Rather than extend the pt_regs struct at this time, we steal the orig_gpr3 field, which is only used for system calls, and use it for the CFAR value for all exceptions/interrupts other than system calls. This means we don't save the CFAR on system calls, which is not a great problem since system calls tend not to happen unexpectedly, and also avoids adding the overhead of reading the CFAR to the system call entry path. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
a7b8ad40 |
|
07-Apr-2011 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/book3e: Fix extlb size The calculation of the size for the exception save area of the TLB miss handler is wrong, luckily it's too big not too small. Rework it to make it a bit clearer, and also correct. We want 3 save areas, each EX_TLB_SIZE _bytes_. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
948cf67c |
|
24-Jan-2011 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Add NAP mode support on Power7 in HV mode Wakeup comes from the system reset handler with a potential loss of the non-hypervisor CPU state. We save the non-volatile state on the stack and a pointer to it in the PACA, which the system reset handler uses to restore things Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
e360adbe |
|
14-Oct-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
irq_work: Add generic hardirq context callbacks Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cf9efce0 |
|
26-Aug-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Account time using timebase rather than PURR Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
fc53b420 |
|
07-Jul-2010 |
Matt Evans <matt@ozlabs.org> |
powerpc/kexec: Switch to a static PACA on the way out With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel static data and there is a chance that something may stomp it when preparing to kexec. This patch switches this final CPU to a static PACA just before we pull the switch. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1fc711f7 |
|
13-May-2010 |
Michael Neuling <mikey@neuling.org> |
powerpc/kexec: Fix race in kexec shutdown In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to kexec_smp_down(). kexec_smp_down() calls kexec_smp_wait() which sets the hw_cpu_id() to -1. The primary does this while leaving IRQs on which means the primary can take a timer interrupt which can lead to the IPIing one of the secondary CPUs (say, for a scheduler re-balance) but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU -1... Kaboom! We are hitting this case regularly on POWER7 machines. There is also a second race, where the primary will tear down the MMU mappings before knowing the secondaries have entered real mode. Also, the secondaries are clearing out any pending IPIs before guaranteeing that no more will be received. This changes kexec_prepare_cpus() so that we turn off IRQs in the primary CPU much earlier. It adds a paca flag to say that the secondaries have entered the kexec_smp_down() IPI and turned off IRQs, rather than overloading hw_cpu_id with -1. This new paca flag is again used to in indicate when the secondaries has entered real mode. It also ensures that all CPUs have their IRQs off before we clear out any pending IPI requests (in kexec_cpu_down()) to ensure there are no trailing IPIs left unacknowledged. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0604675f |
|
15-Apr-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use now shadowed vcpu fields The shadow vcpu now contains some fields we don't use from the vcpu anymore. Access to them happens using inline functions that happily use the shadow vcpu fields. So let's now ifdef them out to booke only and add asm-offsets. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
c14dea04 |
|
15-Apr-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use KVM_BOOK3S_HANDLER So far we had a lot of conditional code on CONFIG_KVM_BOOK3S_64_HANDLER. As we're moving towards common code between 32 and 64 bits, most of these ifdefs can be moved to a more generic term define, called CONFIG_KVM_BOOK3S_HANDLER. This patch adds the new generic config option and moves ifdefs over. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
2191d657 |
|
15-Apr-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Name generic 64-bit code generic We have quite some code that can be used by Book3S_32 and Book3S_64 alike, so let's call it "Book3S" instead of "Book3S_64", so we can later on use it from the 32 bit port too. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
1426d5a3 |
|
28-Jan-2010 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc: Dynamically allocate pacas On 64-bit kernels we currently have a 512 byte struct paca_struct for each cpu (usually just called "the paca"). Currently they are statically allocated, which means a kernel built for a large number of cpus will waste a lot of space if it's booted on a machine with few cpus. We can avoid that by only allocating the number of pacas we need at boot. However this is complicated by the fact that we need to access the paca before we know how many cpus there are in the system. The solution is to dynamically allocate enough space for NR_CPUS pacas, but then later in boot when we know how many cpus we have, we free any unused pacas. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7e57cba0 |
|
07-Jan-2010 |
Alexander Graf <agraf@suse.de> |
KVM: PPC: Use PACA backed shadow vcpu We're being horribly racy right now. All the entry and exit code hijacks random fields from the PACA that could easily be used by different code in case we get interrupted, for example by a #MC or even page fault. After discussing this with Ben, we figured it's best to reserve some more space in the PACA and just shove off some vcpu state to there. That way we can drastically improve the readability of the code, make it less racy and less complex. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
|
#
4b7ae55d |
|
29-Oct-2009 |
Alexander Graf <agraf@suse.de> |
Add fields to PACA For KVM we need to store some information in the PACA, so we need to extend it. This patch adds KVM SLB shadow related entries to the PACA and a field that indicates if we're inside a guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
57c0c15b |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Tidy up after the big rename - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's - provide courtesy copy of old perf_counter.h, for user-space projects - small indentation fixups - fix up MAINTAINERS - fix small x86 printout fallout - fix up small PowerPC comment fallout (use 'counter' as in register) Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
cdd6c482 |
|
20-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
dce6670a |
|
23-Jul-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Add PACA fields specific to 64-bit Book3E processors This adds various fields in the PACA that are for use specifically by Book3E processors, such as exception save areas, current pgd pointer, special exceptions kernel stacks etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
91c60b5b |
|
02-Jun-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Separate PACA fields for server CPUs This patch has no effect other than re-ordering PACA fields on current server CPUs. It however is a pre-requisite for future support of BookE 64-bit processors. Various parts of the PACA struct are now moved under some ifdef's, either the new CONFIG_PPC_BOOK3S or CONFIG_PPC_STD_MMU_64, whatever seems more appropriate. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.craashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
93a6d3ce |
|
08-Jan-2009 |
Paul Mackerras <paulus@samba.org> |
powerpc: Provide a way to defer perf counter work until interrupts are enabled Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is possible for a performance monitor exception to come in when the kernel thinks interrupts are disabled (i.e. when they are soft-disabled but hard-enabled). In such a situation the performance monitor exception handler might have some processing to do (such as process wakeups) which can't be done in what is effectively an NMI handler. This provides a way to defer that work until interrupts get enabled, either in raw_local_irq_restore() or by returning from an interrupt handler to code that had interrupts enabled. We have a per-processor flag that indicates that there is work pending to do when interrupts subsequently get re-enabled. This flag is checked in the interrupt return path and in raw_local_irq_restore(), and if it is set, perf_counter_do_pending() is called to do the pending work. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
1f6a93e4 |
|
29-Aug-2008 |
Paul Mackerras <paulus@samba.org> |
powerpc: Make it possible to move the interrupt handlers away from the kernel This changes the way that the exception prologs transfer control to the handlers in 64-bit kernels with the aim of making it possible to have the prologs separate from the main body of the kernel. Now, instead of computing the address of the handler by taking the top 32 bits of the paca address (to get the 0xc0000000........ part) and ORing in something in the bottom 16 bits, we get the base address of the kernel by doing a load from the paca and add an offset. This also replaces an mfmsr and an ori to compute the MSR value for the handler with a load from the paca. That makes it unnecessary to have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit mode. We can no longer use a direct branches in the exception prolog code, which means that the SLB miss handlers can't branch directly to .slb_miss_realmode any more. Instead we have to compute the address and do an indirect branch. This is conditional on CONFIG_RELOCATABLE; for non-relocatable kernels we use a direct branch as before. (A later change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.) Since the secondary CPUs on pSeries start execution in the first 0x100 bytes of real memory and then have to get to wherever the kernel is, we can't use a direct branch to get there. Instead this changes __secondary_hold_spinloop from a flag to a function pointer. When it is set to a non-NULL value, the secondary CPUs jump to the function pointed to by that value. Finally this eliminates one code difference between 32-bit and 64-bit by making __secondary_hold be the text address of the secondary CPU spinloop rather than a function descriptor for it. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
b8b572e1 |
|
31-Jul-2008 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: Move include files to arch/powerpc/include/asm from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|