#
dc158d23 |
|
21-Nov-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers Before running a guest, the host process (e.g., QEMU) FP/VEC registers are saved if they were being used, similarly to when the kernel uses FP registers. The guest values are then loaded into regs, and the host process registers will be restored lazily when it uses FP/VEC. KVM HV has a bug here: the host process registers do get saved, but the user MSR bits remain enabled, which indicates the registers are valid for the process. After they are clobbered by running the guest, this valid indication causes the host process to take on the FP/VEC register values of the guest. Fixes: 34e119c96b2b ("KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs") Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231122025811.2973-1-npiggin@gmail.com
|
#
d45c4b48 |
|
24-Aug-2023 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Hide empty pt_regs at base of the stack A thread started via eg. user_mode_thread() runs in the kernel to begin with and then may later return to userspace. While it's running in the kernel it has a pt_regs at the base of its kernel stack, but that pt_regs is all zeroes. If the thread oopses in that state, it leads to an ugly stack trace with a big block of zero GPRs, as reported by Joel: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc7-00004-gf7757129e3de-dirty #3 Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV Call Trace: [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable) [c0000000036afb30] [c00000000013c524] panic+0x178/0x424 [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324 [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344 [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 REGS: c0000000036afe80 TRAP: 0000 Not tainted (6.5.0-rc7-00004-gf7757129e3de-dirty) MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 CFAR: 0000000000000000 IRQMASK: 0 GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 --- interrupt: 0 The all-zero pt_regs looks ugly and conveys no useful information, other than its presence. So detect that case and just show the presence of the frame by printing the interrupt marker, eg: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00126-g18e9506562a0-dirty #301 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries Call Trace: [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable) [c000000003aabb30] [c00000000014c624] panic+0x178/0x424 [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324 [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344 [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 To avoid ever suppressing a valid pt_regs make sure the pt_regs has a zero MSR and TRAP value, and is located at the very base of the stack. Fixes: 6895dfc04741 ("powerpc: copy_thread fill in interrupt frame marker and back chain") Reported-by: Joel Stanley <joel@jms.id.au> Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230824064210.907266-1-mpe@ellerman.id.au
|
#
be98fcf7 |
|
19-Jun-2023 |
Benjamin Gray <bgray@linux.ibm.com> |
powerpc/dexcr: Support userspace ROP protection The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR to hold a key used in the hash calculation. This key should be different for each process to make it harder for a malicious process to recreate valid hash values for a victim process. Add support for storing a per-thread hash key, and setting/clearing HASHKEYR appropriately. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230616034846.311705-6-bgray@linux.ibm.com
|
#
89fb3913 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread don't set PPR in user interrupt frame regs syscalls do not set the PPR field in their interrupt frame and return from syscall always sets the default PPR for userspace, so setting the value in the ret_from_fork frame is not necessary and mildly inconsistent. Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-9-npiggin@gmail.com
|
#
d195ce46 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread don't set _TIF_RESTOREALL In the kernel user thread path, don't set _TIF_RESTOREALL because the thread is required to call kernel_execve() before it returns, which will set _TIF_RESTOREALL if necessary via start_thread(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-8-npiggin@gmail.com
|
#
b504b6aa |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: differentiate kthread from user kernel thread start Kernel created user threads start similarly to kernel threads in that they call a kernel function after first returning from _switch, so they share ret_from_kernel_thread for this. Kernel threads never return from that function though, whereas user threads often do (although some don't, e.g., IO threads). Split these startup functions in two, and catch kernel threads that improperly return from their function. This is intended to make the complicated code a little bit easier to understand. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-7-npiggin@gmail.com
|
#
eed7c420 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread differentiate kthreads and user mode threads When copy_thread is given a kernel function to run in arg->fn, this does not necessarily mean it is a kernel thread. User threads can be created this way (e.g., kernel_init, see also x86's copy_thread()). These threads run a kernel function which may call kernel_execve() and return, which returns like a userspace exec(2) syscall. Kernel threads are to be differentiated with PF_KTHREAD, will always have arg->fn set, and should never return from that function, instead calling kthread_exit() to exit. Create separate paths for the kthread and user kernel thread creation logic. The kthread path will never exit and does not require a user interrupt frame, so it gets a minimal stack frame. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-6-npiggin@gmail.com
|
#
af5ca9d5 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: use switch frame for ret_from_kernel_thread parameters The kernel thread path in copy_thread creates a user interrupt frame on stack and stores the function and arg parameters there, and ret_from_kernel_thread loads them. This is a slightly confusing way to overload that frame. Non-volatile registers are loaded from the switch frame, so the parameters can be stored there. The user interrupt frame is now only used by user threads when they return to user. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-4-npiggin@gmail.com
|
#
959791e4 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread make ret_from_fork register setup consistent The ret_from_fork code for 64e and 32-bit set r3 for syscall_exit_prepare the same way that 64s does, so there should be no need to special-case them in copy_thread. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-3-npiggin@gmail.com
|
#
c013e9f2 |
|
25-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread remove unused pkey code The pkey registers (AMR, IAMR) do not get loaded from the switch frame so it is pointless to save anything there. Remove the dead code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230325122904.2375060-2-npiggin@gmail.com
|
#
be994293 |
|
31-Oct-2022 |
Bo Liu <liubo03@inspur.com> |
powerpc: Fix a kernel-doc warning The current code provokes a kernel-doc warnings: arch/powerpc/kernel/process.c:1606: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20221101015452.3216-1-liubo03@inspur.com
|
#
1ee4e350 |
|
16-Dec-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: Skip stack validation checking alternate stacks if they are not allocated Stack validation in early boot can just bail out of checking alternate stacks if they are not validated yet. Checking against a NULL stack could cause NULLish pointer values to be considered valid. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216115930.2667772-5-npiggin@gmail.com
|
#
d9ab6da6 |
|
01-Feb-2023 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Remove __kernel_text_address() in show_instructions() That test was introducted in 2006 by commit 00ae36de49cc ("[POWERPC] Better check in show_instructions"). At that time, there was no BPF progs. As seen in message of commit 89d21e259a94 ("powerpc/bpf/32: Fix Oops on tail call tests"), when a page fault occurs in test_bpf.ko for instance, the code is dumped as XXXXXXXXs. Allthough __kernel_text_address() checks is_bpf_text_address(), it seems it is not enough. Today, show_instructions() uses get_kernel_nofault() to read the code, so there is no real need for additional verifications. ARM64 and x86 don't do any additional check before dumping instructions. Do the same and remove __kernel_text_address() in show_instructions(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4fd69ef7945518c3e27f96b95046a5c1468d35bf.1675245773.git.christophe.leroy@csgroup.eu
|
#
90f1b431 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: allow minimum sized kernel stack frames This affects only 64-bit ELFv2 kernels, and reduces the minimum asm-created stack frame size from 112 to 32 byte on those kernels. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-16-npiggin@gmail.com
|
#
4cefb0f6 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: split validate_sp into two functions Most callers just want to validate an arbitrary kernel stack pointer, some need a particular size. Make the size case the exceptional one with an extra function. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-15-npiggin@gmail.com
|
#
edbd0387 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread add a back chain to the switch stack frame Stack unwinders need LR and the back chain as a minimum. The switch stack uses regs->nip for its return pointer rather than lrsave, so that was not set in the fork frame, and neither was the back chain. This change sets those fields in the stack. With this and the previous change, a stack trace in the switch or interrupt stack goes from looking like this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 3 PID: 90 Comm: systemd Not tainted NIP: c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff [ ... regs ... ] NIP [c000000000011060] _switch+0x160/0x17c LR [c000000000010f68] _switch+0x68/0x17c Call Trace: To this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 93 Comm: systemd Not tainted NIP: c000000000011060 LR: c000000000010f68 CTR: 0000000000007fff [ ... regs ... ] NIP [c000000000011060] _switch+0x160/0x17c LR [c000000000010f68] _switch+0x68/0x17c Call Trace: [c000000005a93e10] [c00000000000cdbc] ret_from_fork_scv+0x0/0x54 --- interrupt: 3000 at 0x7fffa72f56d8 NIP: 00007fffa72f56d8 LR: 0000000000000000 CTR: 0000000000000000 [ ... regs ... ] NIP [00007fffa72f56d8] 0x7fffa72f56d8 LR [0000000000000000] 0x0 --- interrupt: 3000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-14-npiggin@gmail.com
|
#
6895dfc0 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: copy_thread fill in interrupt frame marker and back chain Backtraces will not recognise the fork system call interrupt without the regs marker. And regular interrupt entry from userspace creates the back chain to the user stack, so do this for the initial fork frame too, to be consistent. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-13-npiggin@gmail.com
|
#
6f291a03 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: add a define for the switch frame size and regs offset This is open-coded in process.c, ppc32 uses a different define with the same value, and the C definition is name differently which makes it an extra indirection to grep for. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-12-npiggin@gmail.com
|
#
1223e5a2 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: add a define for the user interrupt frame size The user interrupt frame is a different size from the kernel frame, so give it its own name. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-11-npiggin@gmail.com
|
#
e856e336 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset This is a count of longs from the stack pointer to the regs marker. Rename it to make it more distinct from the other byte offsets. It can be derived from the byte offset definitions just added. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-10-npiggin@gmail.com
|
#
c03be0a3 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: add definition for pt_regs offset within an interrupt frame This is a common offset that currently uses the overloaded STACK_FRAME_OVERHEAD constant. It's easier to read and more flexible to use a specific regs offset for this. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-8-npiggin@gmail.com
|
#
bc067736 |
|
27-Nov-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: Rearrange copy_thread child stack creation This makes it a bit clearer where the stack frame is created, and will allow easier use of some of the stack offset constants in a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-5-npiggin@gmail.com
|
#
3671f4eb |
|
08-Nov-2022 |
Jordan Niethe <jniethe5@gmail.com> |
powerpc: Allow clearing and restoring registers independent of saved breakpoint state For the coming temporary mm used for instruction patching, the breakpoint registers need to be cleared to prevent them from accidentally being triggered. As soon as the patching is done, the breakpoints will be restored. The breakpoint state is stored in the per-cpu variable current_brk[]. Add a suspend_breakpoints() function which will clear the breakpoint registers without touching the state in current_brk[]. Add a pair function restore_breakpoints() which will move the state in current_brk[] back to the registers. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
|
#
d90bb7b4 |
|
05-Oct-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Print instruction dump on a single line Although the previous commit made the powerpc instruction dump usable with scripts/decodecode, there are still some problems. Because the dump is split across multiple lines, the script doesn't cope with printk timestamps or caller info. That can be fixed by printing the entire dump on one line, eg: [ 12.016307][ T112] --- interrupt: c00 [ 12.016605][ T112] Code: 4b7aae15 60000000 3d22016e 3c62ffec 39291160 38639bc0 e8890000 4b7aadf9 60000000 4bfffee8 7c0802a6 60000000 <0fe00000> 60420000 3c4c008f 384268a0 [ 12.017655][ T112] ---[ end trace 0000000000000000 ]--- That output can then be piped directly into scripts/decodecode and interpreted correctly. Printing the dump on a single line does produce a very long line, about 173 characters. That is still shorter than x86, which prints nearly 200 characters even without timestamps etc. All consoles I'm aware of will wrap the line if it's too long, so the length should not be a functional problem. If anything it should help on consoles like VGA by using less vertical space. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006032019.1128624-2-mpe@ellerman.id.au
|
#
3e654127 |
|
05-Oct-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Make instruction dump work with scripts/decodecode Matt reported that scripts/decodecode doesn't work for the instruction dump in the powerpc oops output. Although there are scripts around that can decode it, it would be preferable if the standard in-tree script worked. All other arches prefix the instruction dump with "Code:", and that's what the script looks for, so use that. The script then works as expected: $ CROSS_COMPILE=powerpc64le-linux-gnu- ./scripts/decodecode Code: fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378 ^D All code ======== 0: f0 ff c1 fb std r30,-16(r1) 4: c1 ff 21 f8 stdu r1,-64(r1) 8: 78 1b 7d 7c mr r29,r3 ... Note that the script doesn't cope well with printk timestamps or printk caller info. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006032019.1128624-1-mpe@ellerman.id.au
|
#
8032bf12 |
|
09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_below() instead of deprecated function This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
81895a65 |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use prandom_u32_max() when possible, part 1 Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
2be9880d |
|
18-Aug-2022 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
kernel: exit: cleanup release_thread() Only x86 has own release_thread(), introduce a new weak release_thread() function to clean empty definitions in other ARCHs. Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brian Cain <bcain@quicinc.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Huacai Chen <chenhuacai@kernel.org> [LoongArch] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> [csky] Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
0fa68318 |
|
03-Oct-2022 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race irq soft-masking means that when Linux irqs are disabled, the MSR[EE] value can change from 1 to 0 asynchronously: if a masked interrupt of the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled, the masked handler will return with MSR[EE]=0. This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can be called with local irqs disabled, unless a hard_irq_disable has been done. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
|
#
ec6d0dde |
|
09-Jun-2022 |
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> |
powerpc: Enable execve syscall exit tracepoint On execve[at], we are zero'ing out most of the thread register state including gpr[0], which contains the syscall number. Due to this, we fail to trigger the syscall exit tracepoint properly. Fix this by retaining gpr[0] in the thread register state. Before this patch: # tail /sys/kernel/debug/tracing/trace cat-123 [000] ..... 61.449351: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) cat-124 [000] ..... 62.428481: sys_execve(filename: 7fffa6b23448, argv: 7fffa6b233e0, envp: 7fffa6b233f8) echo-125 [000] ..... 65.813702: sys_execve(filename: 7fffa6b23378, argv: 7fffa6b233a0, envp: 7fffa6b233b0) echo-125 [000] ..... 65.822214: sys_execveat(fd: 0, filename: 1009ac48, argv: 7ffff65d0c98, envp: 7ffff65d0ca8, flags: 0) After this patch: # tail /sys/kernel/debug/tracing/trace cat-127 [000] ..... 100.416262: sys_execve(filename: 7fffa41b3448, argv: 7fffa41b33e0, envp: 7fffa41b33f8) cat-127 [000] ..... 100.418203: sys_execve -> 0x0 echo-128 [000] ..... 103.873968: sys_execve(filename: 7fffa41b3378, argv: 7fffa41b33a0, envp: 7fffa41b33b0) echo-128 [000] ..... 103.875102: sys_execve -> 0x0 echo-128 [000] ..... 103.882097: sys_execveat(fd: 0, filename: 1009ac48, argv: 7fffd10d2148, envp: 7fffd10d2158, flags: 0) echo-128 [000] ..... 103.883225: sys_execveat -> 0x0 Cc: stable@vger.kernel.org Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Sumit Dubey2 <Sumit.Dubey2@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220609103328.41306-1-naveen.n.rao@linux.vnet.ibm.com
|
#
a1b29ba2 |
|
20-Jan-2022 |
He Ying <heying24@huawei.com> |
powerpc/kasan: Silence KASAN warnings in __get_wchan() The following KASAN warning was reported in our kernel. BUG: KASAN: stack-out-of-bounds in get_wchan+0x188/0x250 Read of size 4 at addr d216f958 by task ps/14437 CPU: 3 PID: 14437 Comm: ps Tainted: G O 5.10.0 #1 Call Trace: [daa63858] [c0654348] dump_stack+0x9c/0xe4 (unreliable) [daa63888] [c035cf0c] print_address_description.constprop.3+0x8c/0x570 [daa63908] [c035d6bc] kasan_report+0x1ac/0x218 [daa63948] [c00496e8] get_wchan+0x188/0x250 [daa63978] [c0461ec8] do_task_stat+0xce8/0xe60 [daa63b98] [c0455ac8] proc_single_show+0x98/0x170 [daa63bc8] [c03cab8c] seq_read_iter+0x1ec/0x900 [daa63c38] [c03cb47c] seq_read+0x1dc/0x290 [daa63d68] [c037fc94] vfs_read+0x164/0x510 [daa63ea8] [c03808e4] ksys_read+0x144/0x1d0 [daa63f38] [c005b1dc] ret_from_syscall+0x0/0x38 --- interrupt: c00 at 0x8fa8f4 LR = 0x8fa8cc The buggy address belongs to the page: page:98ebcdd2 refcount:0 mapcount:0 mapping:00000000 index:0x2 pfn:0x1216f flags: 0x0() raw: 00000000 00000000 01010122 00000000 00000002 00000000 ffffffff 00000000 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: d216f800: 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 00 d216f880: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >d216f900: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 ^ d216f980: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 d216fa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 After looking into this issue, I find the buggy address belongs to the task stack region. It seems KASAN has something wrong. I look into the code of __get_wchan in x86 architecture and find the same issue has been resolved by the commit f7d27c35ddff ("x86/mm, kasan: Silence KASAN warnings in get_wchan()"). The solution could be applied to powerpc architecture too. As Andrey Ryabinin said, get_wchan() is racy by design, it may access volatile stack of running task, thus it may access redzone in a stack frame and cause KASAN to warn about this. Use READ_ONCE_NOCHECK() to silence these warnings. Reported-by: Wanming Hu <huwanming@huaweil.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121014418.155675-1-heying24@huawei.com
|
#
5bd2e97c |
|
12-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
fork: Generalize PF_IO_WORKER handling Add fn and fn_arg members into struct kernel_clone_args and test for them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER). This allows any task that wants to be a user space task that only runs in kernel mode to use this functionality. The code on x86 is an exception and still retains a PF_KTHREAD test because x86 unlikely everything else handles kthreads slightly differently than user space tasks that start with a function. The functions that created tasks that start with a function have been updated to set ".fn" and ".fn_arg" instead of ".stack" and ".stack_size". These functions are fork_idle(), create_io_thread(), kernel_thread(), and user_mode_thread(). Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
c5febea0 |
|
08-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
fork: Pass struct kernel_clone_args into copy_thread With io_uring we have started supporting tasks that are for most purposes user space tasks that exclusively run code in kernel mode. The kernel task that exec's init and tasks that exec user mode helpers are also user mode tasks that just run kernel code until they call kernel execve. Pass kernel_clone_args into copy_thread so these oddball tasks can be supported more cleanly and easily. v2: Fix spelling of kenrel_clone_args on h8300 Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
86c38fec |
|
08-Mar-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Remove asm/prom.h from all files that don't need it Several files include asm/prom.h for no reason. Clean it up. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop change to prom_parse.c as reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
|
#
1fd02f66 |
|
30-Apr-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
powerpc: fix typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
|
#
3ba4289a |
|
09-Apr-2022 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Simplify and move arch_randomize_brk() arch_randomize_brk() is only needed for hash on book3s/64, for other platforms the one provided by the default mmap layout is good enough. Move it to hash_utils.c and use randomize_page() like the generic one. And properly opt out the radix case instead of making an assumption on mmu_highuser_ssize. Also change to a 32M range like most other architectures instead of 8M. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eafa4d18ec8ac7b98dd02b40181e61643707cc7c.1649523076.git.christophe.leroy@csgroup.eu
|
#
c545b9f0 |
|
29-Nov-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/inst: Define ppc_inst_t In order to stop using 'struct ppc_inst' on PPC32, define a ppc_inst_t typedef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
|
#
43afcf8f |
|
19-Oct-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Add KUAP support for BOOKE and 40x On booke/40x we don't have segments like book3s/32. On booke/40x we don't have access protection groups like 8xx. Use the PID register to provide user access protection. Kernel address space can be accessed with any PID. User address space has to be accessed with the PID of the user. User PID is always not null. Everytime the kernel is entered, set PID register to 0 and restore PID register when returning to user. Everytime kernel needs to access user data, PID is restored for the access. In TLB miss handlers, check the PID and bail out to data storage exception when PID is 0 and accessed address is in user space. Note that also forbids execution of user text by kernel except when user access is unlocked. But this shouldn't be a problem as the kernel is not supposed to ever run user text. This patch prepares the infrastructure but the real activation of KUAP is done by following patches for each processor type one by one. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
|
#
42e03bc5 |
|
19-Oct-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64 Also call kuap_lock() and kuap_save_and_lock() from interrupt functions with CONFIG_PPC64. For book3s/64 we keep them empty as it is done in assembly. Also do the locked assert when switching task unless it is book3s/64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
|
#
387e220a |
|
01-Dec-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
|
#
5236756d |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRs Linux implements SPR save/restore including storage space for registers in the task struct for process context switching. Make use of this similarly to the way we make use of the context switching fp/vec save restore. This improves code reuse, allows some stack space to be saved, and helps with avoiding VRSAVE updates if they are not required. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-39-npiggin@gmail.com
|
#
34e119c9 |
|
23-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs This reduces the number of mtmsrd required to enable facility bits when saving/restoring registers, by having the KVM code set all bits up front rather than using individual facility functions that set their particular MSR bits. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-20-npiggin@gmail.com
|
#
42a20f86 |
|
29-Sep-2021 |
Kees Cook <keescook@chromium.org> |
sched: Add wrapper for get_wchan() to keep task blocked Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm] Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
|
#
4872cbd0 |
|
06-Aug-2021 |
Xiongwei Song <sxwjean@gmail.com> |
powerpc: Add dear as a synonym for pt_regs.dar register Create an anonymous union for dar and dear regsiters, we can reference dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y. Otherwise, reference dar. This makes code more clear. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Reword commit title] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210807010239.416055-4-sxwjean@me.com
|
#
4f8e78c0 |
|
06-Aug-2021 |
Xiongwei Song <sxwjean@gmail.com> |
powerpc: Add esr as a synonym for pt_regs.dsisr Create an anonymous union for dsisr and esr regsiters, we can reference esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y. Otherwise, reference dsisr. This makes code more clear. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Reword commit title] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210807010239.416055-2-sxwjean@me.com
|
#
f35d2f24 |
|
21-Jun-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Fix copy-paste data exposure into newly created tasks copy-paste contains implicit "copy buffer" state that can contain arbitrary user data (if the user process executes a copy instruction). This could be snooped by another process if a context switch hits while the state is live. So cp_abort is executed on context switch to clear out possible sensitive data and prevent the leak. cp_abort is done after the low level _switch(), which means it is never reached by newly created tasks, so they could snoop on this buffer between their first and second context switch. Fix this by doing the cp_abort before calling _switch. Add some comments which should make the issue harder to miss. Fixes: 07d2a628bc000 ("powerpc/64s: Avoid cpabort in context switch when possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210622053036.474678-1-npiggin@gmail.com
|
#
59dc5bfc |
|
17-Jun-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: avoid reloading (H)SRR registers if they are still valid When an interrupt is taken, the SRR registers are set to return to where it left off. Unless they are modified in the meantime, or the return address or MSR are modified, there is no need to reload these registers when returning from interrupt. Introduce per-CPU flags that track the validity of SRR and HSRR registers. These are cleared when returning from interrupt, when using the registers for something else (e.g., OPAL calls), when adjusting the return address or MSR of a context, and when context switching (which changes the return address and MSR). This improves the performance of interrupt returns. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in fixup patch from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
|
#
b112fb91 |
|
14-Jun-2021 |
Daniel Axtens <dja@axtens.net> |
powerpc: make stack walking KASAN-safe Make our stack-walking code KASAN-safe by using __no_sanitize_address. Generic code, arm64, s390 and x86 all make accesses unchecked for similar sorts of reasons: when unwinding a stack, we might touch memory that KASAN has marked as being out-of-bounds. In ppc64 KASAN development, I hit this sometimes when checking for an exception frame - because we're checking an arbitrary offset into the stack frame. See commit 20955746320e ("s390/kasan: avoid false positives during stack unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack: Prevent KASAN false positive warnings") and commit 6e22c8366416 ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer"). Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210614120907.1952321-1-dja@axtens.net
|
#
16132529 |
|
03-Jun-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/32s: Rework Kernel Userspace Access Protection On book3s/32, KUAP is provided by toggling Ks bit in segment registers. One segment register addresses 256M of virtual memory. At the time being, KUAP implements a complex logic to apply the unlock/lock on the exact number of segments covering the user range to access, with saving the boundaries of the range of segments in a member of thread struct. But most if not all user accesses are within a single segment. Rework KUAP with a different approach: - Open only one segment, the one corresponding to the starting address of the range to be accessed. - If a second segment is involved, it will generate a page fault. The segment will then be open by the page fault handler. The kuap member of thread struct will now contain: - The start address of the current on going user access, that will be used to know which segment to lock at the end of the user access. - ~0 when no user access is open - ~1 when additionnal segments are opened by a page fault. Then, at lock time - When only one segment is open, close it. - When several segments are open, close all user segments. Almost 100% of the time, only one segment will be involved. In interrupts, inline the function that unlock/lock all segments, because not inlining them implies a lot of register save/restore. With the patch, writing value 128 in userspace in perf_copy_attr() is done with 16 instructions: 3890: 93 82 04 dc stw r28,1244(r2) 3894: 7d 20 e5 26 mfsrin r9,r28 3898: 55 29 00 80 rlwinm r9,r9,0,2,0 389c: 7d 20 e1 e4 mtsrin r9,r28 38a0: 4c 00 01 2c isync 38a4: 39 20 00 80 li r9,128 38a8: 91 3c 00 00 stw r9,0(r28) 38ac: 81 42 04 dc lwz r10,1244(r2) 38b0: 39 00 ff ff li r8,-1 38b4: 91 02 04 dc stw r8,1244(r2) 38b8: 2c 0a ff fe cmpwi r10,-2 38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c> 38c0: 7d 20 55 26 mfsrin r9,r10 38c4: 65 29 40 00 oris r9,r9,16384 38c8: 7d 20 51 e4 mtsrin r9,r10 38cc: 4c 00 01 2c isync ... 3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c> 3944: R_PPC_REL24 kuap_lock_all_ool Before the patch it was 118 instructions. In reality only 42 are executed in most cases, but GCC is not able to see that a properly aligned user access cannot involve more than one segment. 5060: 39 1d 00 04 addi r8,r29,4 5064: 3d 20 b0 00 lis r9,-20480 5068: 7c 08 48 40 cmplw r8,r9 506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc> 5070: 3d 00 b0 00 lis r8,-20480 5074: 39 28 ff ff addi r9,r8,-1 5078: 57 aa 00 06 rlwinm r10,r29,0,0,3 507c: 55 29 27 3e rlwinm r9,r9,4,28,31 5080: 39 29 00 01 addi r9,r9,1 5084: 7d 29 53 78 or r9,r9,r10 5088: 91 22 04 dc stw r9,1244(r2) 508c: 7d 20 ed 26 mfsrin r9,r29 5090: 55 29 00 80 rlwinm r9,r9,0,2,0 5094: 7c 08 50 40 cmplw r8,r10 5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0> 509c: 7d 46 50 f8 not r6,r10 50a0: 7c c6 42 14 add r6,r6,r8 50a4: 54 c6 27 be rlwinm r6,r6,4,30,31 50a8: 7d 20 51 e4 mtsrin r9,r10 50ac: 3c ea 10 00 addis r7,r10,4096 50b0: 39 29 01 11 addi r9,r9,273 50b4: 7f 88 38 40 cmplw cr7,r8,r7 50b8: 55 29 02 06 rlwinm r9,r9,0,8,3 50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0> 50c0: 2f 86 00 00 cmpwi cr7,r6,0 50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368> 50c8: 2f 86 00 01 cmpwi cr7,r6,1 50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350> 50d0: 2f 86 00 02 cmpwi cr7,r6,2 50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340> 50d8: 7d 20 39 e4 mtsrin r9,r7 50dc: 39 29 01 11 addi r9,r9,273 50e0: 3c e7 10 00 addis r7,r7,4096 50e4: 55 29 02 06 rlwinm r9,r9,0,8,3 50e8: 7d 20 39 e4 mtsrin r9,r7 50ec: 39 29 01 11 addi r9,r9,273 50f0: 3c e7 10 00 addis r7,r7,4096 50f4: 55 29 02 06 rlwinm r9,r9,0,8,3 50f8: 7d 20 39 e4 mtsrin r9,r7 50fc: 3c e7 10 00 addis r7,r7,4096 5100: 39 29 01 11 addi r9,r9,273 5104: 7f 88 38 40 cmplw cr7,r8,r7 5108: 55 29 02 06 rlwinm r9,r9,0,8,3 510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0> 5110: 7d 20 39 e4 mtsrin r9,r7 5114: 39 29 01 11 addi r9,r9,273 5118: 3c c7 10 00 addis r6,r7,4096 511c: 55 29 02 06 rlwinm r9,r9,0,8,3 5120: 7d 20 31 e4 mtsrin r9,r6 5124: 39 29 01 11 addi r9,r9,273 5128: 3c c6 10 00 addis r6,r6,4096 512c: 55 29 02 06 rlwinm r9,r9,0,8,3 5130: 7d 20 31 e4 mtsrin r9,r6 5134: 39 29 01 11 addi r9,r9,273 5138: 3c c7 30 00 addis r6,r7,12288 513c: 55 29 02 06 rlwinm r9,r9,0,8,3 5140: 7d 20 31 e4 mtsrin r9,r6 5144: 3c e7 40 00 addis r7,r7,16384 5148: 39 29 01 11 addi r9,r9,273 514c: 7f 88 38 40 cmplw cr7,r8,r7 5150: 55 29 02 06 rlwinm r9,r9,0,8,3 5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368> 5158: 4c 00 01 2c isync 515c: 39 20 00 80 li r9,128 5160: 91 3d 00 00 stw r9,0(r29) 5164: 38 e0 00 00 li r7,0 5168: 90 e2 04 dc stw r7,1244(r2) 516c: 7d 20 ed 26 mfsrin r9,r29 5170: 65 29 40 00 oris r9,r9,16384 5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c> 5178: 7d 47 50 f8 not r7,r10 517c: 7c e7 42 14 add r7,r7,r8 5180: 54 e7 27 be rlwinm r7,r7,4,30,31 5184: 7d 20 51 e4 mtsrin r9,r10 5188: 3d 4a 10 00 addis r10,r10,4096 518c: 39 29 01 11 addi r9,r9,273 5190: 7c 08 50 40 cmplw r8,r10 5194: 55 29 02 06 rlwinm r9,r9,0,8,3 5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c> 519c: 2c 07 00 00 cmpwi r7,0 51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444> 51a4: 2c 07 00 01 cmpwi r7,1 51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c> 51ac: 2c 07 00 02 cmpwi r7,2 51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c> 51b4: 7d 20 51 e4 mtsrin r9,r10 51b8: 39 29 01 11 addi r9,r9,273 51bc: 3d 4a 10 00 addis r10,r10,4096 51c0: 55 29 02 06 rlwinm r9,r9,0,8,3 51c4: 7d 20 51 e4 mtsrin r9,r10 51c8: 39 29 01 11 addi r9,r9,273 51cc: 3d 4a 10 00 addis r10,r10,4096 51d0: 55 29 02 06 rlwinm r9,r9,0,8,3 51d4: 7d 20 51 e4 mtsrin r9,r10 51d8: 3d 4a 10 00 addis r10,r10,4096 51dc: 39 29 01 11 addi r9,r9,273 51e0: 7c 08 50 40 cmplw r8,r10 51e4: 55 29 02 06 rlwinm r9,r9,0,8,3 51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c> 51ec: 7d 20 51 e4 mtsrin r9,r10 51f0: 39 29 01 11 addi r9,r9,273 51f4: 3c ea 10 00 addis r7,r10,4096 51f8: 55 29 02 06 rlwinm r9,r9,0,8,3 51fc: 7d 20 39 e4 mtsrin r9,r7 5200: 39 29 01 11 addi r9,r9,273 5204: 3c e7 10 00 addis r7,r7,4096 5208: 55 29 02 06 rlwinm r9,r9,0,8,3 520c: 7d 20 39 e4 mtsrin r9,r7 5210: 39 29 01 11 addi r9,r9,273 5214: 3c ea 30 00 addis r7,r10,12288 5218: 55 29 02 06 rlwinm r9,r9,0,8,3 521c: 7d 20 39 e4 mtsrin r9,r7 5220: 3d 4a 40 00 addis r10,r10,16384 5224: 39 29 01 11 addi r9,r9,273 5228: 7c 08 50 40 cmplw r8,r10 522c: 55 29 02 06 rlwinm r9,r9,0,8,3 5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444> 5234: 4c 00 01 2c isync Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Export the ool handlers to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
|
#
359c2ca7 |
|
14-May-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc: Don't handle ALTIVEC/SPE in ASM in _switch(). Do it in C. _switch() saves and restores ALTIVEC and SPE status. For altivec this is redundant with what __switch_to() does with save_sprs() and restore_sprs() and giveup_all() before calling _switch(). Add support for SPI in save_sprs() and restore_sprs() and remove things from _switch(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ab21fd93d6e0047aa71e6509e5e312f14b2991b.1620998075.git.christophe.leroy@csgroup.eu
|
#
b03fbd4f |
|
11-Jun-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched: Introduce task_is_running() Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
|
#
7153d4bf |
|
14-Apr-2021 |
Xiongwei Song <sxwjean@gmail.com> |
powerpc/traps: Enhance readability for trap types Define macros to list ppc interrupt types in interttupt.h, replace the reference of the trap hex values with these macros. Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S, arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S, arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h. Signed-off-by: Xiongwei Song <sxwjean@gmail.com> [mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
|
#
8dc7f022 |
|
16-Mar-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: remove partial register save logic All subarchitectures always save all GPRs to pt_regs interrupt frames now. Remove FULL_REGS and associated bits. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210316104206.407354-11-npiggin@gmail.com
|
#
c1672883 |
|
11-Mar-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/32: Manage KUAP in C Move all KUAP management in C. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/199365ddb58d579daf724815f2d0acb91cc49d19.1615552867.git.christophe.leroy@csgroup.eu
|
#
57472306 |
|
11-Mar-2021 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/32: Remove ksp_limit ksp_limit is there to help detect stack overflows. That is specific to ppc32 as it was removed from ppc64 in commit cbc9565ee826 ("powerpc: Remove ksp_limit on ppc64"). There are other means for detecting stack overflows. As ppc64 has proven to not need it, ppc32 should be able to do without it too. Lets remove it and simplify exception handling. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d789c3385b22e07bedc997613c0d26074cb513e7.1615552866.git.christophe.leroy@csgroup.eu
|
#
2d19630e |
|
26-Feb-2021 |
Christopher M. Riedl <cmr@codefail.de> |
powerpc/signal64: Remove TM ifdefery in middle of if/else block Both rt_sigreturn() and handle_rt_signal_64() contain TM-related ifdefs which break-up an if/else block. Provide stubs for the ifdef-guarded TM functions and remove the need for an ifdef in rt_sigreturn(). Rework the remaining TM ifdef in handle_rt_signal64() similar to commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of if/else"). Unlike in the commit for ppc32, the ifdef can't be removed entirely since uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM. Signed-off-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210227011259.11992-6-cmr@codefail.de
|
#
0100e6bb |
|
23-Feb-2021 |
Jens Axboe <axboe@kernel.dk> |
arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() In the arch addition of PF_IO_WORKER, I missed parisc and powerpc for some reason. Fix that up, ensuring they handle PF_IO_WORKER like they do PF_KTHREAD in copy_thread(). Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD") Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e3de1e29 |
|
09-Feb-2021 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64: Fix stack trace not displaying final frame In commit bf13718bc57a ("powerpc: show registers when unwinding interrupt frames") we changed our stack dumping logic to show the full registers whenever we find an interrupt frame on the stack. However we didn't notice that on 64-bit this doesn't show the final frame, ie. the interrupt that brought us in from userspace, whereas on 32-bit it does. That is due to confusion about the size of that last frame. The code in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE to check the sp is at least that far below the top of the stack. However on 64-bit that size is too large for the final frame, because it includes the red zone, but we don't allocate a red zone for the first frame. So add a new define that encodes the correct size for 32-bit and 64-bit, and use it in show_stack(). This results in the full trace being shown on 64-bit, eg: sysrq: Trigger a crash Kernel panic - not syncing: sysrq triggered crash CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649 Call Trace: [c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable) [c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c [c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50 [c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210 [c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188 [c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0 [c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360 [c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140 [c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0 [c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278 --- interrupt: c00 at 0x7fff9fbab428 NIP: 00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000 REGS: c00000000a1c3e80 TRAP: 0c00 Not tainted (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty) MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 22002884 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001 GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063 GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0 GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000 GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0 GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001 NIP [00007fff9fbab428] 0x7fff9fbab428 LR [000000001000b724] 0x1000b724 --- interrupt: c00 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210209141627.2898485-1-mpe@ellerman.id.au
|
#
0ecf6a9e |
|
02-Feb-2021 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64: Make stack tracing work during very early boot If we try to stack trace very early during boot, either due to a WARN/BUG or manual dump_stack(), we will oops in valid_emergency_stack() when we try to dereference the paca_ptrs array. The fix is simple, we just return false if paca_ptrs isn't allocated yet. The stack pointer definitely isn't part of any emergency stack because we haven't allocated any yet. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-1-mpe@ellerman.id.au
|
#
3a96570f |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: convert interrupt handlers to use wrappers Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
|
#
18722ecf |
|
30-Jan-2021 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: do_break get registers from regs Similar to the previous patch this makes interrupt handler function types more regular so they can be wrapped with the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-9-npiggin@gmail.com
|
#
ad3ed15c |
|
04-Dec-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove target specific __set_dabr() __set_dabr() are simple functions that can be inline directly inside set_dabr() and using IS_ENABLED() instead of #ifdef Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c10b263668e137236c71d76648b03cf2cd1ee66f.1607076733.git.christophe.leroy@csgroup.eu
|
#
48a8ab4e |
|
26-Nov-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode. Now that kernel correctly store/restore userspace AMR/IAMR values, avoid manipulating AMR and IAMR from the kernel on behalf of userspace. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-15-aneesh.kumar@linux.ibm.com
|
#
d5fa30e6 |
|
26-Nov-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec On fork, we inherit from the parent and on exec, we should switch to default_amr values. Also, avoid changing the AMR register value within the kernel. The kernel now runs with different AMR values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-13-aneesh.kumar@linux.ibm.com
|
#
f643fcab |
|
26-Nov-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/book3s64/pkeys: Inherit correctly on fork. Child thread.kuap value is inherited from the parent in copy_thread_tls. We still need to make sure when the child returns from a fork in the kernel we start with the kernel default AMR value. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-12-aneesh.kumar@linux.ibm.com
|
#
d7df77e8 |
|
26-Nov-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/exec: Set thread.regs early during exec In later patches during exec, we would like to access default regs.amr to control access to the user mapping. Having thread.regs set early makes the code changes simpler. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-10-aneesh.kumar@linux.ibm.com
|
#
bf13718b |
|
06-Nov-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: show registers when unwinding interrupt frames It's often useful to know the register state for interrupts in the stack frame. In the below example (with this patch applied), the important information is the state of the page fault. A blatant case like this probably rather should have the page fault regs passed down to the warning, but quite often there are less obvious cases where an interrupt shows up that might give some more clues. The downside is longer and more complex bug output. Bug: Write fault blocked by AMR! WARNING: CPU: 0 PID: 72 at arch/powerpc/include/asm/book3s/64/kup-radix.h:164 __do_page_fault+0x880/0xa90 Modules linked in: CPU: 0 PID: 72 Comm: systemd-gpt-aut Not tainted NIP: c00000000006e2f0 LR: c00000000006e2ec CTR: 0000000000000000 REGS: c00000000a4f3420 TRAP: 0700 MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28002840 XER: 20040000 CFAR: c000000000128be0 IRQMASK: 3 GPR00: c00000000006e2ec c00000000a4f36c0 c0000000014f0700 0000000000000020 GPR04: 0000000000000001 c000000001290f50 0000000000000001 c000000001290f80 GPR08: c000000001612b08 0000000000000000 0000000000000000 00000000ffffe0f7 GPR12: 0000000048002840 c0000000016e0000 c00c000000021c80 c000000000fd6f60 GPR16: 0000000000000000 c00000000a104698 0000000000000003 c0000000087f0000 GPR20: 0000000000000100 c0000000070330b8 0000000000000000 0000000000000004 GPR24: 0000000002000000 0000000000000300 0000000002000000 c00000000a5b0c00 GPR28: 0000000000000000 000000000a000000 00007fffb2a90038 c00000000a4f3820 NIP [c00000000006e2f0] __do_page_fault+0x880/0xa90 LR [c00000000006e2ec] __do_page_fault+0x87c/0xa90 Call Trace: [c00000000a4f36c0] [c00000000006e2ec] __do_page_fault+0x87c/0xa90 (unreliable) [c00000000a4f3780] [c000000000e1c034] do_page_fault+0x34/0x90 [c00000000a4f37b0] [c000000000008908] data_access_common_virt+0x158/0x1b0 --- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4 NIP: c00000000009b028 LR: c000000000802978 CTR: 0000000000000800 REGS: c00000000a4f3820 TRAP: 0300 MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24004840 XER: 00000000 CFAR: c00000000009aff4 DAR: 00007fffb2a90038 DSISR: 0a000000 IRQMASK: 0 GPR00: 0000000000000000 c00000000a4f3ac0 c0000000014f0700 00007fffb2a90028 GPR04: c000000008720010 0000000000010000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 GPR12: 0000000000004000 c0000000016e0000 c00c000000021c80 c000000000fd6f60 GPR16: 0000000000000000 c00000000a104698 0000000000000003 c0000000087f0000 GPR20: 0000000000000100 c0000000070330b8 0000000000000000 0000000000000004 GPR24: c00000000a4f3c80 c000000008720000 0000000000010000 0000000000000000 GPR28: 0000000000010000 0000000008720000 0000000000010000 c000000001515b98 NIP [c00000000009b028] __copy_tofrom_user_base+0x9c/0x5a4 LR [c000000000802978] copyout+0x68/0xc0 --- interrupt: 300 [c00000000a4f3af0] [c0000000008074b8] copy_page_to_iter+0x188/0x540 [c00000000a4f3b50] [c00000000035c678] generic_file_buffered_read+0x358/0xd80 [c00000000a4f3c40] [c0000000004c1e90] blkdev_read_iter+0x50/0x80 [c00000000a4f3c60] [c00000000045733c] new_sync_read+0x12c/0x1c0 [c00000000a4f3d00] [c00000000045a1f0] vfs_read+0x1d0/0x240 [c00000000a4f3d50] [c00000000045a7f4] ksys_read+0x84/0x140 [c00000000a4f3da0] [c000000000033a60] system_call_exception+0x100/0x280 [c00000000a4f3e10] [c00000000000c508] system_call_common+0xf8/0x2f8 Instruction dump: eae10078 3be0000b 4bfff890 60420000 792917e1 4182ff18 3c82ffab 3884a5e0 3c62ffab 3863a6e8 480ba891 60000000 <0fe00000> 3be0000b 4bfff860 e93c0938 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201107023305.2384874-1-npiggin@gmail.com
|
#
b6254ced |
|
18-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/signal: Don't manage floating point regs when no FPU There is no point in copying floating point regs when there is no FPU and MATH_EMULATION is not selected. Create a new CONFIG_PPC_FPU_REGS bool that is selected by CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to opt out everything related to fp_state in thread_struct. The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU as fpu.S build is conditionnal to CONFIG_PPC_FPU. The following app spends approx 8.1 seconds system time on an 8xx without the patch, and 7.0 seconds with the patch (13.5% reduction). On an 832x, it spends approx 2.6 seconds system time without the patch and 2.1 seconds with the patch (19% reduction). void sigusr1(int sig) { } int main(int argc, char **argv) { int i = 100000; signal(SIGUSR1, sigusr1); for (;i--;) raise(SIGUSR1); exit(0); } Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
|
#
d208e13c |
|
16-Sep-2020 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/process: Fix uninitialised variable error Clang, and GCC with -Wmaybe-uninitialized, can't see that val is unused in get_fpexec_mode(): arch/powerpc/kernel/process.c:1940:7: error: variable 'val' is used uninitialized whenever 'if' condition is true if (cpu_has_feature(CPU_FTR_SPE)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ We know that CPU_FTR_SPE will only be true iff CONFIG_SPE is also true, but the compiler doesn't. Avoid it by initialising val to zero. Reported-by: kernel test robot <lkp@intel.com> Fixes: 532ed1900d37 ("powerpc/process: Remove useless #ifdef CONFIG_SPE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20200917024509.3253837-1-mpe@ellerman.id.au
|
#
c83c192a |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove useless #ifdef CONFIG_PPC_FPU Add a stub for __giveup_fpu() when CONFIG_PPC_FPU is not selected, as done for CONFIG_SPE and CONFIG_ALTIVEC. This allows to remove some #ifdef CONFIG_PPC_FPU. Also change one to IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/69c8b7954ceeccc6b849e52e1fa41b3a0f10f6c1.1597643221.git.christophe.leroy@csgroup.eu
|
#
532ed190 |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove useless #ifdef CONFIG_SPE cpu_has_feature(CPU_FTR_SPE) returns false when CONFIG_SPE is not set. There is no need to enclose the test in an #ifdef CONFIG_SPE. Remove it. CPU_FTR_SPE only exists on 32 bits. Define it as 0 on 64 bits. We have a couple of places like: #ifdef CONFIG_SPE if (cpu_has_feature(CPU_FTR_SPE)) { do_something_that_requires_CONFIG_SPE } else { return -EINVAL; } #else return -EINVAL; #endif Replace them by a cleaner version: if (cpu_has_feature(CPU_FTR_SPE)) { #ifdef CONFIG_SPE do_something_that_requires_CONFIG_SPE #endif } else { return -EINVAL; } When CONFIG_SPE is not set, this resolves to an unconditional return of -EINVAL Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/698df8387555765b70ea42e4a7fa48141c309c1f.1597643221.git.christophe.leroy@csgroup.eu
|
#
e3667ee4 |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove useless #ifdef CONFIG_ALTIVEC cpu_has_feature(CPU_FTR_ALTIVEC) returns false when CONFIG_ALTIVEC is not set. There is no need to enclose the test in an #ifdef CONFIG_ALTIVEC. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/03ba6b52344ca7c336df2bc6e3d31d736c804ae2.1597643221.git.christophe.leroy@csgroup.eu
|
#
80739c2b |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove useless #ifdef CONFIG_VSX cpu_has_feature(CPU_FTR_VSX) returns false when CONFIG_VSX is not set. There is no need to enclose the test in an #ifdef CONFIG_VSX. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0eb61cf0dc66d781d47deb2228498cd61d03a754.1597643221.git.christophe.leroy@csgroup.eu
|
#
60d62bfd |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Tag an #endif to help locate the matching #ifdef. That #endif is more than 100 lines after the matching #ifdef, and there are several #ifdef/#else/#endif inbetween. Tag it as /* CONFIG_PPC_BOOK3S_64 */ to help locate the matching #ifdef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3612a8f8aaca16de3fc414a7e66293319d6e213c.1597643147.git.christophe.leroy@csgroup.eu
|
#
8f020c7c |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Replace #ifdef CONFIG_KALLSYMS by IS_ENABLED() The #ifdef CONFIG_KALLSYMS encloses some printk which can compile in all cases. Replace by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2d89732a9062b2cf2651728804e4b8f6c9b9358e.1597643164.git.christophe.leroy@csgroup.eu
|
#
2ec42996 |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Replace an #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) by IS_ENABLED() The #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) encloses some printk which can be compiled in all cases. Replace by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a1b6ef3d657c8f249193442f56868fc358ea5b6c.1597643160.git.christophe.leroy@csgroup.eu
|
#
bfac2799 |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Replace an #ifdef CONFIG_PPC_BOOK3S_64 by IS_ENABLED() This #ifdef CONFIG_PPC_BOOK3S_64 calls preload_new_slb_context() when radix is not enabled. radix_enabled() is always defined, and the prototype for preload_new_slb_context() is always present, so the #ifdef is unneeded. Replace it by IS_ENABLED(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d31506ca9bac9def68cf7424eded63fdc4fb6660.1597643167.git.christophe.leroy@csgroup.eu
|
#
04d476bf |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Replace an #ifdef CONFIG_PPC_47x by IS_ENABLED() isync() is always defined, no need for an #ifdef. Replace it by IS_ENABLED(CONFIG_PPC_47x). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ac8da0e3baa91dda805e1e492fd65aecd90c1fb5.1597643156.git.christophe.leroy@csgroup.eu
|
#
5b905d77 |
|
01-Sep-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Fix exception handling for CONFIG_HAVE_HW_BREAKPOINT=N On powerpc, ptrace watchpoint works in one-shot mode. i.e. kernel disables event every time it fires and user has to re-enable it. Also, in case of ptrace watchpoint, kernel notifies ptrace user before executing instruction. With CONFIG_HAVE_HW_BREAKPOINT=N, kernel is missing to disable ptrace event and thus it's causing infinite loop of exceptions. This is especially harmful when user watches on a data which is also read/written by kernel, eg syscall parameters. In such case, infinite exceptions happens in kernel mode which causes soft-lockup. Fixes: 9422de3e953d ("powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902042945.129369-6-ravi.bangoria@linux.ibm.com
|
#
dc462267 |
|
25-Aug-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: handle ISA v3.1 local copy-paste context switches The ISA v3.1 the copy-paste facility has a new memory move functionality which allows the copy buffer to be pasted to domestic memory (RAM) as opposed to foreign memory (accelerator). This means the POWER9 trick of avoiding the cp_abort on context switch if the process had not mapped foreign memory does not work on POWER10. Do the cp_abort unconditionally there. KVM must also cp_abort on guest exit to prevent copy buffer state leaking between contexts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075535.224536-1-npiggin@gmail.com
|
#
353bce21 |
|
16-Aug-2020 |
Christophe Leroy <christophe.leroy@csgroup.eu> |
powerpc/process: Remove unnecessary #ifdef CONFIG_FUNCTION_GRAPH_TRACER ftrace_graph_ret_addr() is always defined and returns 'ip' when CONFIG_FUNCTION GRAPH_TRACER is not set. So the #ifdef is not needed, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9d11143d4e27ba8274369a926968756917584868.1597643153.git.christophe.leroy@csgroup.eu
|
#
b91eb518 |
|
25-Aug-2020 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") changed some of the handling of floating point/vector restore. In particular it caused current->thread.fpexc_mode to be copied into the current MSR (via msr_check_and_set()), rather than just into regs->msr (which is moved into MSR on return to userspace). This can lead to a crash in the kernel if we take a floating point exception when restoring FPSCR: Oops: Exception in kernel mode, sig: 8 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9 NIP: c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570 REGS: c0000016b7cfb3b0 TRAP: 0700 Not tainted (5.9.0-rc1-00098-g18445bf405cb-dirty) MSR: 900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002444 XER: 00000000 CFAR: c00000000001a7a8 IRQMASK: 1 GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740 GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00 GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000 GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58 GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638 GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0 GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28 GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900 NIP load_fp_state+0x4/0x214 LR restore_math+0x17c/0x1f0 Call Trace: 0xc0000016b7cfb680 (unreliable) __switch_to+0x330/0x460 __schedule+0x318/0x920 schedule+0x74/0x140 schedule_timeout+0x318/0x3f0 wait_for_completion+0xc8/0x210 call_usermodehelper_exec+0x234/0x280 do_coredump+0xedc/0x13c0 get_signal+0x1d4/0xbe0 do_notify_resume+0x1a0/0x490 interrupt_exit_user_prepare+0x1c4/0x230 interrupt_return+0x14/0x1c0 Instruction dump: ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4 782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010 Fix it by only loading the fpexc_mode value into regs->msr. Also add a comment to explain that although VSX is subject to the value of fpexc_mode, we don't have to handle that separately because we only allow VSX to be enabled if FP is also enabled. Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
|
#
7fa95f9a |
|
11-Jun-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: system call support for scv/rfscv instructions Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
|
#
01eb0187 |
|
23-Jun-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Fix restore_math unnecessarily changing MSR Before returning to user, if there are missing FP/VEC/VSX bits from the user MSR then those registers had been saved and must be restored again before use. restore_math will decide whether to restore immediately, or skip the restore and let fp/vec/vsx unavailable faults demand load the registers. Each time restore_math restores one of the FP/VSX or VEC register sets is loaded, an 8-bit counter is incremented (load_fp and load_vec). When these wrap to zero, restore_math no longer restores that register set until after they are next demand faulted. It's quite usual for those counters to have different values, so if one wraps to zero and restore_math no longer restores its registers or user MSR bit but the other is not zero yet does not need to be restored (because the kernel is not frequently using the FPU), then restore_math will be called and it will also not return in the early exit check. This causes msr_check_and_set to test and set the MSR at every kernel exit despite having no work to do. This can cause workloads (e.g., a NULL syscall microbenchmark) to run fast for a time while both counters are non-zero, then slow down when one of the counters reaches zero, then speed up again after the second counter reaches zero. The cost is significant, about 10% slowdown on a NULL syscall benchmark, and the jittery behaviour is very undesirable. Fix this by having restore_math test all conditions first, and only update MSR if we will be loading registers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
|
#
891b4fe8 |
|
23-Jun-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: restore_math remove TM test The TM test in restore_math added by commit dc16b553c949e ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") is no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts"), which removed the cases where restore_math has to restore if TM is active. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
|
#
714acdbd |
|
11-Jun-2020 |
Christian Brauner <christian.brauner@ubuntu.com> |
arch: rename copy_thread_tls() back to copy_thread() Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls() back simply copy_thread(). It's a simpler name, and doesn't imply that only tls is copied here. This finishes an outstanding chunk of internal process creation work since we've added clone3(). Cc: linux-arch@vger.kernel.org Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>A Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Greentime Hu <green.hu@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>A Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
#
25f12ae4 |
|
17-Jun-2020 |
Christoph Hellwig <hch@lst.de> |
maccess: rename probe_kernel_address to get_kernel_nofault Better describe what this helper does, and match the naming of copy_from_kernel_nofault. Also switch the argument order around, so that it acts and looks like get_user(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c0ee37e8 |
|
17-Jun-2020 |
Christoph Hellwig <hch@lst.de> |
maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault Better describe what these functions do. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a6e2c226 |
|
24-May-2020 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL With CONFIG_DEBUG_VIRTUAL=y, we can hit a BUG() if we take a hard lockup watchdog interrupt when in OPAL mode. This happens in show_instructions() if the kernel takes the watchdog NMI IPI, or any other interrupt, with MSR_IR == 0. show_instructions() updates the variable pc in the loop and the second iteration will result in BUG(). We hit the BUG_ON due the below check in __va() #define __va(x) ({ VIRTUAL_BUG_ON((unsigned long)(x) >= PAGE_OFFSET); (void *)(unsigned long)((phys_addr_t)(x) | PAGE_OFFSET); }) Fix it by moving the check out of the loop. Also update nip so that the nip == pc check still matches. Fixes: 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Use IS_ENABLED(), massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200524093822.423487-1-aneesh.kumar@linux.ibm.com
|
#
e31cf2f4 |
|
08-Jun-2020 |
Mike Rapoport <rppt@kernel.org> |
mm: don't include asm/pgtable.h if linux/mm.h is already included Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9cb8f069 |
|
08-Jun-2020 |
Dmitry Safonov <0x7f454c46@gmail.com> |
kernel: rename show_stack_loglvl() => show_stack() Now the last users of show_stack() got converted to use an explicit log level, show_stack_loglvl() can drop it's redundant suffix and become once again well known show_stack(). Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200418201944.482088-51-dima@arista.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b9677a8c |
|
08-Jun-2020 |
Dmitry Safonov <0x7f454c46@gmail.com> |
powerpc: add show_stack_loglvl() Currently, the log-level of show_stack() depends on a platform realization. It creates situations where the headers are printed with lower log level or higher than the stacktrace (depending on a platform or user). Furthermore, it forces the logic decision from user to an architecture side. In result, some users as sysrq/kdb/etc are doing tricks with temporary rising console_loglevel while printing their messages. And in result it not only may print unwanted messages from other CPUs, but also omit printing at all in the unlucky case where the printk() was deferred. Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier approach than introducing more printk buffers. Also, it will consolidate printings with headers. Introduce show_stack_loglvl(), that eventually will substitute show_stack(). [1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#u Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/20200418201944.482088-27-dima@arista.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
74c68810 |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Prepare handler to handle more than one watchpoint Currently we assume that we have only one watchpoint supported by hw. Get rid of that assumption and use dynamic loop instead. This should make supporting more watchpoints very easy. With more than one watchpoint, exception handler needs to know which DAWR caused the exception, and hw currently does not provide it. So we need sw logic for the same. To figure out which DAWR caused the exception, check all different combinations of user specified range, DAWR address range, actual access range and DAWRX constrains. For ex, if user specified range and actual access range overlaps but DAWRX is configured for readonly watchpoint and the instruction is store, this DAWR must not have caused exception. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Michael Neuling <mikey@neuling.org> [mpe: Unsplit multi-line printk() strings, fix some sparse warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200514111741.97993-14-ravi.bangoria@linux.ibm.com
|
#
e68ef121 |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Use builtin ALIGN*() macros Currently we calculate hw aligned start and end addresses manually. Replace them with builtin ALIGN_DOWN() and ALIGN() macros. So far end_addr was inclusive but this patch makes it exclusive (by avoiding -1) for better readability. Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-13-ravi.bangoria@linux.ibm.com
|
#
6b424efa |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Use loop for thread_struct->ptrace_bps ptrace_bps is already an array of size HBP_NUM_MAX. But we use hardcoded index 0 while fetching/updating it. Convert such code to loop over array. ptrace interface to use multiple watchpoint remains same. eg: two PPC_PTRACE_SETHWDEBUG calls will create two watchpoint if underneath hw supports it. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-11-ravi.bangoria@linux.ibm.com
|
#
303e6a9d |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Convert thread_struct->hw_brk to an array So far powerpc hw supported only one watchpoint. But Power10 is introducing 2nd DAWR. Convert thread_struct->hw_brk into an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-10-ravi.bangoria@linux.ibm.com
|
#
4a8a9379 |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Provide DAWR number to __set_breakpoint Introduce new parameter 'nr' to __set_breakpoint() which indicates which DAWR should be programed. Also convert current_brk variable to an array. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-7-ravi.bangoria@linux.ibm.com
|
#
a18b8346 |
|
14-May-2020 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Provide DAWR number to set_dawr Introduce new parameter 'nr' to set_dawr() which indicates which DAWR should be programed. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-6-ravi.bangoria@linux.ibm.com
|
#
912237ea |
|
07-May-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: trap_is_syscall() helper to hide syscall trap number A new system call interrupt will be added with a new trap number. Hide the explicit 0xc00 test behind an accessor to reduce churn in callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-3-mpe@ellerman.id.au
|
#
feb9df34 |
|
07-May-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Always has full regs, so remove remnant checks Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-1-mpe@ellerman.id.au
|
#
c420644c |
|
16-Apr-2020 |
Haren Myneni <haren@linux.ibm.com> |
powerpc: Use mm_context vas_windows counter to issue CP_ABORT set_thread_uses_vas() sets used_vas flag for a process that opened VAS window and issue CP_ABORT during context switch for only that process. In multi-thread application, windows can be shared. For example Thread A can open a window and Thread B can run COPY/PASTE instructions to send NX request which may cause corruption or snooping or a covert channel Also once this flag is set, continue to run CP_ABORT even the VAS window is closed. So define vas-windows counter in process mm_context, increment this counter for each window open and decrement it for window close. If vas-windows is set, issue CP_ABORT during context switch. It means clear the foreign real address mapping only if the process / thread uses COPY/PASTE. Then disable it for that process if windows are not open. Moved set_thread_uses_vas() code to vas_tx_win_open() as this functionality is needed only for userspace open windows. We are adding VAS userspace support along with this fix. So no need to include this fix in stable releases. Fixes: 9d2a4d71332c ("powerpc: Define set_thread_uses_vas()") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reported-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Milton Miller <miltonm@us.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1587017291.2275.1077.camel@hbabu-laptop
|
#
6cc0c16d |
|
25-Feb-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Implement interrupt exit logic in C Implement the bulk of interrupt return logic in C. The asm return code must handle a few cases: restoring full GPRs, and emulating stack store. The stack store emulation is significantly simplfied, rather than creating a new return frame and switching to that before performing the store, it uses the PACA to keep a scratch register around to perform the store. The asm return code is moved into 64e for now. The new logic has made allowance for 64e, but I don't have a full environment that works well to test it, and even booting in emulated qemu is not great for stress testing. 64e shouldn't be too far off working with this, given a bit more testing and auditing of the logic. This is slightly faster on a POWER9 (page fault speed increases about 1.1%), probably due to reduced mtmsrd. mpe: Includes fixes from Nick for _TIF_EMULATE_STACK_STORE handling (including the fast_interrupt_return path), to remove trace_hardirqs_on(), and fixes the interrupt-return part of the MSR_VSX restore bug caught by tm-unavailable selftest. mpe: Incorporate fix from Nick: The return-to-kernel path has to replay any soft-pending interrupts if it is returning to a context that had interrupts soft-enabled. It has to do this carefully and avoid plain enabling interrupts if this is an irq context, which can cause multiple nesting of interrupts on the stack, and other unexpected issues. The code which avoided this case got the soft-mask state wrong, and marked interrupts as enabled before going around again to retry. This seems to be mostly harmless except when PREEMPT=y, this calls preempt_schedule_irq with irqs apparently enabled and runs into a BUG in kernel/sched/core.c Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-29-npiggin@gmail.com
|
#
a2e36683 |
|
25-Mar-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: mark emergency stacks valid to unwind Before: WARNING: CPU: 0 PID: 494 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 494 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000d13190 CTR: c00000000003f910 REGS: c0000001fffd3870 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000aeb12c c0000001fffd3b00 c0000000012ba300 0000000000000000 GPR04: 0000000000000000 0000000000000000 000000010bd207c8 6b00696e74657272 GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 000000010bd207bc GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000d13190] _raw_spin_unlock_irqrestore+0x50/0xc0 Call Trace: Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 After: WARNING: CPU: 0 PID: 499 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 499 Comm: a Not tainted NIP: c00000000001ed2c LR: c000000000d13210 CTR: c00000000003f980 REGS: c0000001fffd3870 TRAP: 0700 Not tainted MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000aeb1ac c0000001fffd3b00 c0000000012ba300 0000000000000000 GPR04: 0000000000000000 0000000000000000 00000001347607c8 6b00696e74657272 GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 00000001347607bc GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000d13210] _raw_spin_unlock_irqrestore+0x50/0xc0 Call Trace: [c0000001fffd3b20] [c000000000aeb1ac] of_find_property+0x6c/0x90 [c0000001fffd3b70] [c000000000aeb1f0] of_get_property+0x20/0x40 [c0000001fffd3b90] [c000000000042cdc] rtas_token+0x3c/0x70 [c0000001fffd3bb0] [c0000000000dc318] fwnmi_release_errinfo+0x28/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x1347607c8 LR = 0x7fffafbd8328 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200325104144.158362-1-npiggin@gmail.com
|
#
3d13e839 |
|
20-Feb-2020 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Rename current_stack_pointer() to current_stack_frame() current_stack_pointer(), which was called __get_SP(), used to just return the value in r1. But that caused problems in some cases, so it was turned into a function in commit bfe9a2cfe91a ("powerpc: Reimplement __get_SP() as a function not a define"). Because it's a function in a separate compilation unit to all its callers, it has the effect of causing a stack frame to be created, and then returns the address of that frame. This is good in some cases like those described in the above commit, but in other cases it's overkill, we just need to know what stack page we're on. On some other arches current_stack_pointer is just a register global giving the stack pointer, and we'd like to do that too. So rename our current_stack_pointer() to current_stack_frame() to make that possible. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/20200220115141.2707-1-mpe@ellerman.id.au
|
#
ba32f4b0 |
|
29-Jan-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls() is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6ecbda05b4119c40222dc8ec284604e1597c9bff.1580327381.git.christophe.leroy@c-s.fr
|
#
def0bfdb |
|
23-Jan-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: use probe_user_read() and probe_user_write() Instead of opencoding, use probe_user_read() to failessly read a user location and probe_user_write() for writing to user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
|
#
39413ae0 |
|
26-Nov-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size. Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but it has a breakpoint support based on a set of comparators which allow more flexibility. Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint") implemented breakpoints by emulating the DABR behaviour. It did this by setting one comparator the match 4 bytes at breakpoint address and the other comparator to match 4 bytes at breakpoint address + 4. Rewrite 8xx hw_breakpoint to make breakpoints match all addresses defined by the breakpoint address and length by making full use of comparators. Now, comparator E is set to match any address greater than breakpoint address minus one. Comparator F is set to match any address lower than breakpoint address plus breakpoint length. Addresses are aligned to 32 bits. When the breakpoint range starts at address 0, the breakpoint is set to match comparator F only. When the breakpoint range end at address 0xffffffff, the breakpoint is set to match comparator E only. Otherwise the breakpoint is set to match comparator E and F. At the same time, use registers bit names instead of hardcoded values. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
|
#
b57aeab8 |
|
17-Oct-2019 |
Ravi Bangoria <ravi.bangoria@linux.ibm.com> |
powerpc/watchpoint: Fix length calculation for unaligned target Watchpoint match range is always doubleword(8 bytes) aligned on powerpc. If the given range is crossing doubleword boundary, we need to increase the length such that next doubleword also get covered. Ex, address len = 6 bytes |=========. |------------v--|------v--------| | | | | | | | | | | | | | | | | | |---------------|---------------| <---8 bytes---> In such case, current code configures hw as: start_addr = address & ~HW_BREAKPOINT_ALIGN len = 8 bytes And thus read/write in last 4 bytes of the given range is ignored. Fix this by including next doubleword in the length. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191017093204.7511-3-ravi.bangoria@linux.ibm.com
|
#
7c1bb6bb |
|
05-Sep-2019 |
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> |
powerpc: Use ftrace_graph_ret_addr() when unwinding With support for HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, ftrace_graph_ret_addr() provides more robust unwinding when function graph is in use. Update show_stack() to use the same. With dump_stack() added to sysrq_sysctl_handler(), before this patch: root@(none):/sys/kernel/debug/tracing# cat /proc/sys/kernel/sysrq CPU: 0 PID: 218 Comm: cat Not tainted 5.3.0-rc7-00868-g8453ad4a078c-dirty #20 Call Trace: [c0000000d1e13c30] [c00000000006ab98] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e13c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e13cd0] [c00000000006ab98] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e13d60] [c00000000006ab98] return_to_handler+0x0/0x40 (return_to_handler+0x0/0x40) [c0000000d1e13d80] [c00000000006ab98] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e13dd0] [c00000000006ab98] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e13e20] [c00000000006ab98] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) After this patch: Call Trace: [c0000000d1e33c30] [c00000000006ab58] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e33c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e33cd0] [c00000000006ab58] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e33d60] [c00000000006ab58] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e33d80] [c00000000006ab58] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e33dd0] [c00000000006ab58] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) [c0000000d1e33e20] [c00000000006ab58] return_to_handler+0x0/0x40 (system_call+0x5c/0x68) Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dc89c9a887121342d9c7819482c3dabdece2a323.1567707399.git.naveen.n.rao@linux.vnet.ibm.com
|
#
a8318c13 |
|
03-Sep-2019 |
Gustavo Romero <gromero@linux.ibm.com> |
powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts When in userspace and MSR FP=0 the hardware FP state is unrelated to the current process. This is extended for transactions where if tbegin is run with FP=0, the hardware checkpoint FP state will also be unrelated to the current process. Due to this, we need to ensure this hardware checkpoint is updated with the correct state before we enable FP for this process. Unfortunately we get this wrong when returning to a process from a hardware interrupt. A process that starts a transaction with FP=0 can take an interrupt. When the kernel returns back to that process, we change to FP=1 but with hardware checkpoint FP state not updated. If this transaction is then rolled back, the FP registers now contain the wrong state. The process looks like this: Userspace: Kernel Start userspace with MSR FP=0 TM=1 < ----- ... tbegin bne Hardware interrupt ---- > <do_IRQ...> .... ret_from_except restore_math() /* sees FP=0 */ restore_fp() tm_active_with_fp() /* sees FP=1 (Incorrect) */ load_fp_state() FP = 0 -> 1 < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk When returning from the hardware exception, tm_active_with_fp() is incorrectly making restore_fp() call load_fp_state() which is setting FP=1. The fix is to remove tm_active_with_fp(). tm_active_with_fp() is attempting to handle the case where FP state has been changed inside a transaction. In this case the checkpointed and transactional FP state is different and hence we must restore the FP state (ie. we can't do lazy FP restore inside a transaction that's used FP). It's safe to remove tm_active_with_fp() as this case is handled by restore_tm_state(). restore_tm_state() detects if FP has been using inside a transaction and will set load_fp and call restore_math() to ensure the FP state (checkpoint and transaction) is restored. This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. Similarly for VMX. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c This fixes CVE-2019-15031. Fixes: a7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904045529.23002-2-gromero@linux.vnet.ibm.com
|
#
8205d5d9 |
|
03-Sep-2019 |
Gustavo Romero <gromero@linux.ibm.com> |
powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction When we take an FP unavailable exception in a transaction we have to account for the hardware FP TM checkpointed registers being incorrect. In this case for this process we know the current and checkpointed FP registers must be the same (since FP wasn't used inside the transaction) hence in the thread_struct we copy the current FP registers to the checkpointed ones. This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr to determine if FP was on when in userspace. thread->ckpt_regs.msr represents the state of the MSR when exiting userspace. This is setup by check_if_tm_restore_required(). Unfortunatley there is an optimisation in giveup_all() which returns early if tsk->thread.regs->msr (via local variable `usermsr`) has FP=VEC=VSX=SPE=0. This optimisation means that check_if_tm_restore_required() is not called and hence thread->ckpt_regs.msr is not updated and will contain an old value. This can happen if due to load_fp=255 we start a userspace process with MSR FP=1 and then we are context switched out. In this case thread->ckpt_regs.msr will contain FP=1. If that same process is then context switched in and load_fp overflows, MSR will have FP=0. If that process now enters a transaction and does an FP instruction, the FP unavailable will not update thread->ckpt_regs.msr (the bug) and MSR FP=1 will be retained in thread->ckpt_regs.msr. tm_reclaim_thread() will then not perform the required memcpy and the checkpointed FP regs in the thread struct will contain the wrong values. The code path for this happening is: Userspace: Kernel Start userspace with MSR FP/VEC/VSX/SPE=0 TM=1 < ----- ... tbegin bne fp instruction FP unavailable ---- > fp_unavailable_tm() tm_reclaim_current() tm_reclaim_thread() giveup_all() return early since FP/VMX/VSX=0 /* ckpt MSR not updated (Incorrect) */ tm_reclaim() /* thread_struct ckpt FP regs contain junk (OK) */ /* Sees ckpt MSR FP=1 (Incorrect) */ no memcpy() performed /* thread_struct ckpt FP regs not fixed (Incorrect) */ tm_recheckpoint() /* Put junk in hardware checkpoint FP regs */ .... < ----- Return to userspace with MSR TM=1 FP=1 with junk in the FP TM checkpoint TM rollback reads FP junk This is a data integrity problem for the current process as the FP registers are corrupted. It's also a security problem as the FP registers from one process may be leaked to another. This patch moves up check_if_tm_restore_required() in giveup_all() to ensure thread->ckpt_regs.msr is updated correctly. A simple testcase to replicate this will be posted to tools/testing/selftests/powerpc/tm/tm-poison.c Similarly for VMX. This fixes CVE-2019-15030. Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904045529.23002-1-gromero@linux.vnet.ibm.com
|
#
facd04a9 |
|
26-Aug-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: convert to copy_thread_tls Commit 3033f14ab78c3 ("clone: support passing tls argument via C rather than pt_regs magic") introduced the HAVE_COPY_THREAD_TLS option. Use it to avoid a subtle assumption about the argument ordering of clone type syscalls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190827033010.28090-2-npiggin@gmail.com
|
#
a278e7ea |
|
03-Jun-2019 |
Michael Neuling <mikey@neuling.org> |
powerpc: Fix compile issue with force DAWR If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This moves a bunch of code around to fix this. It moves a lot of the DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable compiling it. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Minor formatting in set_dawr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
2e1661d2 |
|
23-May-2019 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove the task parameter from force_sig_fault As synchronous exceptions really only make sense against the current task (otherwise how are you synchronous) remove the task parameter from from force_sig_fault to make it explicit that is what is going on. The two known exceptions that deliver a synchronous exception to a stopped ptraced task have already been changed to force_sig_fault_to_task. The callers have been changed with the following emacs regular expression (with obvious variations on the architectures that take more arguments) to avoid typos: force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)] -> force_sig_fault(\1,\2,\3) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
e2b36d59 |
|
01-May-2019 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Don't trace code that runs with the soft irq mask unreconciled "Reconciling" in terms of interrupt handling, is to bring the soft irq mask state in to synch with the hardware, after an interrupt causes MSR[EE] to be cleared (while the soft mask may be enabled, and hard irqs not marked disabled). General kernel code should not be called while unreconciled, because local_irq_disable, etc. manipulations can cause surprising irq traces, and it's fragile because the soft irq code does not really expect to be called in this situation. When exiting from an interrupt, MSR[EE] is cleared to prevent races, but soft irq state is enabled for the returned-to context, so this is now an unreconciled state. restore_math is called in this state, and that can be ftraced, and the ftrace subsystem disables local irqs. Mark restore_math and its callees as notrace. Restore a sanity check in the soft irq code that had to be disabled for this case, by commit 4da1f79227ad4 ("powerpc/64: Disable irq restore warning for now"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a5ae043d |
|
13-Mar-2019 |
Mathieu Malaterre <malat@debian.org> |
powerpc/64s: Remove 'dummy_copy_buffer' In commit 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") the function __switch_to remove usage for 'dummy_copy_buffer'. Since it is not used anywhere else, remove it completely. This remove the following warning: arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined but not used Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c1fe190c |
|
01-Apr-2019 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add force enable of DAWR on P9 option This adds a flag so that the DAWR can be enabled on P9 via: echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous The DAWR was previously force disabled on POWER9 in: 9654153158 powerpc: Disable DAWR in the base POWER9 CPU features Also see Documentation/powerpc/DAWR-POWER9.txt This is a dangerous setting, USE AT YOUR OWN RISK. Some users may not care about a bad user crashing their box (ie. single user/desktop systems) and really want the DAWR. This allows them to force enable DAWR. This flag can also be used to disable DAWR access. Once this is cleared, all DAWR access should be cleared immediately and your machine once again safe from crashing. Userspace may get confused by toggling this. If DAWR is force enabled/disabled between getting the number of breakpoints (via PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an inconsistent view of what's available. Similarly for guests. For the DAWR to be enabled in a KVM guest, the DAWR needs to be force enabled in the host AND the guest. For this reason, this won't work on POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the dawr_enable_dangerous file will fail if the hypervisor doesn't support writing the DAWR. To double check the DAWR is working, run this kernel selftest: tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c Any errors/failures/skips mean something is wrong. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f89bd8ba |
|
08-Apr-2019 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
powerpc/mm/radix: Don't do SLB preload when using the radix MMU Add radix_enabled() check to avoid SLB preload with radix translation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a7916a1d |
|
31-Jan-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: regain entire stack space thread_info is not anymore in the stack, so the entire stack can now be used. There is also no risk anymore of corrupting task_cpu(p) with a stack overflow so the patch removes the test. When doing this, an explicit test for NULL stack pointer is needed in validate_sp() as it is not anymore implicitely covered by the sizeof(thread_info) gap. In the meantime, with the previous patch all pointers to the stacks are not anymore pointers to thread_info so this patch changes them to void* Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
ed1cd6de |
|
31-Jan-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Activate CONFIG_THREAD_INFO_IN_TASK This patch activates CONFIG_THREAD_INFO_IN_TASK which moves the thread_info into task_struct. Moving thread_info into task_struct has the following advantages: - It protects thread_info from corruption in the case of stack overflows. - Its address is harder to determine if stack addresses are leaked, making a number of attacks more difficult. This has the following consequences: - thread_info is now located at the beginning of task_struct. - The 'cpu' field is now in task_struct, and only exists when CONFIG_SMP is active. - thread_info doesn't have anymore the 'task' field. This patch: - Removes all recopy of thread_info struct when the stack changes. - Changes the CURRENT_THREAD_INFO() macro to point to current. - Selects CONFIG_THREAD_INFO_IN_TASK. - Modifies raw_smp_processor_id() to get ->cpu from current without including linux/sched.h to avoid circular inclusion and without including asm/asm-offsets.h to avoid symbol names duplication between ASM constants and C constants. - Modifies klp_init_thread_info() to take a task_struct pointer argument. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add task_stack.h to livepatch.h to fix build fails] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
05b98791 |
|
17-Jan-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Replace current_thread_info()->task with current We have a few places that use current_thread_info()->task to access current. This won't work with THREAD_INFO_IN_TASK so fix them now. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
018cce33 |
|
31-Jan-2019 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: prep stack walkers for THREAD_INFO_IN_TASK [text copied from commit 9bbd4c56b0b6 ("arm64: prep stack walkers for THREAD_INFO_IN_TASK")] When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed before a task is destroyed. To account for this, the stacks are refcounted, and when manipulating the stack of another task, it is necessary to get/put the stack to ensure it isn't freed and/or re-used while we do so. This patch reworks the powerpc stack walking code to account for this. When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no refcounting, and this should only be a structural change that does not affect behaviour. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move try_get_task_stack() below tsk == NULL check in show_stack()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
fe1ef6bc |
|
08-Feb-2019 |
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> |
powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest Commit 8792468da5e1 "powerpc: Add the ability to save FPU without giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from the bitmask used to update the MSR of the previous thread in __giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the host kernel. Leaving FE0/1 enabled means unrelated processes might receive FPEs when they're not expecting them and crash. In particular if this happens to init the host will then panic. eg (transcribed): qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0 systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reinstate these bits to the MSR bitmask to enable MacOS guests to run under 32-bit KVM-PR once again without issue. Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
0fad8bfe |
|
06-Dec-2018 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
powerpc/frace: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack The structure of the ret_stack array on the task struct is going to change, and accessing it directly via the curr_ret_stack index will no longer give the ret_stack entry that holds the return address. To access that, architectures must now use ftrace_graph_get_ret_stack() to get the associated ret_stack that matches the saved return address. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
#
e9013785 |
|
24-Oct-2018 |
Felipe Rechia <felipe.rechia@datacom.com.br> |
powerpc/process: Fix flush_all_to_thread for SPE Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e764e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e764e ("powerpc: create flush_all_to_thread()") Signed-off-by: Felipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5434ae74 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Add a SLB preload cache When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
425d3314 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4c2de74c |
|
12-Oct-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Interrupts save PPR on stack rather than thread_struct PPR is the odd register out when it comes to interrupt handling, it is saved in current->thread.ppr while all others are saved on the stack. The difficulty with this is that accessing thread.ppr can cause a SLB fault, but the SLB fault handler implementation in C change had assumed the normal exception entry handlers would not cause an SLB fault. Fix this by allocating room in the interrupt stack to save PPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
df13102f |
|
06-Oct-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/process: Constify the number of insns printed by show instructions functions. instructions_to_print var is assigned value 16 and there is no way to change it. This patch replaces it by a constant. Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
fb2d9505 |
|
06-Oct-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/process: Fix interleaved output in show_user_instructions() When two processes crash at the same time, we sometimes encounter interleaving in the middle of a line: init[1]: segfault (11) at 0 nip 0 lr 0 code 1 init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code 1 in sh[10000000+14000] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be00000 7d3e4b78 3bbd0c20 3b600000 init[74]: code: 3b9d0040 7c7fe02e 2f830000 419e0028 <89230000> 2f890000 41be001c 4b7f6e79 This patch fixes it by preparing complete lines in a buffer and printing it at once. Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Use seq_buf_printf() not seq_buf_puts() which doesn't NULL terminate] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c9386bfd |
|
08-Oct-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/process: Add missing include of stacktrace.h As spotted by sparse: arch/powerpc/kernel/process.c:1302:6: warning: symbol 'show_user_instructions' was not declared. Should it be static? Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3b35bd48 |
|
06-Oct-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/process: Fix sparse address space warnings This patch fixes the following warnings, which are leftovers from when __get_user() was replaced by probe_kernel_address(). arch/powerpc/kernel/process.c:1287:22: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1287:22: expected void const *src arch/powerpc/kernel/process.c:1287:22: got unsigned int [noderef] <asn:1>*<noident> arch/powerpc/kernel/process.c:1319:21: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1319:21: expected void const *src arch/powerpc/kernel/process.c:1319:21: got unsigned int [noderef] <asn:1>*<noident> Fixes: 7b051f665c32d ("powerpc: Use probe_kernel_address in show_instructions") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a932ed3b |
|
05-Oct-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Don't print kernel instructions in show_user_instructions() Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5c784c84 |
|
16-Aug-2018 |
Breno Leitao <leitao@debian.org> |
powerpc/tm: Remove msr_tm_active() Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set. This function is not necessary, since MSR_TM_ACTIVE() just do the same and could be used, removing the dualism and simplifying the code. This patchset remove every instance of msr_tm_active() and replaced it by MSR_TM_ACTIVE(). Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
54be0b9c |
|
02-Oct-2018 |
Michael Ellerman <mpe@ellerman.id.au> |
Revert "convert SLB miss handlers to C" and subsequent commits This reverts commits: 5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C") 8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca") 655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps") 2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup") 89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache") This series had a few bugs, and the fixes are not all trivial. So revert most of it for now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f383d8b4 |
|
18-Sep-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/powerpc: Use force_sig_fault where appropriate Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
89ca4e12 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: Add a SLB preload cache When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2e162674 |
|
14-Sep-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
88b0fe17 |
|
01-Aug-2018 |
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> |
powerpc: Add show_user_instructions() show_user_instructions() is a slightly modified version of show_instructions() that allows userspace instruction dump. This will be useful within show_signal_msg() to dump userspace instructions of the faulty location. Here is a sample of what show_user_instructions() outputs: pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040 The current->comm and current->pid printed can serve as a glue that links the instructions dump to its originator, allowing messages to be interleaved in the logs. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b5ac51d7 |
|
05-Jul-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: declare set_breakpoint() static set_breakpoint() is only used in process.c so make it static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
edd00b83 |
|
31-Jan-2018 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc/tm: Remove struct thread_info param from tm_reclaim_thread() Since commit dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") tm_reclaim_thread() doesn't use the parameter anymore, both callers have to bother getting it as they have no need for a struct thread_info either. Just remove it and adjust the callers. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c76662e8 |
|
17-Jul-2018 |
Ram Pai <linuxram@us.ibm.com> |
powerpc/pkeys: Save the pkey registers before fork When a thread forks the contents of AMR, IAMR, UAMOR registers in the newly forked thread are not inherited. Save the registers before forking, for content of those registers to be automatically copied into the new thread. Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2bf1071a |
|
05-Jul-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove POWER9 DD1 support POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
71cc64a8 |
|
11-May-2018 |
Alastair D'Silva <alastair@d-silva.org> |
powerpc: use task_pid_nr() for TID allocation The current implementation of TID allocation, using a global IDR, may result in an errant process starving the system of available TIDs. Instead, use task_pid_nr(), as mentioned by the original author. The scenario described which prevented it's use is not applicable, as set_thread_tidr can only be called after the task struct has been populated. In the unlikely event that 2 threads share the TID and are waiting, all potential outcomes have been determined safe. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3449f191 |
|
11-May-2018 |
Alastair D'Silva <alastair@d-silva.org> |
powerpc: Use TIDR CPU feature to control TIDR allocation Switch the use of TIDR on it's CPU feature, rather than assuming it is available based on architecture. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3130a7bb |
|
09-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: change softe to irqmask in show_regs and xmon When the soft enabled flag was changed to a soft disable mask, xmon and register dump code was not updated to reflect that, which is confusing ('SOFTE: 1' previously meant interrupts were soft enabled, currently it means the opposite, the general interrupt type has been disabled). Fix this by using the name irqmask, and printing it in hex. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3d3a6021 |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: lparcfg calculate PURR on demand For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs. Currently this is done by reading PURR in context switch and timer interrupt, and storing that into a per-CPU variable. These are summed to provide the value. This does not work with all timer schemes (e.g., NO_HZ_FULL), and it is sub-optimal for performance because it reads the PURR register on every context switch, although that's been difficult to distinguish from noise in the contxt_switch microbenchmark. This patch implements the sum by calling a function on each CPU, to read and add PURR values of each CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
36d632ea |
|
04-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: remove start_tb and accum_tb from thread_struct These fields are only written to. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d1c72112 |
|
23-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
powerpc: Export msr_check_and_set() to modules PR KVM will need to reuse msr_check_and_set(). This patch exports this API for reuse. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3eb0f519 |
|
17-Apr-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Ensure every siginfo we send has all bits initialized Call clear_siginfo to ensure every stack allocated siginfo is properly initialized before being passed to the signal sending functions. Note: It is not safe to depend on C initializers to initialize struct siginfo on the stack because C is allowed to skip holes when initializing a structure. The initialization of struct siginfo in tracehook_report_syscall_exit was moved from the helper user_single_step_siginfo into tracehook_report_syscall_exit itself, to make it clear that the local variable siginfo gets fully initialized. In a few cases the scope of struct siginfo has been reduced to make it clear that siginfo siginfo is not used on other paths in the function in which it is declared. Instances of using memset to initialize siginfo have been replaced with calls clear_siginfo for clarity. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
252988cb |
|
31-Mar-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: Don't write to DABR on >= Power8 if DAWR is disabled flush_thread() calls __set_breakpoint() via set_debug_reg_defaults() without checking ppc_breakpoint_available(). On Power8 or later CPUs which have the DAWR feature disabled that will cause a write to the DABR which is incorrect as those CPUs don't have a DABR. Fix it two ways, by checking ppc_breakpoint_available() in set_debug_reg_defaults(), and also by reworking __set_breakpoint() to only write to DABR on Power7 or earlier. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rework the logic in __set_breakpoint()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
404b27d6 |
|
26-Mar-2018 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add ppc_breakpoint_available() Add ppc_breakpoint_available() to determine if a breakpoint is available currently via the DAWR or DABR. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
1cdf039b |
|
25-Feb-2018 |
Mathieu Malaterre <malat@debian.org> |
powerpc/kernel: Make function __giveup_fpu() static __giveup_fpu() is never called outside process.c, so it can be static. That also means we don't need an empty definition in switch_to.h Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Also drop the empty version, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
384dfd62 |
|
28-Nov-2017 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
powerpc/kernel: Block interrupts when updating TIDR clear_thread_tidr() is called in interrupt context as a part of delayed put of the task structure (i.e as a part of timer interrupt). To prevent a deadlock, block interrupts when holding vas_thread_id_lock to set/ clear TIDR for a task. Fixes: ec233ede4c86 ("powerpc: Add support for setting SPRN_TIDR") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f71dd7dc |
|
22-Jan-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed There are so many places that build struct siginfo by hand that at least one of them is bound to get it wrong. A handful of cases in the kernel arguably did just that when using the errno field of siginfo to pass no errno values to userspace. The usage is limited to a single si_code so at least does not mess up anything else. Encapsulate this questionable pattern in a helper function so that the userspace ABI is preserved. Update all of the places that use this pattern to use the new helper function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
47355040 |
|
16-Jan-2018 |
Eric W. Biederman <ebiederm@xmission.com> |
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal_code is always TRAP_HWBKPT Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
#
06bb53b3 |
|
18-Jan-2018 |
Ram Pai <linuxram@us.ibm.com> |
powerpc: store and restore the pkey state across context switches Store and restore the AMR, IAMR and UAMOR register state of the task before scheduling out and after scheduling in, respectively. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b1db5513 |
|
11-Jan-2018 |
Christophe Lombard <clombard@linux.vnet.ibm.com> |
cxl: Add support for ASB_Notify on POWER9 The POWER9 core supports a new feature: ASB_Notify which requires the support of the Special Purpose Register: TIDR. The ASB_Notify command, generated by the AFU, will attempt to wake-up the host thread identified by the particular LPID:PID:TID. This patch assign a unique TIDR (thread id) for the current thread which will be used in the process element entry. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c2e480ba |
|
19-Dec-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/64: Add #defines for paca->soft_enabled flags Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2271db20 |
|
11-Jan-2018 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Use the TRAP macro whenever comparing a trap number Trap numbers can have extra bits at the bottom that need to be filtered out. There are a few cases where we don't do that. It's possible that we got lucky but better safe than sorry. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
182dc9c7 |
|
17-Dec-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/kernel: Print actual address of regs when oopsing When we oops or otherwise call show_regs() we print the address of the regs structure. Being able to see the address is fairly useful, firstly to verify that the regs pointer is not completely bogus, and secondly it allows you to dump the regs and surrounding memory with a debugger if you have one. In the normal case the regs will be located somewhere on the stack, so printing their location discloses no further information than printing the stack pointer does already. So switch to %px and print the actual address, not the hashed value. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7e4d4233 |
|
24-Nov-2017 |
Vaibhav Jain <vaibhav@linux.vnet.ibm.com> |
powerpc: Do not assign thread.tidr if already assigned If set_thread_tidr() is called twice for same task_struct then it will allocate a new tidr value to it leaving the previous value still dangling in the vas_thread_ida table. To fix this the patch changes set_thread_tidr() to check if a tidr value is already assigned to the task_struct and if yes then returns zero. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [mpe: Modify to return 0 in the success case, not the TID value] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
aca7573f |
|
27-Nov-2017 |
Vaibhav Jain <vaibhav@linux.vnet.ibm.com> |
powerpc: Avoid signed to unsigned conversion in set_thread_tidr() There is an unsafe signed to unsigned conversion in set_thread_tidr() that may cause an error value to be assigned to SPRN_TIDR register and used as thread-id. The issue happens as assign_thread_tidr() returns an int and thread.tidr is an unsigned-long. So a negative error code returned from assign_thread_tidr() will fail the error check and gets assigned as tidr as a large positive value. To fix this the patch assigns the return value of assign_thread_tidr() to a temporary int and assigns it to thread.tidr iff its '> 0'. The patch shouldn't impact the calling convention of set_thread_tidr() i.e all -ve return-values are error codes and a return value of '0' indicates success. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
9d2a4d71 |
|
07-Nov-2017 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
powerpc: Define set_thread_uses_vas() A CP_ABORT instruction is required in processes that have mapped a VAS "paste address" with the intention of using COPY/PASTE instructions. But since CP_ABORT is expensive, we want to restrict it to only processes that use/intend to use COPY/PASTE. Define an interface, set_thread_uses_vas(), that VAS can use to indicate that the current process opened a send window. During context switch, issue CP_ABORT only for processes that have the flag set. Thanks for input from Nick Piggin, Michael Ellerman. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [mpe: Fix to not use new_thread after _switch() returns] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
ec233ede |
|
07-Nov-2017 |
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> |
powerpc: Add support for setting SPRN_TIDR We need the SPRN_TIDR to be set for use with fast thread-wakeup (core- to-core wakeup) and also with CAPI. Each thread in a process needs to have a unique id within the process. But for now, we assign globally unique thread ids to all threads in the system. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> [mpe: Simplify tidr clearing on fork() and ctx switch code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
eb5c3f1c |
|
01-Nov-2017 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
91381b9c |
|
01-Nov-2017 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable exception Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). This patch is a minimal fix for ease of backporting. A more correct fix which removes the msr parameter to tm_reclaim() and tm_recheckpoint() altogether has been upstreamed to apply on top of this patch. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a7771176 |
|
01-Nov-2017 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Don't enable FP/Altivec if not checkpointed Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. Further compounding the issue, the transactional {fp,altivec,vsx} unavailable code has been incorrectly using the MSR to enable facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply means if the registers are live on the CPU, not if the kernel should load them before returning to userspace. This has worked due to the bug mentioned above. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4e003747 |
|
18-Oct-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU found in "server" processors, from IBM mainly. Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's annoying to have two symbols that always have the same value, it's not quite annoying enough to bother removing one. However with the arrival of Power9, we now have the situation where CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the Radix MMU - *not* the "standard" MMU. So it is now actively confusing to use it, because it implies that code is disabled or inactive when the Radix MMU is in use, however that is not necessarily true. So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor formatting updates of some of the affected lines. This will be a pain for backports, but c'est la vie. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
92fb8690 |
|
12-Oct-2017 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: P9 disable transactionally suspended sigcontexts Unfortunately userspace can construct a sigcontext which enables suspend. Thus userspace can force Linux into a path where trechkpt is executed. This patch blocks this from happening on POWER9 by sanity checking sigcontexts passed in. ptrace doesn't have this problem as only MSR SE and BE can be changed via ptrace. This patch also adds a number of WARN_ON()s in case we ever enter suspend when we shouldn't. This should not happen, but if it does the symptoms are soft lockup warnings which are not obviously TM related, so the WARN_ON()s should make it obvious what's happening. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
54820530 |
|
12-Oct-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/powernv: Enable TM without suspend if possible Some Power9 revisions can run in a mode where TM operates without suspended state. If we find ourself on a CPU that might be in this mode, we query OPAL to check, and if so we reenable TM in CPU features, and enable a new user feature to signal to userspace that we are in this mode. We do not enable the "normal" user feature, PPC_FEATURE2_HTM, but we do enable PPC_FEATURE2_HTM_NOSC because that indicates to userspace that the kernel will abort transactions on syscall entry, which is true regardless of the suspend mode. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
4ca360f3 |
|
19-Apr-2016 |
Kautuk Consul <kautuk.consul.1980@gmail.com> |
powerpc: get_wchan(): solve possible race scenario due to parallel wakeup Add a check for p->state == TASK_RUNNING so that any wake-ups on task_struct p in the interim lead to 0 being returned by get_wchan(). Signed-off-by: Kautuk Consul <kautuk.consul.1980@gmail.com> [mpe: Confirmed other architectures do similar] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a6036100 |
|
23-Aug-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/oops: Line up NIP & MSR with other rows This is purely cosmetic, but does look nicer IMHO: Before: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b3488 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 After: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81-dirty) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b34a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f6fc73fb |
|
23-Aug-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/oops: Print CR/XER on same line as MSR Somehow we missed this when the pr_cont() changes went in. Fix CR/XER to go on the same line as MSR, as they have historically, eg: MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4804408a XER: 20000000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d1d0d5ff |
|
11-Aug-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Optimise set/clear of CTRL[RUN] (runlatch) On modern CPUs the CTRL register is read-only except bit 63 which is the run latch control. This means it can be updated with a mtspr rather than mfspr/mtspr. To accomodate older CPUs (Cell at least), where there are other bits in the register, we still do a read/modify/write on pre 2.06 CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Update change log to mention 2.06 workaround] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
96c79b6b |
|
16-Aug-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Remove more redundant VSX save/tests __giveup_vsx/save_vsx are completely equivalent to testing MSR_FP and MSR_VEC and calling the corresponding giveup/save function so just remove the spurious VSX cases. Also add WARN_ONs checking that we never have VSX enabled without the two other. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
dc801081 |
|
16-Aug-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Remove redundant clear of MSR_VSX in __giveup_vsx() __giveup_fpu() already does it and we cannot have MSR_VSX set without having MSR_FP also set. This also adds a warning to check we indeed do Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
746874d3 |
|
16-Aug-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Remove redundant FP/Altivec giveup code __giveup_vsx() already calls those two functions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6a303833 |
|
16-Aug-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Fix missing newline before { Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5a69aec9 |
|
16-Aug-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
44a12806 |
|
07-Aug-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
Revert "powerpc/64: Avoid restore_math call if possible in syscall exit" This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f. As reported by Andreas, this commit is causing unrecoverable SLB misses in the system call exit path: Unrecoverable exception 4100 at c00000000000a1ec Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2 PowerMac ... CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1 task: c00000018335e080 task.stack: c000000139e50000 NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000 REGS: c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3) MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044 XER: 20000000 SOFTE: 1 GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080 GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001 GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88 GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000 GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000 GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000 GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10 NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c LR [c00000000000a118] system_call+0x58/0x6c Call Trace: [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable) Instruction dump: 64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000 60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000 This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an SLB miss on the thread struct. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Diagnosed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
07d2a628 |
|
08-Jun-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Avoid cpabort in context switch when possible The ISA v3.0B copy-paste facility only requires cpabort when switching to a process that has foreign real addresses mapped (direct access to accelerators), to clear a potential copy buffer filled by a previous thread. There is no accelerator driver implemented yet, so cpabort can be removed. It can be be re-added when a driver is implemented. POWER9 DD1 requires the copy buffer to always be cleared on context switch, but if accelerators are not in use, then an unpaired copy from a dummy region is sufficient to clear data out of the copy buffer. This increases context switch performance by about 5% on POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e4c0fc5f |
|
08-Jun-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Leave interrupts hard enabled in context switch for radix Commit 4387e9ff25 ("[POWERPC] Fix PMU + soft interrupt disable bug") hard disabled interrupts over the low level context switch, because the SLB management can't cope with a PMU interrupt accesing the stack in that window. Radix based kernel mapping does not use the SLB so it does not require interrupts hard disabled here. This is worth 1-2% in context switch performance on POWER9. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bc4f65e4 |
|
08-Jun-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Avoid restore_math call if possible in syscall exit The syscall exit code that branches to restore_math is quite heavy on Book3S, consisting of 2 mtmsr instructions. Threads that don't use both FP and vector can get caught here if the kernel ever uses FP or vector. Lazy-FP/vec context switching also trips this case. So check for lazy FP and vector before switching RI for restore_math. Move most of this case out of line. For threads that do want to restore math registers, the MSR switches are still suboptimal. Future direction may be to use a soft-RI bit to avoid MSR switches in kernel (similar to soft-EE), but for now at least the no-restore POWER9 context switch rate increases by about 5% due to sched_yield(2) return performance. I haven't constructed a test to measure the syscall cost. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7f22ced4 |
|
05-Jun-2017 |
Breno Leitao <leitao@debian.org> |
powerpc/kernel: Initialize load_tm on task creation Currently tsk->thread.load_tm is not initialized in the task creation and can contain garbage on a new task. This is an undesired behaviour, since it affects the timing to enable and disable the transactional memory laziness (disabling and enabling the MSR TM bit, which affects TM reclaim and recheckpoint in the scheduling process). Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
1195892c |
|
02-Jun-2017 |
Breno Leitao <leitao@debian.org> |
powerpc/kernel: Fix FP and vector register restoration Currently tsk->thread->load_vec and load_fp are not initialized during task creation, which can lead to garbage values in these variables (non-zero values). These variables will be checked later in restore_math() to validate if the FP and vector registers are being utilized. Since these values might be non-zero, the restore_math() will continue to save the FP and vectors even if they were never utilized by the userspace application. load_fp and load_vec counters will then overflow (they wrap at 255) and the FP and Altivec will be finally disabled, but before that condition is reached (counter overflow) several context switches will have restored FP and vector registers without need, causing a performance degradation. Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Romero <gusbromero@gmail.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f48e91e8 |
|
08-May-2017 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: Fix FP and VMX register corruption In commit dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers"), a section of code was removed that copied the current state to checkpointed state. That code should not have been removed. When an FP (Floating Point) unavailable is taken inside a transaction, we need to abort the transaction. This is because at the time of the tbegin, the FP state is bogus so the state stored in the checkpointed registers is incorrect. To fix this, we treclaim (to get the checkpointed GPRs) and then copy the thread_struct FP live state into the checkpointed state. We then trecheckpoint so that the FP state is correctly restored into the CPU. The copying of the FP registers from live to checkpointed is what was missing. This simplifies the logic slightly from the original patch. tm_reclaim_thread() will now always write the checkpointed FP state. Either the checkpointed FP state will be written as part of the actual treclaim (in tm.S), or it'll be a copy of the live state. Which one we use is based on MSR[FP] from userspace. Similarly for VMX. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: cyrilbur@gmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
68db0cf1 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
29930025 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b17b0153 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4ad8622d |
|
29-Nov-2016 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc/8xx: Implement hw_breakpoint This patch implements HW breakpoint on the 8xx. The 8xx has capability to manage HW breakpoints, which is slightly different than BOOK3S: 1/ The breakpoint match doesn't trigger a DSI exception but a dedicated data breakpoint exception. 2/ The breakpoint happens after the instruction has completed, no need to single step or emulate the instruction, 3/ Matched address is not set in DAR but in BAR, 4/ DABR register doesn't exist, instead we have registers LCTRL1, LCTRL2 and CMPx registers, 5/ The match on one comparator is not on a double word but on a single word. The patch does: 1/ Prepare the dedicated registers in call to __set_dabr(). In order to emulate the double word handling of BOOK3S, comparator E is set to DABR address value and comparator F to address + 4. Then breakpoint 1 is set to match comparator E or F, 2/ Skip the singlestepping stage when compiled for CONFIG_PPC_8xx, 3/ Implement the exception. In that exception, the matched address is taken from SPRN_BAR and manage as if it was from SPRN_DAR. 4/ I/D TLB error exception routines perform a tlbie on bad TLBs. That tlbie triggers the breakpoint exception when performed on the breakpoint address. For this reason, the routine returns if the match is from one of those two tlbie. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
|
#
f2574030 |
|
24-Jan-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Revert the initial stack protector support Unfortunately the stack protector support we merged recently only works on some toolchains. If the toolchain is built without glibc support everything works fine, but if glibc is built then it leads to a panic at boot. The solution is not rc5 material, so revert the support for now. This reverts commits: 6533b7c16ee5 ("powerpc: Initial stack protector (-fstack-protector) support") 902e06eb86cd ("powerpc/32: Change the stack protector canary value per task") Fixes: 6533b7c16ee5 ("powerpc: Initial stack protector (-fstack-protector) support") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6533b7c1 |
|
22-Nov-2016 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Initial stack protector (-fstack-protector) support Partialy copied from commit c743f38013aef ("ARM: initial stack protector (-fstack-protector) support") This is the very basic stuff without the changing canary upon task switch yet. Just the Kconfig option and a constant canary value initialized at boot time. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
29a969b7 |
|
30-Oct-2016 |
Michael Neuling <mikey@neuling.org> |
powerpc: Revert Load Monitor Register Support Load monitored is no longer supported on POWER9 so let's remove the code. This reverts commit bd3ea317fddf ("powerpc: Load Monitor Register Support"). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
2ffd04de |
|
04-Nov-2016 |
Andrew Donnellan <andrew.donnellan@au1.ibm.com> |
powerpc/oops: Fix missing pr_cont()s in instruction dump Since the KERN_CONT changes, the current code in show_instructions() prints out a whole bunch of unnecessary newlines. Change occurrences of printk("\n") to pr_cont("\n"). While we're here, change all the other cases of printk(KERN_CONT ...) to pr_cont() as well. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7dae865f |
|
03-Nov-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/oops: Fix missing pr_cont()s in show_regs() Fix up our oops output by converting continuation lines to use pr_cont(). Some of these are dubious, eg. printing a continuation line which starts with a newline, but seem to work OK for now. This whole function needs a rewrite in the next release. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
db5ba5ae |
|
02-Nov-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/oops: Fix missing pr_cont()s in print_msr_bits() et. al. Since the KERN_CONT changes these are being horribly split across lines, for example: MSR: 8000000000009033 < SF,EE ,ME,IR ,DR,RI ,LE> So fix it by using pr_cont() where appropriate. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
9a1f490f |
|
02-Nov-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/oops: Fix missing pr_cont()s in show_stack() Previously we got away with printing the stack trace in multiple pieces and it usually looked right. But since commit 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines"), KERN_CONT is now required when printing continuation lines. Use pr_cont() as appropriate. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
39715bf9 |
|
04-Oct-2016 |
Valentin Rothberg <valentinrothberg@gmail.com> |
powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state() It should be ALTIVEC, not ALIVEC. Cyril explains: If a thread performs a transaction with altivec and then gets preempted for whatever reason, this bug may cause the kernel to not re-enable altivec when that thread runs again. This will result in an altivec unavailable fault, when that fault happens inside a user transaction the kernel has no choice but to enable altivec and doom the transaction. The result is that transactions using altivec may get aborted more often than they should. The difficulty in catching this with a selftest is my deliberate use of the word may above. Optimisations to avoid FPU/altivec/VSX faults mean that the kernel will always leave them on for 255 switches. This code prevents the kernel turning it off if it got to the 256th switch (and userspace was transactional). Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5d176f75 |
|
14-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: tm: Enable transactional memory (TM) lazily for userspace Currently the MSR TM bit is always set if the hardware is TM capable. This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and TFAIR) must be swapped for each process regardless of if they use TM. For processes that don't use TM the TM MSR bit can be turned off allowing the kernel to avoid the expensive swap of the TM registers. A TM unavailable exception will occur if a thread does use TM and the kernel will enable MSR_TM and leave it so for some time afterwards. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
000ec280 |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: tm: Rename transct_(*) to ck(\1)_state Make the structures being used for checkpointed state named consistently with the pt_regs/ckpt_regs. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
dc310669 |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: tm: Always use fp_state and vr_state to store live registers There is currently an inconsistency as to how the entire CPU register state is saved and restored when a thread uses transactional memory (TM). Using transactional memory results in the CPU having duplicated (almost) all of its register state. This duplication results in a set of registers which can be considered 'live', those being currently modified by the instructions being executed and another set that is frozen at a point in time. On context switch, both sets of state have to be saved and (later) restored. These two states are often called a variety of different things. Common terms for the state which only exists after the CPU has entered a transaction (performed a TBEGIN instruction) in hardware are 'transactional' or 'speculative'. Between a TBEGIN and a TEND or TABORT (or an event that causes the hardware to abort), regardless of the use of TSUSPEND the transactional state can be referred to as the live state. The second state is often to referred to as the 'checkpointed' state and is a duplication of the live state when the TBEGIN instruction is executed. This state is kept in the hardware and will be rolled back to on transaction failure. Currently all the registers stored in pt_regs are ALWAYS the live registers, that is, when a thread has transactional registers their values are stored in pt_regs and the checkpointed state is in ckpt_regs. A strange opposite is true for fp_state/vr_state. When a thread is non transactional fp_state/vr_state holds the live registers. When a thread has initiated a transaction fp_state/vr_state holds the checkpointed state and transact_fp/transact_vr become the structure which holds the live state (at this point it is a transactional state). This method creates confusion as to where the live state is, in some circumstances it requires extra work to determine where to put the live state and prevents the use of common functions designed (probably before TM) to save the live state. With this patch pt_regs, fp_state and vr_state all represent the same thing and the other structures [pending rename] are for checkpointed state. Acked-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e909fb83 |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Never giveup a reclaimed thread when enabling kernel {fp, altivec, vsx} After a thread is reclaimed from its active or suspended transactional state the checkpointed state exists on CPU, this state (along with the live/transactional state) has been saved in its entirety by the reclaiming process. There exists a sequence of events that would cause the kernel to call one of enable_kernel_fp(), enable_kernel_altivec() or enable_kernel_vsx() after a thread has been reclaimed. These functions save away any user state on the CPU so that the kernel can use the registers. Not only is this saving away unnecessary at this point, it is actually incorrect. It causes a save of the checkpointed state to the live structures within the thread struct thus destroying the true live state for that thread. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3cee070a |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Return the new MSR from msr_check_and_set() msr_check_and_set() always performs a mfmsr() to determine if it needs to perform an mtmsr(), as mfmsr() can be a costly operation msr_check_and_set() could return the MSR now on the CPU to avoid callers of msr_check_and_set having to make their own mfmsr() call. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b0f16b46 |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Add check_if_tm_restore_required() to giveup_all() giveup_all() causes FPU/VMX/VSX facilities to be disabled in a threads MSR. If the thread performing the giveup was transactional, the kernel must record which facilities were in use before the giveup as the thread must have these facilities re-enabled on return to userspace. >From process.c: /* * This is called if we are on the way out to userspace and the * TIF_RESTORE_TM flag is set. It checks if we need to reload * FP and/or vector state and does so if necessary. * If userspace is inside a transaction (whether active or * suspended) and FP/VMX/VSX instructions have ever been enabled * inside that transaction, then we have to keep them enabled * and keep the FP/VMX/VSX state loaded while ever the transaction * continues. The reason is that if we didn't, and subsequently * got a FP/VMX/VSX unavailable interrupt inside a transaction, * we don't know whether it's the same transaction, and thus we * don't know which of the checkpointed state and the transactional * state to use. */ Calling check_if_tm_restore_required() will set TIF_RESTORE_TM and save the MSR if needed. Fixes: c208505 ("powerpc: create giveup_all()") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
dc16b553 |
|
23-Sep-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use Comment from arch/powerpc/kernel/process.c:967: If userspace is inside a transaction (whether active or suspended) and FP/VMX/VSX instructions have ever been enabled inside that transaction, then we have to keep them enabled and keep the FP/VMX/VSX state loaded while ever the transaction continues. The reason is that if we didn't, and subsequently got a FP/VMX/VSX unavailable interrupt inside a transaction, we don't know whether it's the same transaction, and thus we don't know which of the checkpointed state and the ransactional state to use. restore_math() restore_fp() and restore_altivec() currently may not restore the registers. It doesn't appear that this is more serious than a performance penalty. If the math registers aren't restored the userspace thread will still be run with the facility disabled. Userspace will not be able to read invalid values. On the first access it will take an facility unavailable exception and the kernel will detected an active transaction, at which point it will abort the transaction. There is the possibility for a pathological case preventing any progress by transactions, however, transactions are never guaranteed to make progress. Fixes: 70fe3d9 ("powerpc: Restore FPU/VEC/VSX if previously used") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
0545d543 |
|
05-Sep-2016 |
Daniel Axtens <dja@axtens.net> |
powerpc/sparse: Add more assembler prototypes Another set of things that are only called from assembler and so need prototypes to keep sparse happy. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c7a318ba |
|
09-Aug-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc/ptrace: Fix coredump since ptrace TM changes Commit 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") added flush_tmregs_to_thread() and included the assumption that it would only be called for a task which is not current. Although this is correct for ptrace, when generating a core dump, some of the routines which call flush_tmregs_to_thread() are called. This leads to a WARNing such as: Not expecting ptrace on self: TM regs may be incorrect ------------[ cut here ]------------ WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80 CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1 task: c000000fe631b600 task.stack: c000000fe63b0000 NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780 REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d) MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000 ... NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80 LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80 Call Trace: flush_tmregs_to_thread+0x74/0x80 (unreliable) vsr_get+0x64/0x1a0 elf_core_dump+0x604/0x1430 do_coredump+0x5fc/0x1200 get_signal+0x398/0x740 do_signal+0x54/0x2b0 do_notify_resume+0x98/0xb0 ret_from_except_lite+0x70/0x74 So fix flush_tmregs_to_thread() to detect the case where it is called on current, and a transaction is active, and in that case flush the TM regs to the thread_struct. This patch also moves flush_tmregs_to_thread() into ptrace.c as it is only called from that file. Fixes: 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8d460f61 |
|
27-Jul-2016 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/process: Add the function flush_tmregs_to_thread This patch creates a function flush_tmregs_to_thread which will then be used by subsequent patches in this series. The function checks for self tracing ptrace interface attempts while in the TM context and logs appropriate warning message. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b92a226e |
|
23-Jul-2016 |
Kevin Hao <haokexin@gmail.com> |
powerpc: Move cpu_has_feature() to a separate file We plan to use jump label for cpu_has_feature(). In order to implement this we need to include the linux/jump_label.h in asm/cputable.h. Unfortunately if we do that it leads to an include loop. The root of the problem seems to be that reg.h needs cputable.h (for CPU_FTRs), and then cputable.h via jump_label.h eventually pulls in hw_irq.h which needs reg.h (for MSR_EE). So move cpu_has_feature() to a separate file on its own. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rename to cpu_has_feature.h and flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8e96a87c |
|
16-Jun-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc/tm: Always reclaim in start_thread() for exec() class syscalls Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: bc2a9408fa65 ("powerpc: Hook in new transactional memory code") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bd3ea317 |
|
08-Jun-2016 |
Jack Miller <jack@codezen.org> |
powerpc: Load Monitor Register Support This enables new registers, LMRR and LMSER, that can trigger an EBB in userspace code when a monitored load (via the new ldmx instruction) loads memory from a monitored space. This facility is controlled by a new FSCR bit, LM. This patch disables the FSCR LM control bit on task init and enables that bit when a load monitor facility unavailable exception is taken for using it. On context switch, this bit is then used to determine whether the two relevant registers are saved and restored. This is done lazily for performance reasons. Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b57bd2de |
|
08-Jun-2016 |
Michael Neuling <mikey@neuling.org> |
powerpc: Improve FSCR init and context switching This fixes a few issues with FSCR init and switching. In commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") we moved the setting of the FSCR register from inside an CPU_FTR_ARCH_207S section to inside just a CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where the FSCR doesn't exist. This is harmless but we shouldn't do it. Also, we can simplify the FSCR context switch. We don't need to go through the calculation involving dscr_inherit. We can just restore what we saved last time. We also set an initial value in INIT_THREAD, so that pid 1 which is cloned from that gets a sane value. Based on patch by Jack Miller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
027dfac6 |
|
01-Jun-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Various typo fixes Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8eb98037 |
|
29-May-2016 |
Anton Blanchard <anton@samba.org> |
powerpc: Avoid load hit store in __giveup_fpu() and __giveup_altivec() In both __giveup_fpu() and __giveup_altivec() we make two modifications to tsk->thread.regs->msr. gcc decides to do a read/modify/write of each change, so we end up with a load hit store: ld r9,264(r10) rldicl r9,r9,50,1 rotldi r9,r9,14 std r9,264(r10) ... ld r9,264(r10) rldicl r9,r9,40,1 rotldi r9,r9,24 std r9,264(r10) Fix this by using a temporary. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5f56a5df |
|
20-May-2016 |
Jiri Slaby <jirislaby@kernel.org> |
exit_thread: remove empty bodies Define HAVE_EXIT_THREAD for archs which want to do something in exit_thread. For others, let's define exit_thread as an empty inline. This is a cleanup before we change the prototype of exit_thread to accept a task parameter. [akpm@linux-foundation.org: fix mips] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
caca285e |
|
29-Apr-2016 |
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> |
powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code We also use MMU_FTR_RADIX to branch out from code path specific to hash. No functionality change. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5d31a96e |
|
24-Mar-2016 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/livepatch: Add livepatch stack to struct thread_info In order to support live patching we need to maintain an alternate stack of TOC & LR values. We use the base of the stack for this, and store the "live patch stack pointer" in struct thread_info. Unlike the other fields of thread_info, we can not statically initialise that value, so it must be done at run time. This patch just adds the code to support that, it is not enabled until the next patch which actually adds live patch support. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com>
|
#
7f92bc56 |
|
05-Jan-2016 |
Daniel Axtens <dja@axtens.net> |
powerpc: sparse: Include headers for __weak symbols Sometimes when sparse warns about undefined symbols, it isn't because they should have 'static' added, it's because they're overriding __weak symbols defined elsewhere, and the header has been missed. Fix a few of them by adding appropriate headers. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
01d7c2a2 |
|
07-Mar-2016 |
Oliver O'Halloran <oohall@gmail.com> |
powerpc/process: Fix altivec SPR not being saved In save_sprs() in process.c contains the following test: if (cpu_has_feature(cpu_has_feature(CPU_FTR_ALTIVEC))) t->vrsave = mfspr(SPRN_VRSAVE); CPU feature with the mask 0x1 is CPU_FTR_COHERENT_ICACHE so the test is equivilent to: if (cpu_has_feature(CPU_FTR_ALTIVEC) && cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) On CPUs without support for both (i.e G5) this results in vrsave not being saved between context switches. The vector register save/restore code doesn't use VRSAVE to determine which registers to save/restore, but the value of VRSAVE is used to determine if altivec is being used in several code paths. Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") Cc: stable@vger.kernel.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bf6a4d5b |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Add the ability to save VSX without giving it up This patch adds the ability to be able to save the VSX registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU and VEC registers in the thread copy path to avoid a possibly pointless reload of VSX state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6f515d84 |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Add the ability to save Altivec without giving it up This patch adds the ability to be able to save the VEC registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU registers in the thread copy path to avoid a possibly pointless reload of VEC state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
8792468d |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Add the ability to save FPU without giving it up This patch adds the ability to be able to save the FPU registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch optimises the thread copy path (as a result of a fork() or clone()) so that the parent thread can return to userspace with hot registers avoiding a possibly pointless reload of FPU register state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
de2a20aa |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two This prepares for the decoupling of saving {fpu,altivec,vsx} registers and marking {fpu,altivec,vsx} as being unused by a thread. Currently giveup_{fpu,altivec,vsx}() does both however optimisations to task switching can be made if these two operations are decoupled. save_all() will permit the saving of registers to thread structs and leave threads MSR with bits enabled. This patch introduces no functional change. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
70fe3d98 |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Restore FPU/VEC/VSX if previously used Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a problem unless a process is using these facilities. Modern versions of GCC are very good at automatically vectorising code, new and modernised workloads make use of floating point and vector facilities, even the kernel makes use of vectorised memcpy. All this combined greatly increases the cost of a syscall since the kernel uses the facilities sometimes even in syscall fast-path making it increasingly common for a thread to take an *_unavailable exception soon after a syscall, not to mention potentially taking all three. The obvious overcompensation to this problem is to simply always load all the facilities on every exit to userspace. Loading up all FPU, VEC and VSX registers every time can be expensive and if a workload does avoid using them, it should not be forced to incur this penalty. An 8bit counter is used to detect if the registers have been used in the past and the registers are always loaded until the value wraps to back to zero. Several versions of the assembly in entry_64.S were tested: 1. Always calling C. 2. Performing a common case check and then calling C. 3. A complex check in asm. After some benchmarking it was determined that avoiding C in the common case is a performance benefit (option 2). The full check in asm (option 3) greatly complicated that codepath for a negligible performance gain and the trade-off was deemed not worth it. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Move load_vec in the struct to fill an existing hole, reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> fixup
|
#
d272f667 |
|
28-Feb-2016 |
Cyril Bur <cyrilbur@gmail.com> |
powerpc: Explicitly disable math features when copying thread Currently when threads get scheduled off they always giveup the FPU, Altivec (VMX) and Vector (VSX) units if they were using them. When they are scheduled back on a fault is then taken to enable each facility and load registers. As a result explicitly disabling FPU/VMX/VSX has not been necessary. Future changes and optimisations remove this mandatory giveup and fault which could cause calls such as clone() and fork() to copy threads and run them later with FPU/VMX/VSX enabled but no registers loaded. This patch starts the process of having MSR_{FP,VEC,VSX} mean that a threads registers are hot while not having MSR_{FP,VEC,VSX} means that the registers must be loaded. This allows for a smarter return to userspace. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
5ef11c35 |
|
26-Feb-2016 |
Daniel Cashman <dcashman@android.com> |
mm: ASLR: use get_random_long() Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
801c0b2c |
|
19-Nov-2015 |
Michael Neuling <mikey@neuling.org> |
powerpc: Print MSR TM bits in oops messages Print MSR TM bits in oops messages. This appends them to the end like this: MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]> You get the TM[] only if at least one TM MSR bit is set. Inside the TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T means Transactional (bit 34) If no bits are set, you get no TM[] output. Include rework of printbits() to handle this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
db1231dc |
|
09-Dec-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix DSCR inheritance over fork() Two DSCR tests have a hack in them: /* * XXX: Force a context switch out so that DSCR * current value is copied into the thread struct * which is required for the child to inherit the * changed value. */ sleep(1); We should not be working around this in the testcase, it is a kernel bug. Fix it by copying the current DSCR to the child, instead of what we had in the thread struct at last context switch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
20dbe670 |
|
10-Dec-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Call restore_sprs() before _switch() commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") moved the restore of SPRs after the call to _switch(). There is an issue with this approach - new tasks do not return through _switch(), they are set up by copy_thread() to directly return through ret_from_fork() or ret_from_kernel_thread(). This means restore_sprs() is not getting called for new tasks. Fix this by moving restore_sprs() before _switch(). Fixes: 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d64d02ce |
|
10-Dec-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Call check_if_tm_restore_required() in enable_kernel_*() Commit a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()") removed a call to check_if_tm_restore_required() in the enable_kernel_*() functions. Add them back in. Fixes: a0e72cf12b1a ("powerpc: Create msr_check_and_{set,clear}()") Reported-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
d1e1cf2e |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: clean up asm/switch_to.h Remove a bunch of unnecessary fallback functions and group things in a more logical way. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
f3d885cc |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Rearrange __switch_to() Most of __switch_to() is housekeeping, TLB batching, timekeeping etc. Move these away from the more complex and critical context switching code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
579e633e |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: create flush_all_to_thread() Create a single function that flushes everything (FP, VMX, VSX, SPE). Doing this all at once means we only do one MSR write. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
c2085059 |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: create giveup_all() Create a single function that gives everything up (FP, VMX, VSX, SPE). Doing this all at once means we only do one MSR write. A context switch microbenchmark using yield(): http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield --fp --altivec --vector 0 0 shows an improvement of 3% on POWER8. Signed-off-by: Anton Blanchard <anton@samba.org> [mpe: giveup_all() needs to be EXPORT_SYMBOL'ed] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
1f2e25b2 |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Remove fp_enable() and vec_enable(), use msr_check_and_{set, clear}() More consolidation of our MSR available bit handling. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
3eb5d588 |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Add ppc_strict_facility_enable boot option Add a boot option that strictly manages the MSR unavailable bits. This catches kernel uses of FP/Altivec/SPE that would otherwise corrupt user state. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a0e72cf1 |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Create msr_check_and_{set,clear}() Create helper functions to set and clear MSR bits after first checking if they are already set. Grouping them will make it easy to avoid the MSR writes in a subsequent optimisation. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
a7d623d4 |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Move part of giveup_vsx into c Move the MSR modification into c. Removing it from the assembly function will allow us to avoid costly MSR writes by batching them up. Check the FP and VMX bits before calling the relevant giveup_*() function. This makes giveup_vsx() and flush_vsx_to_thread() perform more like their sister functions, and allows us to use flush_vsx_to_thread() in the signal code. Move the check_if_tm_restore_required() check in. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
98da581e |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Move part of giveup_fpu,altivec,spe into c Move the MSR modification into new c functions. Removing it from the low level functions will allow us to avoid costly MSR writes by batching them up. Move the check_if_tm_restore_required() check into these new functions. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
611b0e5c |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Create mtmsrd_isync() mtmsrd_isync() will do an mtmsrd followed by an isync on older processors. On newer processors we avoid the isync via a feature fixup. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
b86fd2bd |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Simplify TM restore checks Instead of having multiple giveup_*_maybe_transactional() functions, separate out the TM check into a new function called check_if_tm_restore_required(). This will make it easier to optimise the giveup_*() functions in a subsequent patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
af1bbc3d |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Remove UP only lazy floating point and vector optimisations The UP only lazy floating point and vector optimisations were written back when SMP was not common, and neither glibc nor gcc used vector instructions. Now SMP is very common, glibc aggressively uses vector instructions and gcc autovectorises. We want to add new optimisations that apply to both UP and SMP, but in preparation for that remove these UP only optimisations. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
152d523e |
|
28-Oct-2015 |
Anton Blanchard <anton@samba.org> |
powerpc: Create context switch helpers save_sprs() and restore_sprs() Move all our context switch SPR save and restore code into two helpers. We do a few optimisations: - Group all mfsprs and all mtsprs. In many cases an mtspr sets a scoreboarding bit that an mfspr waits on, so the current practise of mfspr A; mtspr A; mfpsr B; mtspr B is the worst scheduling we can do. - SPR writes are slow, so check that the value is changing before writing it. A context switch microbenchmark using yield(): http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield 0 0 shows an improvement of almost 10% on POWER8. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7f821fc9 |
|
18-Nov-2015 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: Check for already reclaimed tasks Currently we can hit a scenario where we'll tm_reclaim() twice. This results in a TM bad thing exception because the second reclaim occurs when not in suspend mode. The scenario in which this can happen is the following. We attempt to deliver a signal to userspace. To do this we need obtain the stack pointer to write the signal context. To get this stack pointer we must tm_reclaim() in case we need to use the checkpointed stack pointer (see get_tm_stackpointer()). Normally we'd then return directly to userspace to deliver the signal without going through __switch_to(). Unfortunatley, if at this point we get an error (such as a bad userspace stack pointer), we need to exit the process. The exit will result in a __switch_to(). __switch_to() will attempt to save the process state which results in another tm_reclaim(). This tm_reclaim() now causes a TM Bad Thing exception as this state has already been saved and the processor is no longer in TM suspend mode. Whee! This patch checks the state of the MSR to ensure we are TM suspended before we attempt the tm_reclaim(). If we've already saved the state away, we should no longer be in TM suspend mode. This has the additional advantage of checking for a potential TM Bad Thing exception. Found using syscall fuzzer. Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
829023df |
|
06-Jul-2015 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/tm: Drop tm_orig_msr from thread_struct Currently tm_orig_msr is getting used during process context switch only. Then there is ckpt_regs which saves the checkpointed userspace context The MSR slot contained in ckpt_regs structure can be used during process context switch instead of tm_orig_msr, thus allowing us to drop it from thread_struct structure. This patch does that change. Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
72cd7b44 |
|
13-Jul-2015 |
Leonidas Da Silva Barbosa <leosilva@linux.vnet.ibm.com> |
powerpc: Uncomment and make enable_kernel_vsx() routine available enable_kernel_vsx() function was commented since anything was using it. However, vmx-crypto driver uses VSX instructions which are only available if VSX is enable. Otherwise it rises an exception oops. This patch uncomment enable_kernel_vsx() routine and makes it available. Signed-off-by: Leonidas S. Barbosa <leosilva@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
280e1099 |
|
20-May-2015 |
Anshuman Khandual <khandual@linux.vnet.ibm.com> |
powerpc/kernel: Remove the unused extern dscr_default The process context switch code no longer uses dscr_default variable from the sysfs.c file. The variable became unused when we started storing the CPU specific DSCR value in the PACA structure instead. This patch just removes this extern declaration. It was originally added by the following commit. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
6eca8933 |
|
13-Mar-2015 |
Alex Dowad <alexinbeijing@gmail.com> |
powerpc/kernel: Rename copy_thread() 'arg' argument to 'kthread_arg' The 'arg' argument to copy_thread() is only ever used when forking a new kernel thread. Hence, rename it to 'kthread_arg' for clarity. Signed-off-by: Alex Dowad <alexinbeijing@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
59994fb0 |
|
14-Nov-2014 |
Vineeth Vijayan <vvijayan@mvista.com> |
powerpc: Use generic PIE randomization Back in 2009 we merged 501cb16d3cfd "Randomise PIEs", which added support for randomizing PIE (Position Independent Executable) binaries. That commit added randomize_et_dyn(), which correctly randomized the addresses, but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE randomization via the personality flag, or /proc/sys/kernel/randomize_va_space. Since then there has been generic support for PIE randomization added to binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE. Enabling that allows us to drop randomize_et_dyn(), which means we start honoring PF_RANDOMIZE correctly. It also causes a fairly major change to how we layout PIE binaries. Currently we will place the binary at 512MB-520MB for 32 bit binaries, or 512MB-1.5GB for 64 bit binaries, eg: $ cat /proc/$$/maps 4e550000-4e580000 r-xp 00000000 08:02 129813 /bin/dash 4e580000-4e590000 rw-p 00020000 08:02 129813 /bin/dash 10014110000-10014140000 rw-p 00000000 00:00 0 [heap] 3fffaa3f0000-3fffaa5a0000 r-xp 00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaa5a0000-3fffaa5b0000 rw-p 001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaa5c0000-3fffaa5d0000 rw-p 00000000 00:00 0 3fffaa5d0000-3fffaa5f0000 r-xp 00000000 00:00 0 [vdso] 3fffaa5f0000-3fffaa620000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fffaa620000-3fffaa630000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3ffffc340000-3ffffc370000 rw-p 00000000 00:00 0 [stack] With this commit applied we don't do any special randomisation for the binary, and instead rely on mmap randomisation. This means the binary ends up at high addresses, eg: $ cat /proc/$$/maps 3fff99820000-3fff999d0000 r-xp 00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fff999d0000-3fff999e0000 rw-p 001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fff999f0000-3fff99a00000 rw-p 00000000 00:00 0 3fff99a00000-3fff99a20000 r-xp 00000000 00:00 0 [vdso] 3fff99a20000-3fff99a50000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fff99a50000-3fff99a60000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fff99a60000-3fff99a90000 r-xp 00000000 08:02 129813 /bin/dash 3fff99a90000-3fff99aa0000 rw-p 00020000 08:02 129813 /bin/dash 3fffc3de0000-3fffc3e10000 rw-p 00000000 00:00 0 [stack] 3fffc55e0000-3fffc5610000 rw-p 00000000 00:00 0 [heap] Although this should be OK, it's possible it might break badly written binaries that make assumptions about the address space layout. Signed-off-by: Vineeth Vijayan <vvijayan@mvista.com> [mpe: Rewrite changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7d56c65a |
|
17-Sep-2014 |
Anton Blanchard <anton@samba.org> |
powerpc/ftrace: Remove mod_return_to_handler mod_return_to_handler is the same as return_to_handler, except it handles the change of the TOC (r2). Add this into return_to_handler and remove mod_return_to_handler. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
7b051f66 |
|
13-Oct-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Use probe_kernel_address in show_instructions We really don't want to take a pagefault in show_instructions, so use probe_kernel_address instead of __get_user. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
69111bac |
|
21-Oct-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
acf620ec |
|
13-Oct-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Rename __get_SP() to current_stack_pointer() Michael points out that __get_SP() is a pretty horrible function name. Let's give it a better name. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
bfe9a2cf |
|
13-Oct-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Reimplement __get_SP() as a function not a define Li Zhong points out an issue with our current __get_SP() implementation. If ftrace function tracing is enabled (ie -pg profiling using _mcount) we spill a stack frame on 64bit all the time. If a function calls __get_SP() and later calls a function that is tail call optimised, we will pop the stack frame and the value returned by __get_SP() is no longer valid. An example from Li can be found in save_stack_trace -> save_context_stack: c0000000000432c0 <.save_stack_trace>: c0000000000432c0: mflr r0 c0000000000432c4: std r0,16(r1) c0000000000432c8: stdu r1,-128(r1) <-- stack frame for _mcount c0000000000432cc: std r3,112(r1) c0000000000432d0: bl <._mcount> c0000000000432d4: nop c0000000000432d8: mr r4,r1 <-- __get_SP() c0000000000432dc: ld r5,632(r13) c0000000000432e0: ld r3,112(r1) c0000000000432e4: li r6,1 c0000000000432e8: addi r1,r1,128 <-- pop stack frame c0000000000432ec: ld r0,16(r1) c0000000000432f0: mtlr r0 c0000000000432f4: b <.save_context_stack> <-- tail call optimized save_context_stack ends up with a stack pointer below the current one, and it is likely to be scribbled over. Fix this by making __get_SP() a function which returns the callers stack frame. Also replace inline assembly which grabs the stack pointer in save_stack_trace and show_stack with __get_SP(). This also fixes an issue with perf_arch_fetch_caller_regs(). It currently unwinds the stack once, which will skip a valid stack frame on a leaf function. With the __get_SP() fixes in this patch, we never need to unwind the stack frame to get to the first interesting frame. We have to export __get_SP() because perf_arch_fetch_caller_regs() (which is used in modules) calls it from a header file. Reported-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
e1802b06 |
|
19-Aug-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Move more symbol exports next to function definitions Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
#
23f66e2d |
|
27-Aug-2014 |
Tejun Heo <tj@kernel.org> |
Revert "powerpc: Replace __get_cpu_var uses" This reverts commit 5828f666c069af74e00db21559f1535103c9f79a due to build failure after merging with pending powerpc changes. Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.au Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
5828f666 |
|
16-Aug-2014 |
Christoph Lameter <cl@linux.com> |
powerpc: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) tj: Folded a fix patch. http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
9be9be2e |
|
12-Jun-2014 |
Paul Mackerras <paulus@samba.org> |
powerpc: Reduce scariness of interrupt frames in stack traces Some people see things like "Exception: 501" in stack traces in dmesg and assume that means that something has gone badly wrong, when in fact "Exception: 501" just means a device interrupt was taken. This changes "Exception" to "interrupt" to make it clearer that we are just recording the fact of a change in control flow rather than some error condition. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
cec15488 |
|
09-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Pull out ksp_vsid logic into a helper The previous patch left a bit of a wart in copy_process(). Clean it up a bit by moving the logic out into a helper. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
13b3d13b |
|
09-Jul-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Remove MMU_FTR_SLB We now only support cpus that use an SLB, so we don't need an MMU feature to indicate that. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
96d01610 |
|
05-Jun-2014 |
Sam bobroff <sam.bobroff@au1.ibm.com> |
powerpc: Correct DSCR during TM context switch Correct the DSCR SPR becoming temporarily corrupted if a task is context switched during a transaction. The problem occurs while suspending the task and is caused by saving the DSCR to thread.dscr after it has already been set to the CPU's default value: __switch_to() calls __switch_to_tm() which calls tm_reclaim_task() which calls tm_reclaim_thread() which calls tm_reclaim() where the DSCR is set to the CPU's default __switch_to() calls _switch() where thread.dscr is set to the DSCR When the task is resumed, it's transaction will be doomed (as usual) and the DSCR SPR will be corrupted, although the checkpointed value will be correct. Therefore the DSCR will be immediately corrected by the transaction aborting, unless it has been suspended. In that case the incorrect value can be seen by the task until it resumes the transaction. The fix is to treat the DSCR similarly to the TAR and save it early in __switch_to(). A program exposing the problem is added to the kernel self tests as: tools/testing/selftests/powerpc/tm/tm-resched-dscr. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> CC: <stable@vger.kernel.org> [v3.10+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
21f58507 |
|
29-Apr-2014 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Fix smp_processor_id() in preemptible splat in set_breakpoint Currently, on 8641D, which doesn't set CONFIG_HAVE_HW_BREAKPOINT we get the following splat: BUG: using smp_processor_id() in preemptible [00000000] code: login/1382 caller is set_breakpoint+0x1c/0xa0 CPU: 0 PID: 1382 Comm: login Not tainted 3.15.0-rc3-00041-g2aafe1a4d451 #1 Call Trace: [decd5d80] [c0008dc4] show_stack+0x50/0x158 (unreliable) [decd5dc0] [c03c6fa0] dump_stack+0x7c/0xdc [decd5de0] [c01f8818] check_preemption_disabled+0xf4/0x104 [decd5e00] [c00086b8] set_breakpoint+0x1c/0xa0 [decd5e10] [c00d4530] flush_old_exec+0x2bc/0x588 [decd5e40] [c011c468] load_elf_binary+0x2ac/0x1164 [decd5ec0] [c00d35f8] search_binary_handler+0xc4/0x1f8 [decd5ef0] [c00d4ee8] do_execve+0x3d8/0x4b8 [decd5f40] [c001185c] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xfeee554 LR = 0xfeee7d4 The call path in this case is: flush_thread --> set_debug_reg_defaults --> set_breakpoint --> __get_cpu_var Since preemption is enabled in the cleanup of flush thread, and there is no need to disable it, introduce the distinction between set_breakpoint and __set_breakpoint, leaving only the flush_thread instance as the current user of set_breakpoint. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
04c32a51 |
|
29-Apr-2014 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Drop return value from set_breakpoint as it is unused None of the callers check the return value, so it might as well not have one at all. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7cedd601 |
|
03-Feb-2014 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix kernel thread creation on ABIv2 Change how we setup registers for ret_from_kernel_thread. In ABIv1, instead of passing a function descriptor in, dereference it and pass the target in directly. Use ppc_global_function_entry to get it right on both ABIv1 and ABIv2. Signed-off-by: Anton Blanchard <anton@samba.org>
|
#
e6b8fd02 |
|
04-Apr-2014 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: Disable IRQ in tm_recheckpoint We can't take an IRQ when we're about to do a trechkpt as our GPR state is set to user GPR values. We've hit this when running some IBM Java stress tests in the lab resulting in the following dump: cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40] pc: c000000000050074: restore_gprs+0xc0/0x148 lr: 00000000b52a8184 sp: ac57d360 msr: 8000000100201030 current = 0xc00000002c500000 paca = 0xc000000007dbfc00 softe: 0 irq_happened: 0x00 pid = 34535, comm = Pooled Thread # R00 = 00000000b52a8184 R16 = 00000000b3e48fda R01 = 00000000ac57d360 R17 = 00000000ade79bd8 R02 = 00000000ac586930 R18 = 000000000fac9bcc R03 = 00000000ade60000 R19 = 00000000ac57f930 R04 = 00000000f6624918 R20 = 00000000ade79be8 R05 = 00000000f663f238 R21 = 00000000ac218a54 R06 = 0000000000000002 R22 = 000000000f956280 R07 = 0000000000000008 R23 = 000000000000007e R08 = 000000000000000a R24 = 000000000000000c R09 = 00000000b6e69160 R25 = 00000000b424cf00 R10 = 0000000000000181 R26 = 00000000f66256d4 R11 = 000000000f365ec0 R27 = 00000000b6fdcdd0 R12 = 00000000f66400f0 R28 = 0000000000000001 R13 = 00000000ada71900 R29 = 00000000ade5a300 R14 = 00000000ac2185a8 R30 = 00000000f663f238 R15 = 0000000000000004 R31 = 00000000f6624918 pc = c000000000050074 restore_gprs+0xc0/0x148 cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4 lr = 00000000b52a8184 msr = 8000000100201030 cr = 24804888 ctr = 0000000000000000 xer = 0000000000000000 trap = 700 This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into that function. It then adds IRQ disabling over the trechkpt critical section. It also sets the TEXASR FS in the signals code to ensure this is never set now that we explictly write the TM sprs in tm_recheckpoint. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
621b5060 |
|
02-Mar-2014 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: Fix crash when forking inside a transaction When we fork/clone we currently don't copy any of the TM state to the new thread. This results in a TM bad thing (program check) when the new process is switched in as the kernel does a tmrechkpt with TEXASR FS not set. Also, since R1 is from userspace, we trigger the bad kernel stack pointer detection. So we end up with something like this: Bad kernel stack pointer 0 at c0000000000404fc cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40] pc: c0000000000404fc: restore_gprs+0xc0/0x148 lr: 0000000000000000 sp: 0 msr: 9000000100201030 current = 0xc000001dd1417c30 paca = 0xc00000000fe00800 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/2 WARNING: exception is not recoverable, can't continue The below fixes this by flushing the TM state before we copy the task_struct to the clone. To do this we go through the tmreclaim patch, which removes the checkpointed registers from the CPU and transitions the CPU out of TM suspend mode. Hence we need to call tmrechkpt after to restore the checkpointed state and the TM mode for the current task. To make this fail from userspace is simply: tbegin li r0, 2 sc <boom> Kudos to Adhemerval Zanella Neto for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: Adhemerval Zanella Neto <azanella@br.ibm.com> cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1c430c06 |
|
21-Jan-2014 |
Andreas Schwab <schwab@linux-m68k.org> |
powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations This fixes a logic error that caused a failure to update the hw breakpoint registers when not using the hw-breakpoint interface. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d31626f7 |
|
12-Jan-2014 |
Paul Mackerras <paulus@samba.org> |
powerpc: Don't corrupt transactional state when using FP/VMX in kernel Currently, when we have a process using the transactional memory facilities on POWER8 (that is, the processor is in transactional or suspended state), and the process enters the kernel and the kernel then uses the floating-point or vector (VMX/Altivec) facility, we end up corrupting the user-visible FP/VMX/VSX state. This happens, for example, if a page fault causes a copy-on-write operation, because the copy_page function will use VMX to do the copy on POWER8. The test program below demonstrates the bug. The bug happens because when FP/VMX state for a transactional process is stored in the thread_struct, we store the checkpointed state in .fp_state/.vr_state and the transactional (current) state in .transact_fp/.transact_vr. However, when the kernel wants to use FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(), which saves the current state in .fp_state/.vr_state. Furthermore, when we return to the user process we return with FP/VMX/VSX disabled. The next time the process uses FP/VMX/VSX, we don't know which set of state (the current register values, .fp_state/.vr_state, or .transact_fp/.transact_vr) we should be using, since we have no way to tell if we are still in the same transaction, and if not, whether the previous transaction succeeded or failed. Thus it is necessary to strictly adhere to the rule that if FP has been enabled at any point in a transaction, we must keep FP enabled for the user process with the current transactional state in the FP registers, until we detect that it is no longer in a transaction. Similarly for VMX; once enabled it must stay enabled until the process is no longer transactional. In order to keep this rule, we add a new thread_info flag which we test when returning from the kernel to userspace, called TIF_RESTORE_TM. This flag indicates that there is FP/VMX/VSX state to be restored before entering userspace, and when it is set the .tm_orig_msr field in the thread_struct indicates what state needs to be restored. The restoration is done by restore_tm_state(). The TIF_RESTORE_TM bit is set by new giveup_fpu/altivec_maybe_transactional helpers, which are called from enable_kernel_fp/altivec, giveup_vsx, and flush_fp/altivec_to_thread instead of giveup_fpu/altivec. The other thing to be done is to get the transactional FP/VMX/VSX state from .fp_state/.vr_state when doing reclaim, if that state has been saved there by giveup_fpu/altivec_maybe_transactional. Having done this, we set the FP/VMX bit in the thread's MSR after reclaim to indicate that that part of the state is now valid (having been reclaimed from the processor's checkpointed state). Finally, in the signal handling code, we move the clearing of the transactional state bits in the thread's MSR a bit earlier, before calling flush_fp_to_thread(), so that we don't unnecessarily set the TIF_RESTORE_TM bit. This is the test program: /* Michael Neuling 4/12/2013 * * See if the altivec state is leaked out of an aborted transaction due to * kernel vmx copy loops. * * gcc -m64 htm_vmxcopy.c -o htm_vmxcopy * */ /* We don't use all of these, but for reference: */ int main(int argc, char *argv[]) { long double vecin = 1.3; long double vecout; unsigned long pgsize = getpagesize(); int i; int fd; int size = pgsize*16; char tmpfile[] = "/tmp/page_faultXXXXXX"; char buf[pgsize]; char *a; uint64_t aborted = 0; fd = mkstemp(tmpfile); assert(fd >= 0); memset(buf, 0, pgsize); for (i = 0; i < size; i += pgsize) assert(write(fd, buf, pgsize) == pgsize); unlink(tmpfile); a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); assert(a != MAP_FAILED); asm __volatile__( "lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value TBEGIN "beq 3f ;" TSUSPEND "xxlxor 40,40,40 ; " // set 40 to 0 "std 5, 0(%[map]) ;" // cause kernel vmx copy page TABORT TRESUME TEND "li %[res], 0 ;" "b 5f ;" "3: ;" // Abort handler "li %[res], 1 ;" "5: ;" "stxvd2x 40,0,%[vecoutptr] ; " : [res]"=r"(aborted) : [vecinptr]"r"(&vecin), [vecoutptr]"r"(&vecout), [map]"r"(a) : "memory", "r0", "r3", "r4", "r5", "r6", "r7"); if (aborted && (vecin != vecout)){ printf("FAILED: vector state leaked on abort %f != %f\n", (double)vecin, (double)vecout); exit(1); } munmap(a, size); close(fd); printf("PASSED!\n"); return 0; } Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c141611f |
|
08-Jan-2014 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: Delete non-required instances of include <linux/init.h> None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
640e9225 |
|
10-Dec-2013 |
Joseph Myers <joseph@codesourcery.com> |
powerpc: fix exception clearing in e500 SPE float emulation The e500 SPE floating-point emulation code clears existing exceptions (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the emulated operation. However, these exception bits are the "sticky", cumulative exception bits, and should only be cleared by the user program setting SPEFSCR, not implicitly by any floating-point instruction (whether executed purely by the hardware or emulated). The spurious clearing of these bits shows up as missing exceptions in glibc testing. Fixing this, however, is not as simple as just not clearing the bits, because while the bits may be from previous floating-point operations (in which case they should not be cleared), the processor can also set the sticky bits itself before the interrupt for an exception occurs, and this can happen in cases when IEEE 754 semantics are that the sticky bit should not be set. Specifically, the "invalid" sticky bit is set in various cases with non-finite operands, where IEEE 754 semantics do not involve raising such an exception, and the "underflow" sticky bit is set in cases of exact underflow, whereas IEEE 754 semantics are that this flag is set only for inexact underflow. Thus, for correct emulation the kernel needs to know the setting of these two sticky bits before the instruction being emulated. When a floating-point operation raises an exception, the kernel can note the state of the sticky bits immediately afterwards. Some <fenv.h> functions that affect the state of these bits, such as fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and PR_SET_FPEXC anyway, and so it is natural to record the state of those bits during that call into the kernel and so avoid any need for a separate call into the kernel to inform it of a change to those bits. Thus, the interface I chose to use (in this patch and the glibc port) is that one of those prctl calls must be made after any userspace change to those sticky bits, other than through a floating-point operation that traps into the kernel anyway. feclearexcept and fesetexceptflag duly make those calls, which would not be required were it not for this issue. The previous EGLIBC port, and the uClibc code copied from it, is fundamentally broken as regards any use of prctl for floating-point exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl calls (and did various worse things, such as passing a pointer when prctl expected an integer). If you avoid anything where prctl is used, the clearing of sticky bits still means it will never give anything approximating correct exception semantics with existing kernels. I don't believe the patch makes things any worse for existing code that doesn't try to inform the kernel of changes to sticky bits - such code may get incorrect exceptions in some cases, but it would have done so anyway in other cases. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
f5f97210 |
|
22-Nov-2013 |
Scott Wood <scottwood@freescale.com> |
powerpc/kvm/booke: Fix build break due to stack frame size warning Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
6d888d1a |
|
17-Nov-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Only print PACATMSCRATCH in oops when TM is active If TM is not active there is no need to print PACATMSCRATCH so we can save ourselves a line. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9db8bcfd |
|
14-Nov-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Remove a few lines of oops output We waste quite a few lines in our oops output: ... MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28044024 XER: 00000000 SOFTE: 0 CFAR: 0000000000009088 DAR: 000000000000001c, DSISR: 40000000 GPR00: c0000000000c74f0 c00000037cc1b010 c000000000d2bb30 0000000000000000 ... We can do a better job here and remove 3 lines: MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28044024 XER: 00000000 CFAR: 0000000000009088 DAR: 0000000000000010, DSISR: 40000000 SOFTE: 1 GPR00: c0000000000e3d10 c00000037cc2fda0 c000000000d2c3a8 0000000000000001 Also move PACATMSCRATCH up, it doesn't really belong in the stack trace section. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c5400649 |
|
14-Nov-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Print DAR and DSISR on machine check oopses Machine check exceptions set DAR and DSISR, so print them in our oops output. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
94af3abf |
|
20-Nov-2013 |
Rusty Russell <rusty@rustcorp.com.au> |
powerpc: ELF2 binaries launched directly. No function descriptor, but we set r12 up and set TIF_RESTOREALL as it normally isn't restored on return from syscall. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7ba5fef7 |
|
02-Oct-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc/tm: Remove interrupt disable in __switch_to() We currently turn IRQs off in __switch_to(0 but this is unnecessary as it's already disabled in the caller. This removes the IRQ disable but adds a check to make sure it is really off in case this changes in future. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
3743c9b8 |
|
03-Jul-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: export debug registers save function for KVM KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
51ae8d4a |
|
04-Jul-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: move debug registers in a structure This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Michael Neuling <mikey@neuling.org> [scottwood@freescale.com: removed obvious debug_reg comment] Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
660970fe |
|
04-Jul-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: remove unnecessary line continuations Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Scott Wood <scottwood@freescale.com>
|
#
fc82cf11 |
|
03-Jul-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: export debug registers save function for KVM KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
95791988 |
|
25-Jun-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: move debug registers in a structure This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
6b3e3b31 |
|
25-Jun-2013 |
Bharat Bhushan <r65777@freescale.com> |
powerpc: remove unnecessary line continuations Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
18461960 |
|
10-Sep-2013 |
Paul Mackerras <paulus@samba.org> |
powerpc: Provide for giveup_fpu/altivec to save state in alternate location This provides a facility which is intended for use by KVM, where the contents of the FP/VSX and VMX (Altivec) registers can be saved away to somewhere other than the thread_struct when kernel code wants to use floating point or VMX instructions. This is done by providing a pointer in the thread_struct to indicate where the state should be saved to. The giveup_fpu() and giveup_altivec() functions test these pointers and save state to the indicated location if they are non-NULL. Note that the MSR_FP/VEC bits in task->thread.regs->msr are still used to indicate whether the CPU register state is live, even when an alternate save location is being used. This also provides load_fp_state() and load_vr_state() functions, which load up FP/VSX and VMX state from memory into the CPU registers, and corresponding store_fp_state() and store_vr_state() functions, which store FP/VSX and VMX state into memory from the CPU registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
de79f7b9 |
|
10-Sep-2013 |
Paul Mackerras <paulus@samba.org> |
powerpc: Put FP/VSX and VR state into structures This creates new 'thread_fp_state' and 'thread_vr_state' structures to store FP/VSX state (including FPSCR) and Altivec/VSX state (including VSCR), and uses them in the thread_struct. In the thread_fp_state, the FPRs and VSRs are represented as u64 rather than double, since we rarely perform floating-point computations on the values, and this will enable the structures to be used in KVM code as well. Similarly FPSCR is now a u64 rather than a structure of two 32-bit values. This takes the offsets out of the macros such as SAVE_32FPRS, REST_32FPRS, etc. This enables the same macros to be used for normal and transactional state, enabling us to delete the transactional versions of the macros. This also removes the unused do_load_up_fpu and do_load_up_altivec, which were in fact buggy since they didn't create large enough stack frames to account for the fact that load_up_fpu and load_up_altivec are not designed to be called from C and assume that their caller's stack frame is an interrupt frame. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
cbc9565e |
|
23-Sep-2013 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Remove ksp_limit on ppc64 We've been keeping that field in thread_struct for a while, it contains the "limit" of the current stack pointer and is meant to be used for detecting stack overflows. It has a few problems however: - First, it was never actually *used* on 64-bit. Set and updated but not actually exploited - When switching stack to/from irq and softirq stacks, it's update is racy unless we hard disable interrupts, which is costly. This is fine on 32-bit as we don't soft-disable there but not on 64-bit. Thus rather than fixing 2 in order to implement 1 in some hypothetical future, let's remove the code completely from 64-bit. In order to avoid a clutter of ifdef's, we remove the updates from C code completely during interrupt stack switching, and instead maintain it from the asm helper that is used to do the stack switching in the first place. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
037f0eed |
|
14-Jul-2013 |
Kevin Hao <haokexin@gmail.com> |
powerpc: Make flush_fp_to_thread() nop when CONFIG_PPC_FPU is disabled In the current kernel, the function flush_fp_to_thread() is not dependent on CONFIG_PPC_FPU. So most invocations of this function is not wrapped by CONFIG_PPC_FPU. Even through we don't really save the FPRs to the thread struct if CONFIG_PPC_FPU is not enabled, but there does have some runtime overhead such as the check for tsk->thread.regs and preempt disable and enable. It really make no sense to do that. So make it a nop when CONFIG_PPC_FPU is disabled. Also remove the wrapped #ifdef CONFIG_PPC_FPU when invoking this function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c2d52644 |
|
09-Aug-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Save the TAR register earlier This moves us to save the Target Address Register (TAR) a earlier in __switch_to. It introduces a new function save_tar() to do this. We need to save the TAR earlier as we will overwrite it in the transactional memory reclaim/recheckpoint path. We are going to do this in a subsequent patch which will fix saving the TAR register when it's modified inside a transaction. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
330a1eb7 |
|
28-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc/perf: Core EBB support for 64-bit book3s Add support for EBB (Event Based Branches) on 64-bit book3s. See the included documentation for more details. EBBs are a feature which allows the hardware to branch directly to a specified user space address when a PMU event overflows. This can be used by programs for self-monitoring with no kernel involvement in the inner loop. Most of the logic is in the generic book3s code, primarily to avoid a proliferation of PMU callbacks. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0e37739b |
|
13-Jun-2013 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc: Fix stack overflow crash in resume_kernel when ftracing It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] CC: <stable@vger.kernel.org> [v3.4+] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
82a9f16a |
|
16-May-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc/hw_breakpoints: Add DABRX cpu feature to fix 32-bit regression When introducing support for DABRX in 4474ef0, we broke older 32-bit CPUs that don't have that register. Some CPUs have a DABR but not DABRX. Configuration are: - No 32bit CPUs have DABRX but some have DABR. - POWER4+ and below have the DABR but no DABRX. - 970 and POWER5 and above have DABR and DABRX. - POWER8 has DAWR, hence no DABRX. This introduces CPU_FTR_DABRX and sets it on appropriate CPUs. We use the top 64 bits for CPU FTR bits since only 64 bit CPUs have this. Processors that don't have the DABRX will still work as they will fall back to software filtering these breakpoints via perf_exclude_event(). Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: "Gorelik, Jacob (335F)" <jacob.gorelik@jpl.nasa.gov> cc: stable@vger.kernel.org (v3.9 only) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6cecf76b |
|
13-May-2013 |
Scott Wood <scottwood@freescale.com> |
powerpc/booke64: Fix kernel hangs at kernel_dbg_exc MSR_DE is not cleared on entry to the kernel, and we don't clear it explicitly outside of debug code. If we have MSR_DE set in prime_debug_regs(), and the new thread has events enabled in DBCR0 (e.g. ICMP is set in thread->dbsr0, even though it was cleared in the real DBCR0 when the thread got scheduled out), we'll end up taking a debug exception in the kernel when DBCR0 is loaded. DSRR0 will not point to an exception vector, and the kernel ends up hanging at kernel_dbg_exc. Fix this by always clearing MSR_DE when we load new debug state. Another observed source of kernel_dbg_exc hangs is with the branch taken event. If this event is active, but we take a non-debug trap (e.g. a TLB miss or an asynchronous interrupt) before the next branch. We end up taking a branch-taken debug exception on the initial branch instruction of the exception vector, but because the debug exception is DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even checking the DSRR0 address. Fix this by checking for DBSR_BT as well as DBSR_IC, which is what 32-bit does and what the comments suggest was intended in the 64-bit code as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
af945cf4 |
|
06-May-2013 |
Li Zhong <zhong@linux.vnet.ibm.com> |
powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again Saw this warning again, and this time from the ret_from_fork path. It seems we could clear the back chain earlier in copy_thread(), which could cover both path, and also fix potential lockdep usage in schedule_tail(), or exception occurred before we clear the back chain. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
a43cb95d |
|
30-Apr-2013 |
Tejun Heo <tj@kernel.org> |
dump_stack: unify debug information printed by show_regs() show_regs() is inherently arch-dependent but it does make sense to print generic debug information and some archs already do albeit in slightly different forms. This patch introduces a generic function to print debug information from show_regs() so that different archs print out the same information and it's much easier to modify what's printed. show_regs_print_info() prints out the same debug info as dump_stack() does plus task and thread_info pointers. * Archs which didn't print debug info now do. alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r, metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc, um, xtensa * Already prints debug info. Replaced with show_regs_print_info(). The printed information is superset of what used to be there. arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86 * s390 is special in that it used to print arch-specific information along with generic debug info. Heiko and Martin think that the arch-specific extra isn't worth keeping s390 specfic implementation. Converted to use the generic version. Note that now all archs print the debug info before actual register dumps. An example BUG() dump follows. kernel BUG at /work/os/work/kernel/workqueue.c:4841! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000 RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6 RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650 0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760 Call Trace: [<ffffffff81000312>] do_one_initcall+0x122/0x170 [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8 [<ffffffff81c47760>] ? rest_init+0x140/0x140 [<ffffffff81c4776e>] kernel_init+0xe/0xf0 [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0 [<ffffffff81c47760>] ? rest_init+0x140/0x140 ... v2: Typo fix in x86-32. v3: CPU number dropped from show_regs_print_info() as dump_stack_print_info() has been updated to print it. s390 specific implementation dropped as requested by s390 maintainers. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
196779b9 |
|
30-Apr-2013 |
Tejun Heo <tj@kernel.org> |
dump_stack: consolidate dump_stack() implementations and unify their behaviors Both dump_stack() and show_stack() are currently implemented by each architecture. show_stack(NULL, NULL) dumps the backtrace for the current task as does dump_stack(). On some archs, dump_stack() prints extra information - pid, utsname and so on - in addition to the backtrace while the two are identical on other archs. The usages in arch-independent code of the two functions indicate show_stack(NULL, NULL) should print out bare backtrace while dump_stack() is used for debugging purposes when something went wrong, so it does make sense to print additional information on the task which triggered dump_stack(). There's no reason to require archs to implement two separate but mostly identical functions. It leads to unnecessary subtle information. This patch expands the dummy fallback dump_stack() implementation in lib/dump_stack.c such that it prints out debug information (taken from x86) and invokes show_stack(NULL, NULL) and drops arch-specific dump_stack() implementations in all archs except blackfin. Blackfin's dump_stack() does something wonky that I don't understand. Debug information can be printed separately by calling dump_stack_print_info() so that arch-specific dump_stack() implementation can still emit the same debug information. This is used in blackfin. This patch brings the following behavior changes. * On some archs, an extra level in backtrace for show_stack() could be printed. This is because the top frame was determined in dump_stack() on those archs while generic dump_stack() can't do that reliably. It can be compensated by inlining dump_stack() but not sure whether that'd be necessary. * Most archs didn't use to print debug info on dump_stack(). They do now. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Hardware name: empty Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a071>] init_workqueues+0x35/0x505 ... v2: CPU number added to the generic debug info as requested by s390 folks and dropped the s390 specific dump_stack(). This loses %ksp from the debug message which the maintainers think isn't important enough to keep the s390-specific dump_stack() implementation. dump_stack_print_info() is moved to kernel/printk.c from lib/dump_stack.c. Because linkage is per objecct file, dump_stack_print_info() living in the same lib file as generic dump_stack() means that archs which implement custom dump_stack() - at this point, only blackfin - can't use dump_stack_print_info() as that will bring in the generic version of dump_stack() too. v1 The v1 patch broke build on blackfin due to this issue. The build breakage was reported by Fengguang Wu. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
28d170ab |
|
21-Apr-2013 |
Oleg Nesterov <oleg@redhat.com> |
ptrace/powerpc: Don't flush_ptrace_hw_breakpoint() on fork() arch_dup_task_struct() does flush_ptrace_hw_breakpoint(src), this destroys the parent's breakpoints for no reason. We should clear child->thread.ptrace_bps[] copied by dup_task_struct() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
f110c0c1 |
|
09-Apr-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: fix compiling CONFIG_PPC_TRANSACTIONAL_MEM when CONFIG_ALTIVEC=n We can't compile a kernel with CONFIG_ALTIVEC=n when CONFIG_PPC_TRANSACTIONAL_MEM=y. We currently get: arch/powerpc/kernel/tm.S:320: Error: unsupported relocation against THREAD_VSCR arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0 arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0 etc. The below fixes this with a sprinkling of #ifdefs. This was found by mpe with kisskb: http://kisskb.ellerman.id.au/kisskb/buildresult/8539442/ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
#
bc2a9408 |
|
13-Feb-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Hook in new transactional memory code This hooks the new transactional memory code into context switching, FP/VMX/VMX unavailable and exception return. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
fb09692e |
|
13-Feb-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes When we switch out a task, we need to save both the checkpointed and the speculated state into the thread struct. Similarly when we are switching in a task we need to load both the checkpointed and speculated state. If the task was using FP, we non-lazily reload both the original and the speculative FP register states. This is because the kernel doesn't see if/when a TM rollback occurs, so if we take an FP unavoidable later, we are unable to determine which set of FP regs need to be restored. This simply adds these functions. It doesn't hook them into the existing code yet. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
afc07701 |
|
13-Feb-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add transactional memory paca scratch register to show_regs Add transactional memory paca scratch register to show_regs. This is useful for debugging. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
8b3c34cf |
|
13-Feb-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: New macros for transactional memory support This adds new macros for saving and restoring checkpointed architected state from and to the thread_struct. It also adds some debugging macros for when your brain explodes trying to debug your transactional memory enabled kernel. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
05d694ea |
|
24-Jan-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add length setting to set_dawr Currently we set the length field in the DAWR to 0 which defaults it to one double word (64bits) which is the same as the DABR. Change this so that we can set it to longer values as supported by the DAWR. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b9818c33 |
|
10-Jan-2013 |
Michael Neuling <mikey@neuling.org> |
powerpc: Rename set_break to avoid naming conflict With allmodconfig we are getting: drivers/tty/synclink_gt.c:160:12: error: conflicting types for 'set_break' arch/powerpc/include/asm/debug.h:49:5: note: previous declaration of 'set_break' was here drivers/tty/synclinkmp.c:526:12: error: conflicting types for 'set_break' arch/powerpc/include/asm/debug.h:49:5: note: previous declaration of 'set_break' was here This renames set_break to set_breakpoint to avoid this naming conflict Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
bf99de36 |
|
20-Dec-2012 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add the DAWR support to the set_break() This adds DAWR supoprt to the set_break(). It does both bare metal and PAPR versions of setting the DAWR. There is still some work we can do to make full use of the watchpoint but that will come later. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9422de3e |
|
20-Dec-2012 |
Michael Neuling <mikey@neuling.org> |
powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers This is a rewrite so that we don't assume we are using the DABR throughout the code. We now use the arch_hw_breakpoint to store the breakpoint in a generic manner in the thread_struct, rather than storing the raw DABR value. The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from userspace. We keep this functionality, so that future changes (like the POWER8 DAWR), will still fake the DABR to userspace. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
92779245 |
|
06-Dec-2012 |
Haren Myneni <haren@linux.vnet.ibm.com> |
powerpc: Define ppr in thread_struct [PATCH 4/6] powerpc: Define ppr in thread_struct ppr in thread_struct is used to save PPR and restore it before process exits from kernel. This patch sets the default priority to 3 when tasks are created such that users can use 4 for higher priority tasks. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
afa86fc4 |
|
22-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
flagday: don't pass regs to copy_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0bcfe540 |
|
26-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: switch to generic fork/clone/vfork Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ab75819d |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: make fork_idle() take the common "kernel thread" path in copy_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ea516b11 |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: put the "zero usp means using parent's stack pointer" to copy_thread() simplifies callers, at that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
64c2f659 |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: don't bother with CHECK_FULL_REGS in sys_fork() et.al. copy_thread() will do it anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
9d401279 |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: don't bother with zero-extending arguments in sys_clone() ... since the syscall glue had been doing that for 9 years already. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
53b50f94 |
|
21-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: take dereferencing to ret_from_kernel_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
40792104 |
|
11-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: don't mess with r2 in copy_thread() and friends kernel_thread() callbacks are *not* in modules and are not going to be there. And it's not even read in ppc32 ret_from_kernel_thread(), so no need to bother with it there either. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
138d1ce8 |
|
11-Oct-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: switch to saner kernel_execve() semantics Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
be6abfa7 |
|
31-Aug-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: switch to generic sys_execve()/kernel_execve() the only non-obvious part is that current_pt_regs() is really needed here - task_pt_regs() is NULL for kernel threads; it's OK for ptrace uses (the thing task_pt_regs() is intended for), but not for us. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
58254e10 |
|
12-Sep-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
powerpc: split ret_from_fork ... and get rid of in-kernel syscalls in kernel_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4474ef05 |
|
06-Sep-2012 |
Michael Neuling <mikey@neuling.org> |
powerpc: Rework set_dabr so it can take a DABRX value as well Rework set_dabr to take a DABRX value as well. Both the pseries and PS3 hypervisors do some checks on the DABRX values that are passed in the hcall. This patch stops bogus values from being passed to hypervisor. Also, in the case where we are clearing the breakpoint, where DABR and DABRX are zero, we modify the DABRX value to make it valid so that the hcall won't fail. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
1021cb26 |
|
03-Sep-2012 |
Anton Blanchard <anton@samba.org> |
powerpc: Fix DSCR inheritance in copy_thread() If the default DSCR is non zero we set thread.dscr_inherit in copy_thread() meaning the new thread and all its children will ignore future updates to the default DSCR. This is not intended and is a change in behaviour that a number of our users have hit. We just need to inherit thread.dscr and thread.dscr_inherit from the parent which ends up being much simpler. This was found with the following test case: http://ozlabs.org/~anton/junkcode/dscr_default_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> # 3.0+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
41ab5266 |
|
23-Aug-2012 |
Ananth N Mavinakayanahalli <ananth@in.ibm.com> |
powerpc: Add trap_nr to thread_struct Add thread_struct.trap_nr and use it to store the last exception the thread experienced. In this patch, we populate the field at various places where we force_sig_info() to the process. This is also used in uprobes to determine if the probed instruction caused an exception. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
baa36046 |
|
18-Jun-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Consolidate vtime handling on context switch The archs that implement virtual cputime accounting all flush the cputime of a task when it gets descheduled and sometimes set up some ground initialization for the next task to account its cputime. These archs all put their own hooks in their context switch callbacks and handle the off-case themselves. Consolidate this by creating a new account_switch_vtime() callback called in generic code right after a context switch and that these archs must implement to flush the prev task cputime and initialize the next task cputime related state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
|
#
55ccf3fe |
|
16-May-2012 |
Suresh Siddha <suresh.b.siddha@intel.com> |
fork: move the real prepare_to_copy() users to arch_dup_task_struct() Historical prepare_to_copy() is mostly a no-op, duplicated for majority of the architectures and the rest following the x86 model of flushing the extended register state like fpu there. Remove it and use the arch_dup_task_struct() instead. Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <chris@zankel.net> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
#
96c95117 |
|
05-May-2012 |
Thomas Gleixner <tglx@linutronix.de> |
powerpc: Use common threadinfo allocator The core now has a threadinfo allocator which uses a kmemcache when THREAD_SIZE < PAGE_SIZE. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20120505150142.059161130@linutronix.de
|
#
35000870 |
|
15-Apr-2012 |
Anton Blanchard <anton@samba.org> |
powerpc: Optimise enable_kernel_altivec Add two optimisations to enable_kernel_altivec: - enable_kernel_altivec has already determined if we need to save the previous task's state but we call giveup_altivec in both cases, requiring an extra branch in giveup_altivec. Create giveup_altivec_notask which only turns on the VMX bit in the MSR. - We write the VMX MSR bit each time we call enable_kernel_altivec even it was already set. Check the bit and branch out if we have already set it. The classic case for this is vectored IO where we have to copy multiple buffers to or from userspace. The following testcase was used to confirm this patch improves performance: http://ozlabs.org/~anton/junkcode/copy_to_user.c Since the current breakpoint for using VMX in copy_tofrom_user is 4096 bytes, I'm using buffers of 4096 + 1 cacheline (4224) bytes. A benchmark of 16 entry readvs (-s 16): time copy_to_user -l 4224 -s 16 -i 1000000 completes 5.2% faster on a POWER7 PS700. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
fae2e0fb |
|
10-Apr-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Fix typo in runlatch code Commit fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" has a nasty typo where it uses "TLF_RUNLATCH" instead of "_TLF_RUNLATCH" (bit number instead of bit mask), causing some flags to be potentially lost such as _TLF_RESTORE_SIGMASK (Brown paper bag for me ! We should be able to make that break at compile time with a bit of magic, any volunteer ?) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ae3a197e |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
|
#
7230c564 |
|
06-Mar-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Rework lazy-interrupt handling The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
|
#
fe1952fc |
|
29-Feb-2012 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Rework runlatch code This moves the inlines into system.h and changes the runlatch code to use the thread local flags (non-atomic) rather than the TIF flags (atomic) to keep track of the latch state. The code to turn it back on in an asynchronous interrupt is now simplified and partially inlined. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
40c8cefa |
|
05-Jan-2012 |
Ira Snyder <iws@ovro.caltech.edu> |
powerpc: Fix kernel log of oops/panic instruction dump A kernel oops/panic prints an instruction dump showing several instructions before and after the instruction which caused the oops/panic. The code intended that the faulting instruction be enclosed in angle brackets, however a bug caused the faulting instruction to be interpreted by printk() as the message log level. To fix this, the KERN_CONT log level is added before the actual text of the printed message. === Before the patch === [ 1081.587266] Instruction dump: [ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 [ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000 [ 1081.602500] 4e800020 3803ffd0 2b800009 <4>[ 1081.587266] Instruction dump: <4>[ 1081.590236] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 <4>[ 1081.598034] 3d20c03a 9009a114 7c0004ac 39200000 <98090000>[ 1081.602500] 4e800020 3803ffd0 2b800009 === After the patch === [ 51.385216] Instruction dump: [ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 [ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009 <4>[ 51.385216] Instruction dump: <4>[ 51.388186] 7c000110 7c0000f8 5400077c 552907f6 7d290378 992b0003 4e800020 38000001 <4>[ 51.395986] 3d20c03a 9009a114 7c0004ac 39200000 <98090000> 4e800020 3803ffd0 2b800009 Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
3bfd0c9c |
|
24-Nov-2011 |
Anton Blanchard <anton@samba.org> |
powerpc: Decode correct MSR bits in oops output On a 64bit book3s machine I have an oops from a system reset that claims the book3e CE bit was set: MSR: 8000000000021032 <ME,CE,IR,DR> CR: 24004082 XER: 00000010 On a book3s machine system reset sets IBM bit 46 and 47 depending on the power saving mode. Separate the definitions by type and for completeness add the rest of the bits in. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
187b9f2a |
|
05-Oct-2011 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc/book3e-64: Fix debug support for userspace With the introduction of CONFIG_PPC_ADV_DEBUG_REGS user space debug is broken on Book-E 64-bit parts that support delayed debug events. When switch_booke_debug_regs() sets DBCR0 we'll start getting debug events as MSR_DE is also set and we aren't able to handle debug events from kernel space. We can remove the hack that always enables MSR_DE and loads up DBCR0 and just utilize switch_booke_debug_regs() to get user space debug working again. We still need to handle critical/debug exception stacks & proper save/restore of state for those exception levles to support debug events from kernel space like we have on 32-bit. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ba28c9aa |
|
05-Oct-2011 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Revert show_regs() define for readability We had an existing ifdef for 4xx & BOOKE processors that got changed to CONFIG_PPC_ADV_DEBUG_REGS. The define has nothing to do with CONFIG_PPC_ADV_DEBUG_REGS. The define really should be: #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) and not #ifdef CONFIG_PPC_ADV_DEBUG_REGS Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4b16f8e2 |
|
22-Jul-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
5115a026 |
|
14-Jul-2011 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add CFAR to oops output Now we have the CFAR saved add it to the oops output. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0e0ebdb9 |
|
09-Jun-2011 |
Mathias Krause <minipli@googlemail.com> |
powerpc: Remove redundant set_fs(USER_DS) The address limit is already set in flush_old_exec() so this set_fs(USER_DS) is redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
de56a948 |
|
28-Jun-2011 |
Paul Mackerras <paulus@samba.org> |
KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
685659ee |
|
14-Jun-2011 |
yu liu <yu.liu@freescale.com> |
powerpc/e500: Save SPEFCSR in flush_spe_to_thread() giveup_spe() saves the SPE state which is protected by MSR[SPE]. However, modifying SPEFSCR does not trap when MSR[SPE]=0. And since SPEFSCR is already saved/restored in _switch(), not all the callers want to save SPEFSCR again. Thus, saving SPEFSCR should not belong to giveup_spe(). This patch moves SPEFSCR saving to flush_spe_to_thread(), and cleans up the caller that needs to save SPEFSCR accordingly. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
|
#
d6bf29b4 |
|
24-May-2011 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
powerpc: mmu_gather rework Fix up powerpc to the new mmu_gather stuff. PPC has an extra batching queue to RCU free the actual pagetable allocations, use the ARCH extentions for that for now. For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the hardware hash-table, keep using per-cpu arrays but flush on context switch and use a TLF bit to track the lazy_mmu state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
44ae3ab3 |
|
06-Apr-2011 |
Matt Evans <matt@ozlabs.org> |
powerpc: Free up some CPU feature bits by moving out MMU-related features Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
efcac658 |
|
02-Mar-2011 |
Alexey Kardashevskiy <aik@au1.ibm.com> |
powerpc: Per process DSCR + some fixes (try#4) The DSCR (aka Data Stream Control Register) is supported on some server PowerPC chips and allow some control over the prefetch of data streams. This patch allows the value to be specified per thread by emulating the corresponding mfspr and mtspr instructions. Children of such threads inherit the value. Other threads use a default value that can be specified in sysfs - /sys/devices/system/cpu/dscr_default. If a thread starts with non default value in the sysfs entry, all children threads inherit this non default value even if the sysfs value is changed later. Signed-off-by: Alexey Kardashevskiy <aik@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
b6a84016 |
|
22-Mar-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
mm: NUMA aware alloc_thread_info_node() Add a node parameter to alloc_thread_info(), and change its name to alloc_thread_info_node() This change is needed to allow NUMA aware kthread_create_on_cpu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e0780b72 |
|
09-Feb-2011 |
K.Prasad <prasad@linux.vnet.ibm.com> |
powerpc: Fix call to flush_ptrace_hw_breakpoint() Fix the error in spelling the config option for hw-breakpoints and fix the build issue that follows. Signed-off by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7071854b |
|
11-Jan-2011 |
Anton Blanchard <anton@samba.org> |
powerpc: Print 32 bits of DSISR in show_regs We were printing 64 bits of DSISR in show_regs even though it is 32 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
cf9efce0 |
|
26-Aug-2010 |
Paul Mackerras <paulus@samba.org> |
powerpc: Account time using timebase rather than PURR Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
e1f0ece1 |
|
10-Aug-2010 |
Michael Neuling <mikey@neuling.org> |
powerpc: Move arch_sd_sibling_asym_packing() to smp.c Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to smp.c to save an #ifdef CONFIG_SMP No functionality change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
4138d653 |
|
05-Aug-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Inline ppc64_runlatch_off I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it into the callers. To avoid a mess of circular includes I didn't add it as an inline function. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9904b005 |
|
29-Jul-2010 |
Denis Kirjanov <dkirjanov@kernel.org> |
powerpc: Use is_32bit_task() helper to test 32 bit binary Use is_32bit_task() helper to test 32 bit binary. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d7627467 |
|
17-Aug-2010 |
David Howells <dhowells@redhat.com> |
Make do_execve() take a const filename pointer Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c7887325 |
|
11-Aug-2010 |
David Howells <dhowells@redhat.com> |
Mark arguments to certain syscalls as being const Mark arguments to certain system calls as being const where they should be but aren't. The list includes: (*) The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. (*) The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a2e19811 |
|
08-Jul-2010 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/book3e: Hack to get gdb moving along on Book3E 64-bit Our handling of debug interrupts on Book3E 64-bit is not quite the way it should be just yet. This is a workaround to let gdb work at least for now. We ensure that when context switching, we set the appropriate DBCR0 value for the new task. We also make sure that we turn off MSR[DE] within the kernel, and set it as part of the bits that get set when going back to userspace. In the long run, we will probably set the userspace DBCR0 on the exception exit code path and ensure we have some proper kernel value to set on the way into the kernel, a bit like ppc32 does, but that will take more work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
2ec57d44 |
|
28-Jun-2010 |
Michael Neuling <mikey@neuling.org> |
sched: Fix spelling of sibling No logic changes, only spelling. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <15249.1277776921@neuling.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
5aae8a53 |
|
15-Jun-2010 |
K.Prasad <prasad@linux.vnet.ibm.com> |
powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f1ba9a5b |
|
02-Jun-2010 |
Christoph Hellwig <hch@lst.de> |
powerpc: Unconditionally enabled irq stacks Irq stacks provide an essential protection from stack overflows through external interrupts, at the cost of two additionals stacks per CPU. Enable them unconditionally to simplify the kernel build and prevent people from accidentally disabling them. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
89275d59 |
|
09-Jun-2010 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
powerpc: Exclude arch_sd_sibiling_asym_packing() on UP Only SMP systems care about load-balance features, plus this saves some .text space on UP and also fixes the build. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Neuling <mikey@neuling.org> LKML-Reference: <tip-76cbd8a8f8b0dddbff89a6708bd5bd13c0d21a00@git.kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
76cbd8a8 |
|
07-Jun-2010 |
Michael Neuling <mikey@neuling.org> |
powerpc: Enable asymmetric SMT scheduling on POWER7 The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 & 1 SMT4 uses threads 0, 1, 2 & 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 & 3 idle and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20100608045702.31FB5CC8C7@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
221c185d |
|
05-Mar-2010 |
Dave Kleikamp <shaggy@linux.vnet.ibm.com> |
powerpc/476: Add isync after loading mmu and debug spr's 476 requires an isync after loading MMU and debug related SPR's. Some of these are in performance-critical paths and may need to be optimized, but initially, we're playing it safe. Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
|
#
3bffb652 |
|
08-Feb-2010 |
Dave Kleikamp <shaggy@linux.vnet.ibm.com> |
powerpc/booke: Add support for advanced debug registers powerpc/booke: Add support for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Based on patches originally written by Torez Smith. This patch defines context switch and trap related functionality for BookE specific Debug Registers. It adds support to ptrace() for setting and getting BookE related Debug Registers Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Torez Smith <lnxtorez@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <dwg@au1.ibm.com> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Sergio Durigan Junior <sergiodj@br.ibm.com> Cc: Thiago Jung Bauermann <bauerman@br.ibm.com> Cc: linuxppc-dev list <Linuxppc-dev@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
172ae2e7 |
|
08-Feb-2010 |
Dave Kleikamp <shaggy@linux.vnet.ibm.com> |
powerpc/booke: Introduce new CONFIG options for advanced debug registers powerpc/booke: Introduce new CONFIG options for advanced debug registers From: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Introduce new config options to simplify the ifdefs pertaining to the advanced debug registers for booke and 40x processors: CONFIG_PPC_ADV_DEBUG_REGS - boolean: true for dac-based processors CONFIG_PPC_ADV_DEBUG_IACS - number of IAC registers CONFIG_PPC_ADV_DEBUG_DACS - number of DAC registers CONFIG_PPC_ADV_DEBUG_DVCS - number of DVC registers CONFIG_PPC_ADV_DEBUG_DAC_RANGE - DAC ranges supported Beginning conservatively, since I only have the facilities to test 440 hardware. I believe all 40x and booke platforms support at least 2 IAC and 2 DAC registers. For 440, 4 IAC and 2 DVC registers are enabled, as well as the DAC ranges. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
94f28da8 |
|
30-Jan-2010 |
Andreas Schwab <schwab@linux-m68k.org> |
powerpc: TIF_ABI_PENDING bit removal Here are the powerpc bits to remove TIF_ABI_PENDING now that set_personality() is called at the appropriate place in exec. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
ce7a35c7 |
|
16-Oct-2009 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Fix compile errors found by new ppc64e_defconfig Fix the following 3 issues: arch/powerpc/kernel/process.c: In function 'arch_randomize_brk': arch/powerpc/kernel/process.c:1183: error: 'mmu_highuser_ssize' undeclared (first use in this function) arch/powerpc/kernel/process.c:1183: error: (Each undeclared identifier is reported only once arch/powerpc/kernel/process.c:1183: error: for each function it appears in.) arch/powerpc/kernel/process.c:1183: error: 'MMU_SEGSIZE_1T' undeclared (first use in this function) In file included from arch/powerpc/kernel/setup_64.c:60: arch/powerpc/include/asm/mmu-hash64.h:132: error: redefinition of 'struct mmu_psize_def' arch/powerpc/include/asm/mmu-hash64.h:159: error: expected identifier or '(' before numeric constant arch/powerpc/include/asm/mmu-hash64.h:396: error: conflicting types for 'mm_context_t' arch/powerpc/include/asm/mmu-book3e.h:184: error: previous declaration of 'mm_context_t' was here cc1: warnings being treated as errors arch/powerpc/kernel/pci_64.c: In function 'pcibios_unmap_io_space': arch/powerpc/kernel/pci_64.c:100: error: unused variable 'res' Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
9135c3cc |
|
15-Sep-2009 |
Steven Rostedt <srostedt@redhat.com> |
powerpc/ftrace: show real return addresses in modules When the function graph tracer is enabled, it replaces the return address with a hook back to the tracer. This makes back traces see the hook instead of the actual return address. The current code also shows the real address by checking if the return address jumps to the return_to_handler. If it is, is also prints out the saved real return address. On powerpc64, some modules may return to mod_return_to_handler, which is not checked. This patch will also show the real address if a return is to mod_return_to_handler as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
#
8bbde7a7 |
|
21-Sep-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: Move 64bit heap above 1TB on machines with 1TB segments If we are using 1TB segments and we are allowed to randomise the heap, we can put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be in the bottom 1TB which always uses 256MB segments and this may result in a performance penalty. This functionality is disabled when heap randomisation is turned off: echo 1 > /proc/sys/kernel/randomize_va_space which may be useful when trying to allocate the maximum amount of 16M or 16G pages. On a microbenchmark that repeatedly touches 32GB of memory with a stride of 256MB + 4kB (designed to stress 256MB segments while still mapping nicely into the L1 cache), we see the improvement: Force malloc to use heap all the time: # export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1 Disable heap randomization: # echo 1 > /proc/sys/kernel/randomize_va_space # time ./test 12.51s Enable heap randomization: # echo 2 > /proc/sys/kernel/randomize_va_space # time ./test 1.70s Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
c6c9eace |
|
08-Sep-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead Also remove a duplicate setting of it in the context switch path on BookE. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
747bea91 |
|
23-Jul-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Clean ifdef usage in copy_thread() Currently, a single ifdef covers SLB related bits and more generic ppc64 related bits, split this in two separate ifdef's since 64-bit BookE will need one but not the other. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
a2367194 |
|
18-Jun-2009 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Fix output from show_regs For some reason we've had an explicit KERN_INFO for GPR dumps. With recent changes we get output like: <6>GPR00: 00000000 ef855eb0 ef858000 00000001 000000d0 f1000000 ffbc8000 ffffffff The KERN_INFO is causing the <6>. Don't see any reason to keep it around. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
94491685 |
|
02-Jun-2009 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Shield code specific to 64-bit server processors This is a random collection of added ifdef's around portions of code that only mak sense on server processors. Using either CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate. This is meant to make the future merging of Book3E 64-bit support easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6f2c55b8 |
|
02-Apr-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
Simplify copy_thread() First argument unused since 2.3.11. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
501cb16d |
|
21-Feb-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: Randomise PIEs Randomise ELF_ET_DYN_BASE, which is used when loading position independent executables. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
912f9ee2 |
|
21-Feb-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: Randomise the brk region Randomize the heap. before: tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] after tundro2:~ # sleep 1 & cat /proc/${!}/maps | grep heap 19419000-1951a000 rw-p 19419000 00:00 0 [heap] 325ff000-32700000 rw-p 325ff000 00:00 0 [heap] 1a97c000-1aa7d000 rw-p 1a97c000 00:00 0 [heap] 1cc60000-1cd61000 rw-p 1cc60000 00:00 0 [heap] 1afa9000-1b0aa000 rw-p 1afa9000 00:00 0 [heap] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
d839088c |
|
21-Feb-2009 |
Anton Blanchard <anton@samba.org> |
powerpc: Randomise lower bits of stack address Randomise the lower bits of the stack address. More randomisation is good for security but the scatter can also help with SMT threads that share an L1. A quick test case shows this working: int main() { int sp; printf("%x\n", (unsigned long)&sp & 4095); } before: 80 80 80 80 80 after: 610 490 300 6b0 d80 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
6794c782 |
|
09-Feb-2009 |
Steven Rostedt <srostedt@redhat.com> |
powerpc64: port of the function graph tracer This is a port of the function graph tracer that was written by Frederic Weisbecker for the x86. This only works for PPC64 at the moment and only for static tracing. PPC32 and dynamic function graph tracing support will come later. The trace produces a visual calling of functions: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) 2.224 us | } 0) ! 271.024 us | } 0) ! 320.080 us | } 0) ! 324.656 us | } 0) ! 329.136 us | } 0) | .put_prev_task_fair() { 0) | .update_curr() { 0) 2.240 us | .update_min_vruntime(); 0) 6.512 us | } 0) 2.528 us | .__enqueue_entity(); 0) + 15.536 us | } 0) | .pick_next_task_fair() { 0) 2.032 us | .__pick_next_entity(); 0) 2.064 us | .__clear_buddies(); 0) | .set_next_entity() { 0) 2.672 us | .__dequeue_entity(); 0) 6.864 us | } Geoff Lavand tested on PS3. Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
79741dd3 |
|
31-Dec-2008 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
#
c4d04be1 |
|
19-Nov-2008 |
Johannes Berg <johannes@sipsolutions.net> |
powerpc: Allow the max stack trace depth to be configured On my screen, when something crashes, I only have space for maybe 16 functions of the stack trace before the information above it scrolls off the screen. It's easy to hack the kernel to print out only that much, but it's harder to remember to do it. This introduces a config option for it so that I can keep the setting in my config. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
1b98326b |
|
18-Nov-2008 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Add MSR[CE, DE] to the MSR bits we print on show_regs() Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
9c4cb825 |
|
01-Aug-2008 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: Remove use of CONFIG_PPC_MERGE Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc and include/asm-powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
2325f0a0 |
|
25-Jul-2008 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc/booke: Clean up the hardware watchpoint support * CONFIG_BOOKE is selected by CONFIG_44x so we dont need both * Fixed a few comments * Go back to only using DBCR0_IDM to determine if we are using debug resources. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
0b21bb49 |
|
25-Jul-2008 |
Kumar Gala <galak@kernel.crashing.org> |
powerpc: clean up the Book-E HW watchpoint support * CONFIG_BOOKE is selected by CONFIG_44x so we dont need both * Fixed a few comments * Go back to only using DBCR0_IDM to determine if we are using debug resources. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
d6a61bfc |
|
23-Jul-2008 |
Luis Machado <luisgpm@linux.vnet.ibm.com> |
powerpc: BookE hardware watchpoint support This patch implements support for HW based watchpoint via the DBSR_DAC (Data Address Compare) facility of the BookE processors. It does so by interfacing with the existing DABR breakpoint code and adding the necessary bits and pieces for the new bits to be properly set or cleared Signed-off-by: Luis Machado <luisgpm@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
7c292170 |
|
11-Jul-2008 |
Michael Neuling <mikey@neuling.org> |
powerpc: fix giveup_vsx to save registers correctly giveup_vsx didn't save the FPU and VMX regsiters. Change it to be like giveup_fpr/altivec which save these registers. Also update call sites where FPU and VMX are already saved to use the original giveup_vsx (renamed to __giveup_vsx). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
058c78f4 |
|
06-Jul-2008 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Use new printk extension %pS to print symbols on oops This changes the oops and backtrace code to use the new %pS printk extension to print out symbols rather than manually calling print_symbol. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
#
138fc1ee0 |
|
02-Jul-2008 |
Michael Neuling <mikey@neuling.org> |
powerpc: Remove old dump_task_* functions Since Roland's ptrace cleanup starting with commit f65255e8d51ecbc6c9eef20d39e0377d19b658ca ("[POWERPC] Use user_regset accessors for FP regs"), the dump_task_* functions are no longer being used. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f3e909c2 |
|
30-Jun-2008 |
Michael Neuling <mikey@neuling.org> |
powerpc: Update for VSX core file and ptrace This correctly hooks the VSX dump into Roland McGrath core file infrastructure. It adds the VSX dump information as an additional elf note in the core file (after talking more to the tool chain/gdb guys). This also ensures the formats are consistent between signals, ptrace and core files. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
ce48b210 |
|
24-Jun-2008 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add VSX context save/restore, ptrace and signal support This patch extends the floating point save and restore code to use the VSX load/stores when VSX is available. This will make FP context save/restore marginally slower on FP only code, when VSX is available, as it has to load/store 128bits rather than just 64bits. Mixing FP, VMX and VSX code will get constant architected state. The signals interface is extended to enable access to VSR 0-31 doubleword 1 after discussions with tool chain maintainers. Backward compatibility is maintained. The ptrace interface is also extended to allow access to VSR 0-31 full registers. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
9c75a31c |
|
26-Jun-2008 |
Michael Neuling <mikey@neuling.org> |
powerpc: Add macros to access floating point registers in thread_struct. We are going to change where the floating point registers are stored in the thread_struct, so in preparation add some macros to access the floating point registers. Update all code to use these new macros. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
85218827 |
|
28-Apr-2008 |
Kumar Gala <galak@kernel.crashing.org> |
[POWERPC] Add IRQSTACKS support on ppc32 This makes it possible to use separate stacks for hard and soft IRQs on 32-bit powerpc as well as on 64-bit. The code for 32-bit is just the 32-bit analog of the 64-bit code. * Added allocation and initialization of the irq stacks. We limit the stacks to be in lowmem for ppc32. * Implemented ppc32 versions of call_do_softirq() and call_handle_irq() to switch the stack pointers * Reworked how we do stack overflow detection. We now keep around the limit of the stack in the thread_struct and compare against the limit to see if we've overflowed. We can now use this on ppc64 if desired. [ paulus@samba.org: Fixed bug on 6xx where we need to reload r9 with the thread_info pointer. ] Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f6a61680 |
|
18-Apr-2008 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[POWERPC] Fix kernel stack allocation alignment The powerpc kernel stacks need to be naturally aligned, as they contain the thread info at the bottom, which is obtained by clearing the low bits of the stack pointer. However, when using 64K pages, the stack is smaller than a page, so we use kmalloc to allocate it, but that doesn't provide the alignment guarantee we need. It appeared to work so far... until one enables SLUB debugging which then returns unaligned pointers. Ooops... This fixes it by using a slab cache with enforced alignment. It relies on my previous patch that adds a thread_info_cache_init() callback. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
ec2b36b9 |
|
16-Apr-2008 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[POWERPC] Move stackframe definitions to common header This moves various definitions used all over the place to parse stack frames to ptrace.h so only one definition is needed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
a2ceff5e |
|
28-Mar-2008 |
Michael Ellerman <michael@ellerman.id.au> |
[POWERPC] Fix missed hardware breakpoints across multiple threads There is a bug in the powerpc DABR (data access breakpoint) handling, which can result in us missing breakpoints if several threads are trying to break on the same address. The circumstances are that do_page_fault() calls do_dabr(), this clears the DABR (sets it to 0) and sets up the signal which will report to userspace that the DABR was hit. The do_signal() code will restore the DABR value on the way out to userspace. If we reschedule before calling do_signal(), __switch_to() will check the cached DABR value and compare it to the new thread's value, if they match we don't set the DABR in hardware. So if two threads have the same DABR value, and we schedule from one to the other after taking the interrupt for the first thread hitting the DABR, the second thread will run without the DABR set in hardware. The cleanest fix is to move the cache update into set_dabr(), that way we can't forget to do it. Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
71e91a0a |
|
16-Mar-2008 |
Roland McGrath <roland@redhat.com> |
[POWERPC] Don't touch PT_DTRACE in exec The PT_DTRACE flag is meaningless and obsolete. Don't touch it. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
44387e9f |
|
16-Mar-2008 |
Anton Blanchard <anton@samba.org> |
[POWERPC] Fix PMU + soft interrupt disable bug Since the PMU is an NMI now, it can come at any time we are only soft disabled. We must hard disable around the two places we allow the kernel stack SLB and r1 to go out of sync. Otherwise the PMU exception can force a kernel stack SLB into another slot, which can lead to it getting evicted, which can lead to a nasty unrecoverable SLB miss in the exception entry code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
79ccd1be |
|
08-Feb-2008 |
Hugh Dickins <hugh@veritas.com> |
[POWERPC] Fix DEBUG_PREEMPT warning when warning The powerpc show_regs prints CPU using smp_processor_id: change that to raw_smp_processor_id, so that when it's showing a WARN_ON backtrace without preemption disabled, DEBUG_PREEMPT doesn't mess up that warning with its own. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
81a3843f |
|
03-Dec-2007 |
Tony Breeds <tony@bakeyournoodle.com> |
[POWERPC] Fix hardware IRQ time accounting problem. The commit fa13a5a1f25f671d084d8884be96fc48d9b68275 (sched: restore deterministic CPU accounting on powerpc), unconditionally calls update_process_tick() in system context. In the deterministic accounting case this is the correct thing to do. However, in the non-deterministic accounting case we need to not do this, since doing this results in the time accounted as hardware irq time being artificially elevated. Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING' checks in time.h into one for neatness. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
fa13a5a1 |
|
09-Nov-2007 |
Paul Mackerras <paulus@samba.org> |
sched: restore deterministic CPU accounting on powerpc Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been broken on powerpc, because we end up counting user time twice: once in timer_interrupt() and once in update_process_times(). This fixes the problem by pulling the code in update_process_times that updates utime and stime into a separate function called account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined, there is a version of account_process_tick in kernel/timer.c that simply accounts a whole tick to either utime or stime as before. If CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to implement account_process_tick. This also lets us simplify the s390 code a bit; it means that the s390 timer interrupt can now call update_process_times even when CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a suitable account_process_tick(). account_process_tick() now takes the task_struct * as an argument. Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
19c5870c |
|
19-Oct-2007 |
Alexey Dobriyan <adobriyan@openvz.org> |
Use helpers to obtain task pid in printks (arch code) One of the easiest things to isolate is the pid printed in kernel log. There was a patch, that made this for arch-independent code, this one makes so for arch/xxx files. It took some time to cross-compile it, but hopefully these are all the printks in arch code. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1f7d6668 |
|
17-Oct-2007 |
Mark Nelson <markn@au1.ibm.com> |
powerpc: add Altivec/VMX state to coredumps Update dump_task_altivec() (which has so far never been put to use) so that it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the same format as the ptrace get_vrregs(), and add the appropriate glue typedef and #defines to make it work. A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because it allows the low range values to be used for more generic purposes and 0x100 seems an adequate starting point for PowerPC extensions. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1189be65 |
|
11-Oct-2007 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Use 1TB segments This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+, POWER6 and PA6T). We detect that the machine supports 1TB segments by looking at the ibm,processor-segment-sizes property in the device tree. We don't currently use 1TB segments for user addresses < 1T, since that would effectively prevent 32-bit processes from using huge pages unless we also had a way to revert to using 256MB segments. That would be possible but would involve extra complications (such as keeping track of which segment size was used when HPTEs were inserted) and is not addressed here. Parts of this patch were originally written by Ben Herrenschmidt. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0de2d820 |
|
27-Sep-2007 |
Scott Wood <scottwood@freescale.com> |
[POWERPC] Make instruction dumping work in real mode On non-book-E, exceptions execute in real mode. If a fault happens that leads to a register dump, the kernel currently prints XXXXXXXX because it doesn't realize that PC is a physical address. This patch checks whether instruction address translation is turned on, and if not converts PC into a virtual address. Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
474f8196 |
|
24-Sep-2007 |
Roland McGrath <roland@redhat.com> |
[POWERPC] Ensure FULL_REGS on exec When PTRACE_O_TRACEEXEC is used, a ptrace call to fetch the registers at the PTRACE_EVENT_EXEC stop (PTRACE_PEEKUSR) will oops in CHECK_FULL_REGS. With recent versions, "gdb --args /bin/sh -c 'exec /bin/true'" and "run" at the (gdb) prompt is sufficient to produce this. I also have written an isolated test case, see https://bugzilla.redhat.com/show_bug.cgi?id=301791#c15. This change fixes the problem by clearing the low bit of pt_regs.trap in start_thread so that FULL_REGS is true again. This is correct since all of the GPRs that "full" refers to are cleared in start_thread. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5e14d21e |
|
13-Sep-2007 |
Kumar Gala <galak@kernel.crashing.org> |
[POWERPC] Add cpu feature for SPE handling Make it so that SPE support can be determined at runtime. This is similiar to how we handle AltiVec. This allows us to have SPE support built in and work on processors with and without SPE. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
0ee6c15e |
|
28-Aug-2007 |
Kumar Gala <galak@kernel.crashing.org> |
[POWERPC] Flush registers to proper task context When we flush register state for FP, Altivec, or SPE in flush_*_to_thread we need to respect the task_struct that the caller has passed to us. Most cases we are called with current, however sometimes (ptrace) we may be passed a different task_struct. This showed up when using gdbserver debugging a simple program that used floating point. When gdb tried to show the FP regs they all showed up as 0, because the child's FP registers were never properly flushed to memory. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
14170789 |
|
25-Jul-2007 |
Kumar Gala <galak@kernel.crashing.org> |
[POWERPC] Fix register labels on show_regs() message for 4xx/Book-E In a show_regs() message The DEAR and ESR were reported as DAR and DSISR which only exist on classic parts. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
#
791cc501 |
|
03-Jun-2007 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[POWERPC] Always apply DABR changes on context switches This patch removes the #ifdef CONFIG_PPC64 around setting the DABR. The actual setting of the SPR inside of the set_dabr() function is dependent on CONFIG_PPC64 || CONFIG_6xx but you can always provide a ppc_md hook to override that. We should improve support for different HW breakpoints facilities but this is a first step. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
e63340ae |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a741e679 |
|
10-Apr-2007 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[POWERPC] Make tlb flush batch use lazy MMU mode The current tlb flush code on powerpc 64 bits has a subtle race since we lost the page table lock due to the possible faulting in of new PTEs after a previous one has been removed but before the corresponding hash entry has been evicted, which can leads to all sort of fatal problems. This patch reworks the batch code completely. It doesn't use the mmu_gather stuff anymore. Instead, we use the lazy mmu hooks that were added by the paravirt code. They have the nice property that the enter/leave lazy mmu mode pair is always fully contained by the PTE lock for a given range of PTEs. Thus we can guarantee that all batches are flushed on a given CPU before it drops that lock. We also generalize batching for any PTE update that require a flush. Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and disabled by arch_leave_lazy_mmu_mode(). The code epects that this is always contained within a PTE lock section so no preemption can happen and no PTE insertion in that range from another CPU. When batching is enabled on a CPU, every PTE updates that need a hash flush will use the batch for that flush. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f6f7dde3 |
|
20-Mar-2007 |
anton@samba.org <anton@samba.org> |
[POWERPC] Use lowercase for hex printouts in oops messages. Use lowercase for hex printouts in oops messages. The number of times I have tried to copy and paste from an oops into an objdump search... Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
4002aca7 |
|
20-Mar-2007 |
Anton Blanchard <anton@samba.org> |
[POWERPC] Remove last_syscall Remove last_syscall from 32bit powerpc, its been gone in 64bit for years. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
f144e7c7 |
|
10-Mar-2007 |
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> |
[POWERPC] Fix atomicity of TIF update in flush_thread() Fix atomicity of TIF update in flush_thread() for powerpc Fixes it correctly with *_ti_thread_flag. Race : parent process executing : sys_ptrace() (lock_kernel()) (ptrace_get_task_struct(pid)) arch_ptrace() ptrace_detach() ptrace_disable(child); clear_singlestep(child); clear_tsk_thread_flag(child, TIF_SINGLESTEP); (which clears the TIF_SINGLESTEP flag atomically from a different process) (put_task_struct(child)) (unlock_kernel()) And at the same time, in the child process : sys_execve() do_execve() search_binary_handler() load_elf_binary() flush_old_exec() flush_thread() doing a non-atomic thread flag update Applies on 2.6.20. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
bb72c481 |
|
18-Feb-2007 |
Paul Mackerras <paulus@samba.org> |
[POWERPC] Harden validate_sp against stack corruption If something has overflowed or corrupted the stack and causes an oops, and we try to print a stack trace, that will call validate_sp, which can itself cause an oops if the cpu field of the thread_info struct at the bottom of the stack has been corrupted (if CONFIG_IRQSTACKS is set). This makes debugging harder. To avoid the second oops, this adds a check to make sure that the cpu number is reasonable before using it to check whether the stack is on the softirq or hardirq stack. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
00ae36de |
|
12-Oct-2006 |
Anton Blanchard <anton@samba.org> |
[POWERPC] Better check in show_instructions Instead of just checking that an address is in the right range, use the provided __kernel_text_address() helper which covers both the kernel and module text sections. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
96b644bd |
|
02-Oct-2006 |
Serge E. Hallyn <serue@us.ibm.com> |
[PATCH] namespaces: utsname: use init_utsname when appropriate In some places, particularly drivers and __init code, the init utsns is the appropriate one to use. This patch replaces those with a the init_utsname helper. Changes: Removed several uses of init_utsname(). Hope I picked all the right ones in net/ipv4/ipconfig.c. These are now changed to utsname() (the per-process namespace utsname) in the previous patch (2/7) [akpm@osdl.org: CIFS fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
e9370ae1 |
|
07-Jun-2006 |
Paul Mackerras <paulus@samba.org> |
[PATCH] powerpc: Implement PR_[GS]ET_UNALIGN prctls for powerpc This gives the ability to control whether alignment exceptions get fixed up or reported to the process as a SIGBUS, using the existing PR_SET_UNALIGN and PR_GET_UNALIGN prctls. We do not implement the option of logging a message on alignment exceptions. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
fab5db97 |
|
07-Jun-2006 |
Paul Mackerras <paulus@samba.org> |
[PATCH] powerpc: Implement support for setting little-endian mode via prctl This adds the PowerPC part of the code to allow processes to change their endian mode via prctl. This also extends the alignment exception handler to be able to fix up alignment exceptions that occur in little-endian mode, both for "PowerPC" little-endian and true little-endian. We always enter signal handlers in big-endian mode -- the support for little-endian mode does not amount to the creation of a little-endian user/kernel ABI. If the signal handler returns, the endian mode is restored to what it was when the signal was delivered. We have two new kernel CPU feature bits, one for PPC little-endian and one for true little-endian. Most of the classic 32-bit processors support PPC little-endian, and this is reflected in the CPU feature table. There are two corresponding feature bits reported to userland in the AT_HWCAP aux vector entry. This is based on an earlier patch by Anton Blanchard. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0cb3463f |
|
31-Mar-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] unexport get_wchan The only user of get_wchan is the proc fs - and proc can't be built modular. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2f25194d |
|
26-Mar-2006 |
Anton Blanchard <anton@samba.org> |
[PATCH] powerpc: export validate_sp for oprofile calltrace Export validate_sp so we can use it in the oprofile calltrace code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
a7f31841 |
|
22-Mar-2006 |
Arnd Bergmann <abergman@de.ibm.com> |
[PATCH] powerpc: declare arch syscalls in <asm/syscalls.h> powerpc currently declares some of its own system calls in <asm/unistd.h>, but not all of them. That place also contains remainders of the now almost unused kernel syscall hack. - Add a new <asm/syscalls.h> with clean declarations - Include that file from every source that implements one of these - Get rid of old declarations in <asm/unistd.h> This patch is required as a base for implementing system calls from an SPU, but also makes sense as a general cleanup. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
af308377 |
|
22-Mar-2006 |
Stephen Rothwell <sfr@canb.auug.org.au> |
[PATCH] powerpc: fix various sparse warnings Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c6fd91f0 |
|
26-Mar-2006 |
bibo mao <bibo_mao@linux.intel.com> |
[PATCH] kretprobe instance recycled by parent process When kretprobe probes the schedule() function, if the probed process exits then schedule() will never return, so some kretprobe instances will never be recycled. In this patch the parent process will recycle retprobe instances of the probed function and there will be no memory leak of kretprobe instances. Signed-off-by: bibo mao <bibo.mao@intel.com> Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
c6622f63 |
|
23-Feb-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: Implement accurate task and CPU time accounting This implements accurate task and cpu time accounting for 64-bit powerpc kernels. Instead of accounting a whole jiffy of time to a task on a timer interrupt because that task happened to be running at the time, we now account time in units of timebase ticks according to the actual time spent by the task in user mode and kernel mode. We also count the time spent processing hardware and software interrupts accurately. This is conditional on CONFIG_VIRT_CPU_ACCOUNTING. If that is not set, we do tick-based approximate accounting as before. To get this accurate information, we read either the PURR (processor utilization of resources register) on POWER5 machines, or the timebase on other machines on * each entry to the kernel from usermode * each exit to usermode * transitions between process context, hard irq context and soft irq context in kernel mode * context switches. On POWER5 systems with shared-processor logical partitioning we also read both the PURR and the timebase at each timer interrupt and context switch in order to determine how much time has been taken by the hypervisor to run other partitions ("steal" time). Unfortunately, since we need values of the PURR on both threads at the same time to accurately calculate the steal time, and since we can only calculate steal time on a per-core basis, the apportioning of the steal time between idle time (time which we ceded to the hypervisor in the idle loop) and actual stolen time is somewhat approximate at the moment. This is all based quite heavily on what s390 does, and it uses the generic interfaces that were added by the s390 developers, i.e. account_system_time(), account_user_time(), etc. This patch doesn't add any new interfaces between the kernel and userspace, and doesn't change the units in which time is reported to userspace by things such as /proc/stat, /proc/<pid>/stat, getrusage(), times(), etc. Internally the various task and cpu times are stored in timebase units, but they are converted to USER_HZ units (1/100th of a second) when reported to userspace. Some precision is therefore lost but there should not be any accumulating error, since the internal accumulation is at full precision. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
cb2c9b27 |
|
12-Feb-2006 |
Anton Blanchard <anton@samba.org> |
[PATCH] powerpc: Fix runlatch performance issues The runlatch SPR can take a lot of time to write. My original runlatch code would set it on every exception entry even though most of the time this was not required. It would also continually set it in the idle loop, which is an issue on an SMT capable processor. Now we cache the runlatch value in a threadinfo bit, and only check for it in decrementer and hardware interrupt exceptions as well as the idle loop. Boot on POWER3, POWER5 and iseries, and compile tested on pmac32. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
2ef9481e |
|
23-Jan-2006 |
Jon Mason <jdmason@us.ibm.com> |
[PATCH] powerpc: trivial: modify comments to refer to new location of files This patch removes all self references and fixes references to files in the now defunct arch/ppc64 tree. I think this accomplises everything wanted, though there might be a few references I missed. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
0cec6fd1 |
|
12-Jan-2006 |
Al Viro <viro@ftp.linux.org.uk> |
[PATCH] powerpc: task_stack_page() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b5e2fc1c |
|
12-Jan-2006 |
Al Viro <viro@ftp.linux.org.uk> |
[PATCH] powerpc: task_thread_info() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
624cee31 |
|
12-Jan-2006 |
Paul Mackerras <paulus@samba.org> |
powerpc: make ARCH=ppc use arch/powerpc/kernel/process.c Commit 5388fb1025443ec223ba556b10efc4c5f83f8682 made signal_32.c use discard_lazy_cpu_state, which broke ARCH=ppc because that uses the common signal_32.c but has its own process.c. Make ARCH=ppc use the common process.c to fix this and to reduce the amount of duplicated code. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
5388fb10 |
|
11-Jan-2006 |
Paul Mackerras <paulus@samba.org> |
[PATCH] powerpc: Avoid potential FP corruption with preempt and UP Heikki Lindholm pointed out that there was a potential race with the lazy CPU state (FP, VR, EVR) stuff if preempt is enabled. The race is that in the process of restoring FP state on sigreturn, the task gets preempted by a user task that wants to use the FPU. It will take an FP unavailable exception, which will write the current FPU state to the thread_struct, overwriting the values which sigreturn has stored. Note that this can only happen on UP since we don't implement lazy CPU state on SMP. The fix is to flush the lazy CPU state before updating the thread_struct. To do this we re-use the flush_lazy_cpu_state() function from process.c. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
48abec07 |
|
29-Nov-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fix bug causing FP registers corruption on UP + preempt This fixes a bug noticed by Paolo Galtieri and fixed for ARCH=ppc in the previous commit (ppc: fix floating point register corruption). This fixes the arch/powerpc code by adding preempt_disable/enable, and also cleans it up a bit by pulling out the code that discards any lazily-switched CPU register state into a new function, rather than having that code repeated in three places. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
8bf1101b |
|
23-Nov-2005 |
Jim Keniston <jkenisto@us.ibm.com> |
[PATCH] kprobes: Fix return probes on sys_execve Fix a bug in kprobes that can cause an Oops or even a crash when a return probe is installed on one of the following functions: sys_execve, do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to remove the call to kprobe_flush_task() in flush_thread(). This fix has been tested on all architectures for which the return-probes feature has been implemented (i386, x86_64, ppc64, ia64). Please apply. BACKGROUND Up to now, we have called kprobe_flush_task() under two situations: when a task exits, and when it execs. Flushing kretprobe_instances on exit is correct because (a) do_exit() doesn't return, and (b) one or more return-probed functions may be active when a task calls do_exit(). Neither is the case for sys_execve() and its callees. Initially, the mistaken call to kprobe_flush_task() on exec was harmless because we put the "real" return address of each active probed function back in the stack, just to be safe, when we recycled its kretprobe_instance. When support for ppc64 and ia64 was added, this safety measure couldn't be employed, and was eventually dropped even for i386 and x86_64. sys_execve() and its callees were informally blacklisted for return probes until this fix was developed. Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
76032de8 |
|
06-Nov-2005 |
Michael Ellerman <michael@ellerman.id.au> |
[PATCH] powerpc: Make ppc_md.set_dabr non 64-bit specific Define ppc_md.set_dabr for both 32 + 64 bit. Cleanup the implementation for pSeries also, it was needlessly complex. Now we just do two firmware tests at setup time, and use one of two functions, rather than using one function and testing on every call. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
3c726f8d |
|
06-Nov-2005 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
[PATCH] ppc64: support 64k pages Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel base page size to 64K. The resulting kernel still boots on any hardware. On current machines with 4K pages support only, the kernel will maintain 16 "subpages" for each 64K page transparently. Note that while real 64K capable HW has been tested, the current patch will not enable it yet as such hardware is not released yet, and I'm still verifying with the firmware architects the proper to get the information from the newer hypervisors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
cab0af98 |
|
02-Nov-2005 |
Michael Ellerman <michael@ellerman.id.au> |
powerpc: Make set_dabr() a ppc_md function Move pSeries specific code in set_dabr() into a ppc_md function, this will allow us to keep plpar_wrappers.h private to platforms/pseries. Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
|
#
25c8a78b |
|
27-Oct-2005 |
David Gibson <david@gibson.dropbear.id.au> |
[PATCH] powerpc: Fix handling of fpscr on 64-bit The recent merge of fpu.S broken the handling of fpscr for ARCH=powerpc and CONFIG_PPC64=y. FP registers could be corrupted, leading to strange random application crashes. The confusion arises, because the thread_struct has (and requires) a 64-bit area to save the fpscr, because we use load/store double instructions to get it in to/out of the FPU. However, only the low 32-bits are actually used, so we want to treat it as a 32-bit quantity when manipulating its bits to avoid extra load/stores on 32-bit. This patch replaces the current definition with a structure of two 32-bit quantities (pad and val), to clarify things as much as is possible. The 'val' field is used when manipulating bits, the structure itself is used when obtaining the address for loading/unloading the value from the FPU. While we're at it, consolidate the 4 (!) almost identical versions of cvt_fd() and cvt_df() (arch/ppc/kernel/misc.S, arch/ppc64/kernel/misc.S, arch/powerpc/kernel/misc_32.S, arch/powerpc/kernel/misc_64.S) into a single version in fpu.S. The new version takes a pointer to thread_struct and applies the correct offset itself, rather than a pointer to the fpscr field itself, again to avoid confusion as to which is the correct field to use. Finally, this patch makes ARCH=ppc64 also use the consolidated fpu.S code, which it previously did not. Built for G5 (ARCH=ppc64 and ARCH=powerpc), 32-bit powermac (ARCH=ppc and ARCH=powerpc) and Walnut (ARCH=ppc, CONFIG_MATH_EMULATION=y). Booted on G5 (ARCH=powerpc) and things which previously fell over no longer do. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
90eac727 |
|
21-Oct-2005 |
Michael Ellerman <michael@ellerman.id.au> |
[PATCH] powerpc: Don't blow away load_addr in start_thread The patch to make process.c work for 32-bit and 64-bit (06d67d54741a5bfefa31945ef195dfa748c29025) broke some 64-bit binaries. We were blowing away load_addr in gpr[2], so we weren't properly relocating the entry point. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
d4bf9a78 |
|
12-Oct-2005 |
Stephen Rothwell <sfr@canb.auug.org.au> |
ppc64: merge binfmt_elf32.c and use start_thread for both 32 and 64 bit bineries. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
#
0f17d074 |
|
12-Oct-2005 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: make 64 bit binaries work Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
#
06d67d54 |
|
10-Oct-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: make process.c suitable for both 32-bit and 64-bit Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
c0c0d996 |
|
30-Sep-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Get merged kernel to compile and run on 32-bit SMP powermac. This updates the powermac SMP code to use the mpic driver instead of the openpic driver and fixes the SMP-dependent context switch code. We had a subtle bug where we were using interrupt numbers 256-259 for IPIs, but ppc32 had NR_IRQS = 256. Moved the IPIs down to use interrupt numbers 252-255 instead. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
20c8c210 |
|
28-Sep-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Fixes to get the merged kernel to boot on powermac. This merges ppc_ksyms.c, puts back the actual do_execve call in sys_execve, makes init_MMU call find_end_of_memory rather than ppc_md.find_end_of_memory (every platform has a device tree with a /memory node now, right?) and fixes some problems with the mpic initialization on newworld powermacs. Signed-off-by: Paul Mackerras <paulus@samba.org>
|
#
14cf11af |
|
26-Sep-2005 |
Paul Mackerras <paulus@samba.org> |
powerpc: Merge enough to start building in arch/powerpc. This creates the directory structure under arch/powerpc and a bunch of Kconfig files. It does a first-cut merge of arch/powerpc/mm, arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough to build a 32-bit powermac kernel with ARCH=powerpc. For now we are getting some unmerged files from arch/ppc/kernel and arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes to files in those directories and files outside arch/powerpc. The boot directory is still not merged. That's going to be interesting. Signed-off-by: Paul Mackerras <paulus@samba.org>
|