History log of /linux-master/arch/riscv/kernel/entry.S
Revision Date Author Comments
# 5014396a 04-Oct-2023 Clément Léger <cleger@rivosinc.com>

riscv: blacklist assembly symbols for kprobe

Adding kprobes on some assembly functions (mainly exception handling)
will result in crashes (either recursive trap or panic). To avoid such
errors, add ASM_NOKPROBE() macro which allow adding specific symbols
into the __kprobe_blacklist section and use to blacklist the following
symbols that showed to be problematic:
- handle_exception()
- ret_from_exception()
- handle_kernel_stack_overflow()

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231004131009.409193-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 2080ff94 14-Jan-2024 Andy Chiu <andy.chiu@sifive.com>

riscv: vector: allow kernel-mode Vector with preemption

Add kernel_vstate to keep track of kernel-mode Vector registers when
trap introduced context switch happens. Also, provide riscv_v_flags to
let context save/restore routine track context status. Context tracking
happens whenever the core starts its in-kernel Vector executions. An
active (dirty) kernel task's V contexts will be saved to memory whenever
a trap-introduced context switch happens. Or, when a softirq, which
happens to nest on top of it, uses Vector. Context retoring happens when
the execution transfer back to the original Kernel context where it
first enable preempt_v.

Also, provide a config CONFIG_RISCV_ISA_V_PREEMPTIVE to give users an
option to disable preemptible kernel-mode Vector at build time. Users
with constraint memory may want to disable this config as preemptible
kernel-mode Vector needs extra space for tracking of per thread's
kernel-mode V context. Or, users might as well want to disable it if all
kernel-mode Vector code is time sensitive and cannot tolerate context
switch overhead.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-11-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 4cc0d8a3 24-Oct-2023 Clément Léger <cleger@rivosinc.com>

riscv: kernel: Use correct SYM_DATA_*() macro for data

Some data were incorrectly annotated with SYM_FUNC_*() instead of
SYM_DATA_*() ones. Use the correct ones.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# b18f7296 24-Oct-2023 Clément Léger <cleger@rivosinc.com>

riscv: use ".L" local labels in assembly when applicable

For the sake of coherency, use local labels in assembly when
applicable. This also avoid kprobes being confused when applying a
kprobe since the size of function is computed by checking where the
next visible symbol is located. This might end up in computing some
function size to be way shorter than expected and thus failing to apply
kprobes to the specified offset.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 87615e95 21-Aug-2023 Nam Cao <namcaov@gmail.com>

riscv: put interrupt entries into .irqentry.text

The interrupt entries are expected to be in the .irqentry.text section.
For example, for kprobes to work properly, exception code cannot be
probed; this is ensured by blacklisting addresses in the .irqentry.text
section.

Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Signed-off-by: Nam Cao <namcaov@gmail.com>
Link: https://lore.kernel.org/r/20230821145708.21270-1-namcaov@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# c40fef85 27-Sep-2023 Sami Tolvanen <samitolvanen@google.com>

riscv: Use separate IRQ shadow call stacks

When both CONFIG_IRQ_STACKS and SCS are enabled, also use a separate
per-CPU shadow call stack.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-13-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# d1584d79 27-Sep-2023 Sami Tolvanen <samitolvanen@google.com>

riscv: Implement Shadow Call Stack

Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the
compiler injects instructions to all non-leaf C functions to
store the return address to the shadow stack and unconditionally
load it again before returning, which makes it harder to corrupt
the return address through a stack overflow, for example.

The active shadow call stack pointer is stored in the gp
register, which makes SCS incompatible with gp relaxation. Use
--no-relax-gp to ensure gp relaxation is disabled and disable
global pointer loading. Add SCS pointers to struct thread_info,
implement SCS initialization, and task switching

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# e609b4f4 27-Sep-2023 Sami Tolvanen <samitolvanen@google.com>

riscv: Move global pointer loading to a macro

In Clang 17, -fsanitize=shadow-call-stack uses the newly declared
platform register gp for storing shadow call stack pointers. As
this is obviously incompatible with gp relaxation, in preparation
for CONFIG_SHADOW_CALL_STACK support, move global pointer loading
to a single macro, which we can cleanly disable when SCS is used
instead.

Link: https://reviews.llvm.org/rGaa1d2693c256
Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a484e843e6eeb51f0cb7b8819e50da6d2444d769
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-11-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 82982fdd 27-Sep-2023 Sami Tolvanen <samitolvanen@google.com>

riscv: Deduplicate IRQ stack switching

With CONFIG_IRQ_STACKS, we switch to a separate per-CPU IRQ stack
before calling handle_riscv_irq or __do_softirq. We currently
have duplicate inline assembly snippets for stack switching in
both code paths. Now that we can access per-CPU variables in
assembly, implement call_on_irq_stack in assembly, and use that
instead of redundant inline assembly.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-10-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# be97d0db 27-Sep-2023 Deepak Gupta <debug@rivosinc.com>

riscv: VMAP_STACK overflow detection thread-safe

commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added
support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to
`shadow_stack` temporarily before switching finally to per-cpu
`overflow_stack`.

If two CPUs/harts are racing and end up in over flowing kernel stack, one
or both will end up corrupting each other state because `shadow_stack` is
not per-cpu. This patch optimizes per-cpu overflow stack switch by
directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`.

Following are the changes in this patch

- Defines an asm macro to obtain per-cpu symbols in destination
register.
- In entry.S, when overflow is detected, per-cpu overflow stack is
located using per-cpu asm macro. Computing per-cpu symbol requires
a temporary register. x31 is saved away into CSR_SCRATCH
(CSR_SCRATCH is anyways zero since we're in kernel).

Please see Links for additional relevant disccussion and alternative
solution.

Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT`
Kernel crash log below

Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT
Task stack: [0xff20000010a98000..0xff20000010a9c000]
Overflow stack: [0xff600001f7d98370..0xff600001f7d99370]
CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
Hardware name: riscv-virtio,qemu (DT)
epc : __memset+0x60/0xfc
ra : recursive_loop+0x48/0xc6 [lkdtm]
epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80
gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88
t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0
s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000
a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000
a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff
s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90
s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684
s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10
s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4
t5 : ffffffff815dbab8 t6 : ff20000010a9bb48
status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f
Kernel panic - not syncing: Kernel stack overflow
CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80006754>] dump_backtrace+0x30/0x38
[<ffffffff808de798>] show_stack+0x40/0x4c
[<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c
[<ffffffff808ea2d8>] dump_stack+0x18/0x20
[<ffffffff808dec06>] panic+0x126/0x2fe
[<ffffffff800065ea>] walk_stackframe+0x0/0xf0
[<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm]
SMP: stopping secondary CPUs
---[ end Kernel panic - not syncing: Kernel stack overflow ]---

Cc: Guo Ren <guoren@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t
Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Guo Ren <guoren@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 4681daca 23-Apr-2023 Fangrui Song <maskray@google.com>

riscv: replace deprecated scall with ecall

scall is a deprecated alias for ecall. ecall is used in several places,
so there is no assembler compatibility concern.

Signed-off-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20230423223210.126948-1-maskray@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 74abe5a3 05-Jun-2023 Guo Ren <guoren@kernel.org>

riscv: Disable Vector Instructions for kernel itself

Disable vector instructions execution for kernel mode at its entrances.
This helps find illegal uses of vector in the kernel space, which is
similar to the fpu.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Co-developed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-7-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 45b32b94 21-Feb-2023 Jisheng Zhang <jszhang@kernel.org>

riscv: entry: Consolidate general regs saving/restoring

Consolidate the saving/restoring GPs (except zero, ra, sp, gp,
tp and t0) into save_from_x6_to_x31/restore_from_x6_to_x31 macros.

No functional change intended.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-8-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# ab9164da 21-Feb-2023 Jisheng Zhang <jszhang@kernel.org>

riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork

The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many
architectures have done the same thing, it makes entry.S more clean.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# f0bddf50 21-Feb-2023 Guo Ren <guoren@kernel.org>

riscv: entry: Convert to generic entry

This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:

- More clear entry.S with handle_exception and ret_from_exception
- Get rid of complex custom signal implementation
- Move syscall procedure from assembly to C, which is much more
readable.
- Connect ret_from_fork & ret_from_kernel_thread to generic entry.
- Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
- Use the standard preemption code instead of custom

Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# b0f4c74e 11-Nov-2022 Andrew Bresticker <abrestic@rivosinc.com>

RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path

The return to userspace path in entry.S may enable interrupts without the
corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP
is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy
due to the use of RA to point back to ret_from_exception, so just move
the whole slow-path loop into C. It's more readable and it lets us use
local_irq_{enable,disable}(), avoiding the need for manual annotations
altogether.

[0]:
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())
WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0
Modules linked in:
CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53
Hardware name: riscv-virtio,qemu (DT)
epc : check_flags+0x10a/0x1e0
ra : check_flags+0x10a/0x1e0
<snip>
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff808edb90>] lock_is_held_type+0x78/0x14e
[<ffffffff8003dae2>] __might_resched+0x26/0x22c
[<ffffffff8003dd24>] __might_sleep+0x3c/0x66
[<ffffffff80022c60>] get_signal+0x9e/0xa70
[<ffffffff800054a2>] do_notify_resume+0x6e/0x422
[<ffffffff80003c68>] ret_from_exception+0x0/0x10
irq event stamp: 44512
hardirqs last enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62
hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14
softirqs last enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e
softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104
---[ end trace 0000000000000000 ]---
possible reason: unannotated irqs-on.

Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 7ecdadf7 08-Nov-2022 Guo Ren <guoren@kernel.org>

riscv: stacktrace: Make walk_stackframe cross pt_regs frame

The current walk_stackframe with FRAME_POINTER would stop unwinding at
ret_from_exception:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
Call Trace:
[<ffffffe0002038c8>] walk_stackframe+0x0/0xee
[<ffffffe000aecf48>] show_stack+0x32/0x4a
[<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
[<ffffffe000af1648>] dump_stack+0x14/0x1c
[<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
[<ffffffe000239aec>] __might_sleep+0x10/0x18
[<ffffffe000afe3fe>] down_read+0x22/0xa4
[<ffffffe000207588>] do_page_fault+0xb0/0x2fe
[<ffffffe000201b80>] ret_from_exception+0x0/0xc

The optimization would help walk_stackframe cross the pt_regs frame and
get more backtrace of debug info:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
Call Trace:
[<ffffffe0002038c8>] walk_stackframe+0x0/0xee
[<ffffffe000aecf48>] show_stack+0x32/0x4a
[<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
[<ffffffe000af1648>] dump_stack+0x14/0x1c
[<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
[<ffffffe000239aec>] __might_sleep+0x10/0x18
[<ffffffe000afe3fe>] down_read+0x22/0xa4
[<ffffffe000207588>] do_page_fault+0xb0/0x2fe
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
[<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72
[<ffffffe000201b80>] ret_from_exception+0x0/0xc
[<ffffffe00033f44a>] vma_link+0x54/0x160
[<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0
[<ffffffe000342256>] do_mmap+0x2d8/0x3ac
[<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8
[<ffffffe00032638a>] vm_mmap+0x2a/0x36
[<ffffffe0003cfdde>] elf_map+0x72/0x84
[<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8
[<ffffffe000376240>] bprm_execve+0x246/0x53a
[<ffffffe00037786c>] kernel_execve+0xe8/0x124
[<ffffffe000aecdf2>] run_init_process+0xfa/0x10c
[<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c
[<ffffffe000afa920>] kernel_init+0xb4/0xf8
[<ffffffe000201b80>] ret_from_exception+0x0/0xc

Here is the error injection test code for the above output:
drivers/irqchip/irq-riscv-intc.c:
static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
{
unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
+ u32 tmp; __get_user(tmp, (u32 *)0);

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org
[Palmer: use SYM_CODE_*]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 7e186433 30-Oct-2022 Jisheng Zhang <jszhang@kernel.org>

riscv: fix race when vmap stack overflow

Currently, when detecting vmap stack overflow, riscv firstly switches
to the so called shadow stack, then use this shadow stack to call the
get_overflow_stack() to get the overflow stack. However, there's
a race here if two or more harts use the same shadow stack at the same
time.

To solve this race, we introduce spin_shadow_stack atomic var, which
will be swap between its own address and 0 in atomic way, when the
var is set, it means the shadow_stack is being used; when the var
is cleared, it means the shadow_stack isn't being used.

Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
[Palmer: Add AQ to the swap, and also some comments.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 24a9c541 08-Jun-2022 Frederic Weisbecker <frederic@kernel.org>

context_tracking: Split user tracking Kconfig

Context tracking is going to be used not only to track user transitions
but also idle/IRQs/NMIs. The user tracking part will then become a
separate feature. Prepare Kconfig for that.

[ frederic: Apply Max Filippov feedback. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>


# f163f030 08-Jun-2022 Frederic Weisbecker <frederic@kernel.org>

context_tracking: Rename context_tracking_user_enter/exit() to user_enter/exit_callable()

context_tracking_user_enter() and context_tracking_user_exit() are
ASM callable versions of user_enter() and user_exit() for architectures
that didn't manage to check the context tracking static key from ASM.
Change those function names to better reflect their purpose.

[ frederic: Apply Max Filippov feedback. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>


# dfb0bfa7 05-Apr-2022 Guo Ren <guoren@kernel.org>

riscv: compat: syscall: Add entry.S implementation

Implement the entry of compat_sys_call_table[] in asm. Ref to
riscv-privileged spec 4.1.1 Supervisor Status Register (sstatus):

BIT[32:33] = UXL[1:0]:
- 1:32
- 2:64
- 3:128

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220405071314.3225832-13-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 93917ad5 08-Mar-2022 Vincent Chen <vincent.chen@sifive.com>

RISC-V: Add support for restartable sequence

Add calls to rseq_signal_deliver() and rseq_syscall() to introduce RSEQ
support.

1. Call the rseq_signal_deliver() function to fixup on the pre-signal
frame when a signal is delivered on top of a restartable sequence
critical section.

2. Check that system calls are not invoked from within rseq critical
sections by invoking rseq_signal() from ret_from_syscall(). With
CONFIG_DEBUG_RSEQ, such behavior results in termination of the
process with SIGSEGV.

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 22e2100b 13-Feb-2022 Changbin Du <changbin.du@intel.com>

riscv: fix oops caused by irqsoff latency tracer

The trace_hardirqs_{on,off}() require the caller to setup frame pointer
properly. This because these two functions use macro 'CALLER_ADDR1' (aka.
__builtin_return_address(1)) to acquire caller info. If the $fp is used
for other purpose, the code generated this macro (as below) could trigger
memory access fault.

0xffffffff8011510e <+80>: ld a1,-16(s0)
0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here

The oops message during booting if compiled with 'irqoff' tracer enabled:
[ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8
[ 0.041925][ T0] Oops [#1]
[ 0.042063][ T0] Modules linked in:
[ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29
[ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT)
[ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2
[ 0.044601][ T0] ra : restore_all+0x12/0x6e
[ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0
[ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020
[ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0
[ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100
[ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45
[ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50
[ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8
[ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000
[ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000
[ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368
[ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d
[ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e

This because the $fp(aka. $s0) register is not used as frame pointer in the
assembly entry code.

resume_kernel:
REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
bnez s0, restore_all
REG_L s0, TASK_TI_FLAGS(tp)
andi s0, s0, _TIF_NEED_RESCHED
beqz s0, restore_all
call preempt_schedule_irq
j restore_all

To fix above issue, here we add one extra level wrapper for function
trace_hardirqs_{on,off}() so they can be safely called by low level entry
code.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Fixes: 3c4697982982 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>


# 7ecbc648 20-Oct-2021 Mark Rutland <mark.rutland@arm.com>

irq: riscv: perform irqentry in entry code

In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/riscv
perform all the irqentry accounting in its entry code. As arch/riscv
uses GENERIC_IRQ_MULTI_HANDLER, we can use generic_handle_arch_irq() to
do so.

Since generic_handle_arch_irq() handles the irq entry and setting the
irq regs, and happens before the irqchip code calls handle_IPI(), we can
remove the redundant irq entry and irq regs manipulation from
handle_IPI().

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>


# 8aa0fb0f 14-Sep-2021 Ard Biesheuvel <ardb@kernel.org>

riscv: rely on core code to keep thread_info::cpu updated

Now that the core code switched back to using thread_info::cpu to keep
a task's CPU number, we no longer need to keep it in sync explicitly. So
just drop the code that does this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>


# 31da94c2 20-Jun-2021 Tong Tiangen <tongtiangen@huawei.com>

riscv: add VMAP_STACK overflow detection

This patch adds stack overflow detection to riscv, usable when
CONFIG_VMAP_STACK=y.

Overflow is detected in kernel exception entry(kernel/entry.S), if the
kernel stack is overflow and been detected, the overflow handler is
invoked on a per-cpu overflow stack. This approach preserves GPRs and
the original exception information.

The overflow detect is performed before any attempt is made to access
the stack and the principle of stack overflow detection: kernel stacks
are aligned to double their size, enabling overflow to be detected with
a single bit test. For example, a 16K stack is aligned to 32K, ensuring
that bit 14 of the SP must be zero. On an overflow (or underflow), this
bit is flipped. Thus, overflow (of less than the size of the stack) can
be detected by testing whether this bit is set.

This gives us a useful error message on stack overflow, as can be
trigger with the LKDTM overflow test:

[ 388.053267] lkdtm: Performing direct entry EXHAUST_STACK
[ 388.053663] lkdtm: Calling function with 1024 frame size to depth 32 ...
[ 388.054016] lkdtm: loop 32/32 ...
[ 388.054186] lkdtm: loop 31/32 ...
[ 388.054491] lkdtm: loop 30/32 ...
[ 388.054672] lkdtm: loop 29/32 ...
[ 388.054859] lkdtm: loop 28/32 ...
[ 388.055010] lkdtm: loop 27/32 ...
[ 388.055163] lkdtm: loop 26/32 ...
[ 388.055309] lkdtm: loop 25/32 ...
[ 388.055481] lkdtm: loop 24/32 ...
[ 388.055653] lkdtm: loop 23/32 ...
[ 388.055837] lkdtm: loop 22/32 ...
[ 388.056015] lkdtm: loop 21/32 ...
[ 388.056188] lkdtm: loop 20/32 ...
[ 388.058145] Insufficient stack space to handle exception!
[ 388.058153] Task stack: [0xffffffd014260000..0xffffffd014264000]
[ 388.058160] Overflow stack: [0xffffffe1f8d2c220..0xffffffe1f8d2d220]
[ 388.058168] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90
[ 388.058175] Hardware name: riscv-virtio,qemu (DT)
[ 388.058187] epc : number+0x32/0x2c0
[ 388.058247] ra : vsnprintf+0x2ae/0x3f0
[ 388.058255] epc : ffffffe0002d38f6 ra : ffffffe0002d814e sp : ffffffd01425ffc0
[ 388.058263] gp : ffffffe0012e4010 tp : ffffffe08014da00 t0 : ffffffd0142606e8
[ 388.058271] t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffd014260070
[ 388.058303] s1 : ffffffd014260158 a0 : ffffffd01426015e a1 : ffffffd014260158
[ 388.058311] a2 : 0000000000000013 a3 : ffff0a01ffffff10 a4 : ffffffe000c398e0
[ 388.058319] a5 : 511b02ec65f3e300 a6 : 0000000000a1749a a7 : 0000000000000000
[ 388.058327] s2 : ffffffff000000ff s3 : 00000000ffff0a01 s4 : ffffffe0012e50a8
[ 388.058335] s5 : 0000000000ffff0a s6 : ffffffe0012e50a8 s7 : ffffffe000da1cc0
[ 388.058343] s8 : ffffffffffffffff s9 : ffffffd0142602b0 s10: ffffffd0142602a8
[ 388.058351] s11: ffffffd01426015e t3 : 00000000000f0000 t4 : ffffffffffffffff
[ 388.058359] t5 : 000000000000002f t6 : ffffffd014260158
[ 388.058366] status: 0000000000000100 badaddr: ffffffd01425fff8 cause: 000000000000000f
[ 388.058374] Kernel panic - not syncing: Kernel stack overflow
[ 388.058381] CPU: 0 PID: 89 Comm: bash Not tainted 5.12.0-rc8-dirty #90
[ 388.058387] Hardware name: riscv-virtio,qemu (DT)
[ 388.058393] Call Trace:
[ 388.058400] [<ffffffe000004944>] walk_stackframe+0x0/0xce
[ 388.058406] [<ffffffe0006f0b28>] dump_backtrace+0x38/0x46
[ 388.058412] [<ffffffe0006f0b46>] show_stack+0x10/0x18
[ 388.058418] [<ffffffe0006f3690>] dump_stack+0x74/0x8e
[ 388.058424] [<ffffffe0006f0d52>] panic+0xfc/0x2b2
[ 388.058430] [<ffffffe0006f0acc>] print_trace_address+0x0/0x24
[ 388.058436] [<ffffffe0002d814e>] vsnprintf+0x2ae/0x3f0
[ 388.058956] SMP: stopping secondary CPUs

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 800149a7 22-Mar-2021 Vincent Chen <vincent.chen@sifive.com>

riscv: sifive: Apply errata "cip-453" patch

Add sign extension to the $badaddr before addressing the instruction page
fault and instruction access fault to workaround the issue "cip-453".

To avoid affecting the existing code sequence, this patch will creates two
trampolines to add sign extension to the $badaddr. By the "alternative"
mechanism, these two trampolines will replace the original exception
handler of instruction page fault and instruction access fault in the
excp_vect_table. In this case, only the specific SiFive CPU core jumps to
the do_page_fault and do_trap_insn_fault through these two trampolines.
Other CPUs are not affected.

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 7ae11635 29-Mar-2021 Jisheng Zhang <jszhang@kernel.org>

riscv: keep interrupts disabled for BREAKPOINT exception

Current riscv's kprobe handlers are run with both preemption and
interrupt enabled, this violates kprobe requirements. Fix this issue
by keeping interrupts disabled for BREAKPOINT exception.

Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
Cc: stable@vger.kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
[Palmer: add a comment]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# ac8d0b90 17-Mar-2021 Zihao Yu <yuzihao@ict.ac.cn>

riscv,entry: fix misaligned base for excp_vect_table

In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.

Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 7cd1af10 18-Dec-2020 Atish Patra <atish.patra@wdc.com>

riscv: Trace irq on only interrupt is enabled

We should call irq trace only if interrupt is going to be enabled during
excecption handling. Otherwise, it results in following warning during
boot with lock debugging enabled.

[ 0.000000] ------------[ cut here ]------------
[ 0.000000] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4085 lockdep_hardirqs_on_prepare+0x22a/0x22e
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-00022-ge20097fb37e2-dirty #548
[ 0.000000] epc: c005d5d4 ra : c005d5d4 sp : c1c01e80
[ 0.000000] gp : c1d456e0 tp : c1c0a980 t0 : 00000000
[ 0.000000] t1 : ffffffff t2 : 00000000 s0 : c1c01ea0
[ 0.000000] s1 : c100f360 a0 : 0000002d a1 : c00666ee
[ 0.000000] a2 : 00000000 a3 : 00000000 a4 : 00000000
[ 0.000000] a5 : 00000000 a6 : c1c6b390 a7 : 3ffff00e
[ 0.000000] s2 : c2384fe8 s3 : 00000000 s4 : 00000001
[ 0.000000] s5 : c1c0a980 s6 : c1d48000 s7 : c1613b4c
[ 0.000000] s8 : 00000fff s9 : 80000200 s10: c1613b40
[ 0.000000] s11: 00000000 t3 : 00000000 t4 : 00000000
[ 0.000000] t5 : 00000001 t6 : 00000000

Fixes: 3c4697982982 ("riscv:Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 643437b9 13-Dec-2020 Damien Le Moal <damien.lemoal@wdc.com>

riscv: Enable interrupts during syscalls with M-Mode

When running is M-Mode (no MMU config), MPIE does not get set. This
results in all syscalls being executed with interrupts disabled as
handle_exception never sets SR_IE as it always sees SR_PIE being
cleared. Fix this by always force enabling interrupts in
handle_syscall when CONFIG_RISCV_M_MODE is enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# cf7b2ae4 21-Dec-2020 Andreas Schwab <schwab@suse.de>

riscv: return -ENOSYS for syscall -1

Properly return -ENOSYS for syscall -1 instead of leaving the return value
uninitialized. This fixes the strace teststuite.

Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 3e7b669c 12-Jul-2020 Guo Ren <guoren@linux.alibaba.com>

riscv: Cleanup unnecessary define in asm-offset.c

- TASK_THREAD_SP is duplicated define
- TASK_STACK is no use at all
- Don't worry about thread_info's offset in task_struct, have
a look on comment in include/linux/sched.h:

struct task_struct {
/*
* For reasons of header soup (see current_thread_info()), this
* must be the first element of task_struct.
*/
struct thread_info thread_info;

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# ed48b297 24-Jun-2020 Greentime Hu <greentime.hu@sifive.com>

riscv: Enable context tracking

This patch implements and enables context tracking for riscv (which is a
prerequisite for CONFIG_NO_HZ_FULL support)

It adds checking for previous state in the entry that all excepttions and
interrupts goes to and calls context_tracking_user_exit() if it comes from
user space. It also calls context_tracking_user_enter() if it will return
to user space before restore_all.

This patch is tested with the dynticks-testing testcase in
qemu-system-riscv64 virt machine and Unleashed board.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git

We can see the log here. The tick got mostly stopped during the execution
of the user loop.

_-----=> irqs-off
/ _----=> need-resched
| / _---=> hardirq/softirq
|| / _--=> preempt-depth
||| / delay
TASK-PID CPU# |||| TIMESTAMP FUNCTION
| | | |||| | |
<idle>-0 [001] d..2 604.183512: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=taskset next_pid=273 next_prio=120
user_loop-273 [001] d.h1 604.184788: hrtimer_expire_entry: hrtimer=000000002eda5fab function=tick_sched_timer now=604176096300
user_loop-273 [001] d.s2 604.184897: workqueue_queue_work: work struct=00000000383402c2 function=vmstat_update workqueue=00000000f36d35d4 req_cpu=1 cpu=1
user_loop-273 [001] dns2 604.185039: tick_stop: success=0 dependency=SCHED
user_loop-273 [001] dn.1 604.185103: tick_stop: success=0 dependency=SCHED
user_loop-273 [001] d..2 604.185154: sched_switch: prev_comm=taskset prev_pid=273 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=46 next_prio=120
<...>-46 [001] .... 604.185194: workqueue_execute_start: work struct 00000000383402c2: function vmstat_update
<...>-46 [001] d..2 604.185266: sched_switch: prev_comm=kworker/1:1 prev_pid=46 prev_prio=120 prev_state=I ==> next_comm=taskset next_pid=273 next_prio=120
user_loop-273 [001] d.h1 604.188812: hrtimer_expire_entry: hrtimer=000000002eda5fab function=tick_sched_timer now=604180133400
user_loop-273 [001] d..1 604.189050: tick_stop: success=1 dependency=NONE
user_loop-273 [001] d..2 614.251386: sched_switch: prev_comm=user_loop prev_pid=273 prev_prio=120 prev_state=X ==> next_comm=swapper/1 next_pid=0 next_prio=120
<idle>-0 [001] d..2 614.315391: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=taskset next_pid=276 next_prio=120

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 3c469798 27-Jun-2020 Guo Ren <guoren@linux.alibaba.com>

riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT

Lockdep is needed by proving the spinlocks and rwlocks. To suupport
it, we need fixup TRACE_IRQFLAGS_SUPPORT in kernel/entry.S. This
patch follow Documentation/irqflags-tracing.txt.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 24dc1700 01-Jun-2020 Anup Patel <anup.patel@wdc.com>

RISC-V: Remove do_IRQ() function

The only thing do_IRQ() does is call handle_arch_irq function
pointer. We can very well call handle_arch_irq function pointer
directly from assembly and remove do_IRQ() function hence this
patch.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# af33d243 08-Feb-2020 Tycho Andersen <tycho@tycho.pizza>

riscv: fix seccomp reject syscall code path

If secure_computing() rejected a system call, we were previously setting
the system call number to -1, to indicate to later code that the syscall
failed. However, if something (e.g. a user notification) was sleeping, and
received a signal, we may set a0 to -ERESTARTSYS and re-try the system call
again.

In this case, seccomp "denies" the syscall (because of the signal), and we
would set a7 to -1, thus losing the value of the system call we want to
restart.

Instead, let's return -1 from do_syscall_trace_enter() to indicate that the
syscall was rejected, so we don't clobber the value in case of -ERESTARTSYS
or whatever.

This commit fixes the user_notification_signal seccomp selftest on riscv to
no longer hang. That test expects the system call to be re-issued after the
signal, and it wasn't due to the above bug. Now that it is, everything
works normally.

Note that in the ptrace (tracer) case, the tracer can set the register
values to whatever they want, so we still need to keep the code that
handles out-of-bounds syscalls. However, we can drop the comment.

We can also drop syscall_set_nr(), since it is no longer used anywhere, and
the code that re-loads the value in a7 because of it.

Reported in: https://lore.kernel.org/bpf/CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com/

Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# fdff9911 27-Feb-2020 Palmer Dabbelt <palmerdabbelt@google.com>

RISC-V: Inline the assembly register save/restore macros

These are only used once, and when reading the code I've always found them to
be more of a headache than a benefit. While they were never worth removing
before, LLVM's integrated assembler doesn't support LOCAL so rather that trying
to figure out how to refactor the macros it seems saner to just inline them.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>


# 556f47ac 18-Dec-2019 David Abdurachmanov <david.abdurachmanov@gmail.com>

riscv: reject invalid syscalls below -1

Running "stress-ng --enosys 4 -t 20 -v" showed a large number of kernel oops
with "Unable to handle kernel paging request at virtual address" message. This
happens when enosys stressor starts testing random non-valid syscalls.

I forgot to redirect any syscall below -1 to sys_ni_syscall.

With the patch kernel oops messages are gone while running stress-ng enosys
stressor.

Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 29ff6492 15-Oct-2019 Thomas Gleixner <tglx@linutronix.de>

sched/rt, riscv: Use CONFIG_PREEMPTION

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20191015191821.11479-17-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6bd33e1e 28-Oct-2019 Christoph Hellwig <hch@lst.de>

riscv: add nommu support

The kernel runs in M-mode without using page tables, and thus can't run
bare metal without help from additional firmware.

Most of the patch is just stubbing out code not needed without page
tables, but there is an interesting detail in the signals implementation:

- The normal RISC-V syscall ABI only implements rt_sigreturn as VDSO
entry point, but the ELF VDSO is not supported for nommu Linux.
We instead copy the code to call the syscall onto the stack.

In addition to enabling the nommu code a new defconfig for a small
kernel image that can run in nommu mode on qemu is also provided, to run
a kernel in qemu you can use the following command line:

qemu-system-riscv64 -smp 2 -m 64 -machine virt -nographic \
-kernel arch/riscv/boot/loader \
-drive file=rootfs.ext2,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0

Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
[paul.walmsley@sifive.com: updated to apply; add CONFIG_MMU guards
around PCI_IOBASE definition to fix build issues; fixed checkpatch
issues; move the PCI_IO_* and VMEMMAP address space macros along
with the others; resolve sparse warning]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# a4c3733d 28-Oct-2019 Christoph Hellwig <hch@lst.de>

riscv: abstract out CSR names for supervisor vs machine mode

Many of the privileged CSRs exist in a supervisor and machine version
that are used very similarly. Provide versions of the CSR names and
fields that map to either the S-mode or M-mode variant depending on
a new CONFIG_RISCV_M_MODE kconfig symbol.

Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>
and Paul Walmsley <paul.walmsley@sifive.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de> # for drivers/clocksource, drivers/irqchip
[paul.walmsley@sifive.com: updated to apply]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 5340627e 04-Oct-2019 David Abdurachmanov <david.abdurachmanov@sifive.com>

riscv: add support for SECCOMP and SECCOMP_FILTER

This patch was extensively tested on Fedora/RISCV (applied by default on
top of 5.2-rc7 kernel for <2 months). The patch was also tested with 5.3-rc
on QEMU and SiFive Unleashed board.

libseccomp (userspace) was rebased:
https://github.com/seccomp/libseccomp/pull/134

Fully passes libseccomp regression testing (simulation and live).

There is one failing kernel selftest: global.user_notification_signal

v1 -> v2:
- return immediately if secure_computing(NULL) returns -1
- fixed whitespace issues
- add missing seccomp.h
- remove patch #2 (solved now)
- add riscv to seccomp kernel selftest

Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com>
Cc: keescook@chromium.org
Cc: me@carlosedp.com
Tested-by: Carlos de Paula <me@carlosedp.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/linux-riscv/CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com/
Link: https://lore.kernel.org/linux-riscv/CAJr-aD=UnCN9E_mdVJ2H5nt=6juRSWikZnA5HxDLQxXLbsRz-w@mail.gmail.com/
[paul.walmsley@sifive.com: cleaned up Cc: lines; fixed spelling and
checkpatch issues; updated to apply]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# cd9e72b8 23-Sep-2019 Valentin Schneider <valentin.schneider@arm.com>

RISC-V: entry: Remove unneeded need_resched() loop

Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.

Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 18856604 24-Sep-2019 Palmer Dabbelt <palmer@sifive.com>

RISC-V: Clear load reservations while restoring hart contexts

This is almost entirely a comment. The bug is unlikely to manifest on
existing hardware because there is a timeout on load reservations, but
manifests on QEMU because there is no timeout.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# c82dd6d0 16-Sep-2019 Vincent Chen <vincent.chen@sifive.com>

riscv: Avoid interrupts being erroneously enabled in handle_exception()

When the handle_exception function addresses an exception, the interrupts
will be unconditionally enabled after finishing the context save. However,
It may erroneously enable the interrupts if the interrupts are disabled
before entering the handle_exception.

For example, one of the WARN_ON() condition is satisfied in the scheduling
where the interrupt is disabled and rq.lock is locked. The WARN_ON will
trigger a break exception and the handle_exception function will enable the
interrupts before entering do_trap_break function. During the procedure, if
a timer interrupt is pending, it will be taken when interrupts are enabled.
In this case, it may cause a deadlock problem if the rq.lock is locked
again in the timer ISR.

Hence, the handle_exception() can only enable interrupts when the state of
sstatus.SPIE is 1.

This patch is tested on HiFive Unleashed board.

Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
[paul.walmsley@sifive.com: updated to apply]
Fixes: bcae803a21317 ("RISC-V: Enable IRQ during exception handling")
Cc: David Abdurachmanov <david.abdurachmanov@sifive.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 4f3f9008 07-Aug-2019 Bin Meng <bmeng.cn@gmail.com>

riscv: Using CSR numbers to access CSRs

Since commit a3182c91ef4e ("RISC-V: Access CSRs using CSR numbers"),
we should prefer accessing CSRs using their CSR numbers, but there
are several leftovers like sstatus / sptbr we missed.

Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>


# 50acfb2b 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 286

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 this program is distributed
in the hope that it will be useful but without any warranty without
even the implied warranty of merchantability or fitness for a
particular purpose see the gnu general public license for more
details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 97 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.025053186@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# a3182c91 25-Apr-2019 Anup Patel <Anup.Patel@wdc.com>

RISC-V: Access CSRs using CSR numbers

We should prefer accessing CSRs using their CSR numbers because:
1. It compiles fine with older toolchains.
2. We can use latest CSR names in #define macro names of CSR numbers
as-per RISC-V spec.
3. We can access newly added CSRs even if toolchain does not recognize
newly addes CSRs by name.

Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# 99fd6e87 02-Jan-2019 Vincent Chen <vincentc@andestech.com>

RISC-V: Add _TIF_NEED_RESCHED check for kernel thread when CONFIG_PREEMPT=y

The cond_resched() can be used to yield the CPU resource if
CONFIG_PREEMPT is not defined. Otherwise, cond_resched() is a dummy
function. In order to avoid kernel thread occupying entire CPU,
when CONFIG_PREEMPT=y, the kernel thread needs to follow the
rescheduling mechanism like a user thread.

Signed-off-by: Vincent Chen <vincentc@andestech.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# efe75c49 29-Oct-2018 David Abdurachmanov <david.abdurachmanov@gmail.com>

riscv: add audit support

On RISC-V (riscv) audit is supported through generic lib/audit.c.
The patch adds required arch specific definitions.

Signed-off-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# 1ed4237a 02-Oct-2018 Anup Patel <anup@brainfault.org>

RISC-V: No need to pass scause as arg to do_IRQ()

The scause is already part of pt_regs so no need to pass
scause as separate arg to do_IRQ().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# e68ad867 08-Oct-2018 Alan Kao <alankao@andestech.com>

Extract FPU context operations from entry.S

We move __fstate_save and __fstate_restore to a new source
file, fpu.S.

Signed-off-by: Alan Kao <alankao@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Vincent Chen <vincentc@andestech.com>
Cc: Zong Li <zong@andestech.com>
Cc: Nick Hu <nickhu@andestech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# 6ea0f26a 04-Aug-2018 Christoph Hellwig <hch@lst.de>

RISC-V: implement low-level interrupt handling

Add support for a routine that dispatches exceptions with the interrupt
flags set to either the IPI or irqdomain code (and the clock source in the
future).

Loosely based on the irq-riscv-int.c irqchip driver from the RISC-V tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# cc6c9848 07-Mar-2018 Palmer Dabbelt <palmer@sifive.com>

RISC-V: Move to the new GENERIC_IRQ_MULTI_HANDLER handler

The existing mechanism for handling IRQs on RISC-V is pretty ugly: the irq
entry code selects the handler via Kconfig dependencies.

Use the new generic IRQ handling infastructure, which allows boot time
registration of the low level entry handler.

This does add an additional load to the interrupt latency, but there's a
lot of tuning left to be done there on RISC-V so it's OK for now.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: jonas@southpole.se
Cc: catalin.marinas@arm.com
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux@armlinux.org.uk
Cc: stefan.kristiansson@saunalahti.fi
Cc: openrisc@lists.librecores.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/20180307235731.22627-3-palmer@sifive.com


# bcae803a 30-Jan-2018 zongbox@gmail.com <zongbox@gmail.com>

RISC-V: Enable IRQ during exception handling

Interrupt is allowed during exception handling.
There are warning messages if the kernel enables the configuration
'CONFIG_DEBUG_ATOMIC_SLEEP=y'.

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:23
in_atomic(): 0, irqs_disabled(): 1, pid: 43, name: ash
CPU: 0 PID: 43 Comm: ash Tainted: G W 4.15.0-rc8-00089-g89ffdae-dirty #17
Call Trace:
[<000000009abb1587>] walk_stackframe+0x0/0x7a
[<00000000d4f3d088>] ___might_sleep+0x102/0x11a
[<00000000b1fd792a>] down_read+0x18/0x28
[<000000000289ec01>] do_page_fault+0x86/0x2f6
[<00000000012441f6>] _do_fork+0x1b4/0x1e0
[<00000000f46c3e3b>] ret_from_syscall+0xa/0xe

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zong Li <zong@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# fe9b842f 04-Jan-2018 Christoph Hellwig <hch@lst.de>

riscv: disable SUM in the exception handler

The SUM bit is enabled at the beginning of the copy_{to,from}_user and
{get,put}_user routines, and cleared before they return. But these user
copy helper can be interrupted by exceptions, in which case the SUM bit
will remain set, which leads to elevated privileges for the code running
in exception context, as that can now access userspace address space
unconditionally. This frequently happens when the user copy routines
access freshly allocated user memory that hasn't been faulted in, and a
pagefault needs to be taken before the user copy routines can continue.

Fix this by unconditionally clearing SUM when the exception handler is
called - the restore code will automatically restore it based on the
saved value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# 1125203c 04-Jan-2018 Christoph Hellwig <hch@lst.de>

riscv: rename SR_* constants to match the spec

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>


# 7db91e57 10-Jul-2017 Palmer Dabbelt <palmer@dabbelt.com>

RISC-V: Task implementation

This patch contains the implementation of tasks on RISC-V, most of which
is involved in task switching.

Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>