History log of /linux-master/kernel/panic.c
Revision Date Author Comments
# 2852ca7f 01-Jul-2022 David Gow <davidgow@google.com>

panic: Taint kernel if tests are run

Most in-kernel tests (such as KUnit tests) are not supposed to run on
production systems: they may do deliberately illegal things to trigger
errors, and have security implications (for example, KUnit assertions
will often deliberately leak kernel addresses).

Add a new taint type, TAINT_TEST to signal that a test has been run.
This will be printed as 'N' (originally for kuNit, as every other
sensible letter was taken.)

This should discourage people from running these tests on production
systems, and to make it easier to tell if tests have been run
accidentally (by loading the wrong configuration, etc.)

Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>


# 07a22b61 23-Jun-2022 Petr Mladek <pmladek@suse.com>

Revert "printk: add functions to prefer direct printing"

This reverts commit 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-7-pmladek@suse.com


# 20fb0c82 23-Jun-2022 Petr Mladek <pmladek@suse.com>

Revert "printk: Wait for the global console lock when the system is going down"

This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220623145157.21938-2-pmladek@suse.com


# b87f0230 15-Jun-2022 Petr Mladek <pmladek@suse.com>

printk: Wait for the global console lock when the system is going down

There are reports that the console kthreads block the global console
lock when the system is going down, for example, reboot, panic.

First part of the solution was to block kthreads in these problematic
system states so they stopped handling newly added messages.

Second part of the solution is to wait when for the kthreads when
they are actively printing. It solves the problem when a message
was printed before the system entered the problematic state and
the kthreads managed to step in.

A busy waiting has to be used because panic() can be called in any
context and in an unknown state of the scheduler.

There must be a timeout because the kthread might get stuck or sleeping
and never release the lock. The timeout 10s is an arbitrary value
inspired by the softlockup timeout.

Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1
Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20220615162805.27962-3-pmladek@suse.com


# 9df91869 18-Feb-2022 tangmeng <tangmeng@uniontech.com>

kernel/panic: move panic sysctls to its own file

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

All filesystem syctls now get reviewed by fs folks. This commit
follows the commit of fs, move the oops_all_cpu_backtrace sysctl to
its own file, kernel/panic.c.

Signed-off-by: tangmeng <tangmeng@uniontech.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>


# 595b893e 03-May-2022 Kees Cook <keescook@chromium.org>

randstruct: Reorganize Kconfigs and attribute macros

In preparation for Clang supporting randstruct, reorganize the Kconfigs,
move the attribute macros, and generalize the feature to be named
CONFIG_RANDSTRUCT for on/off, CONFIG_RANDSTRUCT_FULL for the full
randomization mode, and CONFIG_RANDSTRUCT_PERFORMANCE for the cache-line
sized mode.

Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220503205503.3054173-4-keescook@chromium.org


# 2bb2b7b5 21-Apr-2022 John Ogness <john.ogness@linutronix.de>

printk: add functions to prefer direct printing

Once kthread printing is available, console printing will no longer
occur in the context of the printk caller. However, there are some
special contexts where it is desirable for the printk caller to
directly print out kernel messages. Using pr_flush() to wait for
threaded printers is only possible if the caller is in a sleepable
context and the kthreads are active. That is not always the case.

Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit()
functions to explicitly (and globally) activate/deactivate preferred
direct console printing. The term "direct console printing" refers to
printing to all enabled consoles from the context of the printk
caller. The term "prefer" is used because this type of printing is
only best effort. If the console is currently locked or other
printers are already actively printing, the printk caller will need
to rely on the other contexts to handle the printing.

This preferred direct printing is how all printing has been handled
until now (unless it was explicitly deferred).

When kthread printing is introduced, there may be some unanticipated
problems due to kthreads being unable to flush important messages.
In order to minimize such risks, preferred direct printing is
activated for the primary important messages when the system
experiences general types of major errors. These are:

- emergency reboot/shutdown
- cpu and rcu stalls
- hard and soft lockups
- hung tasks
- warn
- sysrq

Note that since kthread printing does not yet exist, no behavior
changes result from this commit. This is only implementing the
counter and marking the various places where preferred direct
printing is active.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-13-john.ogness@linutronix.de


# f953f140 23-Mar-2022 Guilherme G. Piccoli <gpiccoli@igalia.com>

panic: move panic_print before kmsg dumpers

The panic_print setting allows users to collect more information in a
panic event, like memory stats, tasks, CPUs backtraces, etc. This is an
interesting debug mechanism, but currently the print event happens *after*
kmsg_dump(), meaning that pstore, for example, cannot collect a dmesg with
the panic_print extra information.

This patch changes that in 2 steps:

(a) The panic_print setting allows to replay the existing kernel log
buffer to the console (bit 5), besides the extra information dump.
This functionality makes sense only at the end of the panic()
function. So, we hereby allow to distinguish the two situations by a
new boolean parameter in the function panic_print_sys_info().

(b) With the above change, we can safely call panic_print_sys_info()
before kmsg_dump(), allowing to dump the extra information when using
pstore or other kmsg dumpers.

The additional messages from panic_print could overwrite the oldest
messages when the buffer is full. The only reasonable solution is to use
a large enough log buffer, hence we added an advice into the kernel
parameters documentation about that.

Link: https://lkml.kernel.org/r/20220214141308.841525-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8d470a45 23-Mar-2022 Guilherme G. Piccoli <gpiccoli@igalia.com>

panic: add option to dump all CPUs backtraces in panic_print

Currently the "panic_print" parameter/sysctl allows some interesting debug
information to be printed during a panic event. This is useful for
example in cases the user cannot kdump due to resource limits, or if the
user collects panic logs in a serial output (or pstore) and prefers a fast
reboot instead of a kdump.

Happens that currently there's no way to see all CPUs backtraces in a
panic using "panic_print" on architectures that support that. We do have
"oops_all_cpu_backtrace" sysctl, but although partially overlapping in the
functionality, they are orthogonal in nature: "panic_print" is a panic
tuning (and we have panics without oopses, like direct calls to panic() or
maybe other paths that don't go through oops_enter() function), and the
original purpose of "oops_all_cpu_backtrace" is to provide more
information on oopses for cases in which the users desire to continue
running the kernel even after an oops, i.e., used in non-panic scenarios.

So, we hereby introduce an additional bit for "panic_print" to allow
dumping the CPUs backtraces during a panic event.

Link: https://lkml.kernel.org/r/20211109202848.610874-3-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Feng Tang <feng.tang@intel.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a2383e8 23-Mar-2022 Tiezhu Yang <yangtiezhu@loongson.cn>

panic: unset panic_on_warn inside panic()

In the current code, the following three places need to unset
panic_on_warn before calling panic() to avoid recursive panics:

kernel/kcsan/report.c: print_report()
kernel/sched/core.c: __schedule_bug()
mm/kfence/report.c: kfence_report_error()

In order to avoid copy-pasting "panic_on_warn = 0" all over the places,
it is better to move it inside panic() and then remove it from the other
places.

Link: https://lkml.kernel.org/r/1644324666-15947-4-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e83a4472 19-Jan-2022 Sebastian Andrzej Siewior <bigeasy@linutronix.de>

panic: remove oops_id

The oops id has been added as part of the end of trace marker for the
kerneloops.org project. The id is used to automatically identify
duplicate submissions of the same report. Identical looking reports
with different a id can be considered as the same oops occurred again.

The early initialisation of the oops_id can create a warning if the
random core is not yet fully initialized. On PREEMPT_RT it is
problematic if the id is initialized on demand from non preemptible
context.

The kernel oops project is not available since 2017. Remove the oops_id
and use 0 in the output in case parser rely on it.

Link: https://bugs.debian.org/953172
Link: https://lkml.kernel.org/r/Ybdi16aP2NEugWHq@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 23b36fec 19-Jan-2022 Marco Elver <elver@google.com>

panic: use error_report_end tracepoint on warnings

Introduce the error detector "warning" to the error_report event and use
the error_report_end tracepoint at the end of a warning report.

This allows in-kernel tests but also userspace to more easily determine
if a warning occurred without polling kernel logs.

[akpm@linux-foundation.org: add comma to enum list, per Andy]

Link: https://lkml.kernel.org/r/20211115085630.1756817-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 93d102f0 15-Jul-2021 John Ogness <john.ogness@linutronix.de>

printk: remove safe buffers

With @logbuf_lock removed, the high level printk functions for
storing messages are lockless. Messages can be stored from any
context, so there is no need for the NMI and safe buffers anymore.
Remove the NMI and safe buffers.

Although the safe buffers are removed, the NMI and safe context
tracking is still in place. In these contexts, store the message
immediately but still use irq_work to defer the console printing.

Since printk recursion tracking is in place, safe context tracking
for most of printk is not needed. Remove it. Only safe context
tracking relating to the console and console_owner locks is left
in place. This is because the console and console_owner locks are
needed for the actual printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de


# f39650de 30-Jun-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out panic and oops helpers

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2f31ad64 13-Nov-2020 Christophe Leroy <christophe.leroy@csgroup.eu>

panic: don't dump stack twice on warn

Before commit 3f388f28639f ("panic: dump registers on panic_on_warn"),
__warn() was calling show_regs() when regs was not NULL, and show_stack()
otherwise.

After that commit, show_stack() is called regardless of whether
show_regs() has been called or not, leading to duplicated Call Trace:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at arch/powerpc/mm/nohash/8xx.c:186 mmu_mark_initmem_nx+0x24/0x94
CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092
NIP: c00128b4 LR: c0010228 CTR: 00000000
REGS: c9023e40 TRAP: 0700 Not tainted (5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty)
MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24000424 XER: 00000000

GPR00: c0010228 c9023ef8 c2100000 0074c000 ffffffff 00000000 c2151000 c07b3880
GPR08: ff000900 0074c000 c8000000 c33b53a8 24000822 00000000 c0003a20 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00800000
NIP [c00128b4] mmu_mark_initmem_nx+0x24/0x94
LR [c0010228] free_initmem+0x20/0x58
Call Trace:
free_initmem+0x20/0x58
kernel_init+0x1c/0x114
ret_from_kernel_thread+0x14/0x1c
Instruction dump:
7d291850 7d234b78 4e800020 9421ffe0 7c0802a6 bfc10018 3fe0c060 3bff0000
3fff4080 3bffffff 90010024 57ff0010 <0fe00000> 392001cd 7c3e0b78 953e0008
CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.0-rc2-s3k-dev-01375-gf46ec0d3ecbd-dirty #4092
Call Trace:
__warn+0x8c/0xd8 (unreliable)
report_bug+0x11c/0x154
program_check_exception+0x1dc/0x6e0
ret_from_except_full+0x0/0x4
--- interrupt: 700 at mmu_mark_initmem_nx+0x24/0x94
LR = free_initmem+0x20/0x58
free_initmem+0x20/0x58
kernel_init+0x1c/0x114
ret_from_kernel_thread+0x14/0x1c
---[ end trace 31702cd2a9570752 ]---

Only call show_stack() when regs is NULL.

Fixes: 3f388f28639f ("panic: dump registers on panic_on_warn")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lkml.kernel.org/r/e8c055458b080707f1bc1a98ff8bea79d0cec445.1604748361.git.christophe.leroy@csgroup.eu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3f388f28 15-Oct-2020 Alexey Kardashevskiy <aik@ozlabs.ru>

panic: dump registers on panic_on_warn

Currently we print stack and registers for ordinary warnings but we do not
for panic_on_warn which looks as oversight - panic() will reboot the
machine but won't print registers.

This moves printing of registers and modules earlier.

This does not move the stack dumping as panic() dumps it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lkml.kernel.org/r/20200804095054.68724-1-aik@ozlabs.ru
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 63037f74 11-Aug-2020 Yue Hu <huyue2@yulong.com>

panic: make print_oops_end_marker() static

Since print_oops_end_marker() is not used externally, also remove it in
kernel.h at the same time.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20200724011516.12756-1-zbestahu@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 79076e12 11-Aug-2020 Tiezhu Yang <yangtiezhu@loongson.cn>

kernel/panic.c: make oops_may_print() return bool

The return value of oops_may_print() is true or false, so change its type
to reflect that.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lkml.kernel.org/r/1591103358-32087-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5916d5f9 13-Mar-2020 Thomas Gleixner <tglx@linutronix.de>

bug: Annotate WARN/BUG/stackfail as noinstr safe

Warnings, bugs and stack protection fails from noinstr sections, e.g. low
level and early entry code, are likely to be fatal.

Mark them as "safe" to be invoked from noinstr protected code to avoid
annotating all usage sites. Getting the information out is important.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de


# 60c958d8 07-Jun-2020 Guilherme G. Piccoli <gpiccoli@canonical.com>

panic: add sysctl to dump all CPUs backtraces on oops event

Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it. The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.

This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that). The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.

Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops. Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# db38d5c1 07-Jun-2020 Rafael Aquini <aquini@redhat.com>

kernel: add panic_on_taint

Analogously to the introduction of panic_on_warn, this patch introduces
a kernel option named panic_on_taint in order to provide a simple and
generic way to stop execution and catch a coredump when the kernel gets
tainted by any given flag.

This is useful for debugging sessions as it avoids having to rebuild the
kernel to explicitly add calls to panic() into the code sites that
introduce the taint flags of interest.

For instance, if one is interested in proceeding with a post-mortem
analysis at the point a given code path is hitting a bad page (i.e.
unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
by rebooting the kernel with 'panic_on_taint=0x20' amended to the
command line.

Another, perhaps less frequent, use for this option would be as a means
for assuring a security policy case where only a subset of taints, or no
single taint (in paranoid mode), is allowed for the running system. The
optional switch 'nousertaint' is handy in this particular scenario, as
it will avoid userspace induced crashes by writes to sysctl interface
/proc/sys/kernel/tainted causing false positive hits for such policies.

[akpm@linux-foundation.org: tweak kernel-parameters.txt wording]

Suggested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2f30b369 21-Nov-2019 Will Deacon <will@kernel.org>

locking/refcount: Remove unused 'refcount_error_report()' function

'refcount_error_report()' has no callers. Remove it.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-10-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 20bb759a 06-Oct-2019 Will Deacon <will@kernel.org>

panic: ensure preemption is disabled during panic()

Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled. From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.

This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:

| sysrq: Trigger a crash
| Kernel panic - not syncing: sysrq triggered crash
| CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xa0/0xc4
| panic+0x140/0x32c
| sysrq_handle_reboot+0x0/0x20
| __handle_sysrq+0x124/0x190
| write_sysrq_trigger+0x64/0x88
| proc_reg_write+0x60/0xa8
| __vfs_write+0x18/0x40
| vfs_write+0xa4/0x1b8
| ksys_write+0x64/0xf0
| __arm64_sys_write+0x14/0x20
| el0_svc_common.constprop.0+0xb0/0x168
| el0_svc_handler+0x28/0x78
| el0_svc+0x8/0xc
| Kernel Offset: disabled
| CPU features: 0x0002,24002004
| Memory Limit: none
| ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
| Ping 2!
| Ping 1!
| Ping 1!
| Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <will@kernel.org>
Reported-by: Xogium <contact@xogium.me>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2da1ead4 25-Sep-2019 Kees Cook <keescook@chromium.org>

bug: consolidate __WARN_FLAGS usage

Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef
blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro.

Link: http://lkml.kernel.org/r/20190819234111.9019-7-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d38aba49 25-Sep-2019 Kees Cook <keescook@chromium.org>

bug: lift "cut here" out of __warn()

In preparation for cleaning up "cut here", move the "cut here" logic up
out of __warn() and into callers that pass non-NULL args. For anyone
looking closely, there are two callers that pass NULL args: one already
explicitly prints "cut here". The remaining case is covered by how a WARN
is built, which will be cleaned up in the next patch.

Link: http://lkml.kernel.org/r/20190819234111.9019-5-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f2f84b05 25-Sep-2019 Kees Cook <keescook@chromium.org>

bug: consolidate warn_slowpath_fmt() usage

Instead of having a separate helper for no printk output, just consolidate
the logic into warn_slowpath_fmt().

Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ee871133 25-Sep-2019 Kees Cook <keescook@chromium.org>

bug: refactor away warn_slowpath_fmt_taint()

Patch series "Clean up WARN() "cut here" handling", v2.

Christophe Leroy noticed that the fix for missing "cut here" in the WARN()
case was adding explicit printk() calls instead of teaching the exception
handler to add it. This refactors the bug/warn infrastructure to pass
this information as a new BUGFLAG.

Longer details repeated from the last patch in the series:

bug: move WARN_ON() "cut here" into exception handler

The original cleanup of "cut here" missed the WARN_ON() case (that does
not have a printk message), which was fixed recently by adding an explicit
printk of "cut here". This had the downside of adding a printk() to every
WARN_ON() caller, which reduces the utility of using an instruction
exception to streamline the resulting code. By making this a new BUGFLAG,
all of these can be removed and "cut here" can be handled by the exception
handler.

This was very pronounced on PowerPC, but the effect can be seen on x86 as
well. The resulting text size of a defconfig build shows some small
savings from this patch:

text data bss dec hex filename
19691167 5134320 1646664 26472151 193eed7 vmlinux.before
19676362 5134260 1663048 26473670 193f4c6 vmlinux.after

This change also opens the door for creating something like BUG_MSG(),
where a custom printk() before issuing BUG(), without confusing the "cut
here" line.

This patch (of 7):

There's no reason to have specialized helpers for passing the warn taint
down to __warn(). Consolidate and refactor helper macros, removing
__WARN_printf() and warn_slowpath_fmt_taint().

Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Drew Davenport <ddavenport@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7d92bda2 25-Sep-2019 Douglas Anderson <dianders@chromium.org>

kgdb: don't use a notifier to enter kgdb at panic; call directly

Right now kgdb/kdb hooks up to debug panics by registering for the panic
notifier. This works OK except that it means that kgdb/kdb gets called
_after_ the CPUs in the system are taken offline. That means that if
anything important was happening on those CPUs (like something that might
have contributed to the panic) you can't debug them.

Specifically I ran into a case where I got a panic because a task was
"blocked for more than 120 seconds" which was detected on CPU 2. I nicely
got shown stack traces in the kernel log for all CPUs including CPU 0,
which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of
__mmc_switch().

I then ended up at the kdb prompt where switched over to kgdb to try to
look at local variables of the process on CPU 0. I found that I couldn't.
Digging more, I found that I had no info on any tasks running on CPUs
other than CPU 2 and that asking kdb for help showed me "Error: no saved
data for this cpu". This was because all the CPUs were offline.

Let's move the entry of kdb/kgdb to a direct call from panic() and stop
using the generic notifier. Putting a direct call in allows us to order
things more properly and it also doesn't seem like we're breaking any
abstractions by calling into the debugger from the panic function.

Daniel said:

: This patch changes the way kdump and kgdb interact with each other.
: However it would seem rather odd to have both tools simultaneously armed
: and, even if they were, the user still has the option to use panic_timeout
: to force a kdump to happen. Thus I think the change of order is
: acceptable.

Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 57043247 22-Apr-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: admin-guide: move sysctl directory to it

The stuff under sysctl describes /sys interface from userspace
point of view. So, add it to the admin-guide and remove the
:orphan: from its index file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>


# 53b95375 18-Apr-2019 Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

docs: sysctl: convert to ReST

Rename the /proc/sys/ documentation files to ReST, using the
README file as a template for an index.rst, adding the other
files there via TOC markup.

Despite being written on different times with different
styles, try to make them somewhat coherent with a similar
look and feel, ensuring that they'll look nice as both
raw text file and as via the html output produced by the
Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>


# 457c8996 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Add SPDX license identifier for missed files

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# de6da1e8 17-May-2019 Feng Tang <feng.tang@intel.com>

panic: add an option to replay all the printk message in buffer

Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b287a25a 14-May-2019 Aaro Koskinen <aaro.koskinen@nokia.com>

panic/reboot: allow specifying reboot_mode for panic only

Allow specifying reboot_mode for panic only. This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.

Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c39ea0b9 14-May-2019 Feng Tang <feng.tang@intel.com>

panic: avoid the extra noise dmesg

When kernel panic happens, it will first print the panic call stack,
then the ending msg like:

[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------

The above message are very useful for debugging.

But if system is configured to not reboot on panic, say the
"panic_timeout" parameter equals 0, it will likely print out many noisy
message like WARN() call stack for each and every CPU except the panic
one, messages like below:

WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:
<IRQ>
try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt

For people working in console mode, the screen will first show the panic
call stack, but immediately overridden by these noisy extra messages,
which makes debugging much more difficult, as the original context gets
lost on screen.

Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real
panic call stack and context.

Adding a flag "suppress_printk" which gets set in panic() to avoid those
noisy messages, without changing current kernel behavior that both panic
blinking and sysrq magic key can work as is, suggested by Petr Mladek.

To verify this, make sure kernel is not configured to reboot on panic and
in console
# echo c > /proc/sysrq-trigger
to see if console only prints out the panic call stack.

Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 98587c2d 29-Apr-2019 Martin Schwidefsky <schwidefsky@de.ibm.com>

s390: simplify disabled_wait

The disabled_wait() function uses its argument as the PSW address when
it stops the CPU with a wait PSW that is disabled for interrupts.
The different callers sometimes use a specific number like 0xdeadbeef
to indicate a specific failure, the early boot code uses 0 and some
other calls sites use __builtin_return_address(0).

At the time a dump is created the current PSW and the registers of a
CPU are written to lowcore to make them avaiable to the dump analysis
tool. For a CPU stopped with disabled_wait the PSW and the registers
do not really make sense together, the PSW address does not point to
the function the registers belong to.

Simplify disabled_wait() by using _THIS_IP_ for the PSW address and
drop the argument to the function.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 4169680e 07-Mar-2019 YueHaibing <yuehaibing@huawei.com>

kernel/panic.c: taint: fix debugfs_simple_attr.cocci warnings

Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for
debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

The _unsafe() part suggests that some of them "safeness
responsibilities" are now panic.c responsibilities. The patch is OK
since panic's clear_warn_once_fops struct file_operations is safe
against removal, so we don't have to use otherwise necessary
debugfs_file_get()/debugfs_file_put().

[sergey.senozhatsky.work@gmail.com: changelog addition]
Link: http://lkml.kernel.org/r/1545990861-158097-1-git-send-email-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 81c9d43f 03-Jan-2019 Feng Tang <feng.tang@intel.com>

kernel/sysctl: add panic_print into sysctl

So that we can also runtime chose to print out the needed system info
for panic, other than setting the kernel cmdline.

Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d999bd93 03-Jan-2019 Feng Tang <feng.tang@intel.com>

panic: add options to print system info when panic happens

Kernel panic issues are always painful to debug, partially because it's
not easy to get enough information of the context when panic happens.

And we have ramoops and kdump for that, while this commit tries to
provide a easier way to show the system info by adding a cmdline
parameter, referring some idea from sysrq handler.

Link: http://lkml.kernel.org/r/1543398842-19295-2-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7c3f05e 25-Oct-2018 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>

panic: avoid deadlocks in re-entrant console drivers

From printk()/serial console point of view panic() is special, because
it may force CPU to re-enter printk() or/and serial console driver.
Therefore, some of serial consoles drivers are re-entrant. E.g. 8250:

serial8250_console_write()
{
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(&port->lock, flags);
else
spin_lock_irqsave(&port->lock, flags);
...
}

panic() does set oops_in_progress via bust_spinlocks(1), so in theory
we should be able to re-enter serial console driver from panic():

CPU0
<NMI>
uart_console_write()
serial8250_console_write() // if (oops_in_progress)
// spin_trylock_irqsave()
call_console_drivers()
console_unlock()
console_flush_on_panic()
bust_spinlocks(1) // oops_in_progress++
panic()
<NMI/>
spin_lock_irqsave(&port->lock, flags) // spin_lock_irqsave()
serial8250_console_write()
call_console_drivers()
console_unlock()
printk()
...

However, this does not happen and we deadlock in serial console on
port->lock spinlock. And the problem is that console_flush_on_panic()
called after bust_spinlocks(0):

void panic(const char *fmt, ...)
{
bust_spinlocks(1);
...
bust_spinlocks(0);
console_flush_on_panic();
...
}

bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress
can go back to zero. Thus even re-entrant console drivers will simply
spin on port->lock spinlock. Given that port->lock may already be
locked either by a stopped CPU, or by the very same CPU we execute
panic() on (for instance, NMI panic() on printing CPU) the system
deadlocks and does not reboot.

Fix this by removing bust_spinlocks(0), so oops_in_progress is always
set in panic() now and, thus, re-entrant console drivers will trylock
the port->lock instead of spinning on it forever, when we call them
from console_flush_on_panic().

Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Wang <wonderfly@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: linux-serial@vger.kernel.org
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>


# b49dec1c 30-Oct-2018 Borislav Petkov <bp@suse.de>

kernel/panic.c: filter out a potential trailing newline

If a call to panic() terminates the string with a \n , the result puts the
closing brace ']---' on a newline because panic() itself adds \n too.

Now, if one goes and removes the newline chars from all panic()
invocations - and the stats right now look like this:

~300 calls with a \n
~500 calls without a \n

one is destined to a neverending game of whack-a-mole because the usual
thing to do is add a newline at the end of a string a function is supposed
to print.

Therefore, simply zap any \n at the end of the panic string to avoid
touching so many places in the kernel.

Link: http://lkml.kernel.org/r/20181009205019.2786-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 95c4fb78 30-Oct-2018 Borislav Petkov <bp@suse.de>

kernel/panic.c: do not append newline to the stack protector panic string

... because panic() itself already does this. Otherwise you have
line-broken trailer:

[ 1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0
[ 1.836965] ]---

Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 050e9baa 13-Jun-2018 Linus Torvalds <torvalds@linux-foundation.org>

Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables

The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.

That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.

HOWEVER.

It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like

CONFIG_HAVE_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_STACKPROTECTOR_AUTO=y

and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:

CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_STRONG is not set
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.

The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one. This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).

This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:

CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.

Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bc4f2f54 10-Apr-2018 Kees Cook <keescook@chromium.org>

taint: add taint for randstruct

Since the randstruct plugin can intentionally produce extremely unusual
kernel structure layouts (even performance pathological ones), some
maintainers want to be able to trivially determine if an Oops is coming
from a randstruct-built kernel, so as to keep their sanity when
debugging. This adds the new flag and initializes taint_mask
immediately when built with randstruct.

Link: http://lkml.kernel.org/r/1519084390-43867-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9c4560e5 10-Apr-2018 Kees Cook <keescook@chromium.org>

taint: consolidate documentation

This consolidates the taint bit documentation into a single place with
both numeric and letter values. Additionally adds the missing TAINT_AUX
documentation.

Link: http://lkml.kernel.org/r/1519084390-43867-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 47d4b263 10-Apr-2018 Kees Cook <keescook@chromium.org>

taint: convert to indexed initialization

This converts to using indexed initializers instead of comments, adds a
comment on why the taint flags can't be an enum, and make sure that no
one forgets to update the taint_flags when adding new bits.

Link: http://lkml.kernel.org/r/1519084390-43867-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4c281074 03-Apr-2018 Steven Rostedt (VMware) <rostedt@goodmis.org>

lockdep: Add print_irqtrace_events() to __warn

Running a test on a x86_32 kernel I triggered a bug that an interrupt
disable/enable isn't being catched by lockdep. At least knowing where the
last one was found would be helpful, but the warnings that are produced do
not show this information. Even without debugging lockdep, having the WARN()
display the last place hard and soft irqs were enabled or disabled is
valuable.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


# 0862ca42 09-Mar-2018 Kees Cook <keescook@chromium.org>

bug: use %pB in BUG and stack protector failure

The BUG and stack protector reports were still using a raw %p. This
changes it to %pB for more meaningful output.

Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast
Fixes: ad67b74d2469 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>,
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5ad75105 06-Mar-2018 Borislav Petkov <bp@suse.de>

panic: Add closing panic marker parenthesis

Otherwise it looks unbalanced.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-2-bp@alien8.de


# 4efb442c 17-Nov-2017 Borislav Petkov <bp@suse.de>

kernel/panic.c: add TAINT_AUX

This is the gist of a patch which we've been forward-porting in our
kernels for a long time now and it probably would make a good sense to
have such TAINT_AUX flag upstream which can be used by each distro etc,
how they see fit. This way, we won't need to forward-port a distro-only
version indefinitely.

Add an auxiliary taint flag to be used by distros and others. This
obviates the need to forward-port whatever internal solutions people
have in favor of a single flag which they can map arbitrarily to a
definition of their pleasing.

The "X" mnemonic could also mean eXternal, which would be taint from a
distro or something else but not the upstream kernel. We will use it to
mark modules for which we don't provide support. I.e., a really
eXternal module.

Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a7bed27a 17-Nov-2017 Kees Cook <keescook@chromium.org>

bug: fix "cut here" location for __WARN_TAINT architectures

Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s.
After WARN() was moved to using UD0 on x86, the warning text started
appearing _before_ the "cut here" line. This appears to have been a
long-standing bug on architectures that used __WARN_TAINT, but it didn't
get fixed.

v4.11 and earlier on x86:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30
This is a warning message
Modules linked in:

v4.12 and later on x86:

This is a warning message
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20
Modules linked in:

With this fix:

------------[ cut here ]------------
This is a warning message
WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20

Since the __FILE__ reporting happens as part of the UD0 handler, it
isn't trivial to move the message to after the WARNING line, but at
least we can fix the position of the "cut here" line so all the various
logging tools will start including the actual runtime warning message
again, when they follow the instruction and "cut here".

Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org
Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0")
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2a8358d8 17-Nov-2017 Kees Cook <keescook@chromium.org>

bug: define the "cut here" string in a single place

The "cut here" string is used in a few paths. Define it in a single
place.

Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# aaf5dcfb 17-Nov-2017 Andi Kleen <ak@linux.intel.com>

kernel debug: support resetting WARN_ONCE for all architectures

Some architectures store the WARN_ONCE state in the flags field of the
bug_entry. Clear that one too when resetting once state through
/sys/kernel/debug/clear_warn_once

Pointed out by Michael Ellerman

Improves the earlier patch that add clear_warn_once.

[ak@linux.intel.com: add a missing ifdef CONFIG_MODULES]
Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org
[akpm@linux-foundation.org: fix unused var warning]
[akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe]
[akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe]
Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b1fca27d 17-Nov-2017 Andi Kleen <ak@linux.intel.com>

kernel debug: support resetting WARN*_ONCE

I like _ONCE warnings because it's guaranteed that they don't flood the
log.

During testing I find it useful to reset the state of the once warnings,
so that I can rerun tests and see if they trigger again, or can
guarantee that a test run always hits the same warnings.

This patch adds a debugfs interface to reset all the _ONCE warnings so
that they appear again:

echo 1 > /sys/kernel/debug/clear_warn_once

This is implemented by putting all the warning booleans into a special
section, and clearing it.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7a46ec0e 15-Aug-2017 Kees Cook <keescook@chromium.org>

locking/refcounts, x86/asm: Implement fast refcount overflow protection

This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)

refcount_inc() after:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx
ffffffff816c36d7: 0f ff (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
cycles protections
atomic_t 82249267387 none
refcount_t-fast 82211446892 overflow, untested dec-to-zero
refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b17b0153 08-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>

We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7db60d05 01-Feb-2017 Vijay Kumar <vijay.ac.kumar@oracle.com>

sparc64: Send break twice from console to return to boot prom

Now we can also jump to boot prom from sunhv console by sending
break twice on console for both running and panicked kernel
cases.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f92bac3b 27-Dec-2016 Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

printk: rename nmi.c and exported api

A preparation patch for printk_safe work. No functional change.
- rename nmi.c to print_safe.c
- add `printk_safe' prefix to some (which used both by printk-safe
and printk-nmi) of the exported functions.

Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>


# ff7a28a0 24-Jan-2017 Jiri Slaby <jirislaby@kernel.org>

kernel/panic.c: add missing \n

When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line. Fix it.

Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5eb7c0d0 01-Jan-2017 Larry Finger <Larry.Finger@lwfinger.net>

taint/module: Fix problems when out-of-kernel driver defines true or false

Commit 7fd8329ba502 ("taint/module: Clean up global and module taint
flags handling") used the key words true and false as character members
of a new struct. These names cause problems when out-of-kernel modules
such as VirtualBox include their own definitions of true and false.

Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jessica Yu <jeyu@redhat.com>


# 7fd8329b 21-Sep-2016 Petr Mladek <pmladek@suse.com>

taint/module: Clean up global and module taint flags handling

The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints:
add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to
potentially print one more character. But it did not increase the
size of the corresponding buffers in m_show() and print_modules().

We have recently done the same mistake when adding a taint flag
for livepatching, see
https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com

Also struct module uses an incompatible type for mod-taints flags.
It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module
taint flags in Oops/panic"). There was used "int" for the global taint
flags at these times. But only the global tain flags was later changed
to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint
flags reliable").

This patch defines TAINT_FLAGS_COUNT that can be used to create
arrays and buffers of the right size. Note that we could not use
enum because the taint flag indexes are used also in assembly code.

Then it reworks the table that describes the taint flags. The TAINT_*
numbers can be used as the index. Instead, we add information
if the taint flag is also shown per-module.

Finally, it uses "unsigned long", bit operations, and the updated
taint_flags table also for mod->taints.

It is not optimal because only few taint flags can be printed by
module_taint_flags(). But better be on the safe side. IMHO, it is
not worth the optimization and this is a good compromise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com
[jeyu@redhat.com: fix broken lkml link in changelog]
Signed-off-by: Jessica Yu <jeyu@redhat.com>


# 0ee59413 11-Oct-2016 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

x86/panic: replace smp_send_stop() with kdump friendly version in panic path

Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).

In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.

To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.

For Xen's PV kernel, just keep the current behavior.

NOTES:

- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()

- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()

Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reported-by: Daniel Walker <dwalker@fifo99.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Daniel Walker <dwalker@fifo99.com>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: "Steven J. Hill" <steven.hill@cavium.com>
Cc: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b26e27dd 02-Aug-2016 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

kexec: use core_param for crash_kexec_post_notifiers boot option

crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.

Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf9b1106 20-May-2016 Petr Mladek <pmladek@suse.com>

printk/nmi: flush NMI messages on the system panic

In NMI context, printk() messages are stored into per-CPU buffers to
avoid a possible deadlock. They are normally flushed to the main ring
buffer via an IRQ work. But the work is never called when the system
calls panic() in the very same NMI handler.

This patch tries to flush NMI buffers before the crash dump is
generated. In this case it does not risk a double release and bails out
when the logbuf_lock is already taken. The aim is to get the messages
into the main ring buffer when possible. It makes them better
accessible in the vmcore.

Then the patch tries to flush the buffers second time when other CPUs
are down. It might be more aggressive and reset logbuf_lock. The aim
is to get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.

The patch causes vprintk_emit() to be called even in NMI context again.
But it is done via printk_deferred() so that the console handling is
skipped. Consoles use internal locks and we could not prevent a
deadlock easily. They are explicitly called later when the crash dump
is not generated, see console_flush_on_panic().

Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ebc41f20 22-Mar-2016 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic: change nmi_panic from macro to function

Commit 1717f2096b54 ("panic, x86: Fix re-entrance problem due to panic
on NMI") and commit 58c5661f2144 ("panic, x86: Allow CPUs to save
registers even if looping in NMI context") introduced nmi_panic() which
prevents concurrent/recursive execution of panic(). It also saves
registers for the crash dump on x86.

However, there are some cases where NMI handlers still use panic().
This patch set partially replaces them with nmi_panic() in those cases.

Even this patchset is applied, some NMI or similar handlers (e.g. MCE
handler) continue to use panic(). This is because I can't test them
well and actual problems won't happen. For example, the possibility
that normal panic and panic on MCE happen simultaneously is very low.

This patch (of 3):

Convert nmi_panic() to a proper function and export it instead of
exporting internal implementation details to modules, for obvious
reasons.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2553b67a 17-Mar-2016 Josh Poimboeuf <jpoimboe@redhat.com>

lib/bug.c: use common WARN helper

The traceoff_on_warning option doesn't have any effect on s390, powerpc,
arm64, parisc, and sh because there are two different types of WARN
implementations:

1) The above mentioned architectures treat WARN() as a special case of a
BUG() exception. They handle warnings in report_bug() in lib/bug.c.

2) All other architectures just call warn_slowpath_*() directly. Their
warnings are handled in warn_slowpath_common() in kernel/panic.c.

Support traceoff_on_warning on all architectures and prevent any future
divergence by using a single common function to emit the warning.

Also remove the '()' from '%pS()', because the parentheses look funky:

[ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8d91f8b1 15-Jan-2016 Tejun Heo <tj@kernel.org>

printk: do cond_resched() between lines while outputting to consoles

@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.

However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.

If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.

Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7bbee5ca 14-Dec-2015 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

kexec: Fix race between panic() and crash_kexec()

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
nmi_shootdown_cpus() // stop other CPUs

CPU 1:
panic()
crash_kexec()
mutex_trylock() // failed to acquire
smp_send_stop() // stop other CPUs
infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
<NMI>
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude others
by using the panic_cpu atomic.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 58c5661f 14-Dec-2015 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic, x86: Allow CPUs to save registers even if looping in NMI context

Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:

CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...

Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.

In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.

To save registers in this case, we need to:

a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).

This patch does the following:

1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.

2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1717f209 14-Dec-2015 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic, x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7625b3a0 20-Nov-2015 Vitaly Kuznetsov <vkuznets@redhat.com>

kernel/panic.c: turn off locks debug before releasing console lock

Commit 08d78658f393 ("panic: release stale console lock to always get the
logbuf printed out") introduced an unwanted bad unlock balance report when
panic() is called directly and not from OOPS (e.g. from out_of_memory()).
The difference is that in case of OOPS we disable locks debug in
oops_enter() and on direct panic call nobody does that.

Fixes: 08d78658f393 ("panic: release stale console lock to always get the logbuf printed out")
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 08d78658 06-Nov-2015 Vitaly Kuznetsov <vkuznets@redhat.com>

panic: release stale console lock to always get the logbuf printed out

In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. I'm observing the
following:

- A crash is happening on one CPU and console_unlock() is being called on
some other.

- console_unlock() tries to print out the buffer before releasing the lock
and on slow console it takes time.

- in the meanwhile crashing CPU does lots of printk()-s with valuable data
(which go to the logbuf) and sends IPIs to all other CPUs.

- console_unlock() finishes printing previous chunk and enables interrupts
before trying to print out the rest, the CPU catches the IPI and never
releases console lock.

This is not the only possible case: in VT/fb subsystems we have many other
console_lock()/console_unlock() users. Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.
Getting the whole console buffer printed out on crash should be top
priority.

[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5375b708 30-Jun-2015 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path

Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.

The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.

To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().

Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f45d85ff 30-Jun-2015 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers is enabled

For compatibility with the behaviour before the commit f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.

Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5f45465 16-Dec-2014 Seth Jennings <sjenning@redhat.com>

livepatch: kernel: add TAINT_LIVEPATCH

This adds a new taint flag to indicate when the kernel or a kernel
module has been live patched. This will provide a clean indication in
bug reports that live patching was used.

Additionally, if the crash occurs in a live patched function, the live
patch module will appear beside the patched function in the backtrace.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 9e3961a0 10-Dec-2014 Prarit Bhargava <prarit@redhat.com>

kernel: add panic_on_warn

There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic. This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path. The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location. After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
[<ffffffff81665190>] dump_stack+0x46/0x58
[<ffffffff8165e2ec>] panic+0xd0/0x204
[<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
[<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
[<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
[<ffffffff810f8889>] load_module+0x16a9/0x1b30
[<ffffffff810f3d30>] ? store_uevent+0x70/0x70
[<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
[<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bc53a3f4 13-Nov-2014 Xie XiuQi <xiexiuqi@huawei.com>

kernel/panic.c: update comments for print_tainted

Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag,
but failed to update the comments for print_tainted(). So, update the
comments.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 69361eef 08-Aug-2014 Josh Hunt <johunt@akamai.com>

panic: add TAINT_SOFTLOCKUP

This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.

[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f06e5153 06-Jun-2014 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers

Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Satoru MORIYA <satoru.moriya.br@hitachi.com>
Cc: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d7c0847f 07-Apr-2014 Fabian Frederick <fabf@skynet.be>

kernel/panic.c: display reason at end + pr_emerg

Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...

This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.

Also, convert all panic.c printk(KERN_EMERG to pr_emerg(

[akpm@linux-foundation.org: missed a couple of pr_ conversions]
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 57673c2b 30-Mar-2014 Rusty Russell <rusty@rustcorp.com.au>

Use 'E' instead of 'X' for unsigned module taint flag.

Takashi Iwai <tiwai@suse.de> says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules. Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)

Note: this code should be cleaned up, so we don't have such maps in
three places!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 8c90487c 26-Feb-2014 Dave Jones <davej@redhat.com>

Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC

Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 66cc69e3 12-Mar-2014 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.

This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.

Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.

With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.

Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).

Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
NAKed-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a7330c99 08-Feb-2014 Andi Kleen <ak@linux.intel.com>

asmlinkage Make __stack_chk_failed and memcmp visible

In LTO symbols implicitely referenced by the compiler need
to be visible. Earlier these symbols were visible implicitely
from being exported, but we disabled implicit visibility fo
EXPORTs when modules are disabled to improve code size. So
now these symbols have to be marked visible explicitely.

Do this for __stack_chk_fail (with stack protector)
and memcmp.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>


# 5800dc3c 25-Nov-2013 Jason Baron <jbaron@akamai.com>

panic: Make panic_timeout configurable

The panic_timeout value can be set via the command line option
'panic=x', or via /proc/sys/kernel/panic, however that is not
sufficient when the panic occurs before we are able to set up
these values. Thus, add a CONFIG_PANIC_TIMEOUT so that we can
set the desired value from the .config.

The default panic_timeout value continues to be 0 - wait
forever. Also adds set_arch_panic_timeout(new_timeout,
arch_default_timeout), which is intended to be used by arches in
arch_setup(). The idea being that the new_timeout is only set if
the user hasn't changed from the arch_default_timeout.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: mpe@ellerman.id.au
Cc: felipe.contreras@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1a1674daec27c534df409697025ac568ebcee91e.1385418410.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 01284764 12-Nov-2013 Chen Gang <gang.chen@asianux.com>

kernel/panic.c: reduce 1 byte usage for print tainted buffer

sizeof("Tainted: ") already counts '\0', and after first sprintf(), 's'
will start from the current string end (its' value is '\0').

So need not add additional 1 byte for maximized usage of 'buf' in
print_tainted().

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6723734c 11-Sep-2013 Kees Cook <keescook@chromium.org>

panic: call panic handlers before kmsg_dump

Since the panic handlers may produce additional information (via printk)
for the kernel log, it should be reported as part of the panic output
saved by kmsg_dump(). Without this re-ordering, nothing that adds
information to a panic will show up in pstore's view when kmsg_dump runs,
and is therefore not visible to crash reporting tools that examine pstore
output.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dcb6b452 08-Jul-2013 Alex Thorlton <athorlton@sgi.com>

panic: add cpu/pid to warn_slowpath_common in WARNING printk()s

Add the cpu/pid that called WARN() so that the stack traces can be
matched up with the WARNING messages.

[akpm@linux-foundation.org: remove stray quote]
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# de7edd31 14-Jun-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Disable tracing on warning

Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.

This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 98e5e1bf 30-Apr-2013 Tejun Heo <tj@kernel.org>

dump_stack: implement arch-specific hardware description in task dumps

x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.

* x86 show_regs() collects vendor, product and board strings and print
them out with PID, comm and utsname. Some of the information is
printed again later in the same dump.

* warn_slowpath_common() explicitly accesses the DMI board and prints
it out with "Hardware name:" label. This applies to both x86 and
ia64 but is irrelevant on all other archs.

* ia64 doesn't show DMI information on other non-WARN dumps.

This patch introduces arch-specific hardware description used by
dump_stack(). It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.

dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data. It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message. It is superset of the
information x86 show_regs() is using. The function is called from x86
and ia64 boot code right after dmi_scan_machine().

This makes the explicit DMI handling in warn_slowpath_common()
unnecessary. Removed.

show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs(). The next patch will unify show_regs() handling and
remove the duplication.

An example WARN dump follows.

WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
[<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a0c3>] init_workqueues+0x35/0x505
...

v2: Use the same string as the debug message from dmi_present() which
also contains BIOS information. Move hardware name into its own
line as warn_slowpath_common() did. This change was suggested by
Bjorn Helgaas.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 373d4d09 20-Jan-2013 Rusty Russell <rusty@rustcorp.com.au>

taint: add explicit flag to show whether lock dep is still OK.

Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 190320c3 30-Jul-2012 Vikram Mulukutla <markivx@codeaurora.org>

panic: fix a possible deadlock in panic()

panic_lock is meant to ensure that panic processing takes place only on
one cpu; if any of the other cpus encounter a panic, they will spin
waiting to be shut down.

However, this causes a regression in this scenario:

1. Cpu 0 encounters a panic and acquires the panic_lock
and proceeds with the panic processing.
2. There is an interrupt on cpu 0 that also encounters
an error condition and invokes panic.
3. This second invocation fails to acquire the panic_lock
and enters the infinite while loop in panic_smp_self_stop.

Thus all panic processing is stopped, and the cpu is stuck for eternity
in the while(1) inside panic_smp_self_stop.

To address this, disable local interrupts with local_irq_disable before
acquiring the panic_lock. This will prevent interrupt handlers from
executing during the panic processing, thus avoiding this particular
problem.

Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 62be73ea 15-May-2012 Seiji Aguchi <seiji.aguchi@hds.com>

kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()

This patch moves kmsg_dump(KMSG_DUMP_PANIC) below smp_send_stop(),
to serialize the crash-logging process via smp_send_stop() and to
thus retrieve a more stable crash image of all CPUs stopped.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: dle-develop@lists.sourceforge.net <dle-develop@lists.sourceforge.net>
Cc: Satoru Moriya <satoru.moriya@hds.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: a.p.zijlstra@chello.nl <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5C4C569E8A4B9B42A84A977CF070A35B2E4D7A5CE2@USINDEVS01.corp.hds.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2a01bb38 11-Apr-2012 Kyle McMartin <kmcmarti@redhat.com>

panic: Make panic_on_oops configurable

Several distros set this by default by patching panic_on_oops.
It seems to fit with the BOOTPARAM_{HARD,SOFT}_PANIC options
though, so let's add a Kconfig entry and reduce some more
upstream delta.

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120411121529.GH26688@redacted.bos.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 026ee1f6 12-Apr-2012 Jason Wessel <jason.wessel@windriver.com>

panic: fix stack dump print on direct call to panic()

Commit 6e6f0a1f0fa6 ("panic: don't print redundant backtraces on oops")
causes a regression where no stack trace will be printed at all for the
case where kernel code calls panic() directly while not processing an
oops, and of course there are 100's of instances of this type of call.

The original commit executed the check (!oops_in_progress), but this will
always be false because just before the dump_stack() there is a call to
bust_spinlocks(1), which does the following:

void __attribute__((weak)) bust_spinlocks(int yes)
{
if (yes) {
++oops_in_progress;

The proper way to resolve the problem that original commit tried to
solve is to avoid printing a stack dump from panic() when the either of
the following conditions is true:

1) TAINT_DIE has been set (this is done by oops_end())
This indicates and oops has already been printed.
2) oops_in_progress > 1
This guards against the rare case where panic() is invoked
a second time, or in between oops_begin() and oops_end()

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6e6f0a1f 12-Jan-2012 Andi Kleen <ak@linux.intel.com>

panic: don't print redundant backtraces on oops

When an oops causes a panic and panic prints another backtrace it's pretty
common to have the original oops data be scrolled away on a 80x50 screen.

The second backtrace is quite redundant and not needed anyways.

So don't print the panic backtrace when oops_in_progress is true.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 93e13a36 12-Jan-2012 Michael Holzheu <holzheu@linux.vnet.ibm.com>

kdump: fix crash_kexec()/smp_send_stop() race in panic()

When two CPUs call panic at the same time there is a possible race
condition that can stop kdump. The first CPU calls crash_kexec() and the
second CPU calls smp_send_stop() in panic() before crash_kexec() finished
on the first CPU. So the second CPU stops the first CPU and therefore
kdump fails:

1st CPU:
panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

2nd CPU:
panic()->crash_kexec()->kexec_mutex already held by 1st CPU
->smp_send_stop()-> stop 1st CPU (stop kdump)

This patch fixes the problem by introducing a spinlock in panic that
allows only one CPU to process crash_kexec() and the subsequent panic
code.

All other CPUs call the weak function panic_smp_self_stop() that stops the
CPU itself. This function can be overloaded by architecture code. For
example "tile" can use their lower-power "nap" instruction for that.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9402c95f 12-Jan-2012 Joe Perches <joe@perches.com>

treewide: remove useless NORET_TYPE macro and uses

It's a very old and now unused prototype marking so just delete it.

Neaten panic pointer argument style to keep checkpatch quiet.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9ec84ace 07-Dec-2011 Ben Hutchings <ben@decadent.org.uk>

lockdep, bug: Exclude TAINT_OOT_MODULE from disabling lock debugging

We do want to allow lock debugging for GPL-compatible modules
that are not (yet) built in-tree. This was disabled as a
side-effect of commit 2449b8ba0745327c5fa49a8d9acffe03b2eded69
('module,bug: Add TAINT_OOT_MODULE flag for modules not built
in-tree'). Lock debug warnings now include taint flags, so
kernel developers should still be able to deflect warnings
caused by out-of-tree modules.

The TAINT_PROPRIETARY_MODULE flag for non-GPL-compatible modules
will still disable lock debugging.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Nick Bowler <nbowler@elliptictech.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Debian kernel maintainers <debian-kernel@lists.debian.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1323268258.18450.11.camel@deadeye
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# df754e6a 14-Nov-2011 Peter Zijlstra <a.p.zijlstra@chello.nl>

lockdep, bug: Exclude TAINT_FIRMWARE_WORKAROUND from disabling lockdep

It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false
lockdep messages, so do not disable lockdep in that case.
We still want to keep lockdep disabled in the
TAINT_OOT_MODULE case:

- bin-only modules can cause various instabilities in
their and in unrelated kernel code

- they are impossible to debug for kernel developers

- they also typically do not have the copyright license
permission to link to the GPL-ed lockdep code.

Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2449b8ba 24-Oct-2011 Ben Hutchings <ben@decadent.org.uk>

module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree

Use of the GPL or a compatible licence doesn't necessarily make the code
any good. We already consider staging modules to be suspect, and this
should also be true for out-of-tree modules which may receive very
little review.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Dave Jones <davej@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (patched oops-tracing.txt)


# 4302fbc8 26-Jul-2011 Hugh Dickins <hughd@chromium.org>

panic: panic=-1 for immediate reboot

When a kernel BUG or oops occurs, ChromeOS intends to panic and
immediately reboot, with stacktrace and other messages preserved in RAM
across reboot.

But the longer we delay, the more likely the user is to poweroff and
lose the info.

panic_timeout (seconds before rebooting) is set by panic= boot option or
sysctl or /proc/sys/kernel/panic; but 0 means wait forever, so at
present we have to delay at least 1 second.

Let a negative number mean reboot immediately (with the small cosmetic
benefit of suppressing that newline-less "Rebooting in %d seconds.."
message).

Signed-off-by: Hugh Dickins <hughd@chromium.org>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d404ab0a 22-Mar-2011 Olaf Hering <olaf@aepfle.de>

move x86 specific oops=panic to generic code

The oops=panic cmdline option is not x86 specific, move it to generic code.
Update documentation.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 81e88fdc 11-Jan-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
via NMI, because printk is not NMI-safe. Will fix this via delay
printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 863a6049 10-Aug-2010 Anton Blanchard <anton@samba.org>

lib/bug.c: add oops end marker to WARN implementation

We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7ff0d9c 10-Aug-2010 TAMUKI Shoichi <tamuki@linet.gr.jp>

panic: keep blinking in spite of long spin timer mode

To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.

This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.

The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5b530fc1 26-May-2010 Anton Blanchard <anton@samba.org>

panic: call console_verbose() in panic

Most distros turn the console verbosity down and that means a backtrace
after a panic never makes it to the console. I assume we haven't seen
this because a panic is often preceeded by an oops which will have called
console_verbose. There are however a lot of places we call panic
directly, and they are broken.

Use console_verbose like we do in the oops path to ensure a directly
called panic will print a backtrace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 92946bc7 03-Apr-2010 Ben Hutchings <ben@decadent.org.uk>

panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')

This taint flag will initially be used when warning about invalid ACPI
DMAR tables.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# b2be0527 03-Apr-2010 Ben Hutchings <ben@decadent.org.uk>

panic: Allow warnings to set different taint flags

WARN() is used in some places to report firmware or hardware bugs that
are then worked-around. These bugs do not affect the stability of the
kernel and should not set the flag for TAINT_WARN. To allow for this,
add WARN_TAINT() and WARN_TAINT_ONCE() macros that take a taint number
as argument.

Architectures that implement warnings using trap instructions instead
of calls to warn_slowpath_*() now implement __WARN_TAINT(taint)
instead of __WARN().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Helge Deller <deller@gmx.de>
Tested-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 8aeee85a 05-Mar-2010 Anton Blanchard <anton@samba.org>

panic: fix panic_timeout accuracy when running on a hypervisor

I've had some complaints about panic_timeout being wildly innacurate on
shared processor PowerPC partitions (a 3 minute panic_timeout taking 30
minutes).

The problem is we loop on mdelay(1) and with a 1ms in 10ms hypervisor
timeslice each of these will take 10ms (ie 10x) longer. I expect other
platforms with shared processor hypervisors will see the same issue.

This patch keeps the old behaviour if we have a panic_blink (only keyboard
LEDs right now) and does 1 second mdelays if we don't.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0f4bd46e 21-Dec-2009 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

kmsg_dump: Dump on crash_kexec as well

crash_kexec gets called before kmsg_dump(KMSG_DUMP_OOPS) if
panic_on_oops is set, so the kernel log buffer is not stored
for this case.

This patch adds a KMSG_DUMP_KEXEC dump type which gets called
when crash_kexec() is invoked. To avoid getting double dumps,
the old KMSG_DUMP_PANIC is moved below crash_kexec(). The
mtdoops driver is modified to handle KMSG_DUMP_KEXEC in the
same way as a panic.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# 456b565c 16-Oct-2009 Simon Kagstrom <simon.kagstrom@netinsight.net>

core: Add kernel message dumper to call on oopses and panics

The core functionality is implemented as per Linus suggestion from

http://lists.infradead.org/pipermail/linux-mtd/2009-October/027620.html

(with the kmsg_dump implementation by Linus). A struct kmsg_dumper has
been added which contains a callback to dump the kernel log buffers on
crashes. The kmsg_dump function gets called from oops_exit() and panic()
and invokes this callbacks with the crash reason.

[dwmw2: Fix log_end handling]
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Reviewed-by: Anders Grafstrom <anders.grafstrom@netinsight.net>
Reviewed-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# d014e889 02-Oct-2009 Aaro Koskinen <aaro.koskinen@nokia.com>

panic: Fix panic message visibility by calling bust_spinlocks(0) before dying

Commit ffd71da4e3f ("panic: decrease oops_in_progress only after
having done the panic") moved bust_spinlocks(0) to the end of the
function, which in practice is never reached.

As a result console_unblank() is not called, and on some systems
the user may not see the panic message.

Move it back up to before the unblanking.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1254483680-25578-1-git-send-email-aaro.koskinen@nokia.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# fe002a41 28-Jun-2009 Robert P. J. Day <rpjday@crashcourse.ca>

trivial: Correct print_tainted routine name in comment

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# bdff7870 24-Jul-2009 Thomas Gleixner <tglx@linutronix.de>

trace: stop tracer in oops_enter()

If trace_printk_on_oops is set we lose interesting trace information
when the tracer is enabled across oops handling and printing. We want
the trace which might give us information _WHY_ we oopsed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 0f6f49a8 16-May-2009 Linus Torvalds <torvalds@linux-foundation.org>

Fix caller information for warn_slowpath_null

Ian Campbell noticed that since "Eliminate thousands of warnings with
gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all
WARN_ON()'s currently appear to come from warn_slowpath_null(), eg:

WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20()

because now that warn_slowpath_null() is in the call path, the
__builtin_return_address(0) returns that, rather than the place that
caused the warning.

Fix this by splitting up the warn_slowpath_null/fmt cases differently,
using a common helper function, and getting the return address in the
right place. This also happens to avoid the unnecessary stack usage for
the non-stdargs case, and just generally cleans things up.

Make the function name printout use %pS while at it.

Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 57adc4d2 06-May-2009 Andi Kleen <andi@firstfloor.org>

Eliminate thousands of warnings with gcc 3.2 build

When building with gcc 3.2 I get thousands of warnings such as

include/linux/gfp.h: In function `allocflags_to_migratetype':
include/linux/gfp.h:105: warning: null format string

due to passing a NULL format string to warn_slowpath() in

#define __WARN() warn_slowpath(__FILE__, __LINE__, NULL)

Split this case out into a separate call. This also shrinks the kernel
slightly:

text data bss dec hex filename
4802274 707668 712704 6222646 5ef336 vmlinux
text data bss dec hex filename
4799027 703572 712704 6215303 5ed687 vmlinux

due to removeing one argument from the commonly-called __WARN().

[akpm@linux-foundation.org: reduce scope of `empty']
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b48ccb09 23-Apr-2009 Ingo Molnar <mingo@elte.hu>

locking: clarify kernel-taint warning message

Andi Kleen reported this message triggering on non-lockdep kernels:

Disabling lockdep due to kernel taint

Clarify the message to say 'lock debugging' - debug_locks_off()
turns off all things lock debugging, not just lockdep.

[ Impact: change kernel warning message text ]

Reported-by: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 574bbe78 10-Apr-2009 Frederic Weisbecker <fweisbec@gmail.com>

lockdep: continue lock debugging despite some taints

Impact: broaden lockdep checks

Lockdep is disabled after any kernel taints. This might be convenient
to ignore bad locking issues which sources come from outside the kernel
tree. Nevertheless, it might be a frustrating experience for the
staging developers or those who experience a warning but are focused
on another things that require lockdep.

The v2 of this patch simply don't disable anymore lockdep in case
of TAINT_CRAP and TAINT_WARN events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LTP <ltp-list@lists.sourceforge.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg KH <gregkh@suse.de>
LKML-Reference: <1239412638-6739-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9eeba613 10-Apr-2009 Frederic Weisbecker <fweisbec@gmail.com>

lockdep: warn about lockdep disabling after kernel taint

Impact: provide useful missing info for developers

Kernel taint can occur in several situations such as warnings,
load of prorietary or staging modules, bad page, etc...

But when such taint happens, a developer might still be working on
the kernel, expecting that lockdep is still enabled. But a taint
disables lockdep without ever warning about it.
Such a kernel behaviour doesn't really help for kernel development.

This patch adds this missing warning.

Since the taint is done most of the time after the main message that
explain the real source issue, it seems safe to warn about it inside
add_taint() so that it appears at last, without hurting the main
information.

v2: Use a generic helper to disable lockdep instead of an
open coded xchg().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1239412638-6739-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c95dbf27 13-Mar-2009 Ingo Molnar <mingo@elte.hu>

panic: clean up kernel/panic.c

Impact: cleanup, no code changed

Clean up kernel/panic.c some more and make it more consistent.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d1dedb52 13-Mar-2009 Ingo Molnar <mingo@elte.hu>

panic, smp: provide smp_send_stop() wrapper on UP too

Impact: cleanup, no code changed

Remove an ugly #ifdef CONFIG_SMP from panic(), by providing
an smp_send_stop() wrapper on UP too.

LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ffd71da4 13-Mar-2009 Ingo Molnar <mingo@elte.hu>

panic: decrease oops_in_progress only after having done the panic

Impact: eliminate secondary warnings during panic()

We can panic() in a number of difficult, atomic contexts, hence
we use bust_spinlocks(1) in panic() to increase oops_in_progress,
which prevents various debug checks we have in place.

But in practice this protection only covers the first few printk's
done by panic() - it does not cover the later attempt to stop all
other CPUs and kexec(). If a secondary warning triggers in one of
those facilities that can make the panic message scroll off.

So do bust_spinlocks(0) only much later in panic(). (which code
is only reached if panic policy is relaxed that it can return
after a warning message)

Reported-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91A7E.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5d707e9c 09-Feb-2009 Tejun Heo <tj@kernel.org>

stackprotector: update make rules

Impact: no default -fno-stack-protector if stackp is enabled, cleanup

Stackprotector make rules had the following problems.

* cc support test and warning are scattered across makefile and
kernel/panic.c.

* -fno-stack-protector was always added regardless of configuration.

Update such that cc support test and warning are contained in makefile
and -fno-stack-protector is added iff stackp is turned off. While at
it, prepare for 32bit support.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# d6624f99 06-Jan-2009 Arjan van de Ven <arjan@linux.intel.com>

oops: increment the oops UUID every time we oops

... because we do want repeated same-oops to be seen by automated
tools like kerneloops.org

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a8005992 01-Dec-2008 Arjan van de Ven <arjan@linux.intel.com>

taint: add missing comment

The description for 'D' was missing in the comment... (causing me a
minute of WTF followed by looking at more of the code)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ec5679e5 28-Nov-2008 Ingo Molnar <mingo@elte.hu>

debug warnings: eliminate warn_on_slowpath()

Impact: cleanup, eliminate code

now that warn_on_slowpath() uses warn_slowpath(...,NULL), we can
eliminate warn_on_slowpath() altogether and use warn_slowpath().

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bd89bb29 28-Nov-2008 Arjan van de Ven <arjan@linux.intel.com>

debug warnings: print the DMI board info name in a WARN/WARN_ON

Impact: extend WARN_ON() output with DMI_PRODUCT_NAME

It's very useful for many low level WARN_ON's to find out which
motherboard has the broken BIOS etc... this patch adds a printk
to the WARN_ON code for this.

On architectures without DMI, gcc should optimize the code out.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 74853dba 28-Nov-2008 Arjan van de Ven <arjan@linux.intel.com>

debug warnings: consolidate warn_slowpath and warn_on_slowpath

Impact: cleanup, code reduction

warn_slowpath is a superset of warn_on_slowpath; just have
warn_on_slowpath call warn_slowpath with a NULL 3rd argument.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f44dd164 22-Oct-2008 Rusty Russell <rusty@rustcorp.com.au>

Make panic= and panic_on_oops into core_params

This allows them to be examined and set after boot, plus means they
actually give errors if they are misused (eg. panic=yes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 6d5cd6ef 15-Oct-2008 Randy Dunlap <randy.dunlap@oracle.com>

taint: fix kernel-doc

Move print_tainted() kernel-doc to avoid the following error:

Error(/var/linsrc/mmotm-2008-1002-1617//kernel/panic.c:155): cannot understand prototype: 'struct tnt '

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25ddbb18 15-Oct-2008 Andi Kleen <andi@firstfloor.org>

Make the taint flags reliable

It's somewhat unlikely that it happens, but right now a race window
between interrupts or machine checks or oopses could corrupt the tainted
bitmap because it is modified in a non atomic fashion.

Convert the taint variable to an unsigned long and use only atomic bit
operations on it.

Unfortunately this means the intvec sysctl functions cannot be used on it
anymore.

It turned out the taint sysctl handler could actually be simplified a bit
(since it only increases capabilities) so this patch actually removes
code.

[akpm@linux-foundation.org: remove unneeded include]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 061b1bd3 24-Sep-2008 Greg Kroah-Hartman <gregkh@suse.de>

Staging: add TAINT_CRAP for all drivers/staging code

We need to add a flag for all code that is in the drivers/staging/
directory to prevent all other kernel developers from worrying about
issues here, and to notify users that the drivers might not be as good
as they are normally used to.

Based on code from Andreas Gruenbacher and Jeff Mahoney to provide a
TAINT flag for the support level of a kernel module in the Novell
enterprise kernel release.

This is the kernel portion of this feature, the ability for the flag to
be set needs to be done in the build process and will happen in a
follow-up patch.

Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Jeff Mahoney <jeffm@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# a8f18b90 25-Jul-2008 Arjan van de Ven <arjan@linux.intel.com>

Add a WARN() macro; this is WARN_ON() + printk arguments

Add a WARN() macro that acts like WARN_ON(), with the added feature that it
takes a printk like argument that is printed as part of the warning message.

[akpm@linux-foundation.org: fix printk arguments]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4f962d4d 13-Jul-2008 Ingo Molnar <mingo@elte.hu>

stackprotector: remove self-test

turns out gcc generates such stackprotector-failure sequences
in certain circumstances:

movq -8(%rbp), %rax # D.16032,
xorq %gs:40, %rax #,
jne .L17 #,
leave
ret
.L17:
call __stack_chk_fail #
.size __stack_chk_test_func, .-__stack_chk_test_func
.section .init.text,"ax",@progbits
.type panic_setup, @function
panic_setup:
pushq %rbp #

note that there's no jump back to the failing context after the
call to __stack_chk_fail - i.e. it has a ((noreturn)) attribute.

Which is fair enough in the normal case but kills the self-test.
(as we cannot reliably return in the self-test)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# af9ff786 12-Jul-2008 Arjan van de Ven <arjan@linux.intel.com>

x86: simplify stackprotector self-check

Clean up the code by removing no longer needed code;
make sure the pda is updated and kept in sync

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# aa92db14 11-Jul-2008 Arjan van de Ven <arjan@linux.intel.com>

stackprotector: better self-test

check stackprotector functionality by manipulating the canary briefly
during bootup.

far more robust than trying to overflow the stack. (which is architecture
dependent, etc.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b40a4392 18-Apr-2008 Arjan van de Ven <arjan@linux.intel.com>

stackprotector: turn not having the right gcc into a #warning

If the user selects the stack-protector config option, but does not have
a gcc that has the right bits enabled (for example because it isn't build
with a glibc that supports TLS, as is common for cross-compilers, but also
because it may be too old), then the runtime test fails right now.

This patch adds a warning message for this scenario. This warning accomplishes
two goals
1) the user is informed that the security option he selective isn't available
2) the user is suggested to turn of the CONFIG option that won't work for him,
and would make the runtime test fail anyway.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# b719ac56 14-Apr-2008 Daniel Walker <dwalker@mvista.com>

panic.c: fix whitespace additions

trivial: remove white space addition in stack protector

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 54371a43 15-Feb-2008 Arjan van de Ven <arjan@linux.intel.com>

x86: add CONFIG_CC_STACKPROTECTOR self-test

This patch adds a simple self-test capability to the stackprotector
feature. The test deliberately overflows a stack buffer and then
checks if the canary trap function gets called.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 5cb27301 14-Feb-2008 Ingo Molnar <mingo@elte.hu>

panic: print out stacktrace if DEBUG_BUGVERBOSE

if CONFIG_DEBUG_BUGVERBOSE is set then the user most definitely wanted
to see as much information about kernel crashes as possible - so give
them at least a stack dump.

this is particularly useful for stackprotector related panics, where
the stacktrace can give us the exact location of the (attempted)
attack.

Pointed out by pageexec@freemail.hu in the stackprotector breakage
threads.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 517a92c4 14-Feb-2008 Ingo Molnar <mingo@elte.hu>

panic: print more informative messages on stackprotect failure

pointed out by pageexec@freemail.hu:

we just simply panic() when there's a stackprotector attack - giving
the attacked person no information about what kernel code the attack went
against.

print out the attacked function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 95b570c9 29-Apr-2008 Nur Hussein <nurhussein@gmail.com>

Taint kernel after WARN_ON(condition)

The kernel is sent to tainted within the warn_on_slowpath() function, and
whenever a warning occurs the new taint flag 'W' is set. This is useful to
know if a warning occurred before a BUG by preserving the warning as a flag
in the taint state.

This does not work on architectures where WARN_ON has its own definition.
These archs are:
1. s390
2. superh
3. avr32
4. parisc

The maintainers of these architectures have been added in the Cc: list
in this email to alert them to the situation.

The documentation in oops-tracing.txt has been updated to include the
new flag.

Signed-off-by: Nur Hussein <nurhussein@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ed31e92 04-Feb-2008 Éric Piel <Eric.Piel@tremplin-utc.net>

ACPI: Taint kernel on ACPI table override (format corrected)

When an ACPI table is overridden (for now this can happen only for DSDT)
display a big warning and taint the kernel with flag A.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# 71c33911 30-Jan-2008 Arjan van de Ven <arjan@linux.intel.com>

debug: add the end-of-trace marker and the module list to

Unlike oopses, WARN_ON() currently does't print the loaded modules list.
This makes it harder to take action on certain bug reports. For example,
recently there were a set of WARN_ON()s reported in the mac80211 stack,
which were just signalling a driver bug. It takes then anther round trip
to the bug reporter (if he responds at all) to find out which driver
is at fault.

Another issue is that, unlike oopses, WARN_ON() doesn't currently printk
the helpful "cut here" line, nor the "end of trace" marker.
Now that WARN_ON() is out of line, the size increase due to this is
minimal and it's worth adding.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 79b4cc5e 30-Jan-2008 Arjan van de Ven <arjan@infradead.org>

debug: move WARN_ON() out of line

A quick grep shows that there are currently 1145 instances of WARN_ON
in the kernel. Currently, WARN_ON is pretty much entirely inlined,
which makes it hard to enhance it without growing the size of the kernel
(and getting Andrew unhappy).

This patch build on top of Olof's patch that introduces __WARN,
and places the slowpath out of line. It also uses Ingo's suggestion
to not use __FUNCTION__ but to use kallsyms to do the lookup;
this saves a ton of extra space since gcc doesn't need to store the function
string twice now:

3936367 833603 624736 5394706 525112 vmlinux.before
3917508 833603 624736 5375847 520767 vmlinux-slowpath

15Kb savings...

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Olof Johansson <olof@lixom.net>
Acked-by: Matt Meckall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 2c3b20e9 20-Dec-2007 Arjan van de Ven <arjan@linux.intel.com>

debug: add end-of-oops marker

Right now it's nearly impossible for parsers that collect kernel crashes
from logs or emails (such as www.kerneloops.org) to detect the
end-of-oops condition. In addition, it's not currently possible to
detect whether or not 2 oopses that look alike are actually the same
oops reported twice, or are truly two unique oopses.

This patch adds an end-of-oops marker, and makes the end marker include
a very simple 64-bit random ID to be able to detect duplicate reports.

Normally, this ID is calculated as a late_initcall() (in the hope that
at that time there is enough entropy to get a unique enough ID); however
for early oopses the oops_exit() function needs to generate the ID on
the fly.

We do this all at the _end_ of an oops printout, so this does not impact
our ability to get the most important portions of a crash out to the
console first.

[ Sidenote: the already existing oopses-since-bootup counter we print
during crashes serves as the differentiator between multiple oopses
that trigger during the same bootup. ]

Tested on 32-bit and 64-bit x86. Artificially injected very early
crashes as well, as expected they result in this constant ID after
multiple bootups:

---[ end trace ca143223eefdc828 ]---
---[ end trace ca143223eefdc828 ]---

because the random pools are still all zero. But it all still works
fine and causes no additional problems (which is the main goal of
instrumentation code).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9aa5e993 19-Oct-2007 Daniel Roesen <dr@cluenet.de>

trivial comment wording/typo fix regarding taint flags

Signed-off-by: Daniel Roesen <dr@cluenet.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>


# c277e63f 18-Oct-2007 Daniel Walker <dwalker@mvista.com>

whitespace fixes: panic handling

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bcdcd8e7 17-Jul-2007 Pavel Emelianov <xemul@openvz.org>

Report that kernel is tainted if there was an OOPS

If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 34f5a398 10-Feb-2007 Theodore Ts'o <tytso@mit.edu>

[PATCH] Add TAINT_USER and ability to set taint flags from userspace

Allow taint flags to be set from userspace by writing to
/proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used
when userspace has potentially done something dangerous that might
compromise the kernel. This will allow support personnel to ask further
questions about what may have caused the user taint flag to have been set.

For example, they might examine the logs of the realtime JVM to see if the
Java program has used the really silly, stupid, dangerous, and
completely-non-portable direct access to physical memory feature which MUST
be implemented according to the Real-Time Specification for Java (RTSJ).
Sigh. What were those silly people at Sun thinking?

[akpm@osdl.org: build fix]
[bunk@stusta.de: cleanup]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 29cbc78b 29-Sep-2006 Andi Kleen <ak@linux.intel.com>

[PATCH] x86: Clean up x86 NMI sysctls

Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>


# 3162f751 26-Sep-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] Add the __stack_chk_fail() function

GCC emits a call to a __stack_chk_fail() function when the stack canary is
not matching the expected value.

Since this is a bad security issue; lets panic the kernel rather than limping
along; the kernel really can't be trusted anymore when this happens.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>


# 8da5adda 26-Sep-2006 Don Zickus <dzickus@redhat.com>

[PATCH] x86: Allow users to force a panic on NMI

To quote Alan Cox:

The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.

A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.

This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>


# 068c4579 06-Sep-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] lockdep: do not touch console state when tainting the kernel

Remove an unintended console_verbose() side-effect from add_taint().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 657b3010 14-Aug-2006 Andrew Morton <akpm@osdl.org>

[PATCH] panic.c build fix

kernel/panic.c: In function 'add_taint':
kernel/panic.c:176: warning: implicit declaration of function 'debug_locks_off'

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 2c16e9c8 10-Jul-2006 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] lockdep: disable lock debugging when kernel state becomes untrusted

Disable lockdep debugging in two situations where the integrity of the
kernel no longer is guaranteed: when oopsing and when hitting a
tainting-condition. The goal is to not get weird lockdep traces that don't
make sense or are otherwise undebuggable, to not waste time.

Lockdep assumes that the previous state it knows about is valid to operate,
which is why lockdep turns itself off after the first violation it reports,
after that point it can no longer make that assumption.

A kernel oops means that the integrity of the kernel compromised; in
addition anything lockdep would report is of lesser importance than the
oops.

All the tainting conditions are of similar integrity-violating nature and
also make debugging/diagnosing more difficult.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# aa727107 10-Apr-2006 Adrian Bunk <bunk@stusta.de>

[PATCH] the scheduled unexport of panic_timeout

Implement the scheduled unexport of panic_timeout.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e041c683 27-Mar-2006 Alan Stern <stern@rowland.harvard.edu>

[PATCH] Notifier chain update: API changes

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain

It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# dd287796 23-Mar-2006 Andrew Morton <akpm@osdl.org>

[PATCH] pause_on_oops command line option

Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.

If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.

The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c22db941 10-Feb-2006 Jan Beulich <jbeulich@novell.com>

[PATCH] prevent recursive panic from softlockup watchdog

When panic_timeout is zero, suppress triggering a nested panic due to soft
lockup detection.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 347a8dc3 06-Jan-2006 Martin Schwidefsky <schwidefsky@de.ibm.com>

[PATCH] s390: cleanup Kconfig

Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
S390, 64BIT and COMPAT.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 2f048ea8 26-Jul-2005 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] Call emergency_reboot from panic

We know the system is in trouble so there is no question if this
is an emergecy :)

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 6e274d14 25-Jun-2005 Alexander Nyberg <alexn@telia.com>

[PATCH] kdump: Use real pt_regs from exception

Makes kexec_crashdump() take a pt_regs * as an argument. This allows to
get exact register state at the point of the crash. If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.

This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.

die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.

Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# dc009d92 25-Jun-2005 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] kexec: add kexec syscalls

This patch introduces the architecture independent implementation the
sys_kexec_load, the compat_sys_kexec_load system calls.

Kexec on panic support has been integrated into the core patch and is
relatively clean.

In addition the hopefully architecture independent option
crashkernel=size@location has been docuemented. It's purpose is to reserve
space for the panic kernel to live, and where no DMA transfer will ever be
setup to access.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a271c241 24-Apr-2005 Tom 'spot' Callaway <tcallawa@redhat.com>

[SPARC]: Stop-A printk cleanup

This patch is incredibly trivial, but it does resolve some of the user
confusion as to what "L1-A" actually is.

Clarify printk message to refer to Stop-A (L1-A).

Gentoo has a virtually identical patch in their kernel sources.

Signed-off-by: Tom 'spot' Callaway <tcallawa@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!