History log of /linux-master/include/linux/kernel.h
Revision Date Author Comments
# 19f0423f 23-Feb-2024 Huang Yiwei <quic_hyiwei@quicinc.com>

tracing: Support to dump instance traces by ftrace_dump_on_oops

Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to
check specific problems since global trace buffer may be used for
other purposes.

This patch extend the ftrace_dump_on_oops parameter to dump a specific
or multiple trace instances:

- ftrace_dump_on_oops=0: as before -- don't dump
- ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer
on all CPUs
- ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global
trace buffer on CPU that triggered the oops
- ftrace_dump_on_oops=<instance_name>: new behavior -- dump the
tracing instance matching <instance_name>
- ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu],
<instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace
buffer and multiple instance buffer on all CPUs, or only dump on CPU
that triggered the oops if =2 or =orig_cpu is given

Also, the sysctl node can handle the input accordingly.

Link: https://lore.kernel.org/linux-trace-kernel/20240223083126.1817731-1-quic_hyiwei@quicinc.com

Cc: Ross Zwisler <zwisler@google.com>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mcgrof@kernel.org>
Cc: <keescook@chromium.org>
Cc: <j.granados@samsung.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <corbet@lwn.net>
Signed-off-by: Huang Yiwei <quic_hyiwei@quicinc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# 57914905 03-Oct-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: Move lib/cmdline.c prototypes to string.h

The lib/cmdline.c is basically a set of some small string parsers
which are wide used in the kernel. Their prototypes belong to the
string.h rather then kernel.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20231003130142.2936503-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# adeb0436 14-Feb-2024 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: Move upper_*_bits() and lower_*_bits() to wordpart.h

The wordpart.h header is collecting APIs related to the handling
parts of the word (usually in byte granularity). The upper_*_bits()
and lower_*_bits() are good candidates to be moved to there.

This helps to clean up header dependency hell with regard to kernel.h
as the latter gathers completely unrelated stuff together and slows
down compilation (especially when it's included into other header).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240214172752.3605073-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>


# 66a5c40f 26-Dec-2023 Tanzir Hasan <tanzhasanwork@gmail.com>

kernel.h: removed REPEAT_BYTE from kernel.h

This patch creates wordpart.h and includes it in asm/word-at-a-time.h
for all architectures. WORD_AT_A_TIME_CONSTANTS depends on kernel.h
because of REPEAT_BYTE. Moving this to another header and including it
where necessary allows us to not include the bloated kernel.h. Making
this implicit dependency on REPEAT_BYTE explicit allows for later
improvements in the lib/string.c inclusion list.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20231226-libstringheader-v6-1-80aa08c7652c@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# 3cd39bc3 03-Oct-2023 Alejandro Colomar <alx@kernel.org>

kernel.h: Move ARRAY_SIZE() to a separate header

Touching files so used for the kernel,
forces 'make' to recompile most of the kernel.

Having those definitions in more granular files
helps avoid recompiling so much of the kernel.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20230817143352.132583-2-lucas.segarra.fernandez@intel.com
[andy: reduced to cover only string.h for now]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 39ced19b 14-Aug-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib/vsprintf: split out sprintf() and friends

Patch series "lib/vsprintf: Rework header inclusions", v3.

Some patches that reduce the mess with the header inclusions related to
vsprintf.c module. Each patch has its own description, and has no
dependencies to each other, except the collisions over modifications of
the same places. Hence the series.


This patch (of 2):

kernel.h is being used as a dump for all kinds of stuff for a long time.
sprintf() and friends are used in many drivers without need of the full
kernel.h dependency train with it.

Here is the attempt on cleaning it up by splitting out sprintf() and
friends.

Link: https://lkml.kernel.org/r/20230814163344.17429-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20230814163344.17429-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# b229baa3 18-Jul-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out COUNT_ARGS() and CONCATENATE() to args.h

Patch series "kernel.h: Split out a couple of macros to args.h", v4.

There are macros in kernel.h that can be used outside of that header.
Split them to args.h and replace open coded variants.


This patch (of 4):

kernel.h is being used as a dump for all kinds of stuff for a long time.
The COUNT_ARGS() and CONCATENATE() macros may be used in some places
without need of the full kernel.h dependency train with it.

Here is the attempt on cleaning it up by splitting out these macros().

While at it, include new header where it's being used.

Link: https://lkml.kernel.org/r/20230718211147.18647-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20230718211147.18647-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [PCI]
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gow <davidgow@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 890a3ee3 23-Mar-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split the hexadecimal related helpers to hex.h

For the sake of cleaning up the kernel.h split the hexadecimal related
helpers to own header called 'hex.h'.

Link: https://lkml.kernel.org/r/20230323155029.40000-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 2455f0e1 15-Feb-2023 Ross Zwisler <zwisler@chromium.org>

tracing: Always use canonical ftrace path

The canonical location for the tracefs filesystem is at /sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

Before 4.1, all ftrace tracing control files were within the debugfs
file system, which is typically located at /sys/kernel/debug/tracing.
For backward compatibility, when mounting the debugfs file system,
the tracefs file system will be automatically mounted at:

/sys/kernel/debug/tracing

Many comments and Kconfig help messages in the tracing code still refer
to this older debugfs path, so let's update them to avoid confusion.

Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>


# e5be1576 25-Apr-2022 Mikulas Patocka <mpatocka@redhat.com>

hex2bin: make the function hex_to_bin constant-time

The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.

This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.

Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.

I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 868e6139 27-Mar-2022 Keith Busch <kbusch@kernel.org>

block: move lower_48_bits() to block

The function is not generally applicable enough to be included in the core
kernel header. Move it to block since it's the only subsystem using it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220327173316.315-1-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7ee8809d 03-Mar-2022 Keith Busch <kbusch@kernel.org>

linux/kernel: introduce lower_48_bits function

Recent data integrity field enhancements allow reference tags to be up
to 48 bits. Introduce an inline helper function since this will be a
repeated operation.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-5-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 99cf983c 14-Feb-2022 Mark Rutland <mark.rutland@arm.com>

sched/preempt: Add PREEMPT_DYNAMIC using static keys

Where an architecture selects HAVE_STATIC_CALL but not
HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
which will either branch to a callee or return to the caller.

On such architectures, a number of constraints can conspire to make
those trampolines more complicated and potentially less useful than we'd
like. For example:

* Hardware and software control flow integrity schemes can require the
addition of "landing pad" instructions (e.g. `BTI` for arm64), which
will also be present at the "real" callee.

* Limited branch ranges can require that trampolines generate or load an
address into a register and perform an indirect branch (or at least
have a slow path that does so). This loses some of the benefits of
having a direct branch.

* Interaction with SW CFI schemes can be complicated and fragile, e.g.
requiring that we can recognise idiomatic codegen and remove
indirections understand, at least until clang proves more helpful
mechanisms for dealing with this.

For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
really only need to enable/disable specific preemption functions. We can
achieve the same effect without a number of the pain points above by
using static keys to fold early returns into the preemption functions
themselves rather than in an out-of-line trampoline, effectively
inlining the trampoline into the start of the function.

For arm64, this results in good code generation. For example, the
dynamic_cond_resched() wrapper looks as follows when enabled. When
disabled, the first `B` is replaced with a `NOP`, resulting in an early
return.

| <dynamic_cond_resched>:
| bti c
| b <dynamic_cond_resched+0x10> // or `nop`
| mov w0, #0x0
| ret
| mrs x0, sp_el0
| ldr x0, [x0, #8]
| cbnz x0, <dynamic_cond_resched+0x8>
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret

... compared to the regular form of the function:

| <__cond_resched>:
| bti c
| mrs x0, sp_el0
| ldr x1, [x0, #8]
| cbz x1, <__cond_resched+0x18>
| mov w0, #0x0
| ret
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret

Any architecture which implements static keys should be able to use this
to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
calls. Since this is likely to have greater overhead than (inlined)
static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
HAVE_PREEMPT_DYNAMIC_CALL is selected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com


# 40cbf09f 19-Jan-2022 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: include a note to discourage people from including it in headers

Include a note at the top to discourage people from including it in
headers.

Link: https://lkml.kernel.org/r/20211209150803.4473-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cead1855 22-Nov-2021 Eric W. Biederman <ebiederm@xmission.com>

exit: Rename complete_and_exit to kthread_complete_and_exit

Update complete_and_exit to call kthread_exit instead of do_exit.

Change the name to reflect this change in functionality. All of the
users of complete_and_exit are causing the current kthread to exit so
this change makes it clear what is happening.

Move the implementation of kthread_complete_and_exit from
kernel/exit.c to to kernel/kthread.c. As this function is kthread
specific it makes most sense to live with the kthread functions.

There are no functional change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# a8b76910 10-Nov-2021 Valentin Schneider <valentin.schneider@arm.com>

preempt: Restore preemption model selection configs

Commit c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic
preempt mode") changed the selectable config names for the preemption
model. This means a config file must now select

CONFIG_PREEMPT_BEHAVIOUR=y

rather than

CONFIG_PREEMPT=y

to get a preemptible kernel. This means all arch config files would need to
be updated - right now they'll all end up with the default
CONFIG_PREEMPT_NONE_BEHAVIOUR.

Rather than touch a good hundred of config files, restore usage of
CONFIG_PREEMPT{_NONE, _VOLUNTARY}. Make them configure:
o The build-time preemption model when !PREEMPT_DYNAMIC
o The default boot-time preemption model when PREEMPT_DYNAMIC

Add siblings of those configs with the _BUILD suffix to unconditionally
designate the build-time preemption model (PREEMPT_DYNAMIC is built with
the "highest" preemption model it supports, aka PREEMPT). Downstream
configs should by now all be depending / selected by CONFIG_PREEMPTION
rather than CONFIG_PREEMPT, so only a few sites need patching up.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20211110202448.4054153-2-valentin.schneider@arm.com


# b9ad8fe7 08-Nov-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

sections: move is_kernel_inittext() into sections.h

The is_kernel_inittext() and init_kernel_text() are with same
functionality, let's just keep is_kernel_inittext() and move it into
sections.h, then update all the callers.

Link: https://lkml.kernel.org/r/20210930071143.63410-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a20deb3a 08-Nov-2021 Kefeng Wang <wangkefeng.wang@huawei.com>

sections: move and rename core_kernel_data() to is_kernel_core_data()

Move core_kernel_data() into sections.h and rename it to
is_kernel_core_data(), also make it return bool value, then update all the
callers.

Link: https://lkml.kernel.org/r/20210930071143.63410-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e52340de 08-Nov-2021 Stephen Rothwell <sfr@canb.auug.org.au>

kernel.h: split out instruction pointer accessors

bottom_half.h needs _THIS_IP_ to be standalone, so split that and
_RET_IP_ out from kernel.h into the new instruction_pointer.h. kernel.h
directly needs them, so include it there and replace the include of
kernel.h with this new file in bottom_half.h.

Link: https://lkml.kernel.org/r/20211028161248.45232-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d2a8ebbf 08-Nov-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out container_of() and typeof_member() macros

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt cleaning it up by splitting out container_of() and
typeof_member() macros.

For time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

Note, there are _a lot_ of headers and modules that include kernel.h
solely for one of these macros and this allows to unburden compiler for
the twisted inclusion paths and to make new code cleaner in the future.

Link: https://lkml.kernel.org/r/20211013170417.87909-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d2635f20 05-Nov-2021 Christophe Leroy <christophe.leroy@csgroup.eu>

mm: create a new system state and fix core_kernel_text()

core_kernel_text() considers that until system_state in at least
SYSTEM_RUNNING, init memory is valid.

But init memory is freed a few lines before setting SYSTEM_RUNNING, so
we have a small period of time when core_kernel_text() is wrong.

Create an intermediate system state called SYSTEM_FREEING_INIT that is
set before starting freeing init memory, and use it in
core_kernel_text() to report init memory invalid earlier.

Link: https://lkml.kernel.org/r/9ecfdee7dd4d741d172cb93ff1d87f1c58127c9a.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 50e081b9 23-Sep-2021 Thomas Gleixner <tglx@linutronix.de>

sched: Make RCU nest depth distinct in __might_resched()

For !RT kernels RCU nest depth in __might_resched() is always expected to
be 0, but on RT kernels it can be non zero while the preempt count is
expected to be always 0.

Instead of playing magic games in interpreting the 'preempt_offset'
argument, rename it to 'offsets' and use the lower 8 bits for the expected
preempt count, allow to hand in the expected RCU nest depth in the upper
bits and adopt the __might_resched() code and related checks and printks.

The affected call sites are updated in subsequent steps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210923165358.243232823@linutronix.de


# 42a38756 23-Sep-2021 Thomas Gleixner <tglx@linutronix.de>

sched: Remove preempt_offset argument from __might_sleep()

All callers hand in 0 and never will hand in anything else.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210923165358.054321586@linutronix.de


# 874f670e 23-Sep-2021 Thomas Gleixner <tglx@linutronix.de>

sched: Clean up the might_sleep() underscore zoo

__might_sleep() vs. ___might_sleep() is hard to distinguish. Aside of that
the three underscore variant is exposed to provide a checkpoint for
rescheduling points which are distinct from blocking points.

They are semantically a preemption point which means that scheduling is
state preserving. A real blocking operation, e.g. mutex_lock(), wait*(),
which cannot preserve a task state which is not equal to RUNNING.

While technically blocking on a "sleeping" spinlock in RT enabled kernels
falls into the voluntary scheduling category because it has to wait until
the contended spin/rw lock becomes available, the RT lock substitution code
can semantically be mapped to a voluntary preemption because the RT lock
substitution code and the scheduler are providing mechanisms to preserve
the task state and to take regular non-lock related wakeups into account.

Rename ___might_sleep() to __might_resched() to make the distinction of
these functions clear.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210923165357.928693482@linutronix.de


# c0891ac1 02-Aug-2021 Alexey Dobriyan <adobriyan@gmail.com>

isystem: ship and use stdarg.h

Ship minimal stdarg.h (1 type, 4 macros) as <linux/stdarg.h>.
stdarg.h is the only userspace header commonly used in the kernel.

GPL 2 version of <stdarg.h> can be extracted from
http://archive.debian.org/debian/pool/main/g/gcc-4.2/gcc-4.2_4.2.4.orig.tar.gz

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>


# 4c527293 30-Jun-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out kstrtox() and simple_strtox() to a separate header

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out kstrtox() and
simple_strtox() helpers.

At the same time convert users in header and lib folders to use new
header. Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.

[andy.shevchenko@gmail.com: fix documentation references]
Link: https://lkml.kernel.org/r/20210615220003.377901-1-andy.shevchenko@gmail.com

Link: https://lkml.kernel.org/r/20210611185815.44103-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Francis Laniel <laniel_francis@privacyrequired.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kars Mulder <kerneldev@karsmulder.nl>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f39650de 30-Jun-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out panic and oops helpers

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.

There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain

At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.

[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 03cb4473 09-Jun-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: add low level PTP clock access functions

Add the ice_ptp_hw.c file and some associated definitions to the ice
driver folder. This file contains basic low level definitions for
functions that interact with the device hardware.

For now, only E810-based devices are supported. The ice hardware
supports 2 major variants which have different PHYs with different
procedures necessary for interacting with the device clock.

Because the device captures timestamps in the PHY, each PHY has its own
internal timer. The timers are synchronized in hardware by first
preparing the source timer and the PHY timer shadow registers, and then
issuing a synchronization command. This ensures that both the source
timer and PHY timers are programmed simultaneously. The timers
themselves are all driven from the same oscillator source.

The functions in ice_ptp_hw.c abstract over the differences between how
the PHYs in E810 are programmed vs how the PHYs in E822 devices are
programmed. This series only implements E810 support, but E822 support
will be added in a future change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 79270291 28-Jun-2021 Stephen Boyd <swboyd@chromium.org>

slub: force on no_hash_pointers when slub_debug is enabled

Obscuring the pointers that slub shows when debugging makes for some
confusing slub debug messages:

Padding overwritten. 0x0000000079f0674a-0x000000000d4dce17

Those addresses are hashed for kernel security reasons. If we're trying
to be secure with slub_debug on the commandline we have some big problems
given that we dump whole chunks of kernel memory to the kernel logs.
Let's force on the no_hash_pointers commandline flag when slub_debug is on
the commandline. This makes slub debug messages more meaningful and if by
chance a kernel address is in some slub debug object dump we will have a
better chance of figuring out what went wrong.

Note that we don't use %px in the slub code because we want to reduce the
number of places that %px is used in the kernel. This also nicely prints
a big fat warning at kernel boot if slub_debug is on the commandline so
that we know that this kernel shouldn't be used on production systems.

[akpm@linux-foundation.org: fix build with CONFIG_SLUB_DEBUG=n]

Link: https://lkml.kernel.org/r/20210601182202.3011020-5-swboyd@chromium.org
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Petr Mladek <pmladek@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0ab1438b 05-May-2021 Masahiro Yamada <masahiroy@kernel.org>

linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>

<linux/kconfig.h> is included from all the kernel-space source files,
including C, assembly, linker scripts. It is intended to contain a
minimal set of macros to evaluate CONFIG options.

IF_ENABLED() is an intruder here because (x ? y : z) is C code, which
should not be included from assembly files or linker scripts.

Also, <linux/kconfig.h> is no longer self-contained because NULL is
defined in <linux/stddef.h>.

Move IF_ENABLED() out to <linux/kernel.h> as PTR_IF(). PTF_IF()
takes the general boolean expression instead of a CONFIG option
so that it fits better in <linux/kernel.h>.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>


# 08c5188e 06-May-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: drop inclusion in bitmap.h

The bitmap.h header is used in a lot of code around the kernel. Besides
that it includes kernel.h which sometimes makes a loop.

The problem here is many unneeded loops that make header hell
dependencies. For example, how may you move bitmap_zalloc() from C-file
to the header? Currently it's impossible. And bitmap.h here is only the
tip of an iceberg.

kerne.h is a dump of everything that even has nothing in common at all.
We may still have it, but in my new code I prefer to include only the
headers that I want to use, without the bulk of unneeded kernel code.

Break the loop by introducing align.h, including it in kernel.h and
bitmap.h followed by replacing kernel.h with limits.h.

Link: https://lkml.kernel.org/r/20210326170347.37441-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ef72661e 25-Jan-2021 Peter Zijlstra <peterz@infradead.org>

sched: Harden PREEMPT_DYNAMIC

Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport
the static_call_key for the PREEMPT_DYNAMIC calls such that modules
can no longer update these calls.

Having modules change/hi-jack the preemption calls would be horrible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b965f1dd 18-Jan-2021 Peter Zijlstra (Intel) <peterz@infradead.org>

preempt/dynamic: Provide cond_resched() and might_resched() static calls

Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.

[fweisbec: branch might_resched() directly to __cond_resched(), only
define static calls when PREEMPT_DYNAMIC]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org


# aa6159ab 15-Dec-2020 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out mathematical helpers

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out
mathematical helpers.

At the same time convert users in header and lib folder to use new
header. Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.

[sfr@canb.auug.org.au: fix powerpc build]
Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au

Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 74d862b6 18-Nov-2020 Thomas Gleixner <tglx@linutronix.de>

sched: Make migrate_disable/enable() independent of RT

Now that the scheduler can deal with migrate disable properly, there is no
real compelling reason to make it only available for RT.

There are quite some code pathes which needlessly disable preemption in
order to prevent migration and some constructs like kmap_atomic() enforce
it implicitly.

Making it available independent of RT allows to provide a preemptible
variant of kmap_atomic() and makes the code more consistent in general.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Grudgingly-Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20201118204007.269943012@linutronix.de


# dfe56404 31-Aug-2020 chao <chao@eero.com>

rcu: Panic after fixed number of stalls

Some stalls are transient, so that system fully recovers. This commit
therefore allows users to configure the number of stalls that must happen
in order to trigger kernel panic.

Signed-off-by: chao <chao@eero.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>


# 33def849 21-Oct-2020 Joe Perches <joe@perches.com>

treewide: Convert macro and uses of __section(foo) to __section("foo")

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b296a6d5 15-Oct-2020 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: split out min()/max() et al. helpers

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out min()/max()
et al. helpers.

At the same time convert users in header and lib folder to use new header.
Though for time being include new header back to kernel.h to avoid
twisted indirected includes for other existing users.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 36d818f6 11-Sep-2020 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: Move oops_in_progress to printk.h

The oops_in_progress is defined in printk.c, so it's logical
to move oops_in_progress to printk.h.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200911170202.8565-1-andriy.shevchenko@linux.intel.com


# ef91bb19 28-Aug-2020 Herbert Xu <herbert@gondor.apana.org.au>

kernel.h: Silence sparse warning in lower_32_bits

I keep getting sparse warnings in crypto such as:

CHECK drivers/crypto/ccree/cc_hash.c
drivers/crypto/ccree/cc_hash.c:49:9: warning: cast truncates bits from constant value (47b5481dbefa4fa4 becomes befa4fa4)
drivers/crypto/ccree/cc_hash.c:49:26: warning: cast truncates bits from constant value (db0c2e0d64f98fa7 becomes 64f98fa7)
[.. many more ..]

This patch removes the warning by adding a mask to keep sparse
happy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 63037f74 11-Aug-2020 Yue Hu <huyue2@yulong.com>

panic: make print_oops_end_marker() static

Since print_oops_end_marker() is not used externally, also remove it in
kernel.h at the same time.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20200724011516.12756-1-zbestahu@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 79076e12 11-Aug-2020 Tiezhu Yang <yangtiezhu@loongson.cn>

kernel/panic.c: make oops_may_print() return bool

The return value of oops_may_print() is true or false, so change its type
to reflect that.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Link: http://lkml.kernel.org/r/1591103358-32087-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ef0f2685 11-Aug-2020 Kars Mulder <kerneldev@karsmulder.nl>

kstrto*: do not describe simple_strto*() as obsolete/replaced

The documentation of the kstrto*() functions describes kstrto*() as
"replacements" of the "obsolete" simple_strto*() functions. Both of these
terms are inaccurate: they're not replacements because they have different
behaviour, and the simple_strto*() are not obsolete because there are
cases where they have benefits over kstrto*().

Remove usage of the terms "replacement" and "obsolete" in reference to
simple_strto*(), and instead use the term "preferred over".

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Fixes: 885e68e8b7b13 ("kernel.h: update comment about simple_strto<foo>() functions")
Signed-off-by: Kars Mulder <kerneldev@karsmulder.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Eldad Zack <eldad@fogrefinery.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Mans Rullgard <mans@mansr.com>
Cc: Petr Mladek <pmladek@suse.com>
Link: http://lkml.kernel.org/r/29b9-5f234c80-13-4e3aa200@244003027
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b642e44e 11-Aug-2020 Kars Mulder <kerneldev@karsmulder.nl>

kstrto*: correct documentation references to simple_strto*()

The documentation of the kstrto*() functions reference the simple_strtoull
function by "used as a replacement for [the obsolete] simple_strtoull".
All these functions describes themselves as replacements for the function
simple_strtoull, even though a function like kstrtol() would be more aptly
described as a replacement of simple_strtol().

Fix these references by making the documentation of kstrto*() reference
the closest simple_strto*() equivalent available. The functions
kstrto[u]int() do not have direct simple_strto[u]int() equivalences, so
these are made to refer to simple_strto[u]l() instead.

Furthermore, add parentheses after function names, as is standard in
kernel documentation.

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Signed-off-by: Kars Mulder <kerneldev@karsmulder.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Eldad Zack <eldad@fogrefinery.com>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Mans Rullgard <mans@mansr.com>
Cc: Petr Mladek <pmladek@suse.com>
Link: http://lkml.kernel.org/r/1ee1-5f234c00-f3-165a6440@234394593
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 37665343 11-Aug-2020 Arvind Sankar <nivedita@alum.mit.edu>

kernel.h: remove duplicate include of asm/div64.h

This seems to have been added inadvertently in commit
72deb455b5ec ("block: remove CONFIG_LBDAF")

Fixes: 72deb455b5ec ("block: remove CONFIG_LBDAF")
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20200727034852.2813453-1-nivedita@alum.mit.edu
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 229f5879 22-Jul-2020 Kishon Vijay Abraham I <kishon@ti.com>

linux/kernel.h: Add PTR_ALIGN_DOWN macro

Add a macro for aligning down a pointer. This is useful to get an
aligned register address when a device allows only word access and
doesn't allow half word or byte access.

Link: https://lore.kernel.org/r/20200722110317.4744-4-kishon@ti.com
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Rob Herring <robh@kernel.org>


# 60c958d8 07-Jun-2020 Guilherme G. Piccoli <gpiccoli@canonical.com>

panic: add sysctl to dump all CPUs backtraces on oops event

Usually when the kernel reaches an oops condition, it's a point of no
return; in case not enough debug information is available in the kernel
splat, one of the last resorts would be to collect a kernel crash dump
and analyze it. The problem with this approach is that in order to
collect the dump, a panic is required (to kexec-load the crash kernel).
When in an environment of multiple virtual machines, users may prefer to
try living with the oops, at least until being able to properly shutdown
their VMs / finish their important tasks.

This patch implements a way to collect a bit more debug details when an
oops event is reached, by printing all the CPUs backtraces through the
usage of NMIs (on architectures that support that). The sysctl added
(and documented) here was called "oops_all_cpu_backtrace", and when set
will (as the name suggests) dump all CPUs backtraces.

Far from ideal, this may be the last option though for users that for
some reason cannot panic on oops. Most of times oopses are clear enough
to indicate the kernel portion that must be investigated, but in virtual
environments it's possible to observe hypervisor/KVM issues that could
lead to oopses shown in other guests CPUs (like virtual APIC crashes).
This patch hence aims to help debug such complex issues without
resorting to kdump.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# db38d5c1 07-Jun-2020 Rafael Aquini <aquini@redhat.com>

kernel: add panic_on_taint

Analogously to the introduction of panic_on_warn, this patch introduces
a kernel option named panic_on_taint in order to provide a simple and
generic way to stop execution and catch a coredump when the kernel gets
tainted by any given flag.

This is useful for debugging sessions as it avoids having to rebuild the
kernel to explicitly add calls to panic() into the code sites that
introduce the taint flags of interest.

For instance, if one is interested in proceeding with a post-mortem
analysis at the point a given code path is hitting a bad page (i.e.
unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
by rebooting the kernel with 'panic_on_taint=0x20' amended to the
command line.

Another, perhaps less frequent, use for this option would be as a means
for assuring a security policy case where only a subset of taints, or no
single taint (in paranoid mode), is allowed for the running system. The
optional switch 'nousertaint' is handy in this particular scenario, as
it will avoid userspace induced crashes by writes to sysctl interface
/proc/sys/kernel/tainted causing false positive hits for such policies.

[akpm@linux-foundation.org: tweak kernel-parameters.txt wording]

Suggested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4e139c77 14-Feb-2020 Thomas Gleixner <tglx@linutronix.de>

sched: Provide cant_migrate()

Some code pathes rely on preempt_disable() to prevent migration on a non RT
enabled kernel. These preempt_disable/enable() pairs are substituted by
migrate_disable/enable() pairs or other forms of RT specific protection. On
RT these protections prevent migration but not preemption. Obviously a
cant_sleep() check in such a section will trigger on RT because preemption
is not disabled.

Provide a cant_migrate() macro which maps to cant_sleep() on a non RT
kernel and an empty placeholder for RT for now. The placeholder will be
changed to a proper debug check along with the RT specific migration
protection mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de


# 1f07dcc4 30-Dec-2019 Kees Cook <keescook@chromium.org>

kernel.h: Remove unused FIELD_SIZEOF()

Now that all callers of FIELD_SIZEOF() have been converted to
sizeof_field(), remove the unused prior macro.

Signed-off-by: Kees Cook <keescook@chromium.org>


# 885e68e8 04-Dec-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

kernel.h: update comment about simple_strto<foo>() functions

There were discussions in the past about use cases for
simple_strto<foo>() functions and, in some rare cases, they have a
benefit over kstrto<foo>() ones.

Update a comment to reduce confusion about special use cases.

Link: http://lkml.kernel.org/r/20190801192904.41087-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Mans Rullgard <mans@mansr.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2f30b369 21-Nov-2019 Will Deacon <will@kernel.org>

locking/refcount: Remove unused 'refcount_error_report()' function

'refcount_error_report()' has no callers. Remove it.

Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191121115902.2551-10-will@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 312364f3 26-Aug-2019 Daniel Vetter <daniel.vetter@ffwll.ch>

kernel.h: Add non_block_start/end()

In some special cases we must not block, but there's not a spinlock,
preempt-off, irqs-off or similar critical section already that arms the
might_sleep() debug checks. Add a non_block_start/end() pair to annotate
these.

This will be used in the oom paths of mmu-notifiers, where blocking is not
allowed to make sure there's forward progress. Quoting Michal:

"The notifier is called from quite a restricted context - oom_reaper -
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially."

Peter also asked whether we want to catch spinlocks on top, but Michal
said those are less of a problem because spinlocks can't have an indirect
dependency upon the page allocator and hence close the loop with the oom
reaper.

Suggested by Michal Hocko.

Link: https://lore.kernel.org/r/20190826201425.17547-4-daniel.vetter@ffwll.ch
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# ce251e0e 16-Jul-2019 Alexey Dobriyan <adobriyan@gmail.com>

include/linux/kernel.h: add typeof_member() macro

Add typeof_member() macro so that types can be extracted without
introducing dummy variables.

Link: http://lkml.kernel.org/r/20190529190720.GA5703@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8f9fab48 28-Jun-2019 Vinod Koul <vkoul@kernel.org>

linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL

DIV_ROUND_UP_ULL adds the two arguments and then invokes
DIV_ROUND_DOWN_ULL. But on a 32bit system the addition of two 32 bit
values can overflow. DIV_ROUND_DOWN_ULL does it correctly and stashes
the addition into a unsigned long long so cast the result to unsigned
long long here to avoid the overflow condition.

[akpm@linux-foundation.org: DIV_ROUND_UP_ULL must be an rval]
Link: http://lkml.kernel.org/r/20190625100518.30753-1-vkoul@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9f615894 14-May-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib/math: move int_pow() from pwm_bl.c for wider use

The integer exponentiation is used in few places and might be used in
the future by other call sites. Move it to wider use.

Link: http://lkml.kernel.org/r/20190323172531.80025-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 72deb455 05-Apr-2019 Christoph Hellwig <hch@lst.de>

block: remove CONFIG_LBDAF

Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
architectures. These types are required to support block device and/or
file sizes larger than 2 TiB, and have generally defaulted to on for
a long time. Enabling the option only increases the i386 tinyconfig
size by 145 bytes, and many data structures already always use
64-bit values for their in-core and on-disk data structures anyway,
so there should not be a large change in dynamic memory usage either.

Dropping this option removes a somewhat weird non-default config that
has cause various bugs or compiler warnings when actually used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a0fe2c64 29-Mar-2019 Jann Horn <jannh@google.com>

linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()

Use parentheses around uses of the argument in u64_to_user_ptr() to
ensure that the cast doesn't apply to part of the argument.

There are existing uses of the macro of the form

u64_to_user_ptr(A + B)

which expands to

(void __user *)(uintptr_t)A + B

(the cast applies to the first operand of the addition, the addition
is a pointer addition). This happens to still work as intended, the
semantic difference doesn't cause a difference in behavior.

But I want to use u64_to_user_ptr() with a ternary operator in the
argument, like so:

u64_to_user_ptr(A ? B : C)

This currently doesn't work as intended.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com


# 54d50897 07-Mar-2019 Masahiro Yamada <yamada.masahiro@socionext.com>

linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>

<linux/kernel.h> tends to be cluttered because we often put various sort
of unrelated stuff in it. So, we have split out a sensible chunk of
code into a separate header from time to time.

This commit splits out the *_MAX and *_MIN defines.

The standard header <limits.h> contains various MAX, MIN constants
including numerial limits. [1]

I think it makes sense to move in-kernel MAX, MIN constants into
include/linux/limits.h.

We already have include/uapi/linux/limits.h to contain some user-space
constants. I changed its include guard to _UAPI_LINUX_LIMITS_H. This
change has no impact to the user-space because
scripts/headers_install.sh rips off the '_UAPI' prefix from the include
guards of exported headers.

[1] http://pubs.opengroup.org/onlinepubs/009604499/basedefs/limits.h.html

Link: http://lkml.kernel.org/r/1549156242-20806-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Alex Elder <elder@linaro.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2dc0e68d 07-Mar-2019 Masahiro Yamada <yamada.masahiro@socionext.com>

linux/kernel.h: use 'short' to define USHRT_MAX, SHRT_MAX, SHRT_MIN

The commit log of 44f564a4bf6a ("ipc: add definitions of USHORT_MAX and
others") did not explain why it used (s16) and (u16) instead of (short)
and (unsigned short).

Let's use (short) and (unsigned short), which is more sensible, and more
consistent with the other MAX/MIN defines.

As you see in include/uapi/asm-generic/int-ll64.h, s16/u16 are
typedef'ed as signed/unsigned short. So, this commit does not have a
functional change.

Remove the unneeded parentheses around ~0U while we are here.

Link: http://lkml.kernel.org/r/1549156242-20806-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b95c4d18 07-Mar-2019 Randy Dunlap <rdunlap@infradead.org>

<linux/kernel.h>: drop the gcc-3.3 'const' hack in roundup()

The single quotation marks around "const" were causing a documentation
markup warning with reST. Instead of fixing that warning, just delete
that comment line and the gcc-3.3 hack of using "const" in the roundup()
macro since gcc-3.3 is no longer supported for kernel builds.

I did around 20 different $arch builds with no problems, but we'll just
have to see if this causes problems for anyone else out there.

Link: http://lkml.kernel.org/r/ec5dcf72-7c3e-3513-af0c-4003ed598854@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c461aed3 07-Mar-2019 Jani Nikula <jani.nikula@intel.com>

kernel.h: unconditionally include asm/div64.h for do_div()

Include asm/div64.h for do_div() usage in DIV_ROUND_DOWN_ULL() and
DIV_ROUND_CLOSEST_ULL(). Remove the old CONFIG_LBDAF=y conditional
include.

Link: http://lkml.kernel.org/r/20181228153430.23763-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 568f1967 28-Jan-2019 Peter Zijlstra <peterz@infradead.org>

bpf: check that BPF programs run with preemption disabled

Introduce cant_sleep() macro for annotation of functions that
cannot sleep.

Use it in BPF_PROG_RUN to catch execution of BPF programs in
preemptable context.

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 81c9d43f 03-Jan-2019 Feng Tang <feng.tang@intel.com>

kernel/sysctl: add panic_print into sysctl

So that we can also runtime chose to print out the needed system info
for panic, other than setting the kernel cmdline.

Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cedc5b6a 21-Aug-2018 Kees Cook <keescook@chromium.org>

kernel.h: documentation for roundup() vs round_up()

Things like 3619dec5103d ("dh key: fix rounding up KDF output length")
expose the lack of explicit documentation for roundup() vs round_up(). At
least we can try to document it better if anyone goes looking.

Link: http://lkml.kernel.org/r/20180703041950.GA43464@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8730662d 24-Apr-2018 Wei Wang <wvw@google.com>

kernel.h: Fix a typo in comment

Signed-off-by: Wei Wang <wvw@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Crt Mori <cmo@melexis.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: wei.vince.wang@gmail.com
Link: https://lkml.kernel.org/lkml/20180424212241.16013-1-wvw@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1c4bc43d 07-Jun-2018 Stefan Agner <stefan@agner.ch>

mm/memblock: introduce PHYS_ADDR_MAX

So far code was using ULLONG_MAX and type casting to obtain a
phys_addr_t with all bits set. The typecast is necessary to silence
compiler warnings on 32-bit platforms.

Use the simpler but still type safe approach "~(phys_addr_t)0" to create a
preprocessor define for all bits set.

Link: http://lkml.kernel.org/r/20180406213809.566-1-stefan@agner.ch
Signed-off-by: Stefan Agner <stefan@agner.ch>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c1a957d1 25-May-2018 Thomas Gleixner <tglx@linutronix.de>

PM / suspend: Prevent might sleep splats

timekeeping suspend/resume calls read_persistent_clock() which takes
rtc_lock. That results in might sleep warnings because at that point
we run with interrupts disabled.

We cannot convert rtc_lock to a raw spinlock as that would trigger
other might sleep warnings.

As a workaround we disable the might sleep warnings by setting
system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
contention because hibernate / suspend to RAM is single-CPU at this
point.

In s2idle's case the system_state is set to SYSTEM_SUSPEND before
timekeeping_suspend() which is invoked by the last CPU. In the resume
case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
invoked by the first CPU in the resume case. The other CPUs will block
on tick_freeze_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 227abcc6 25-Apr-2018 Dan Carpenter <dan.carpenter@oracle.com>

staging: kernel.h: Prevent macro expantion bug in container_of_safe()

There aren't many users of this so it doesn't cause a problem, but we
obviously want to use "__mptr" here instead of "ptr" to prevent the
parameter from being executed twice.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 05e6557b 28-Mar-2018 NeilBrown <neilb@suse.com>

staging: lustre: add container_of_safe()

Luster has a container_of0() function which is similar to
container_of() but passes an IS_ERR_OR_NULL() pointer through
unchanged.
This could be generally useful: bcache at last has a similar function.

Naming is hard, but the precedent set by hlist_entry_safe() suggests
a _safe suffix might be most consistent.

So add container_of_safe() to kernel.h, and replace all occurrences of
container_of0() with one of
- list_first_entry, list_next_entry, when that is a better fit,
- container_of(), when the pointer is used as a validpointer in
surrounding code,
- container_of_safe() when there is no obviously better alternative.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bc4f2f54 10-Apr-2018 Kees Cook <keescook@chromium.org>

taint: add taint for randstruct

Since the randstruct plugin can intentionally produce extremely unusual
kernel structure layouts (even performance pathological ones), some
maintainers want to be able to trivially determine if an Oops is coming
from a randstruct-built kernel, so as to keep their sanity when
debugging. This adds the new flag and initializes taint_mask
immediately when built with randstruct.

Link: http://lkml.kernel.org/r/1519084390-43867-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 47d4b263 10-Apr-2018 Kees Cook <keescook@chromium.org>

taint: convert to indexed initialization

This converts to using indexed initializers instead of comments, adds a
comment on why the taint flags can't be an enum, and make sure that no
one forgets to update the taint_flags when adding new bits.

Link: http://lkml.kernel.org/r/1519084390-43867-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d1be35cb 10-Apr-2018 Andrei Vagin <avagin@openvz.org>

proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps

seq_put_decimal_ull_w(m, str, val, width) prints a decimal number with a
specified minimal field width.

It is equivalent of seq_printf(m, "%s%*d", str, width, val), but it
works much faster.

== test_smaps.py
num = 0
with open("/proc/1/smaps") as f:
for x in xrange(10000):
data = f.read()
f.seek(0, 0)
==

== Before patch ==
$ time python test_smaps.py
real 0m4.593s
user 0m0.398s
sys 0m4.158s

== After patch ==
$ time python test_smaps.py
real 0m3.828s
user 0m0.413s
sys 0m3.408s

$ perf -g record python test_smaps.py
== Before patch ==
- 79.01% 3.36% python [kernel.kallsyms] [k] show_smap.isra.33
- 75.65% show_smap.isra.33
+ 48.85% seq_printf
+ 15.75% __walk_page_range
+ 9.70% show_map_vma.isra.23
0.61% seq_puts

== After patch ==
- 75.51% 4.62% python [kernel.kallsyms] [k] show_smap.isra.33
- 70.88% show_smap.isra.33
+ 24.82% seq_put_decimal_ull_w
+ 19.78% __walk_page_range
+ 12.74% seq_printf
+ 11.08% show_map_vma.isra.23
+ 1.68% seq_puts

[akpm@linux-foundation.org: fix drivers/of/unittest.c build]
Link: http://lkml.kernel.org/r/20180212074931.7227-1-avagin@openvz.org
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e9092d0d 09-Apr-2018 Linus Torvalds <torvalds@linux-foundation.org>

Fix subtle macro variable shadowing in min_not_zero()

Commit 3c8ba0d61d04 ("kernel.h: Retain constant expression output for
max()/min()") rewrote our min/max macros to be very clever, but in the
meantime resurrected a variable name shadow issue that we had had
previously fixed in commit 589a9785ee3a ("min/max: remove sparse
warnings when they're nested").

That commit talks about the sparse warnings that this shadowing causes,
which we ignored as just a minor annoyance. But it turns out that the
sparse warning is the least of our problems. We actually have a real
bug due to the shadowing through the interaction with "min_not_zero()",
which ends up doing

min(__x, __y)

internally, and then the new declaration of "__x" and "__y" as new
variables in __cmp_once() results in a complete mess of an expression,
and "min_not_zero()" doesn't work at all.

For some odd reason, this only ever caused (reported) problems on s390,
even though it is a generic issue and most of the (obviously successful)
testing of the problematic commit had happened on other architectures.

Quoting Sebastian Ott:
"What happened is that the bio build by the partition detection code
was attempted to be split by the block layer because the block queue
had a max_sector setting of 0. blk_queue_max_hw_sectors uses
min_not_zero."

So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
macros do not have these kinds of clashes.

[ That said, __UNIQUE_ID() itself has several issues that make it less
than wonderful.

In particular, the "uniqueness" has a fallback on the line number,
which means that it's not actually unique in more complex cases if you
don't build with gcc or clang (which have working unique counters that
aren't tied to line numbers).

That historical broken fallback also means that we have that pointless
"prefix" argument that doesn't actually make much sense _except_ for
the known-broken case. Oh well. ]

Fixes: 3c8ba0d61d04 ("kernel.h: Retain constant expression output for max()/min()")
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3c8ba0d6 30-Mar-2018 Kees Cook <keescook@chromium.org>

kernel.h: Retain constant expression output for max()/min()

In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the
evaluation of the max() macro which, due to its construction, ends up
converting constant expression arguments into a constant value result.

All attempts to rewrite this macro with __builtin_constant_p() failed
with older compilers (e.g. gcc 4.4)[2]. However, Martin Uecker,
constructed[3] a mind-shattering solution that works everywhere.
Cthulhu fhtagn!

This patch updates the min()/max() macros to evaluate to a constant
expression when called on constant expression arguments. This removes
several false-positive stack VLA warnings from an x86 allmodconfig build
when -Wvla is added:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla]

This also updates two cases where different enums were being compared
and explicitly casts them to int (which matches the old side-effect of
the single-evaluation code): one in tpm/tpm_tis_core.h, and one in
drm/drm_color_mgmt.c.

[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
[3] https://lkml.org/lkml/2018/3/20/845

Co-Developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-Developed-by: Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf14f27f 28-Mar-2018 Alexei Starovoitov <ast@kernel.org>

macro: introduce COUNT_ARGS() macro

move COUNT_ARGS() macro from apparmor to generic header and extend it
to count till twelve.

COUNT() was an alternative name for this logic, but it's used for
different purpose in many other places.

Similarly for CONCATENATE() macro.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9fbcc57a 20-Feb-2018 Josh Poimboeuf <jpoimboe@redhat.com>

extable: Make init_kernel_text() global

Convert init_kernel_text() to a global function and use it in a few
places instead of manually comparing _sinittext and _einittext.

Note that kallsyms.h has a very similar function called
is_kernel_inittext(), but its end check is inclusive. I'm not sure
whether that's intentional behavior, so I didn't touch it.

Suggested-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4335d02be8d45ca7d265d2f174251d0b7ee6c5fd.1519051220.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 47a36163 11-Jan-2018 Crt Mori <cmo@melexis.com>

lib: Add strongly typed 64bit int_sqrt

There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Using same algorithm as int_sqrt()
with strong typing provides enough precision also on 32bit platforms,
but it sacrifices some performance. In case values are smaller than
ULONG_MAX the standard int_sqrt is used for calculation to maximize the
performance due to more native calculations.

Signed-off-by: Crt Mori <cmo@melexis.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


# 4efb442c 17-Nov-2017 Borislav Petkov <bp@suse.de>

kernel/panic.c: add TAINT_AUX

This is the gist of a patch which we've been forward-porting in our
kernels for a long time now and it probably would make a good sense to
have such TAINT_AUX flag upstream which can be used by each distro etc,
how they see fit. This way, we won't need to forward-port a distro-only
version indefinitely.

Add an auxiliary taint flag to be used by distros and others. This
obviates the need to forward-port whatever internal solutions people
have in favor of a single flag which they can map arbitrarily to a
definition of their pleasing.

The "X" mnemonic could also mean eXternal, which would be taint from a
distro or something else but not the upstream kernel. We will use it to
mark modules for which we don't provide support. I.e., a really
eXternal module.

Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e8c97af0 13-Oct-2017 Randy Dunlap <rdunlap@infradead.org>

linux/kernel.h: add/correct kernel-doc notation

Add kernel-doc notation for some macros. Correct kernel-doc comments &
typos for a few macros.

Link: http://lkml.kernel.org/r/76fa1403-1511-be4c-e9c4-456b43edfad3@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 604df322 08-Sep-2017 Masahiro Yamada <yamada.masahiro@socionext.com>

linux/kernel.h: move DIV_ROUND_DOWN_ULL() macro

This macro is useful to avoid link error on 32-bit systems.

We have the same definition in two drivers, so move it to
include/linux/kernel.h

While we are here, refactor DIV_ROUND_UP_ULL() by using
DIV_ROUND_DOWN_ULL().

Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7a46ec0e 15-Aug-2017 Kees Cook <keescook@chromium.org>

locking/refcounts, x86/asm: Implement fast refcount overflow protection

This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)

refcount_inc() after:

.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx
ffffffff816c36d7: 0f ff (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
cycles protections
atomic_t 82249267387 none
refcount_t-fast 82211446892 overflow, untested dec-to-zero
refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c7acec71 12-Jul-2017 Ian Abbott <abbotti@mev.co.uk>

kernel.h: handle pointers to arrays better in container_of()

If the first parameter of container_of() is a pointer to a
non-const-qualified array type (and the third parameter names a
non-const-qualified array member), the local variable __mptr will be
defined with a const-qualified array type. In ISO C, these types are
incompatible. They work as expected in GNU C, but some versions will
issue warnings. For example, GCC 4.9 produces the warning
"initialization from incompatible pointer type".

Here is an example of where the problem occurs:

-------------------------------------------------------
#include <linux/kernel.h>
#include <linux/module.h>

MODULE_LICENSE("GPL");

struct st {
int a;
char b[16];
};

static int __init example_init(void) {
struct st t = { .a = 101, .b = "hello" };
char (*p)[16] = &t.b;
struct st *x = container_of(p, struct st, b);
printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x);
return 0;
}

static void __exit example_exit(void) {
}

module_init(example_init);
module_exit(example_exit);
-------------------------------------------------------

Building the module with gcc-4.9 results in these warnings (where '{m}'
is the module source and '{k}' is the kernel source):

-------------------------------------------------------
In file included from {m}/example.c:1:0:
{m}/example.c: In function `example_init':
{k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
^
{m}/example.c:14:17: note: in expansion of macro `container_of'
struct st *x = container_of(p, struct st, b);
^
{k}/include/linux/kernel.h:854:48: warning: (near initialization for `x')
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
^
{m}/example.c:14:17: note: in expansion of macro `container_of'
struct st *x = container_of(p, struct st, b);
^
-------------------------------------------------------

Replace the type checking performed by the macro to avoid these
warnings. Make sure `*(ptr)` either has type compatible with the
member, or has type compatible with `void`, ignoring qualifiers. Raise
compiler errors if this is not true. This is stronger than the previous
behaviour, which only resulted in compiler warnings for a type mismatch.

[arnd@arndb.de: fix new warnings for container_of()]
Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 69a78ff2 16-May-2017 Thomas Gleixner <tglx@linutronix.de>

init: Introduce SYSTEM_SCHEDULING state

might_sleep() debugging and smp_processor_id() debugging should be active
right after the scheduler starts working. The init task can invoke
smp_processor_id() from preemptible context as it is pinned on the boot cpu
until sched_smp_init() removes the pinning and lets it schedule on all non
isolated cpus.

Add a new state which allows to enable those checks earlier and add it to
the xen do_poweroff() function.

No functional change.

Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170516184736.196214622@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# ed067d4a 11-Apr-2017 Krzysztof Kozlowski <krzk@kernel.org>

linux/kernel.h: Add ALIGN_DOWN macro

Few parts of kernel define their own macro for aligning down so provide
a common define for this, with the same usage and assumptions as existing
ALIGN.

Convert also three existing implementations to this one.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


# f51b17c8 17-Apr-2017 Baoquan He <bhe@redhat.com>

boot/param: Move next_arg() function to lib/cmdline.c for later reuse

next_arg() will be used to parse boot parameters in the x86/boot/compressed code,
so move it to lib/cmdline.c for better code reuse.

No change in functionality.

Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: dave.jiang@intel.com
Cc: dyoung@redhat.com
Cc: keescook@chromium.org
Cc: zijun_hu <zijun_hu@htc.com>
Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-bhe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4f5901f5 24-Feb-2017 Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors

While working on a thermal driver I encounter a scenario where the
divisor could be negative, instead of adding local code to handle this I
though I first try to add support for this in DIV_ROUND_CLOSEST.

Add support to DIV_ROUND_CLOSEST for negative divisors if both dividend
and divisor variable types are signed. This should not alter current
behavior for users of the macro as previously negative divisors where
not supported.

Before:

DIV_ROUND_CLOSEST( 59, 4) = 15
DIV_ROUND_CLOSEST( 59, -4) = -14
DIV_ROUND_CLOSEST( -59, 4) = -15
DIV_ROUND_CLOSEST( -59, -4) = 14

After:

DIV_ROUND_CLOSEST( 59, 4) = 15
DIV_ROUND_CLOSEST( 59, -4) = -15
DIV_ROUND_CLOSEST( -59, 4) = -15
DIV_ROUND_CLOSEST( -59, -4) = 15

[akpm@linux-foundation.org: fix comment, per Guenter]
Link: http://lkml.kernel.org/r/20161222102217.29011-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5eb7c0d0 01-Jan-2017 Larry Finger <Larry.Finger@lwfinger.net>

taint/module: Fix problems when out-of-kernel driver defines true or false

Commit 7fd8329ba502 ("taint/module: Clean up global and module taint
flags handling") used the key words true and false as character members
of a new struct. These names cause problems when out-of-kernel modules
such as VirtualBox include their own definitions of true and false.

Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jessica Yu <jeyu@redhat.com>


# 7fd8329b 21-Sep-2016 Petr Mladek <pmladek@suse.com>

taint/module: Clean up global and module taint flags handling

The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints:
add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to
potentially print one more character. But it did not increase the
size of the corresponding buffers in m_show() and print_modules().

We have recently done the same mistake when adding a taint flag
for livepatching, see
https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com

Also struct module uses an incompatible type for mod-taints flags.
It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module
taint flags in Oops/panic"). There was used "int" for the global taint
flags at these times. But only the global tain flags was later changed
to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint
flags reliable").

This patch defines TAINT_FLAGS_COUNT that can be used to create
arrays and buffers of the right size. Note that we could not use
enum because the taint flag indexes are used also in assembly code.

Then it reworks the table that describes the taint flags. The TAINT_*
numbers can be used as the index. Instead, we add information
if the taint flag is also shown per-module.

Finally, it uses "unsigned long", bit operations, and the updated
taint_flags table also for mod->taints.

It is not optimal because only few taint flags can be printed by
module_taint_flags(). But better be on the safe side. IMHO, it is
not worth the optimization and this is a good compromise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com
[jeyu@redhat.com: fix broken lkml link in changelog]
Signed-off-by: Jessica Yu <jeyu@redhat.com>


# d3849953 01-Nov-2016 Christoph Hellwig <hch@lst.de>

fs: decouple READ and WRITE from the block layer ops

Move READ and WRITE to kernel.h and don't define them in terms of block
layer ops; they are our generic data direction indicators these days
and have no more resemblance with the block layer ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 3ca45a46 14-Oct-2016 zijun_hu <zijun_hu@htc.com>

percpu: ensure the requested alignment is power of two

The percpu allocator expectedly assumes that the requested alignment
is power of two but hasn't been veryfing the input. If the specified
alignment isn't power of two, the allocator can malfunction. Add the
sanity check.

The following is detailed analysis of the effects of alignments which
aren't power of two.

The alignment must be a even at least since the LSB of a chunk->map
element is used as free/in-use flag of a area; besides, the alignment
must be a power of 2 too since ALIGN() doesn't work well for other
alignment always but is adopted by pcpu_fit_in_area(). IOW, the
current allocator only works well for a power of 2 aligned area
allocation.

See below opposite example for why an odd alignment doesn't work.
Let's assume area [16, 36) is free but its previous one is in-use, we
want to allocate a @size == 8 and @align == 7 area. The larger area
[16, 36) is split to three areas [16, 21), [21, 29), [29, 36)
eventually. However, due to the usage for a chunk->map element, the
actual offset of the aim area [21, 29) is 21 but is recorded in
relevant element as 20; moreover, the residual tail free area [29,
36) is mistook as in-use and is lost silently

Unlike macro roundup(), ALIGN(x, a) doesn't work if @a isn't a power
of 2 for example, roundup(10, 6) == 12 but ALIGN(10, 6) == 10, and
the latter result isn't desired obviously.

tj: Code style and patch description updates.

Signed-off-by: zijun_hu <zijun_hu@htc.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>


# 589a9785 07-Oct-2016 Johannes Berg <johannes.berg@intel.com>

min/max: remove sparse warnings when they're nested

Currently, when min/max are nested within themselves, sparse will warn:

warning: symbol '_min1' shadows an earlier one
originally declared here
warning: symbol '_min1' shadows an earlier one
originally declared here
warning: symbol '_min2' shadows an earlier one
originally declared here

This also immediately happens when min3() or max3() are used.

Since sparse implements __COUNTER__, we can use __UNIQUE_ID() to
generate unique variable names, avoiding this.

Link: http://lkml.kernel.org/r/1471519773-29882-1-git-send-email-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9af6528e 13-Sep-2016 Peter Zijlstra <peterz@infradead.org>

sched/core: Optimize __schedule()

Oleg noted that by making do_exit() use __schedule() for the TASK_DEAD
context switch, we can avoid the TASK_DEAD special case currently in
__schedule() because that avoids the extra preempt_disable() from
schedule().

In order to facilitate this, create a do_task_dead() helper which we
place in the scheduler code, such that it can access __schedule().

Also add some __noreturn annotations to the functions, there's no
coming back from do_exit().

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Cheng Chao <cs.os.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20160913163729.GB5012@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9d5059c9 02-Aug-2016 Luis de Bethencourt <luisbg@osg.samsung.com>

dynamic_debug: only add header when used

kernel.h header doesn't directly use dynamic debug, instead we can
include it in module.c (which used it via kernel.h). printk.h only uses
it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen
in that case.

Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com
[luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h]
Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 088e9d25 02-Jun-2016 Daniel Bristot de Oliveira <bristot@redhat.com>

rcu: sysctl: Panic on RCU Stall

It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


# 48a27055 19-May-2016 Rasmus Villemoes <linux@rasmusvillemoes.dk>

include/linux: apply __malloc attribute

Attach the malloc attribute to a few allocation functions. This helps
gcc generate better code by telling it that the return value doesn't
alias any existing pointers (which is even more valuable given the
pessimizations implied by -fno-strict-aliasing).

A simple example of what this allows gcc to do can be seen by looking at
the last part of drm_atomic_helper_plane_reset:

plane->state = kzalloc(sizeof(*plane->state), GFP_KERNEL);

if (plane->state) {
plane->state->plane = plane;
plane->state->rotation = BIT(DRM_ROTATE_0);
}

which compiles to

e8 99 bf d6 ff callq ffffffff8116d540 <kmem_cache_alloc_trace>
48 85 c0 test %rax,%rax
48 89 83 40 02 00 00 mov %rax,0x240(%rbx)
74 11 je ffffffff814015c4 <drm_atomic_helper_plane_reset+0x64>
48 89 18 mov %rbx,(%rax)
48 8b 83 40 02 00 00 mov 0x240(%rbx),%rax [*]
c7 40 40 01 00 00 00 movl $0x1,0x40(%rax)

With this patch applied, the instruction at [*] is elided, since the
store to plane->state->plane is known to not alter the value of
plane->state.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3ed605bc 25-Apr-2016 Gustavo Padovan <gustavo.padovan@collabora.co.uk>

kernel.h: add u64_to_user_ptr()

This function had copies in 3 different files. Unify them in kernel.h.

Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com> [drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com> [drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de> [drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ebc41f20 22-Mar-2016 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic: change nmi_panic from macro to function

Commit 1717f2096b54 ("panic, x86: Fix re-entrance problem due to panic
on NMI") and commit 58c5661f2144 ("panic, x86: Allow CPUs to save
registers even if looping in NMI context") introduced nmi_panic() which
prevents concurrent/recursive execution of panic(). It also saves
registers for the crash dump on x86.

However, there are some cases where NMI handlers still use panic().
This patch set partially replaces them with nmi_panic() in those cases.

Even this patchset is applied, some NMI or similar handlers (e.g. MCE
handler) continue to use panic(). This is because I can't test them
well and actual problems won't happen. For example, the possibility
that normal panic and panic on MCE happen simultaneously is very low.

This patch (of 3):

Convert nmi_panic() to a proper function and export it instead of
exporting internal implementation details to modules, for obvious
reasons.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3debb0a9 22-Mar-2016 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Fix trace_printk() to print when not using bprintk()

The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).

If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.

Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ef951599 17-Mar-2016 Kees Cook <keescook@chromium.org>

lib: move strtobool() to kstrtobool()

Create the kstrtobool_from_user() helper and move strtobool() logic into
the new kstrtobool() (matching all the other kstrto* functions).
Provides an inline wrapper for existing strtobool() callers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Steve French <sfrench@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b5d3755a 04-Mar-2016 Nicolas Dichtel <nicolas.dichtel@6wind.com>

uapi: define DIV_ROUND_UP for userland

DIV_ROUND_UP is defined in linux/kernel.h only for the kernel.
When ethtool.h is included by a userland app, we got the following error:

include/linux/ethtool.h:1218:8: error: variably modified 'queue_mask' at file scope
__u32 queue_mask[DIV_ROUND_UP(MAX_NUM_QUEUE, 32)];
^

Let's add a common definition in uapi and use it everywhere.

Fixes: ac2c7ad0e5d6 ("net/ethtool: introduce a new ioctl for per queue setting")
CC: Kan Liang <kan.liang@intel.com>
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f57e4d9 15-Jan-2016 Michal Nazarewicz <mina86@mina86.com>

include/linux/kernel.h: change abs() macro so it uses consistent return type

Rewrite abs() so that its return type does not depend on the
architecture and no unexpected type conversion happen inside of it. The
only conversion is from unsigned to signed type. char is left as a
return type but treated as a signed type regradless of it's actual
signedness.

With the old version, int arguments were promoted to long and depending
on architecture a long argument might result in s64 or long return type
(which may or may not be the same).

This came after some back and forth with Nicolas. The current macro has
different return type (for the same input type) depending on
architecture which might be midly iritating.

An alternative version would promote to int like so:

#define abs(x) __abs_choose_expr(x, long long, \
__abs_choose_expr(x, long, \
__builtin_choose_expr( \
sizeof(x) <= sizeof(int), \
({ int __x = (x); __x<0?-__x:__x; }), \
((void)0))))

I have no preference but imagine Linus might. :] Nicolas argument against
is that promoting to int causes iconsistent behaviour:

int main(void) {
unsigned short a = 0, b = 1, c = a - b;
unsigned short d = abs(a - b);
unsigned short e = abs(c);
printf("%u %u\n", d, e); // prints: 1 65535
}

Then again, no sane person expects consistent behaviour from C integer
arithmetic. ;)

Note:

__builtin_types_compatible_p(unsigned char, char) is always false, and
__builtin_types_compatible_p(signed char, char) is also always false.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 58c5661f 14-Dec-2015 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic, x86: Allow CPUs to save registers even if looping in NMI context

Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:

CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...

Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.

In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.

To save registers in this case, we need to:

a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).

This patch does the following:

1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.

2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 1717f209 14-Dec-2015 Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>

panic, x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 79211c8e 09-Nov-2015 Andrew Morton <akpm@linux-foundation.org>

remove abs64()

Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c8299cb6 09-Nov-2015 Michal Nazarewicz <mina86@mina86.com>

kernel.h: make abs() work with 64-bit types

For 64-bit arguments, the abs macro casts it to an int which leads to
lost precision and may cause incorrect results. To deal with 64-bit
types abs64 macro has been introduced but still there are places where
abs macro is used incorrectly.

To deal with the problem, expand abs macro such that it operates on s64
type when dealing with 64-bit types while still returning long when
dealing with smaller types.

This fixes one known bug (per John):

The internal clocksteering done for fine-grained error correction uses a
: logarithmic approximation, so any time adjtimex() adjusts the clock
: steering, timekeeping_freqadjust() quickly approximates the correct clock
: frequency over a series of ticks.
:
: Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit
: dc491596f639438 (Rework frequency adjustments to work better w/ nohz),
: used the abs() function with a s64 error value to calculate the size of
: the approximated adjustment to be made.
:
: Per include/linux/kernel.h: "abs() should not be used for 64-bit types
: (s64, u64, long long) - use abs64()".
:
: Thus on 32-bit platforms, this resulted in the clocksteering to take a
: quite dampended random walk trying to converge on the proper frequency,
: which caused the adjustments to be made much slower then intended (most
: easily observed when large adjustments are made).

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0a9df786 06-Nov-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

lib/kasprintf.c: introduce kvasprintf_const

This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to
kvasprintf.

Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.

The return value should be freed by kfree_const, just like for
kstrdup_const.

There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8db14860 17-Jul-2015 Nicolas Iooss <nicolas.iooss_linux@m4x.org>

include, lib: add __printf attributes to several function prototypes

Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile). For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f18 ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f. for example commit 5ce1aca81435
("reiserfs: fix __RASSERT format string").

To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/. These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5375b708 30-Jun-2015 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path

Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.

The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.

To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().

Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3c6296f7 28-May-2015 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

ring-buffer: Remove useless unused tracing_off_permanent()

The tracing_off_permanent() call is a way to disable all ring_buffers.
Nothing uses it and nothing should use it, as tracing_off() and
friends are better, as they disable the ring buffers related to
tracing. The tracing_off_permanent() even disabled non tracing
ring buffers. This is a bit drastic, and was added to handle NMIs
doing outputs that could corrupt the ring buffer when only tracing
used them. It is now obsolete and adds a little overhead, it should
be removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 28b8d0c8 26-May-2015 Gobinda Charan Maji <gobinda.cemk07@gmail.com>

sysfs: tightened sysfs permission checks

There were some inconsistency in restriction to VERIFY_OCTAL_PERMISSIONS().
Previously the test was "User perms >= group perms >= other perms". The
permission field of User, Group or Other consists of three bits. LSB is
EXECUTE permission, MSB is READ permission and the middle bit is WRITE
permission. But logically WRITE is "more privileged" than READ.

Say for example, permission value is "0430". Here User has only READ
permission whereas Group has both WRITE and EXECUTE permission.

So, the checks could be tightened and the tests are separated to
USER_READABLE >= GROUP_READABLE >= OTHER_READABLE,
USER_WRITABLE >= GROUP_WRITABLE and OTHER_WRITABLE is not permitted.

Signed-off-by: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 9ec23531 11-May-2015 David Hildenbrand <dahi@linux.vnet.ibm.com>

sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults

Commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).

The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:

pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()

Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.

However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:

spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)

If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.

Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.

As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.

Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f766093e 16-Apr-2015 Javi Merino <javi.merino@kernel.org>

kernel.h: implement DIV_ROUND_CLOSEST_ULL

We have grown a number of different implementations of
DIV_ROUND_CLOSEST_ULL throughout the kernel. Move the i915 one to
kernel.h so that it can be reused.

Signed-off-by: Javi Merino <javi.merino@arm.com>
Reviewed-by: Jeff Epler <jepler@unpythonic.net>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Guenter Roeck <linux@roeck-us.net>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Alex Elder <elder@linaro.org>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 02f1f217 12-Feb-2015 Rasmus Villemoes <linux@rasmusvillemoes.dk>

kernel.h: remove ancient __FUNCTION__ hack

__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so
this only helps people who only test-compile using 3.3 (compiler-gcc3.h
barks at anything older than that). Besides, there are almost no
occurrences of __FUNCTION__ left in the tree.

[akpm@linux-foundation.org: convert remaining __FUNCTION__ references]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 00845eb9 01-Feb-2015 Linus Torvalds <torvalds@linux-foundation.org>

sched: don't cause task state changes in nested sleep debugging

Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont. But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

- don't set TASK_RUNNING when the nested condition happens (note: even
if WARN_ONCE() only _warns_ once, the return value isn't whether the
warning happened, but whether the condition for the warning was true.
So despite the warning only happening once, the "if (WARN_ON(..))"
would trigger for every nested sleep.

- in the cases where we knowingly disable the warning by using
"sched_annotate_sleep()", don't change the task state (that is used
for all core scheduling decisions), instead use '->task_state_change'
that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)

Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org>
Tested-by: Rafael J Wysocki <rjw@rjwysocki.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>,
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5f45465 16-Dec-2014 Seth Jennings <sjenning@redhat.com>

livepatch: kernel: add TAINT_LIVEPATCH

This adds a new taint flag to indicate when the kernel or a kernel
module has been live patched. This will provide a clean indication in
bug reports that live patching was used.

Additionally, if the crash occurs in a live patched function, the live
patch module will appear beside the patched function in the backtrace.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 9e3961a0 10-Dec-2014 Prarit Bhargava <prarit@redhat.com>

kernel: add panic_on_warn

There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic. This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path. The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location. After that the panic() output is displayed.

WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...

CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
[<ffffffff81665190>] dump_stack+0x46/0x58
[<ffffffff8165e2ec>] panic+0xd0/0x204
[<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
[<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
[<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
[<ffffffff810f8889>] load_module+0x16a9/0x1b30
[<ffffffff810f3d30>] ? store_uevent+0x70/0x70
[<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
[<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e1c2296c 16-Oct-2014 Peter Hurley <peter@hurleysoftware.com>

tty: Move session_of_pgrp() and make static

tiocspgrp() is the lone caller of session_of_pgrp(); relocate and
limit to file scope.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 3427445a 24-Sep-2014 Peter Zijlstra <peterz@infradead.org>

sched: Exclude cond_resched() from nested sleep test

cond_resched() is a preemption point, not strictly a blocking
primitive, so exclude it from the ->state test.

In particular, preemption preserves task_struct::state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Cc: Alex Elder <alex.elder@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Lin <axel.lin@ingics.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140924082242.656559952@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1029a2b5 24-Sep-2014 Peter Zijlstra <peterz@infradead.org>

sched, exit: Deal with nested sleeps

do_wait() is a big wait loop, but we set TASK_RUNNING too late; we end
up calling potential sleeps before we reset it.

Not strictly a bug since we're guaranteed to exit the loop and not
call schedule(); put in annotations to quiet might_sleep().

WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:7123 __might_sleep+0x7e/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109a788>] do_wait+0x88/0x270

Call Trace:
[<ffffffff81694991>] dump_stack+0x4e/0x7a
[<ffffffff8109877c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8109886c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff810bca6e>] __might_sleep+0x7e/0x90
[<ffffffff811a1c15>] might_fault+0x55/0xb0
[<ffffffff8109a3fb>] wait_consider_task+0x90b/0xc10
[<ffffffff8109a804>] do_wait+0x104/0x270
[<ffffffff8109b837>] SyS_wait4+0x77/0x100
[<ffffffff8169d692>] system_call_fastpath+0x16/0x1b

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: tglx@linutronix.de
Cc: umgwanakikbuti@gmail.com
Cc: ilya.dryomov@inktank.com
Cc: Alex Elder <alex.elder@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Axel Lin <axel.lin@ingics.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Guillaume Morin <guillaume@morinfr.org>
Cc: Ionut Alexa <ionut.m.alexa@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140924082242.186408915@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 3db2e9cd 13-Oct-2014 Daniel Walter <dwalter@google.com>

include/linux: remove strict_strto* definitions

Remove obsolete and unused strict_strto* functions

Signed-off-by: Daniel Walter <dwalter@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c185b07f 09-Oct-2014 Michal Nazarewicz <mina86@mina86.com>

include/linux/kernel.h: deduplicate code implementing clamp* macros

Instead of open-coding clamp_t macro min_t and max_t the way clamp macro
does and instead of open-coding clamp_val simply use clamp_t.
Furthermore, normalise argument naming in the macros to be lo and hi.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: "Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2e1d06e1 09-Oct-2014 Michal Nazarewicz <mina86@mina86.com>

include/linux/kernel.h: rewrite min3, max3 and clamp using min and max

It appears that gcc is better at optimising a double call to min and max
rather than open coded min3 and max3. This can be observed here:

$ cat min-max.c
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define min3(x, y, z) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
typeof(z) _min3 = (z); \
(void) (&_min1 == &_min2); \
(void) (&_min1 == &_min3); \
_min1 < _min2 ? (_min1 < _min3 ? _min1 : _min3) : \
(_min2 < _min3 ? _min2 : _min3); })

int fmin3(int x, int y, int z) { return min3(x, y, z); }
int fmin2(int x, int y, int z) { return min(min(x, y), z); }

$ gcc -O2 -o min-max.s -S min-max.c; cat min-max.s
.file "min-max.c"
.text
.p2align 4,,15
.globl fmin3
.type fmin3, @function
fmin3:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
jl .L5
cmpl %esi, %edx
movl %esi, %eax
cmovle %edx, %eax
ret
.p2align 4,,10
.p2align 3
.L5:
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE0:
.size fmin3, .-fmin3
.p2align 4,,15
.globl fmin2
.type fmin2, @function
fmin2:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovle %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE1:
.size fmin2, .-fmin2
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits

fmin3 function, which uses open-coded min3 macro, is compiled into total
of ten instructions including a conditional branch, whereas fmin2
function, which uses two calls to min2 macro, is compiled into six
instructions with no branches.

Similarly, open-coded clamp produces the same code as clamp using min and
max macros, but the latter is much shorter:

$ cat clamp.c
#define clamp(val, min, max) ({ \
typeof(val) __val = (val); \
typeof(min) __min = (min); \
typeof(max) __max = (max); \
(void) (&__val == &__min); \
(void) (&__val == &__max); \
__val = __val < __min ? __min: __val; \
__val > __max ? __max: __val; })
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })

int fclamp(int v, int min, int max) { return clamp(v, min, max); }
int fclampmm(int v, int min, int max) { return min(max(v, min), max); }

$ gcc -O2 -o clamp.s -S clamp.c; cat clamp.s
.file "clamp.c"
.text
.p2align 4,,15
.globl fclamp
.type fclamp, @function
fclamp:
.LFB0:
.cfi_startproc
cmpl %edi, %esi
movl %edx, %eax
cmovge %esi, %edi
cmpl %edx, %edi
cmovle %edi, %eax
ret
.cfi_endproc
.LFE0:
.size fclamp, .-fclamp
.p2align 4,,15
.globl fclampmm
.type fclampmm, @function
fclampmm:
.LFB1:
.cfi_startproc
cmpl %edi, %esi
cmovge %esi, %edi
cmpl %edi, %edx
movl %edi, %eax
cmovle %edx, %eax
ret
.cfi_endproc
.LFE1:
.size fclampmm, .-fclampmm
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits

Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after

48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:

$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp

do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp

or here side-by-side:

Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp

There are also fewer branches:

$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df

do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8

And here's with allyesconfig on a different machine:

$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn #54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-rwx------ 1 mpn eng 437027411 Jun 20 16:04 vmlinux.before
-rwx------ 1 mpn eng 437026881 Jun 20 15:30 vmlinux.after

530 bytes reduction.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: David Rientjes <rientjes@google.com>
Cc: "Rustad, Mark D" <mark.d.rustad@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ccc72b8 14-Aug-2014 Dave Young <dyoung@redhat.com>

lib: Add a generic cmdline parse function parse_option_str

There should be a generic function to parse params like a=b,c
Adding parse_option_str in lib/cmdline.c which will return true
if there's specified option set in the params.

Also updated efi=old_map parsing code to use the new function

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# 53d91c5c 16-Sep-2014 David Howells <dhowells@redhat.com>

Provide a binary to hex conversion function

Provide a function to convert a buffer of binary data into an unterminated
ascii hex string representation of that data.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>


# 69361eef 08-Aug-2014 Josh Hunt <johunt@akamai.com>

panic: add TAINT_SOFTLOCKUP

This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.

[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 087face5 06-Aug-2014 Joe Perches <joe@perches.com>

kernel.h: remove deprecated pack_hex_byte

It's been nearly 3 years now since commit 55036ba76b2d ("lib: rename
pack_hex_byte() to hex_byte_pack()") so it's time to remove this
deprecated and unused static inline.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 37549e94 26-Jul-2014 Rusty Russell <rusty@rustcorp.com.au>

sysfs: disallow world-writable files.

This check was introduced in 2006 by Alexey Dobriyan (9774a1f54f173)
for module parameters; we removed it when we unified the check into
VERIFY_OCTAL_PERMISSIONS() as sysfs didn't have the same requirement.
Now all those users are fixed, reintroduce it.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# a69f5edb 24-Jun-2014 Joe Perches <joe@perches.com>

mac_pton: Use bool not int return

Use bool instead of int as the return type.

All uses are tested with !.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58f86cc8 23-Mar-2014 Rusty Russell <rusty@rustcorp.com.au>

VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.

Summary of http://lkml.org/lkml/2014/3/14/363 :

Ted: module_param(queue_depth, int, 444)
Joe: 0444!
Rusty: User perms >= group perms >= other perms?
Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?

Side effect of stricter permissions means removing the unnecessary
S_IFREG from several callers.

Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
number of drivers fail this test, so that will be the debate for a
future patch.

Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 8c90487c 26-Feb-2014 Dave Jones <davej@redhat.com>

Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC

Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 66cc69e3 12-Mar-2014 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.

This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.

Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.

With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.

Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).

Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
NAKed-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 89a07141 23-Jan-2014 Alex Elder <alex.elder@linaro.org>

kernel.h: define u8, s8, u32, etc. limits

Create constants that define the maximum and minimum values
representable by the kernel types u8, s8, u16, s16, and so on.

Signed-off-by: Alex Elder <elder@linaro.org>
Cc: Sage Weil <sage@inktank.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 89770b0a 21-Jan-2014 Daniel Borkmann <daniel@iogearbox.net>

net: introduce reciprocal_scale helper and convert users

As David Laight suggests, we shouldn't necessarily call this
reciprocal_divide() when users didn't requested a reciprocal_value();
lets keep the basic idea and call it reciprocal_scale(). More
background information on this topic can be found in [1].

Joint work with Hannes Frederic Sowa.

[1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html

Suggested-by: David Laight <david.laight@aculab.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 386e7906 12-Dec-2013 Axel Lin <axel.lin@ingics.com>

include/linux/kernel.h: make might_fault() a nop for !MMU

The machine cannot fault if !MUU, so make might_fault() a nop for !MMU.

This fixes below build error if
!CONFIG_MMU && (CONFIG_PROVE_LOCKING=y || CONFIG_DEBUG_ATOMIC_SLEEP=y):

arch/arm/kernel/built-in.o: In function `arch_ptrace':
arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault'
arch/arm/kernel/built-in.o: In function `restore_sigframe':
arch/arm/kernel/signal.c:173: undefined reference to `might_fault'
...
arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow
make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5800dc3c 25-Nov-2013 Jason Baron <jbaron@akamai.com>

panic: Make panic_timeout configurable

The panic_timeout value can be set via the command line option
'panic=x', or via /proc/sys/kernel/panic, however that is not
sufficient when the panic occurs before we are able to set up
these values. Thus, add a CONFIG_PANIC_TIMEOUT so that we can
set the desired value from the .config.

The default panic_timeout value continues to be 0 - wait
forever. Also adds set_arch_panic_timeout(new_timeout,
arch_default_timeout), which is intended to be used by arches in
arch_setup(). The idea being that the new_timeout is only set if
the user hasn't changed from the arch_default_timeout.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: mpe@ellerman.id.au
Cc: felipe.contreras@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1a1674daec27c534df409697025ac568ebcee91e.1385418410.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 042b10d8 05-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Remove unused function ftrace_off_permanent()

In the past, ftrace_off_permanent() was called if something
strange was detected. But the ftrace_bug() now handles all the
anomolies that can happen with ftrace (function tracing), and there
are no uses of ftrace_off_permanent(). Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# c26d436c 13-Sep-2013 Andre Naujoks <nautsch2@gmail.com>

lib: introduce upper case hex ascii helpers

To be able to use the hex ascii functions in case sensitive environments
the array hex_asc_upper[] and the needed functions for hex_byte_pack_upper()
are introduced.

Signed-off-by: Andre Naujoks <nautsch2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e67bc51e 02-Aug-2013 Dhaval Giani <dhaval.giani@gmail.com>

tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set

When CONFIG_TRACING is not enabled, the stub prototype for trace_dump_stack()
is incorrect. It has (void) when it should be (int).

Link: http://lkml.kernel.org/r/CAPhKKr_H=ukFnBL4WgDOVT5ay2xeF-Ho+CA0DWZX0E2JW-=vSQ@mail.gmail.com

Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 4cd5773a 04-Jun-2013 Andy Shevchenko <andy.shevchenko@gmail.com>

net: core: move mac_pton() to lib/net_utils.c

Since we have at least one user of this function outside of CONFIG_NET
scope, we have to provide this function independently. The proposed
solution is to move it under lib/net_utils.c with corresponding
configuration variable and select wherever it is needed.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# bcf312cf 02-May-2013 Steven Rostedt <rostedt@goodmis.org>

tracing: Put trace_puts() comment above trace_puts() macro for kernel doc

Kernel-doc gives the following warning:

DOCPROC Documentation/DocBook/kernel-api.xml
Warning(/include/linux/kernel.h:590): No description found for parameter 'ip'
Warning(/include/linux/kernel.h:590): No description found for parameter 'ip'

Due to the externs between the the comment and the trace_puts() macro.
This is fixed by moving the externs below the macro and keeping the
macro and comment directly together.

Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 662bbcb2 26-May-2013 Michael S. Tsirkin <mst@redhat.com>

mm, sched: Allow uaccess in atomic with pagefault_disable()

This changes might_fault() so that it does not
trigger a false positive diagnostic for e.g. the following
sequence:

spin_lock_irqsave()
pagefault_disable()
copy_to_user()
pagefault_enable()
spin_unlock_irqrestore()

In particular vhost wants to do this, to call
socket ops from under a lock.

There are 3 cases to consider:

- CONFIG_PROVE_LOCKING - might_fault is non-inline
so it's easy to move the in_atomic test to fix
up the false positive warning.

- CONFIG_DEBUG_ATOMIC_SLEEP - might_fault
is currently inline, but we are calling a
non-inline __might_sleep anyway,
so let's use the non-line version of might_fault
that does the right thing.

- !CONFIG_DEBUG_ATOMIC_SLEEP && !CONFIG_PROVE_LOCKING
__might_sleep is a nop so might_fault is a nop.

Make this explicit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-11-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 114276ac 26-May-2013 Michael S. Tsirkin <mst@redhat.com>

mm, sched: Drop voluntary schedule from might_fault()

might_fault() is called from functions like copy_to_user()
which most callers expect to be very fast, like a couple of
instructions.

So functions like memcpy_toiovec() call them many times in a loop.

But might_fault() calls might_sleep() and with CONFIG_PREEMPT_VOLUNTARY
this results in a function call.

Let's not do this - just call __might_sleep() that produces
a diagnostic for sleep within atomic, but drop
might_preempt().

Here's a test sending traffic between the VM and the host,
host is built with CONFIG_PREEMPT_VOLUNTARY:

before:
incoming: 7122.77 Mb/s
outgoing: 8480.37 Mb/s

after:
incoming: 8619.24 Mb/s
outgoing: 9455.42 Mb/s

As a side effect, this fixes an issue pointed
out by Ingo: might_fault might schedule differently
depending on PROVE_LOCKING. Now there's no
preemption point in both cases, so it's consistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-10-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7450231f 24-May-2013 Randy Dunlap <rdunlap@infradead.org>

linux/kernel.h: fix kernel-doc warning

Fix kernel-doc warning in <linux/kernel.h>:

Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a0df594 30-Apr-2013 Stephen Rothwell <sfr@canb.auug.org.au>

kernel/compat.c: make do_sysinfo() static

The only use outside of kernel/timer.c was in kernel/compat.c, so move
compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dc72c32e 22-Mar-2013 Frederic Weisbecker <fweisbec@gmail.com>

printk: Provide a wake_up_klogd() off-case

wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on
log_wait waitqueue. It should be a stub in this case for users like
bust_spinlocks().

Otherwise this results in this warning when CONFIG_PRINTK=n and
CONFIG_IRQ_WORK=n:

kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'

To fix this, provide an off-case for wake_up_klogd() when
CONFIG_PRINTK=n.

There is much more from console_unlock() and other console related code
in printk.c that should be moved under CONFIG_PRINTK. But for now,
focus on a minimal fix as we passed the merged window already.

[akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b92021b0 14-Mar-2013 Rusty Russell <rusty@rustcorp.com.au>

CONFIG_SYMBOL_PREFIX: cleanup.

We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_". But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.

Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something. So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:

1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
for pasting.

(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).

Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com> (metag)


# c142be8e 13-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add skip argument to trace_dump_stack()

Altough the trace_dump_stack() already skips three functions in
the call to stack trace, which gets the stack trace to start
at the caller of the function, the caller may want to skip some
more too (as it may have helper functions).

Add a skip argument to the trace_dump_stack() that lets the caller
skip back tracing functions that it doesn't care about.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9d3c752c 08-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Optimize trace_printk() with one arg to use trace_puts()

Although trace_printk() is extremely fast, especially when it uses
trace_bprintk() (writes args straight to buffer instead of inserting
into string), it still has the overhead of calling one of the printf
sprintf() functions, that need to scan the fmt string to determine
what, if any args it has.

This is a waste of precious CPU cycles if the printk format has no
args but a single constant string. It is better to use trace_puts()
which does not have the overhead of the fmt scanning.

But wouldn't it be nice if the developer didn't have to think about
such things, and the compile would just do it for them?

trace_printk("this string has no args\n");
[...]
trace_printk("this sting does %p %d\n", foo, bar);

As tracing is critical to have the least amount of overhead,
especially when dealing with race conditions, and you want to
eliminate any "Heisenbugs", you want the trace_printk() to use the
fastest possible means of tracing.

Currently the macro magic determines if it will use trace_bprintk()
or if the fmt is a dynamic string (a variable), it will fall
back to the slow trace_printk() method that does a full snprintf()
before copying it into the buffer, where as trace_bprintk() only
copys the pointer to the fmt and the args into the buffer.

Well, now there's a way to spend some more Hogwarts cash and come
up with new fancy macro magic.

#define trace_printk(fmt, ...) \
do { \
char _______STR[] = __stringify((__VA_ARGS__)); \
if (sizeof(_______STR) > 3) \
do_trace_printk(fmt, ##__VA_ARGS__); \
else \
trace_puts(fmt); \
} while (0)

The above needs a bit of explaining (both here and in the comments).

By stringifying the __VA_ARGS__, we can, at compile time, determine
the number of args that are being passed to trace_printk(). The extra
parenthesis are required, otherwise the compiler complains about
too many parameters for __stringify if there is more than one arg.

When there are no args, the __stringify((__VA_ARGS__)) converts into
"()\0", a string of 3 characters. Anything else, will be a string
containing more than 3 characters. Now we assign that string to a
dynamic char array, and then take the sizeof() of that array.
If it is greater than 3 characters, we know trace_printk() has args
and we need to do the full "do_trace_printk()" on them, otherwise
it was only passed a single arg and we can optimize to use trace_puts().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>


# 09ae7234 08-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

tracing: Add trace_puts() for even faster trace_printk() tracing

The trace_printk() is extremely fast and is very handy as it can be
used in any context (including NMIs!). But it still requires scanning
the fmt string for parsing the args. Even the trace_bprintk() requires
a scan to know what args will be saved, although it doesn't copy the
format string itself.

Several times trace_printk() has no args, and wastes cpu cycles scanning
the fmt string.

Adding trace_puts() allows the developer to use an even faster
tracing method that only saves the pointer to the string in the
ring buffer without doing any format parsing at all. This will
help remove even more of the "Heisenbug" effect, when debugging.

Also fixed up the F_printk()s for the ftrace internal bprint and print events.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# ad909e21 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com>

tracing: Add internal tracing_snapshot() functions

The new snapshot feature is quite handy. It's a way for the user
to take advantage of the spare buffer that, until then, only
the latency tracers used to "snapshot" the buffer when it hit
a max latency. Now users can trigger a "snapshot" manually when
some condition is hit in a program. But a snapshot currently can
not be triggered by a condition inside the kernel.

With the addition of tracing_snapshot() and tracing_snapshot_alloc(),
snapshots can now be taking when a condition is hit, and the
developer wants to snapshot the case without stopping the trace.

Note, any snapshot will overwrite the old one, so take care
in how this is done.

These new functions are to be used like tracing_on(), tracing_off()
and trace_printk() are. That is, they should never be called
in the mainline Linux kernel. They are solely for the purpose
of debugging.

The tracing_snapshot() will not allocate a buffer, but it is
safe to be called from any context (except NMIs). But if a
snapshot buffer isn't allocated when it is called, it will write
to the live buffer, complaining about the lack of a snapshot
buffer, and then stop tracing (giving you the "permanent snapshot").

tracing_snapshot_alloc() will allocate the snapshot buffer if
it was not already allocated and then take the snapshot. This routine
*may sleep*, and must be called from context that can sleep.
The allocation is done with GFP_KERNEL and not atomic.

If you need a snapshot in an atomic context, say in early boot,
then it is best to call the tracing_snapshot_alloc() before then,
where it will allocate the buffer, and then you can use the
tracing_snapshot() anywhere you want and still get snapshots.

Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 373d4d09 20-Jan-2013 Rusty Russell <rusty@rustcorp.com.au>

taint: add explicit flag to show whether lock dep is still OK.

Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# c4e18497 20-Dec-2012 Guenter Roeck <linux@roeck-us.net>

linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors

Commit 263a523d18bc ("linux/kernel.h: Fix warning seen with W=1 due to
change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to
change in DIV_ROUND_CLOSEST.

Unfortunately, the C compiler converts divide operations with unsigned
divisors to unsigned, even if the dividend is signed and negative (for
example, -10 / 5U = 858993457). The C standard says "If one operand has
unsigned int type, the other operand is converted to unsigned int", so
the compiler is not to blame. As a result, DIV_ROUND_CLOSEST(0, 2U) and
similar operations now return bad values, since the automatic conversion
of expressions such as "0 - 2U/2" to unsigned was not taken into
account.

Fix by checking for the divisor variable type when deciding which
operation to perform. This fixes DIV_ROUND_CLOSEST(0, 2U), but still
returns bad values for negative dividends divided by unsigned divisors.
Mark the latter case as unsupported.

One observed effect of this problem is that the s2c_hwmon driver reports
a value of 4198403 instead of 0 if the ADC reads 0.

Other impact is unpredictable. Problem is seen if the divisor is an
unsigned variable or constant and the dividend is less than (divisor/2).

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Juergen Beisert <jbe@pengutronix.de>
Tested-by: Juergen Beisert <jbe@pengutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: <stable@vger.kernel.org> [3.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4c925d60 17-Dec-2012 Eldad Zack <eldad@fogrefinery.com>

kstrto*: add documentation

As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc
comments. This patch adds kerneldoc comments to common variants of
kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Joe Perches <joe@perches.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d84da3f9 11-Dec-2012 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm: use IS_ENABLED(CONFIG_COMPACTION) instead of COMPACTION_BUILD

We don't need custom COMPACTION_BUILD anymore, since we have handy
IS_ENABLED().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e5adfffc 11-Dec-2012 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD

We don't need custom NUMA_BUILD anymore, since we have handy
IS_ENABLED().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cbdbf2ab 22-Nov-2012 James Hogan <jhogan@kernel.org>

linux/kernel.h: define SYMBOL_PREFIX

Define SYMBOL_PREFIX to be the same as CONFIG_SYMBOL_PREFIX if set by
the architecture, or "" otherwise. This avoids the need for ugly #ifdefs
whenever symbols are referenced in asm blocks.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 60efc15a 25-Oct-2012 Michal Hocko <mhocko@suse.cz>

linux/kernel.h: Remove duplicate trace_printk declaration

!CONFIG_TRACING both declares and defines (empty) trace_printk.
The first one is not redundant so it can be removed.

Link: http://lkml.kernel.org/r/1351172511-18125-1-git-send-email-mhocko@suse.cz

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 607ca46e 13-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Disintegrate include/linux

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>


# 263a523d 18-Sep-2012 Guenter Roeck <linux@roeck-us.net>

linux/kernel.h: Fix warning seen with W=1 due to change in DIV_ROUND_CLOSEST

After commit b6d86d3d (Fix DIV_ROUND_CLOSEST to support negative dividends),
the following warning is seen if the kernel is compiled with W=1 (-Wextra):

warning: comparison of unsigned expression >= 0 is always true

The warning is due to the test '((typeof(x))-1) >= 0', which is used to detect
if the variable type is unsigned. Research on the web suggests that the warning
disappears if '>' instead of '>=' is used for the comparison.

Tests after changing the macro along that line show that the warning is gone,
and that the result is still correct:

i=-4: DIV_ROUND_CLOSEST(i, 2)=-2
i=-3: DIV_ROUND_CLOSEST(i, 2)=-2
i=-2: DIV_ROUND_CLOSEST(i, 2)=-1
i=-1: DIV_ROUND_CLOSEST(i, 2)=-1
i=0: DIV_ROUND_CLOSEST(i, 2)=0
i=1: DIV_ROUND_CLOSEST(i, 2)=1
i=2: DIV_ROUND_CLOSEST(i, 2)=1
i=3: DIV_ROUND_CLOSEST(i, 2)=2
i=4: DIV_ROUND_CLOSEST(i, 2)=2

Code size is the same as before.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Jean Delvare <khali@linux-fr.org>


# b6d86d3d 24-Aug-2012 Guenter Roeck <linux@roeck-us.net>

linux/kernel.h: Fix DIV_ROUND_CLOSEST to support negative dividends

DIV_ROUND_CLOSEST returns a bad result for negative dividends:
DIV_ROUND_CLOSEST(-2, 2) = 0

Most of the time this does not matter. However, in the hardware monitoring
subsystem, DIV_ROUND_CLOSEST is sometimes used on integers which can be
negative (such as temperatures).

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>


# bff9d186 21-Jul-2012 Rafael J. Wysocki <rjw@rjwysocki.net>

Remove SYSTEM_SUSPEND_DISK system state

The SYSTEM_SUSPEND_DISK system state is never used, so drop it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a3860c1c 31-May-2012 Xi Wang <xi.wang@gmail.com>

introduce SIZE_MAX

ULONG_MAX is often used to check for integer overflow when calculating
allocation size. While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.

This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Alex Elder <elder@dreamhost.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 44696908 23-May-2012 David S. Miller <davem@davemloft.net>

kernel: Move REPEAT_BYTE definition into linux/kernel.h

And make sure that everything using it explicitly includes
that header file.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 07d777fe 22-Sep-2011 Steven Rostedt <srostedt@redhat.com>

tracing: Add percpu buffers for trace_printk()

Currently, trace_printk() uses a single buffer to write into
to calculate the size and format needed to save the trace. To
do this safely in an SMP environment, a spin_lock() is taken
to only allow one writer at a time to the buffer. But this could
also affect what is being traced, and add synchronization that
would not be there otherwise.

Ideally, using percpu buffers would be useful, but since trace_printk()
is only used in development, having per cpu buffers for something
never used is a waste of space. Thus, the use of the trace_bprintk()
format section is changed to be used for static fmts as well as dynamic ones.
Then at boot up, we can check if the section that holds the trace_printk
formats is non-empty, and if it does contain something, then we
know a trace_printk() has been added to the kernel. At this time
the trace_printk per cpu buffers are allocated. A check is also
done at module load time in case a module is added that contains a
trace_printk().

Once the buffers are allocated, they are never freed. If you use
a trace_printk() then you should know what you are doing.

A buffer is made for each type of context:

normal
softirq
irq
nmi

The context is checked and the appropriate buffer is used.
This allows for totally lockless usage of trace_printk(),
and they no longer even disable interrupts.

Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 1ac101a5 23-Mar-2012 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

procfs: add num_to_str() to speed up /proc/stat

== stat_check.py
num = 0
with open("/proc/stat") as f:
while num < 1000 :
data = f.read()
f.seek(0, 0)
num = num + 1
==

perf shows

20.39% stat_check.py [kernel.kallsyms] [k] format_decode
13.41% stat_check.py [kernel.kallsyms] [k] number
12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf
10.85% stat_check.py [kernel.kallsyms] [k] memcpy
4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
4.43% stat_check.py [kernel.kallsyms] [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real 0m0.150s
user 0m0.026s
sys 0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real 0m0.055s
user 0m0.022s
sys 0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6061d949 23-Mar-2012 Joe Perches <joe@perches.com>

include/ and checkpatch: prefer __scanf to __attribute__((format(scanf,...)

It's equivalent to __printf, so prefer __scanf.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 93d68e52 19-Mar-2012 Steven Rostedt <srostedt@redhat.com>

tracing: Move the tracing_on/off() declarations into CONFIG_TRACING

The tracing_on/off() declarations were under CONFIG_RING_BUFFER, but
the functions are now only defined under CONFIG_TRACING as they are
specific to ftrace and not the ring buffer.

But the declarations were still defined under the ring buffer and
this caused the build to fail when CONFIG_RING_BUFFER was set but
CONFIG_TRACING was not.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 9993bc63 09-Mar-2012 Salman Qazi <sqazi@google.com>

sched/x86: Fix overflow in cyc2ns_offset

When a machine boots up, the TSC generally gets reset. However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel. The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec. The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.

We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.

Refactor code to share the calculation with the previous
fix in __cycles_2_ns().

Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 6c03438e 23-Nov-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

kernel.h: doesn't explicitly use bug.h, so don't include it.

This header isn't using bug.h infrastructure, but due to historical
reasons, it was including it. Removing it revealed several implicit
dependencies (since kernel.h is everywhere) so we've fixed those 1st
before deploying this change.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 35edd910 16-Nov-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

bug: consolidate BUILD_BUG_ON with other bug code

The support for BUILD_BUG in linux/kernel.h predates the
addition of linux/bug.h -- with this chunk off separate,
you can run into situations where a person gets a compile
fail even when they've included linux/bug.h, like this:

CC lib/string.o
lib/string.c: In function 'strlcat':
lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
make[2]: *** [lib/string.o] Error 1
$
$ grep linux/bug.h lib/string.c
#include <linux/bug.h>
$

Since the above violates the principle of least surprise, move
the BUG chunks from kernel.h to bug.h so it is all together.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# d8e5ddef 06-Feb-2012 H. Peter Anvin <hpa@zytor.com>

sysinfo: Move struct sysinfo to a separate header file

struct sysinfo is just about the only thing exported to userspace from
<linux/kernel.h>, so move it into a separate header file with a
residual #include in <linux/kernel.h>.

Originally-by: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-4pr1xnnksprt7t0h3w5fw4rv@git.kernel.org


# ff2d8b19 12-Jan-2012 Joe Perches <joe@perches.com>

treewide: convert uses of ATTRIB_NORETURN to __noreturn

Use the more commonly used __noreturn instead of ATTRIB_NORETURN.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9402c95f 12-Jan-2012 Joe Perches <joe@perches.com>

treewide: remove useless NORET_TYPE macro and uses

It's a very old and now unused prototype marking so just delete it.

Neaten panic pointer argument style to keep checkpatch quiet.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4da47859 12-Jan-2012 Joe Perches <joe@perches.com>

kernel.h: neaten panic prototype

Use __printf macro.
Convert NORET_AND to ATTRIB_NORET.
Use the normal kernel style for pointer arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1399ff86 10-Jan-2012 David Daney <david.daney@cavium.com>

kernel.h: add BUILD_BUG() macro

We can place this in definitions that we expect the compiler to remove by
dead code elimination. If this assertion fails, we get a nice error
message at build time.

The GCC function attribute error("message") was added in version 4.3, so
we define a new macro __linktime_error(message) to expand to this for
GCC-4.3 and later. This will give us an error diagnostic from the
compiler on the line that fails. For other compilers
__linktime_error(message) expands to nothing, and we have to be content
with a link time error, but at least we will still get a build error.

BUILD_BUG() expands to the undefined function __build_bug_failed() and
will fail at link time if the compiler ever emits code for it. On GCC-4.3
and later, attribute((error())) is used so that the failure will be noted
at compile time instead.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: DM <dm.n9107@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 55af7796 28-Nov-2011 Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>

x86: Panic on detection of stack overflow

Currently, messages are just output on the detection of stack
overflow, which is not sufficient for systems that need a
high reliability. This is because in general the overflow may
corrupt data, and the additional corruption may occur due to
reading them unless systems stop.

This patch adds the sysctl parameter
kernel.panic_on_stackoverflow and causes a panic when detecting
the overflows of kernel, IRQ and exception stacks except user
stack according to the parameter. It is disabled by default.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2449b8ba 24-Oct-2011 Ben Hutchings <ben@decadent.org.uk>

module,bug: Add TAINT_OOT_MODULE flag for modules not built in-tree

Use of the GPL or a compatible licence doesn't necessarily make the code
any good. We already consider staging modules to be suspect, and this
should also be true for out-of-tree modules which may receive very
little review.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Dave Jones <davej@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (patched oops-tracing.txt)


# 67d0a075 31-Oct-2011 Joe Perches <joe@perches.com>

kernel.h/checkpatch: mark strict_strto<foo> and simple_strto<foo> as obsolete

Mark obsolete/deprecated strict_strto<foo> and simple_strto<foo> functions
and macros as obsolete.

Update checkpatch to warn about their use.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 55036ba7 31-Oct-2011 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

lib: rename pack_hex_byte() to hex_byte_pack()

As suggested by Andrew Morton in [1] there is better to have most
significant part first in the function name.

[1] https://lkml.org/lkml/2011/9/20/22

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b9075fa9 31-Oct-2011 Joe Perches <joe@perches.com>

treewide: use __printf not __attribute__((format(printf,...)))

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b7804983 20-Sep-2011 Mimi Zohar <zohar@linux.vnet.ibm.com>

lib: add error checking to hex2bin

hex2bin converts a hexadecimal string to its binary representation.
The original version of hex2bin did not do any error checking. This
patch adds error checking and returns the result.

Changelog v1:
- removed unpack_hex_byte()
- changed return code from boolean to int

Changelog:
- use the new unpack_hex_byte()
- add __must_check compiler option (Andy Shevchenko's suggestion)
- change function API to return error checking result
(based on Tetsuo Handa's initial patch)

Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>


# 36a26c69 26-Jul-2011 Nicholas Bellinger <nab@linux-iscsi.org>

kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage

Add new DIV_ROUND_UP_SECTOR_T macro usage for 32-bit architectures requiring
a new DIV_ROUND_UP_ULL, and existing 64-bit usage with DIV_ROUND_UP.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>


# b9d4f426 25-Jul-2011 Arnaud Lacombe <lacombar@gmail.com>

include/linux/kernel.h: hide internal macros from userspace

Unexpose to userland the following macros
- __FUNCTION__
- NUMA_BUILD
- COMPACTION_BUILD
- REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD

Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9d00f92f 25-Jul-2011 WANG Cong <xiyou.wangcong@gmail.com>

include/linux/kernel.h: fix a headers_check warning

Fix the warning:

usr/include/linux/kernel.h:65: userspace cannot reference function or variable defined in the kernel

As Michal noted, BUILD_BUG_ON stuffs should be moved
under #ifdef __KERNEL__.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d902db1e 08-Jun-2011 Frederic Weisbecker <fweisbec@gmail.com>

sched: Generalize sleep inside spinlock detection

The sleeping inside spinlock detection is actually used
for more general sleeping inside atomic sections
debugging: preemption disabled, rcu read side critical
sections, interrupts, interrupt disabled, etc...

Change the name of the config and its help section to
reflect its more general role.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>


# ca39599c 15-Jun-2011 Dr. David Alan Gilbert <linux@treblig.org>

BUILD_BUG_ON_ZERO: fix sparse breakage

BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
CHECKER case otherwise various users of it become syntactically invalid.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c196e32a 24-May-2011 Alexey Dobriyan <adobriyan@gmail.com>

lib: add kstrto*_from_user()

There is quite a lot of code which does copy_from_user() + strict_strto*()
or simple_strto*() combo in slightly different ways.

Before doing conversions all over tree, let's get final API correct.

Enter kstrtoull_from_user() and friends.

Typical code which uses them looks very simple:

TYPE val;
int rv;

rv = kstrtoTYPE_from_user(buf, count, 0, &val);
if (rv < 0)
return rv;
[use val]
return count;

There is a tiny semantic difference from the plain kstrto*() API -- the
latter allows any amount of leading zeroes, while the former copies data
into buffer on stack and thus allows leading zeroes as long as it fits
into buffer.

This shouldn't be a problem for typical usecase "echo 42 > /proc/x".

The point is to make reading one integer from userspace _very_ simple and
very bug free.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 903c0c7c 24-May-2011 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

sparse: define dummy BUILD_BUG_ON definition for sparse

BUILD_BUG_ON() causes a syntax error to detect coding errors. So it
causes sparse to detect an error too. This reduces sparse's usefulness.

This patch makes a dummy BUILD_BUG_ON() definition for sparse.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cdbe61bf 05-May-2011 Steven Rostedt <srostedt@redhat.com>

ftrace: Allow dynamically allocated function tracers

Now that functions may be selected individually, it only makes sense
that we should allow dynamically allocated trace structures to
be traced. This will allow perf to allocate a ftrace_ops structure
at runtime and use it to pick and choose which functions that
structure will trace.

Note, a dynamically allocated ftrace_ops will always be called
indirectly instead of being called directly from the mcount in
entry.S. This is because there's no safe way to prevent mcount
from being preempted before calling the function, unless we
modify every entry.S to do so (not likely). Thus, dynamically allocated
functions will now be called by the ftrace_ops_list_func() that
loops through the ops that are allocated if there are more than
one op allocated at a time. This loop is protected with a
preempt_disable.

To determine if an ftrace_ops structure is allocated or not, a new
util function was added to the kernel/extable.c called
core_kernel_data(), which returns 1 if the address is between
_sdata and _edata.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 33ee3b2e 22-Mar-2011 Alexey Dobriyan <adobriyan@gmail.com>

kstrto*: converting strings to integers done (hopefully) right

1. simple_strto*() do not contain overflow checks and crufty,
libc way to indicate failure.
2. strict_strto*() also do not have overflow checks but the name and
comments pretend they do.
3. Both families have only "long long" and "long" variants,
but users want strtou8()
4. Both "simple" and "strict" prefixes are wrong:
Simple doesn't exactly say what's so simple, strict should not exist
because conversion should be strict by default.

The solution is to use "k" prefix and add convertors for more types.
Enter
kstrtoull()
kstrtoll()
kstrtoul()
kstrtol()
kstrtouint()
kstrtoint()

kstrtou64()
kstrtos64()
kstrtou32()
kstrtos32()
kstrtou16()
kstrtos16()
kstrtou8()
kstrtos8()

Include runtime testsuite (somewhat incomplete) as well.

strict_strto*() become deprecated, stubbed to kstrto*() and
eventually will be removed altogether.

Use kstrto*() in code today!

Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if
they'll be unused at runtime. This is temporarily solution,
because I don't want to hardcode list of archs where these
functions aren't needed. Current solution with sizeof() and
__alignof__ at least always works.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ffbbf2da 31-Jan-2011 Randy Dunlap <randy.dunlap@oracle.com>

kernel.h: fix kernel-doc warning

Fix kernel-doc warning in kernel.h from commit 7ef88ad56145
("BUILD_BUG_ON: make it handle more cases"):

Warning(include/linux/kernel.h:605): No description found for parameter 'condition'
Warning(include/linux/kernel.h:605): Excess function parameter 'cond' description in 'BUILD_BUG_ON'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1765e3a4 24-Jan-2011 Rusty Russell <rusty@rustcorp.com.au>

Remove MAYBE_BUILD_BUG_ON

Now BUILD_BUG_ON() can handle optimizable constants, we don't need
MAYBE_BUILD_BUG_ON any more.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 7ef88ad5 24-Jan-2011 Rusty Russell <rusty@rustcorp.com.au>

BUILD_BUG_ON: make it handle more cases

BUILD_BUG_ON used to use the optimizer to do code elimination or fail
at link time; it was changed to first the size of a negative array (a
nicer compile time error), then (in
8c87df457cb58fe75b9b893007917cf8095660a0) to a bitfield.

This forced us to change some non-constant cases to MAYBE_BUILD_BUG_ON();
as Jan points out in that commit, it didn't work as intended anyway.

bitfields: needs a literal constant at parse time, and can't be put under
"if (__builtin_constant_p(x))" for example.
negative array: can handle anything, but if the compiler can't tell it's
a constant, silently has no effect.
link time: breaks link if the compiler can't determine the value, but the
linker output is not usually as informative as a compiler error.

If we use the negative-array-size method *and* the link time trick,
we get the ability to use BUILD_BUG_ON() under __builtin_constant_p()
branches, and maximal ability for the compiler to detect errors at
build time.

We also document it thoroughly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jan Beulich <JBeulich@novell.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>


# 2ce802f6 19-Jan-2011 Tejun Heo <tj@kernel.org>

lockdep: Move early boot local IRQ enable/disable status to init/main.c

During early boot, local IRQ is disabled until IRQ subsystem is
properly initialized. During this time, no one should enable
local IRQ and some operations which usually are not allowed with
IRQ disabled, e.g. operations which might sleep or require
communications with other processors, are allowed.

lockdep tracked this with early_boot_irqs_off/on() callbacks.
As other subsystems need this information too, move it to
init/main.c and make it generally available. While at it,
toggle the boolean to early_boot_irqs_disabled instead of
enabled so that it can be initialized with %false and %true
indicates the exceptional condition.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3e7d3449 13-Jan-2011 Mel Gorman <mel@csn.ul.ie>

mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim

Lumpy reclaim is disruptive. It reclaims a large number of pages and
ignores the age of the pages it reclaims. This can incur significant
stalls and potentially increase the number of major faults.

Compaction has reached the point where it is considered reasonably stable
(meaning it has passed a lot of testing) and is a potential candidate for
displacing lumpy reclaim. This patch introduces an alternative to lumpy
reclaim whe compaction is available called reclaim/compaction. The basic
operation is very simple - instead of selecting a contiguous range of
pages to reclaim, a number of order-0 pages are reclaimed and then
compaction is later by either kswapd (compact_zone_order()) or direct
compaction (__alloc_pages_direct_compact()).

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: use conventional task_struct naming]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 71a90484 12-Jan-2011 Andrew Morton <akpm@linux-foundation.org>

include/linux/kernel.h: abs(): fix handling of 32-bit unsigneds on 64-bit

Michal reports:

In the framebuffer subsystem the abs() macro is often used as a part of
the calculation of a Manhattan metric, which in turn is used as a measure
of similarity between video modes. The arguments of abs() are sometimes
unsigned numbers. This worked fine until commit a49c59c0 ("Make sure the
value in abs() does not get truncated if it is greater than 2^32:) , which
changed the definition of abs() to prevent truncation. As a result of
this change, in the following piece of code:

u32 a = 0, b = 1;
u32 c = abs(a - b);

'c' will end up with a value of 0xffffffff instead of the expected 0x1.

A problem caused by this change and visible by the end user is that
framebuffer drivers relying on functions from modedb.c will fail to find
high resolution video modes similar to that explicitly requested by the
user if an exact match cannot be found (see e.g.

Fix this by special-casing `long' types within abs().

This patch reduces x86_64 code size a bit - drivers/video/uvesafb.o shrunk
by 15 bytes, presumably because it is doing abs() on 4-byte quantities,
and expanding those to 8-byte longs adds code.

testcase:

#define oldabs(x) ({ \
long __x = (x); \
(__x < 0) ? -__x : __x; \
})

#define newabs(x) ({ \
long ret; \
if (sizeof(x) == sizeof(long)) { \
long __x = (x); \
ret = (__x < 0) ? -__x : __x; \
} else { \
int __x = (x); \
ret = (__x < 0) ? -__x : __x; \
} \
ret; \
})

typedef unsigned int u32;

main()
{
u32 a = 0;
u32 b = 1;
u32 oldc = oldabs(a - b);
u32 newc = newabs(a - b);

printf("%u %u\n", oldc, newc);
}

akpm:/home/akpm> gcc t.c
akpm:/home/akpm> ./a.out
4294967295 1

Reported-by: Michal Januszewski <michalj@gmail.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dc88e460 23-Nov-2010 Mimi Zohar <zohar@linux.vnet.ibm.com>

lib: hex2bin converts ascii hexadecimal string to binary

Similar to the kgdb_hex2mem() code, hex2bin converts a string
to binary using the hex_to_bin() library call.

Changelog:
- Replace parameter names with src/dst (based on David Howell's comment)
- Add 'const' where needed (based on David Howell's comment)
- Replace int with size_t (based on David Howell's comment)

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# 968ab183 15-Nov-2010 Linus Torvalds <torvalds@linux-foundation.org>

include/linux/kernel.h: Move logging bits to include/linux/printk.h

Move the logging bits from kernel.h into printk.h so that
there is a bit more logical separation of the generic from
the printk logging specific parts.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eaf06b24 11-Nov-2010 Dan Rosenberg <drosenberg@vsecurity.com>

Restrict unprivileged access to kernel syslog

The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses. Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.

This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl. When set to "0", the default, no restrictions are
enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.

[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Eugene Teo <eugeneteo@kernel.org>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 074e61ec 09-Nov-2010 James Morris <jmorris@namei.org>

kernel: add roundup() code comment from akpm

Add roundup() code comment from akpm.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>


# 6070bf35 07-Nov-2010 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>

kernel: Constify temporary variable in roundup()

Fix build error with GCC 3.x caused by commit b28efd54
"kernel: roundup should only reference arguments once" by constifying
temporary variable used in that macro.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# 658716d1 26-Oct-2010 Brian Behlendorf <behlendorf1@llnl.gov>

div64_u64(): improve precision on 32bit platforms

The current implementation of div64_u64 for 32bit systems returns an
approximately correct result when the divisor exceeds 32bits. Since doing
64bit division using 32bit hardware is a long since solved problem we just
use one of the existing proven methods.

Additionally, add a div64_s64 function to correctly handle doing signed
64bit division.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=616105

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ben Woodard <bwoodard@llnl.gov>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 77006a0a 26-Oct-2010 Andrew Morton <akpm@linux-foundation.org>

ratelimit: add comment warning people off printk_ratelimit()

printk_ratelimit() was a bad idea - we don't want subsytem A causing
ratelimiting of subsystem B's messages.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a55621f1 26-Oct-2010 Andrew Morton <akpm@linux-foundation.org>

include/linux/kernel.h: add __must_check to strict_strto*()

The whole point to using the strict functions is to check the return
value. If you don't, strict_strto*() will return you uninitialised
garbage. Offenders have been observed in the wild.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f27c85c5 26-Oct-2010 Hagen Paul Pfeifer <hagen@jauu.net>

kernel.h: add {min,max}3 macros

Introduce two additional min/max macros to compare three operands. This
will save some cycles as well as some bytes on the stack and last but not
least more pleasing as macro nesting.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b28efd54 13-Oct-2010 Eric Paris <eparis@redhat.com>

kernel: roundup should only reference arguments once

Currently the roundup macro references it's arguments more than one time.
This patch changes it so it will only use its arguments once.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# 686a0f3d 13-Oct-2010 Eric Paris <eparis@redhat.com>

kernel: rounddown helper function

The roundup() helper function will round a given value up to a multiple of
another given value. aka roundup(11, 7) would give 14 = 7 * 2. This new
function does the opposite. It will round a given number down to the
nearest multiple of the second number: rounddown(11, 7) would give 7.

I need this in some future SELinux code and can carry the macro myself, but
figured I would put it in the core kernel so others might find and use it
if need be.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>


# c8bf1336 10-Sep-2010 Martin K. Petersen <martin.petersen@oracle.com>

Consolidate min_not_zero

We have several users of min_not_zero, each of them using their own
definition. Move the define to kernel.h.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>


# 12fdff3f 12-Aug-2010 David Howells <dhowells@redhat.com>

Add a dummy printk function for the maintenance of unused printks

Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 863a6049 10-Aug-2010 Anton Blanchard <anton@samba.org>

lib/bug.c: add oops end marker to WARN implementation

We are missing the oops end marker for the exception based WARN implementation
in lib/bug.c. This is useful for logfile analysis tools.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7ff0d9c 10-Aug-2010 TAMUKI Shoichi <tamuki@linet.gr.jp>

panic: keep blinking in spite of long spin timer mode

To keep panic_timeout accuracy when running under a hypervisor, the
current implementation only spins on long time (1 second) calls to mdelay.
That brings a good effect, but the problem is the keyboard LEDs don't
blink at all on that situation.

This patch changes to call to panic_blink_enter() between every mdelay and
keeps blinking in spite of long spin timer mode.

The time to call to mdelay is now 100ms. Even this change will keep
panic_timeout accuracy enough when running under a hypervisor.

Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cf4ca487 09-Aug-2010 Joe Perches <joe@perches.com>

kernel.h: remove unused NIPQUAD and NIPQUAD_FMT

There are no more uses of NIPQUAD or NIPQUAD_FMT. Remove the definitions.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eb7beb5c 15-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com>

tracing: Remove special traces

Special traces type was only used by sysprof. Lets remove it now
that sysprof ftrace plugin has been dropped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>


# 69c8f52b 01-Jul-2010 Justin P. Mattock <justinmattock@gmail.com>

fix #warning about using kernel headers in userpsace

Move the preprocessor #warning message:
warning: #warning Attempt to use kernel headers from user space,
see http://kernelnewbies.org/KernelHeaders
from kernel.h to types.h.

And also fixe the #warning message due to the preprocessor not being able to
read the web address due to it thinking it was the start of a comment. also
remove the extra #ifndef _KERNEL_ since it's already there.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 7db6f5fb 26-Jun-2010 Joe Perches <joe@perches.com>

vsprintf: Recursive vsnprintf: Add "%pV", struct va_format

Add the ability to print a format and va_list from a structure pointer

Allows __dev_printk to be implemented as a single printk while
minimizing string space duplication.

%pV should not be used without some mechanism to verify the
format and argument use ala __attribute__(format (printf(...))).

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6de9f08 31-May-2010 Huang Ying <ying.huang@intel.com>

x86, mce: Add HW_ERR printk prefix for hardware error logging

This makes hardware error related log in printk log more explicit. So
that the users can report it to hardware vendor instead of LKML or
software vendor.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <1275295689.3444.462.camel@yhuang-dev.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# 90378889 24-May-2010 Andy Shevchenko <ext-andriy.shevchenko@nokia.com>

lib: introduce common method to convert hex digits

hex_to_bin() is a little method which converts hex digit to its actual
value. There are plenty of places where such functionality is needed.

[akpm@linux-foundation.org: use tolower(), saving 3 bytes, test the more common case first - it's quicker]
[akpm@linux-foundation.org: relocate tolower to make it even faster! (Joe)]
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Duncan Sands <duncan.sands@free.fr>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d8521fcc 24-May-2010 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

printk_ratelimited(): fix uninitialized spinlock

ratelimit_state initialization of printk_ratelimited() seems broken. This
fixes it by using DEFINE_RATELIMIT_STATE() to initialize spinlock
properly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fc62f2f1 24-May-2010 Joe Perches <joe@perches.com>

kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn

The current logging macros are
pr_<level>, dev_<level>, netdev_<level>, and netif_<level>.
pr_ uses warning, the other use warn.

Standardize these logging macros a bit more by adding pr_warn and
pr_warn_ratelimited.

Right now, there are:

$ for level in emerg alert crit err warn warning notice info ; do \
for prefix in pr dev netdev netif ; do \
echo -n "${prefix}_${level}: `git grep -w "${prefix}_${level}" | wc -l` " ; \
done ; \
echo ; \
done
pr_emerg: 45 dev_emerg: 4 netdev_emerg: 1 netif_emerg: 4
pr_alert: 24 dev_alert: 36 netdev_alert: 1 netif_alert: 6
pr_crit: 24 dev_crit: 22 netdev_crit: 1 netif_crit: 4
pr_err: 2013 dev_err: 8467 netdev_err: 267 netif_err: 240
pr_warn: 0 dev_warn: 1818 netdev_warn: 126 netif_warn: 23
pr_warning: 773 dev_warning: 0 netdev_warning: 0 netif_warning: 0
pr_notice: 148 dev_notice: 111 netdev_notice: 9 netif_notice: 3
pr_info: 1717 dev_info: 3007 netdev_info: 101 netif_info: 85

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4be929be 24-May-2010 Alexey Dobriyan <adobriyan@gmail.com>

kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN

- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 92946bc7 03-Apr-2010 Ben Hutchings <ben@decadent.org.uk>

panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I')

This taint flag will initially be used when warning about invalid ACPI
DMAR tables.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>


# cecbca96 18-Apr-2010 Frederic Weisbecker <fweisbec@gmail.com>

tracing: Dump either the oops's cpu source or all cpus buffers

The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>


# 9f93ff5b 13-Apr-2010 Alexey Dobriyan <adobriyan@gmail.com>

Restore __ALIGN_MASK()

Fix lib/bitmap.c compile failure due to __ALIGN_KERNEL changes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# a79ff731 13-Apr-2010 Alexey Dobriyan <adobriyan@gmail.com>

netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()

XT_ALIGN() was rewritten through ALIGN() by commit 42107f5009da223daa800d6da6904d77297ae829
"netfilter: xtables: symmetric COMPAT_XT_ALIGN definition".
ALIGN() is not exported in userspace headers, which created compile problem for tc(8)
and will create problem for iptables(8).

We can't export generic looking name ALIGN() but we can export less generic
__ALIGN_KERNEL() (suggested by Ben Hutchings).
Google knows nothing about __ALIGN_KERNEL().

COMPAT_XT_ALIGN() changed for symmetry.

Reported-by: Andreas Henriksson <andreas@fatal.se>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# bb1dc0ba 06-Apr-2010 Yong Zhang <yong.zhang@windriver.com>

kernel.h: fix wrong usage of __ratelimit()

When __ratelimit() returns 1 this means that we can go ahead.

Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b3be9f9 10-Feb-2010 Yinghai Lu <yinghai@kernel.org>

Move round_up/down to kernel.h

... in preparation of moving early_res to kernel/.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-26-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>


# cc8ef6eb 15-Jan-2010 Roland Dreier <rdreier@cisco.com>

kernel.h: add BUILD_BUG_ON_NOT_POWER_OF_2()

Add BUILD_BUG_ON_NOT_POWER_OF_2()

When code relies on a constant being a power of 2:

#define FOO 512 /* must be a power of 2 */

it would be nice to be able to do:

BUILD_BUG_ON(!is_power_of_2(FOO));

However applying an inline function does not result in a compile-time
constant that can be used with BUILD_BUG_ON(), so trying that gives
results in:

error: bit-field '<anonymous>' width not an integer constant

As suggested by akpm, rather than monkeying around with is_power_of_2()
and risking gcc warts about constant expressions, just create a macro
BUILD_BUG_ON_NOT_POWER_OF_2() to encapsulate this common requirement.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: David Dillow <dave@thedillows.org>
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d894837f 23-Dec-2009 Simon Kagstrom <simon.kagstrom@netinsight.net>

sched: might_sleep(): Make file parameter const char *

Fixes a warning when building with g++:

warning: deprecated conversion from string constant to 'char*'

And the file parameter use is constant, so mark it as such.

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Cc: peterz@infradead.org
LKML-Reference: <20091223110818.442d848e@marrow.netinsight.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 8a64f336 14-Dec-2009 Joe Perches <joe@perches.com>

kernel.h: add printk_ratelimited and pr_<level>_rl

Add a printk_ratelimited statement expression macro that uses a per-call
ratelimit_state so that multiple subsystems output messages are not
suppressed by a global __ratelimit state.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: s/_rl/_ratelimited/g]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Naohiro Ooiwa <nooiwa@miraclelinux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0b2749aa 14-Dec-2009 Joe Perches <joe@perches.com>

kernel.h: remove initialization of bool in printk_once

Don't initialize __print_once. Invert the test to reduce initialized
data.

defconfig before: $size vmlinux
text data bss dec hex filename
6976022 679572 1359668 9015262 898fde vmlinux

defconfig after: $size vmlinux
text data bss dec hex filename
6976006 679508 1359700 9015214 898fae vmlinux

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 00b55864 14-Dec-2009 Joe Perches <joe@perches.com>

dynamic_debug.h/kernel.h: Remove KBUILD_MODNAME from dynamic_pr_debug

If CONFIG_DYNAMIC_DEBUG is enabled and a source file has:

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/kernel.h>

dynamic_debug.h will duplicate KBUILD_MODNAME
in the output string.

Remove the use of KBUILD_MODNAME from the
output format string generated by dynamic_debug.h

If CONFIG_DYNAMIC_DEBUG is not enabled, no compile-time
check is done to printk/dev_printk arguments.

Add it.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e36c5458 14-Dec-2009 Steven Rostedt <srostedt@redhat.com>

tracing: Fix return of trace_dump_stack()

The trace_dump_stack() returned a value for a void function.

Also, added the missing stub for trace_dump_stack() when tracing is
not configured.

Reported-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20091214162713.GA31060@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 03889384 11-Dec-2009 Steven Rostedt <srostedt@redhat.com>

tracing: Add trace_dump_stack()

I've been asked a few times about how to find out what is calling
some location in the kernel. One way is to use dynamic function tracing
and implement the func_stack_trace. But this only finds out who is
calling a particular function. It does not tell you who is calling
that function and entering a specific if conditional.

I have myself implemented a quick version of trace_dump_stack() for
this purpose a few times, and just needed it now. This is when I realized
that this would be a good tool to have in the kernel like trace_printk().

Using trace_dump_stack() is similar to dump_stack() except that it
writes to the trace buffer instead and can be used in critical locations.

For example:

@@ -5485,8 +5485,12 @@ need_resched_nonpreemptible:
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
- else
+ else {
deactivate_task(rq, prev, 1);
+ trace_printk("Deactivating task %s:%d\n",
+ prev->comm, prev->pid);
+ trace_dump_stack();
+ }
switch_count = &prev->nvcsw;
}

Produces:

<...>-3249 [001] 296.105269: schedule: Deactivating task ntpd:3249
<...>-3249 [001] 296.105270: <stack trace>
=> schedule
=> schedule_hrtimeout_range
=> poll_schedule_timeout
=> do_select
=> core_sys_select
=> sys_select
=> system_call_fastpath

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 5c828713 23-Oct-2009 Christian Borntraeger <borntraeger@de.ibm.com>

ratelimit: Make suppressed output messages more useful

Today I got:

[39648.224782] Registered led device: iwl-phy0::TX
[40676.545099] __ratelimit: 246 callbacks suppressed
[40676.545103] abcdef[23675]: segfault at 0 ...

as you can see the ratelimit message contains a function prefix.
Since this is always __ratelimit, this wont help much.

This patch changes __ratelimit and printk_ratelimit to print the
function name that calls ratelimit.

This will pinpoint the responsible function, as long as not several
different places call ratelimit with the same ratelimit state at
the same time. In that case we catch only one random function that
calls ratelimit after the wait period.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200910231458.11832.borntraeger@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# c01226c3 21-Sep-2009 Arnd Bergmann <arnd@arndb.de>

warn about use of uninstalled kernel headers

User applications frequently hit problems when they try to use
the kernel headers directly, rather than the exported headers.

This adds an explicit warning for this case, and points to
a URL holding an explanation of why this is wrong and what
to do about it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>


# a49c59c0 22-Sep-2009 Rolf Eike Beer <eike-kernel@sf-tec.de>

Make sure the value in abs() does not get truncated if it is greater than 2^32

abs() will truncate the input if is it outside the 2^32 range. Fix that
by assuming `long' input.

This might generate worse code in the common case.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8c87df45 22-Sep-2009 Jan Beulich <JBeulich@novell.com>

BUILD_BUG_ON(): fix it and a couple of bogus uses of it

gcc permitting variable length arrays makes the current construct used for
BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
controlling expression isn't really constant. Instead, this patch makes
it so that a bit field gets used here. Consequently, those uses where the
condition isn't really constant now also need fixing.

Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
the expression is compile time constant (__builtin_constant_p() yields
true), the array is still deemed of variable length by gcc, and hence the
whole expression doesn't have the intended effect.

[akpm@linux-foundation.org: make arch/sparc/include/asm/vio.h compile]
[akpm@linux-foundation.org: more nonsensical assertions in tpm.c..]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 70867453 22-Sep-2009 Roland Dreier <rdreier@cisco.com>

printk_once(): use bool for boolean flag

Using the type bool (instead of int) for the __print_once flag in the
printk_once() macro matches the intent of the code better, and allows the
compiler to generate smaller code; eg a typical callsite with gcc 4.3.3 on
i386:

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-6 (-6)
function old new delta
static.__print_once 4 1 -3
get_cpu_vendor 146 143 -3

Saving 6 bytes of object size per callsite by slightly improving the
readability of the source seems like a win to me.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# af91322e 22-Sep-2009 Dave Young <hidave.darkstar@gmail.com>

printk: add printk_delay to make messages readable for some scenarios

When syslog is not possible, at the same time there's no serial/net
console available, it will be hard to read the printk messages. For
example oops/panic/warning messages in shutdown phase.

Add a printk delay feature, we can make each printk message delay some
milliseconds.

Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay

The value range from 0 - 10000, default value is 0

[akpm@linux-foundation.org: fix a few things]
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3fff4c42 22-Sep-2009 Ingo Molnar <mingo@elte.hu>

printk: Remove ratelimit.h from kernel.h

Decouple kernel.h from ratelimit.h: the global declaration of
printk's ratelimit_state is not needed, and it leads to messy
circular dependencies due to ratelimit.h's (new) adding of a
spinlock_types.h include.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e4aafea2 16-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com>

sched: Add a preempt count base offset to __might_sleep()

Add a preempt count base offset to compare against the current
preempt level count. It prepares to pull up the might_sleep
check from cond_resched() to cond_resched_lock() and
cond_resched_bh().

For these two helpers, we need to respectively ensure that once
we'll unlock the given spinlock / reenable local softirqs, we
will reach a sleepable state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Move and rename preempt_count_equals() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e09758fa 16-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com>

sched: Cover the CONFIG_DEBUG_SPINLOCK_SLEEP off-case for __might_sleep()

Cover the off case for __might_sleep(), so that we avoid
#ifdefs in files that make use of it. Especially, this prepares
for the __might_sleep() pull up on cond_resched().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1247725694-6082-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5211a242 24-Jun-2009 Kurt Garloff <garloff@suse.de>

x86: Add sysctl to allow panic on IOCK NMI error

This patch introduces a new sysctl:

/proc/sys/kernel/panic_on_io_nmi

which defaults to 0 (off).

When enabled, the kernel panics when the kernel receives an NMI
caused by an IO error.

The IO error triggered NMI indicates a serious system
condition, which could result in IO data corruption. Rather
than contiuing, panicing and dumping might be a better choice,
so one can figure out what's causing the IO error.

This could be especially important to companies running IO
intensive applications where corruption must be avoided, e.g. a
bank's databases.

[ SuSE has been shipping it for a while, it was done at the
request of a large database vendor, for their users. ]

Signed-off-by: Kurt Garloff <garloff@suse.de>
Signed-off-by: Roberto Angelino <robertangelino@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <20090624213211.GA11291@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 90c699a9 19-Jun-2009 Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

block: rename CONFIG_LBD to CONFIG_LBDAF

Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".

Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadays

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# e28d7137 16-Jun-2009 Linus Torvalds <torvalds@linux-foundation.org>

printk: Add KERN_DEFAULT printk log-level

This adds a KERN_DEFAULT loglevel marker, for when you cannot decide
which loglevel you want, and just want to keep an existing printk
with the default loglevel.

The difference between having KERN_DEFAULT and having no log-level
marker at all is two-fold:

- having the log-level marker will now force a new-line if the
previous printout had not added one (perhaps because it forgot,
but perhaps because it expected a continuation)

- having a log-level marker is required if you are printing out a
message that otherwise itself could perhaps otherwise be mistaken
for a log-level.

Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5fd29d6c 16-Jun-2009 Linus Torvalds <torvalds@linux-foundation.org>

printk: clean up handling of log-levels and newlines

It used to be that we would only look at the log-level in a printk()
after explicit newlines, which can cause annoying problems when the
previous printk() did not end with a '\n'. In that case, the log-level
marker would be just printed out in the middle of the line, and be
seen as just noise rather than change the logging level.

This changes things to always look at the log-level in the first
bytes of the printout. If a log level marker is found, it is always
used as the log-level. Additionally, if no newline existed, one is
added (unless the log-level is the explicit KERN_CONT marker, to
explicitly show that it's a continuation of a previous line).

Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 156f5a78 02-Jun-2009 GeunSik Lim <leemgs1@gmail.com>

debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.

Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/"
directory name to mount debugfs filesystem for ftrace according to
./Documentation/tracers/ftrace.txt file.

And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is
existed in kernel source like ftrace, DRM, Wireless, Documentation,
Network[sky2]files to mount debugfs filesystem.

debugfs means debug filesystem for debugging easy to use by greg kroah
hartman. "/sys/kernel/debug/" name is suitable as directory name
of debugfs filesystem.
- debugfs related reference: http://lwn.net/Articles/334546/

Fix inconsistency of directory name to mount debugfs filesystem.

* From Steven Rostedt
- find_debugfs() and tracing_files() in this patch.

Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com>
Reviewed-by : Steven Rostedt <rostedt@goodmis.org>
Reviewed-by : James Smart <james.smart@emulex.com>
CC: Jiri Kosina <trivial@kernel.org>
CC: David Airlie <airlied@linux.ie>
CC: Peter Osterlund <petero2@telia.com>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 4ccb4579 09-Apr-2009 Michael Ellerman <michael@ellerman.id.au>

dynamic debug: resurrect old pr_debug() semantics as pr_devel()

pr_debug() used to produce zero code unless DEBUG was #defined. This is
now no longer the case in practice[1].

There are places where it's useful to have debugging printks, but we don't
want them to generate any code in production kernels.

So add a new macro, pr_devel(), for _devel_opment, to provide the old
semantics, ie. if the programmer doesn't explicitly enable debugging, no
code is produced.

[1]: You can turn CONFIG_DYNAMIC_DEBUG off, but it's enabled in at least
one distro kernel, so it's not really a solution.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Greg Banks <gnb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 04d491ab 02-Apr-2009 Neil Horman <nhorman@tuxdriver.com>

kexec: add dmesg log symbols to /proc/vmcoreinfo lists

It would be nice to be able to extract the dmesg log from a vmcore file
without needing to keep the debug symbols for the running kernel handy all
the time. We have a facility to do this in /proc/vmcore. This patch adds
the log_buf and log_end symbols to the vmcoreinfo area so that tools (like
makedumpfile) can easily extract the dmesg logs from a vmcore image.

[akpm@linux-foundation.org: several fixes and cleanups]
[akpm@linux-foundation.org: fix unused log_buf_kexec_setup()]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Simon Horman <horms@verge.net.au>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 311d0761 31-Mar-2009 Cyrill Gorcunov <gorcunov@gmail.com>

introduce pr_cont() macro

We cover all log-levels by pr_... macros except KERN_CONT one. Add it
for convenience.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2c60b688 28-Mar-2009 Harvey Harrison <harvey.harrison@gmail.com>

kernel: remove HIPQUAD()

All users have been removed.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6e66b02 11-Mar-2009 Greg Banks <gnb@sgi.com>

Dynamic debug: fix pr_fmt() build error

When CONFIG_DYNAMIC_DEBUG is enabled, allow callers of pr_debug()
to provide their own definition of pr_fmt() even if that definition
uses tricks like

#define pr_fmt(fmt) "%s:" fmt, __func__

Signed-off-by: Greg Banks <gnb@sgi.com>
Cc: Jason Baron <jbaron@redhat.com>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e9d376f0 05-Feb-2009 Jason Baron <jbaron@redhat.com>

dynamic debug: combine dprintk and dynamic printk

This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.

The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.

for example, enabled all debug messages in module 'nf_conntrack':

echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control

to disable them:

echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control

A further explanation can be found in the documentation patch.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 48ead020 12-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/core: bring back raw trace_printk for dynamic formats strings

Impact: fix callsites with dynamic format strings

Since its new binary implementation, trace_printk() internally uses static
containers for the format strings on each callsites. But the value is
assigned once at build time, which means that it can't take dynamic
formats.

So this patch unearthes the raw trace_printk implementation for the callers
that will need trace_printk to be able to carry these dynamic format
strings. The trace_printk() macro will use the appropriate implementation
for each callsite. Most of the time however, the binary implementation will
still be used.

The other impact of this patch is that mmiotrace_printk() will use the old
implementation because it calls the low level trace_vprintk and we can't
guess here whether the format passed in it is dynamic or not.

Some parts of this patch have been written by Steven Rostedt (most notably
the part that chooses the appropriate implementation for each callsites).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 7bffc23e 09-Mar-2009 Ingo Molnar <mingo@elte.hu>

tracing: optimize trace_printk()

Impact: micro-optimization

trace_printk() does this unconditionally:

trace_printk_fmt = fmt;

Where trace_printk_fmt is an entry into a global array. This is
very SMP-unfriendly.

So only write it once per bootup.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 769b0441 06-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com>

tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()

Impact: faster and lighter tracing

Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.

Some changes result of this:

- Previously, trace_bprintk depended of a single tracer and couldn't
work without. This tracer has been dropped and the whole implementation
of trace_printk() (like the module formats management) is now integrated
in the tracing core (comes with CONFIG_TRACING), though we keep the file
trace_printk (previously trace_bprintk.c) where we can find the module
management. Thus we don't overflow trace.c

- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
constants, and fix 'const' qualifiers warnings. But this is all transparent for
developers.

- etc...

V2:

- Rebase against last changes
- Fix mispell on the changelog

V3:

- Rebase against last changes (moving trace_printk() to kernel.h)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 2002c258 05-Mar-2009 Steven Rostedt <srostedt@redhat.com>

tracing: add tracing_on/tracing_off to kernel.h

Impact: cleanup

The functions tracing_start/tracing_stop have been moved to kernel.h.
These are not the functions a developer most likely wants to use
when they want to insert a place to stop tracing and restart it from
user space.

tracing_start/tracing_stop was created to work with things like
suspend to ram, where even calling smp_processor_id() can crash the
system. The tracing_start/tracing_stop was used to stop the tracer from
doing anything. These are still light weight functions, but add a bit
more overhead to be able to stop the tracers. They also have no interface
back to userland. That is, if the kernel calls tracing_stop, userland
can not start tracing.

What a developer most likely wants to use is tracing_on/tracing_off.
These are very light weight functions (simply sets or clears a bit).
These functions just stop recording into the ring buffer. The tracers
don't even know that this happens except that they would receive NULL
from the ring_buffer_lock_reserve function.

Also, there's a way for the user land to enable or disable this bit.
In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
or echo "1" (same as tracing_on()) into this file. This becomes handy when
a kernel developer is debugging and wants tracing to turn off when it
hits an anomaly. Then the developer can examine the trace, and restart
tracing if they want to try again (echo 1 > tracing_on).

This patch moves the prototypes for tracing_on/tracing_off to kernel.h
and comments their use, so that a kernel developer will know how
to use them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>


# 526211bc 05-Mar-2009 Ingo Molnar <mingo@elte.hu>

tracing: move utility functions from ftrace.h to kernel.h

Make common utility functions such as trace_printk() and
tracing_start()/tracing_stop() generally available to kernel
code.

Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ac7b9004 04-Feb-2009 Peter Zijlstra <peterz@infradead.org>

generic swap(): don't return a value from swap()

The swap() macro is accidentally retuning the value of its first argument.
Change it into a doesn't-return-anything macro before someone goes and
relies upon this behaviour.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# f036be96 05-Feb-2009 Ingo Molnar <mingo@elte.hu>

printk: introduce printk_once()

This pattern shows up frequently in the kernel:

static int once = 1;
...

if (once) {
once = 0;
printk(KERN_ERR "message\n");
}
...

So add a printk_once() helper macro that reduces this to a single line
of:

printk_once(KERN_ERR "message\n");

It works analogously to WARN_ONCE() & friends. (We use a macro not
an inline because vararg expansion in inlines looks awkward and the
macro is simple enough.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 91f68b73 07-Jan-2009 Wu Fengguang <fengguang.wu@intel.com>

generic swap(): introduce global macro swap(a, b)

There have been some local definitions of swap(), it's time to replace
them all with a uniform one.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9fe06081 06-Jan-2009 Darrick J. Wong <djwong@us.ibm.com>

Create a DIV_ROUND_CLOSEST macro to do division with rounding

Create a helper macro to divide two numbers and round the result to the
nearest whole number. This is a helper macro for hwmon drivers that want
to convert incoming sysfs values per standard hwmon practice, though the
macro itself can be used by anyone.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0d85ff9 04-Dec-2008 Cornelia Huck <cornelia.huck@de.ibm.com>

Make DEBUG take precedence over DYNAMIC_PRINTK_DEBUG

Statically defined DEBUG should take precedence over
dynamically enabled debugging; otherwise adding DEBUG
(like, for example, via CONFIG_DEBUG_KOBJECT) does not
have the expected result of printing pr_debug() and dev_dbg()
messages unconditionally.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# d091c2f5 12-Nov-2008 Martin Schwidefsky <schwidefsky@de.ibm.com>

Add 'pr_fmt()' format modifier to pr_xyz macros.

A common reason for device drivers to implement their own printk macros
is the lack of a printk prefix with the standard pr_xyz macros.
Introduce a pr_fmt() macro that is applied for every pr_xyz macro to the
format string.

The most common use of the pr_fmt macro would be to add the name of the
device driver to all pr_xyz messages in a source file.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7106a27b 29-Oct-2008 Randy Dunlap <randy.dunlap@oracle.com>

kernel.h: fix might_sleep kernel-doc

Put the kernel-doc for might_sleep() _immediately_ before the macro
(no intervening lines). Otherwise kernel-doc complains like so:

Warning(linux-2.6.27-rc3-git2//include/linux/kernel.h:129): No description found for parameter 'file'
Warning(linux-2.6.27-rc3-git2//include/linux/kernel.h:129): No description found for parameter 'line'

because kernel-doc is looking at the wrong function prototype (i.e.,
__might_sleep). [Yes, I have a todo note to myself to check/warn for that
inconsistency in scripts/kernel-doc.]

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b189db5d 28-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com>

net: remove NIP6(), NIP6_FMT, NIP6_SEQFMT and final users

Open code NIP6_FMT in the one call inside sscanf and one user
of NIP6() that could use %p6 in the netfilter code.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25ddbb18 15-Oct-2008 Andi Kleen <andi@firstfloor.org>

Make the taint flags reliable

It's somewhat unlikely that it happens, but right now a race window
between interrupts or machine checks or oopses could corrupt the tainted
bitmap because it is modified in a non atomic fashion.

Convert the taint variable to an unsigned long and use only atomic bit
operations on it.

Unfortunately this means the intvec sysctl functions cannot be used on it
anymore.

It turned out the taint sysctl handler could actually be simplified a bit
(since it only increases capabilities) so this patch actually removes
code.

[akpm@linux-foundation.org: remove unneeded include]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 346e15be 12-Aug-2008 Jason Baron <jbaron@redhat.com>

driver core: basic infrastructure for per-module dynamic debug messages

Base infrastructure to enable per-module debug messages.

I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.

The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.

Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.

Usage:

Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:

<module_name> <enabled=0/1>
.
.
.

<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not

For example:

snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0

Enable a module:

$echo "set enabled=1 <module_name>" > dynamic_printk/modules

Disable a module:

$echo "set enabled=0 <module_name>" > dynamic_printk/modules

Enable all modules:

$echo "set enabled=1 all" > dynamic_printk/modules

Disable all modules:

$echo "set enabled=0 all" > dynamic_printk/modules

Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.

[gkh: minor cleanups and tweaks to make the build work quietly]

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 29e71abf 14-Aug-2008 Steven Rostedt <rostedt@goodmis.org>

ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD

When enabling or disabling CONFIG_FTRACE_MCOUNT_RECORD, we want a full
kernel compile to handle the adding of the __mcount_loc sections.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 46eaa670 12-Oct-2008 Ingo Molnar <mingo@elte.hu>

x86: memory corruption check - cleanup

Move the prototypes from the generic kernel.h header to the more
appropriate include/asm-x86/bios_ebda.h header file.

Also, remove the check from the power management code - this is a
pure x86 matter for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 061b1bd3 24-Sep-2008 Greg Kroah-Hartman <gregkh@suse.de>

Staging: add TAINT_CRAP for all drivers/staging code

We need to add a flag for all code that is in the drivers/staging/
directory to prevent all other kernel developers from worrying about
issues here, and to notify users that the drivers might not be as good
as they are normally used to.

Based on code from Andreas Gruenbacher and Jeff Mahoney to provide a
TAINT flag for the support level of a kernel module in the Novell
enterprise kernel release.

This is the kernel portion of this feature, the ability for the flag to
be set needs to be done in the build process and will happen in a
follow-up patch.

Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Jeff Mahoney <jeffm@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# a0ad05c7 01-Sep-2008 Thomas Renninger <trenn@suse.de>

Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs

The idea is to add this to printk after the severity:
printk(KERN_ERR FW_BUG "This is not our fault, BIOS developer: fix it by
simply add ...\n");

If a Firmware issue should be hidden, because it is
work-arounded, but you still want to see something popping up e.g.
for info only:
printk(KERN_INFO FW_INFO "This is done stupid, we can handle it,
but it should better be avoided in future\n");

or on the Linuxfirmwarekit to tell vendors that they did something
stupid or wrong without bothering the user:
printk(KERN_INFO FW_BUG "This is done stupid, we can handle it,
but it should better be avoided in future\n");

Some use cases:
- If a user sees a [Firmware Bug] message in the kernel
he should first update the BIOS before wasting time with
debugging and submiting on old firmware code to mailing
lists.

- The linuxfirmwarekit (http://www.linuxfirmwarekit.org)
tries to detect firmware bugs. It currently is doing that
in userspace which results in:
- Huge test scripts that could be a one liner in the kernel
- A lot of BIOS bugs are already absorbed by the kernel

What do we need such a stupid linuxfirmwarekit for?

- Vendors: Can test their BIOSes for Linux compatibility.
There will be the time when vendors realize that the test utils
on Linux are more strict and using them increases the qualitity
and stability of their products.

- Vendors: Can easily fix up their BIOSes and be more Linux
compatible by:
dmesg |grep "Firmware Bug"
and send the result to their BIOS developer colleagues who should
know what the messages are about and how to fix them, without
the need of studying kernel code.

- Distributions: can do a first automated HW/BIOS checks.
This can then be done without the need of asking kernel developers
who need to dig down the code and explain the details.
Certification can/will just be rejected until
dmesg |grep "Firmware Bug" is empty.

- Thus this can be used as an instrument to enforce cleaner BIOS
code. Currently every stupid Windows ACPI bug is
re-implemented in Linux which is a rather unfortunate situation.
We already have the power to avoid this in e.g. memory
or cpu hot-plug ACPI implementations, because Linux certification
is a must for most vendors in the server area.
Working towards being able to do that in the laptop area
(vendors are starting to look at Linux here also and will use this tool)
is the goal. At least provide them a tool to make it as easy
for this guys (e.g. not needing to browse kernel code) as possible.

- The ordinary Linux user: can go into the next shop, boots the
firmwarekit on his most preferred machines. He chooses one without
BIOS bugs. Unsupported HW is ok, he likes to try out latest projects
which might support them or likes to dig on it on his own, but he
hates to workaround broken BIOSes like hell.

I double checked with the firmwarekit.
There they have:
So the mapping generally is (also depending on how likely the BIOS is
to blame, this could sometimes be difficult):
FW_INFO = INFO
FW_WARN = WARN
FW_BUG = FAIL

For more info about the linuxfirmwarekit and why this is needed
can be found here:
http://www.linuxfirmwarekit.org

While severity matches with the firmwarekit, it might be tricky
to hide messages from the user.
E.g. we recently found out that on HP BIOSes negative temperatures
are returned, which seem to indicate that the thermal zone is
invalid.
We can work around that gracefully by ignoring the thermal zone
and we do not want to bother the ordinary user with a frightening
message: Firmware Bug: thermal management absolutely broken
but want to hide it from the user.

But in the linuxfirmwarekit this should be shown as a real
show stopper (the temperatures could really be wrong,
broken thermal management is one of the worst things
that can happen and the BIOS guys of the machine must
implement this properly).

It is intended to do that (hide it from the user with
KERN_INFO msg, but still print it as a BIOS bug) by:
printk(KERN_INFO FW_BUG "Negativ temperature values detected.
Try to workarounded, BIOS must get fixed\n");
Hope that works out..., no idea how to better hide it
as printk is the only way to easily provide this functionality.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 3ee1afa3 10-Sep-2008 Nick Piggin <npiggin@suse.de>

x86: some lock annotations for user copy paths, v2

- introduce might_fault()
- handle the atomic user copy paths correctly

[ mingo@elte.hu: move might_sleep() outside of in_atomic(). ]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# ab7476cf 15-Aug-2008 Arjan van de Ven <arjan@linux.intel.com>

debug: add notifier chain debugging, v2

- unbreak ia64 (and powerpc) where function pointers dont
point at code but at data (reported by Tony Luck)

[ mingo@elte.hu: various cleanups ]

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9c3254ad 07-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: fix compile error with corruption checking disabled

Fix compile error:

arch/x86/mm/init_32.c: In function 'mem_init':
arch/x86/mm/init_32.c:908: error: implicit declaration of function 'start_periodic_check_for_corruption'

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# bb577f98 07-Sep-2008 Hugh Dickins <hugh@veritas.com>

x86: add periodic corruption check

Perodically check for corruption in low phusical memory. Don't bother
checking at fault time, since it won't show anything useful.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 5394f80f 07-Sep-2008 Jeremy Fitzhardinge <jeremy@goop.org>

x86: check for and defend against BIOS memory corruption

Some BIOSes have been observed to corrupt memory in the low 64k. This
change:
- Reserves all memory which does not have to be in that area, to
prevent it from being used as general memory by the kernel. Things
like the SMP trampoline are still in the memory, however.
- Clears the reserved memory so we can observe changes to it.
- Adds a function check_for_bios_corruption() which checks and reports on
memory becoming unexpectedly non-zero. Currently it's called in the
x86 fault handler, and the powermanagement debug output.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 070cb065 12-Aug-2008 Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>

move kernel-doc comment for might_sleep directly before its defining block

Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ced9cd40 11-Aug-2008 Ingo Molnar <mingo@elte.hu>

printk: robustify printk, fix

fix:

include/linux/kernel.h: In function ‘printk_needs_cpu':
include/linux/kernel.h:217: error: parameter name omitted

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# b845b517 08-Aug-2008 Peter Zijlstra <a.p.zijlstra@chello.nl>

printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 204b885e 29-Jul-2008 Joerg Roedel <joerg.roedel@amd.com>

introduce lower_32_bits() macro

The file kernel.h contains the upper_32_bits macro. This patch adds the
other part, the lower_32_bits macro. Its first use will be in the driver
for AMD IOMMU.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d974ae37 24-Jul-2008 Jeremy Fitzhardinge <jeremy@goop.org>

generic, memparse(): constify argument

memparse()'s first argument can be const, so it should be.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 717115e1 25-Jul-2008 Dave Young <hidave.darkstar@gmail.com>

printk ratelimiting rewrite

All ratelimit user use same jiffies and burst params, so some messages
(callbacks) will be lost.

For example:
a call printk_ratelimit(5 * HZ, 1)
b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
will be supressed.

- rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for
hints from andrew.

- Add WARN_ON_RATELIMIT, update rcupreempt.h

- remove __printk_ratelimit

- use __ratelimit in net_ratelimit

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e0deaff4 25-Jul-2008 Andrew Morton <akpm@linux-foundation.org>

split the typecheck macros out of include/linux/kernel.h

Needed to fix up a recursive include snafu in
locking-add-typecheck-on-irqsave-and-friends-for-correct-flags.patch

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ca31e146 04-Jul-2008 Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>

Move _RET_IP_ and _THIS_IP_ to include/linux/kernel.h

These two macros are useful beyond lock debugging. Moved definitions from
include/linux/debug_locks.h to include/linux/kernel.h, so code that needs
them does not have to include the former, which would have been a less
intuitive choice of a header.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e17ba73b 12-May-2008 Jiri Slaby <jirislaby@kernel.org>

x86, generic: mark early_printk as asmlinkage

It's not explicitly marked as asmlinkage, but invoked from x86_32
startup code with parameters on stack.

No other architectures define early_printk and none of them are affected
by this change, since defines asmlinkage as empty token.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 42fdfa23 24-May-2008 Thomas Gleixner <tglx@linutronix.de>

namespacecheck: more kernel/printk.c fixes

[ Stephen Rothwell <sfr@canb.auug.org.au>: build fix ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 3fc95772 14-May-2008 Harvey Harrison <harvey.harrison@gmail.com>

lib: create common ascii hex array

Add a common hex array in hexdump.c so everyone can use it.

Add a common hi/lo helper to avoid the shifting masking that is
done to get the upper and lower nibbles of a byte value.

Pull the pack_hex_byte helper from kgdb as it is opencoded many
places in the tree that will be consolidated.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bdf4bbaa 30-Apr-2008 Harvey Harrison <harvey.harrison@gmail.com>

Add macros similar to min/max/min_t/max_t

Also, change the variable names used in the min/max macros to avoid shadowed
variable warnings when min/max min_t/max_t are nested.

Small formatting changes to make all the macros have a similar form.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix v4l build]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 44f564a4 29-Apr-2008 Zhang, Yanmin <yanmin_zhang@linux.intel.com>

ipc: add definitions of USHORT_MAX and others

Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
implementation might also use it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Pierre Peiffer" <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5f97a5a8 29-Apr-2008 Dave Young <hidave.darkstar@gmail.com>

isolate ratelimit from printk.c for other use

Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 95b570c9 29-Apr-2008 Nur Hussein <nurhussein@gmail.com>

Taint kernel after WARN_ON(condition)

The kernel is sent to tainted within the warn_on_slowpath() function, and
whenever a warning occurs the new taint flag 'W' is set. This is useful to
know if a warning occurred before a BUG by preserving the warning as a flag
in the taint state.

This does not work on architectures where WARN_ON has its own definition.
These archs are:
1. s390
2. superh
3. avr32
4. parisc

The maintainers of these architectures have been added in the Cc: list
in this email to alert them to the situation.

The documentation in oops-tracing.txt has been updated to include the
new flag.

Signed-off-by: Nur Hussein <nurhussein@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1429db83 26-Feb-2008 Joe Perches <joe@perches.com>

driver core: Convert debug functions declared inline __attribute__((format (printf,x,y) to statement expression macros

When DEBUG is not defined, pr_debug and dev_dbg and some
other local debugging functions are specified as:

"inline __attribute__((format (printf, x, y)))"

This is done to validate printk arguments when not debugging.

Converting these functions to macros or statement expressions
"do { if (0) printk(fmt, ##arg); } while (0)"
or
"({ if (0) printk(fmt, ##arg); 0; })
makes at least gcc 4.2.2 produce smaller objects.

This has the additional benefit of allowing the optimizer to
avoid calling functions like print_mac that might have been
arguments to the printk.

defconfig x86 current:

$ size vmlinux
text data bss dec hex filename
4716770 474560 618496 5809826 58a6a2 vmlinux

all converted: (More patches follow)

$ size vmlinux
text data bss dec hex filename
4716642 474560 618496 5809698 58a622 vmlinux

Even kernel/sched.o, which doesn't even use these
functions, becomes smaller.

It appears that merely having an indirect include
of <linux/device.h> can cause bigger objects.

$ size sched.inline.o sched.if0.o
text data bss dec hex filename
31385 2854 328 34567 8707 sched.inline.o
31366 2854 328 34548 86f4 sched.if0.o

The current preprocessed only kernel/sched.i file contains:

# 612 "include/linux/device.h"
static inline __attribute__((always_inline)) int __attribute__ ((format (printf, 2, 3)))
dev_dbg(struct device *dev, const char *fmt, ...)
{
return 0;
}
# 628 "include/linux/device.h"
static inline __attribute__((always_inline)) int __attribute__ ((format (printf, 2, 3)))
dev_vdbg(struct device *dev, const char *fmt, ...)
{
return 0;
}

Removing these unused inlines from sched.i shrinks sched.o

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 06b2a76d 08-Feb-2008 Yi Yang <yi.y.yang@intel.com>

Add new string functions strict_strto* and convert kernel params to use them

Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although input
is acctually invalid.

This patch tries to fix the aforementioned issues and implements
strict_strtox serial functions, kernel/params.c uses them to strictly
validate input, so module params will reject such values as 0x1234xxxx and
returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#

[akpm@linux-foundation.org: fix compiler warnings]
[akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ef3d2fd 08-Feb-2008 Joe Perches <joe@perches.com>

printk_ratelimit() functions should use CONFIG_PRINTK

Makes an embedded image a bit smaller.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ec701584 08-Feb-2008 Harvey Harrison <harvey.harrison@gmail.com>

Remove fastcall from linux/include

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b920de1b 08-Feb-2008 David Howells <dhowells@redhat.com>

mn10300: add the MN10300/AM33 architecture to the kernel

Add architecture support for the MN10300/AM33 CPUs produced by MEI to the
kernel.

This patch also adds board support for the ASB2303 with the ASB2308 daughter
board, and the ASB2305. The only processor supported is the MN103E010, which
is an AM33v2 core plus on-chip devices.

[akpm@linux-foundation.org: nuke cvs control strings]
Signed-off-by: Masakazu Urade <urade.masakazu@jp.panasonic.com>
Signed-off-by: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7fa30315 08-Feb-2008 David Howells <dhowells@redhat.com>

aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT

Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.

Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case. Not
only that, but under such conditions A.OUT core dumps are not produced either.

To make this work, this patch also does the following:

(1) Makes the existence of the contents of linux/a.out.h contingent on
CONFIG_ARCH_SUPPORTS_AOUT.

(2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
core dumping code.

(3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
is then included only where needed. This means that this bit of arch
code will be stored in the appropriate A.OUT binfmt module rather than
the core kernel.

(4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
needed) and FRV.

This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.

[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6ed31e92 04-Feb-2008 Éric Piel <Eric.Piel@tremplin-utc.net>

ACPI: Taint kernel on ACPI table override (format corrected)

When an ACPI table is overridden (for now this can happen only for DSDT)
display a big warning and taint the kernel with flag A.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# f10db627 06-Feb-2008 Herbert Xu <herbert@gondor.apana.org.au>

Avoid divide in IS_ALIGN

I was happy to discover the brand new IS_ALIGN macro and quickly used it in
my code. To my dismay I found that the generated code used division to
perform the test.

This patch fixes it by changing the % test to an &. This avoids the
division.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 076f9776 30-Jan-2008 Ingo Molnar <mingo@elte.hu>

x86: make early printk selectable on 64-bit as well

Enable CONFIG_EMBEDDED to select CONFIG_EARLY_PRINTK on 64-bit as well.

saves ~2K:

text data bss dec hex filename
7290283 3672091 1907848 12870222 c4624e vmlinux.before
7288373 3671795 1907848 12868016 c459b0 vmlinux.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 02b67cc3 25-Jan-2008 Herbert Xu <herbert@gondor.apana.org.au>

sched: do not do cond_resched() when CONFIG_PREEMPT

Why do we even have cond_resched when real preemption
is on? It seems to be a waste of space and time.

remove cond_resched with CONFIG_PREEMPT on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 47492527 17-Oct-2007 Ingo Molnar <mingo@elte.hu>

printk: add KERN_CONT annotation

printk: add the KERN_CONT annotation (which is empty string but via
which checkpatch.pl can notice that the lacking KERN_ level is fine).
This useful for multiple calls of hand-crafted printk output done by
early debug code or similar.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1f7c8234 17-Oct-2007 Emil Medve <Emilian.Medve@Freescale.com>

Make the pr_*() family of macros in kernel.h complete

Other/Some pr_*() macros are already defined in kernel.h, but pr_err() was
defined multiple times in several other places

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Tony Lindgren <tony@atomide.com>
Reviewed-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0b15d04a 17-Oct-2007 Mike Frysinger <vapier.adi@gmail.com>

printk: add interfaces for external access to the log buffer

Add two new functions for reading the kernel log buffer. The intention is for
them to be used by recovery/dump/debug code so the kernel log can be easily
retrieved/parsed in a crash scenario, but they are generic enough for other
people to dream up other fun uses.

[akpm@linux-foundation.org: buncha fixes]
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Robin Getz <rgetz@blackfin.uclinux.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0c0e6195 16-Oct-2007 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

memory unplug: page offline

Logic.
- set all pages in [start,end) as isolated migration-type.
by this, all free pages in the range will be not-for-use.
- Migrate all LRU pages in the range.
- Test all pages in the range's refcnt is zero or not.

Todo:
- allocate migration destination page from better area.
- confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
(I don't like this kind of page but..
- Find out pages which cannot be migrated.
- more running tests.
- Use reclaim for unplugging other memory type area.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 2da96acd 11-Oct-2007 Jens Axboe <jens.axboe@oracle.com>

[BLOCK] Move sector_div() from blkdev.h to kernel.h

We need it even if CONFIG_BLOCK is disabled, so move it outside of
the block layer include system.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# a83308e6 11-Sep-2007 Matthew Wilcox <willy@infradead.org>

PTR_ALIGN

The AdvanSys driver wants to align some pointers, and the ALIGN macro
doesn't work for pointers. Rather than try to make it work, add a new
PTR_ALIGN macro which is typesafe.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eb9a9a56 10-Aug-2007 Alan Stern <stern@rowland.harvard.edu>

hex_dump: add missing "const" qualifiers

Add missing "const" qualifiers to the print_hex_dump_bytes() library routines.

(akpm: rumoured to fix some compile warning somewhere)

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6a0ed91e 07-Aug-2007 Artem Bityutskiy <Artem.Bityutskiy@nokia.com>

hexdump: use const notation

Trivial fix: mark the buffer to hexdump as const so callers could avoid
casting their const buffers when calling print_hex_dump().

The patch is really trivial and I suggest to consider it as a fix
(it fixes GCC warnings) and push it to current tree.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a586df06 21-Jul-2007 Andi Kleen <ak@linux.intel.com>

x86: Support __attribute__((__cold__)) in gcc 4.3

gcc 4.3 supports a new __attribute__((__cold__)) to mark functions cold. Any
path directly leading to a call of this function will be unlikely. And gcc
will try to generate smaller code for the function itself.

Please use with care. The code generation advantage isn't large and in most
cases it is not worth uglifying code with this.

This patch marks some common error functions like panic(), printk()
as cold. This will longer term make many unlikely()s unnecessary, although
we can keep them for now for older compilers.

BUG is not marked cold because there is currently no way to tell
gcc to mark a inline function told.

Also all __init and __exit functions are marked cold. With a non -Os
build this will tell the compiler to generate slightly smaller code
for them. I think it currently only uses less alignments for labels,
but that might change in the future.

One disadvantage over *likely() is that they cannot be easily instrumented
to verify them.

Another drawback is that only the latest gcc 4.3 snapshots support this.
Unfortunately we cannot detect this using the preprocessor. This means older
snapshots will fail now. I don't think that's a problem because they are
unreleased compilers that nobody should be using.

gcc also has a __hot__ attribute, but I don't see any sense in using
this in the kernel right now. But someday I hope gcc will be able
to use more aggressive optimizing for hot functions even in -Os,
if that happens it should be added.

Includes compile fix from Thomas Gleixner.

Cc: Jan Hubicka <jh@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bcdcd8e7 17-Jul-2007 Pavel Emelianov <xemul@openvz.org>

Report that kernel is tainted if there was an OOPS

If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7909234 08-Jun-2007 Randy Dunlap <randy.dunlap@oracle.com>

hexdump: more output formatting

Add a prefix string parameter. Callers are responsible for any string
length/alignment that they want to see in the output. I.e., callers should
pad strings to achieve alignment if they want that.

Add rowsize parameter. This is the number of raw data bytes to be printed
per line. Must be 16 or 32.

Add a groupsize parameter. This allows callers to dump values as 1-byte,
2-byte, 4-byte, or 8-byte numbers. Default is 1-byte numbers. If the
total length is not an even multiple of groupsize, 1-byte numbers are
printed.

Add an "ascii" output parameter. This causes ASCII data output following
the hex data output.

Clean up some doc examples.

Align the ASCII output on all lines that are produced by one call.

Add a new interface, print_hex_dump_bytes(), that is a shortcut to
print_hex_dump(), using default parameter values to print 16 bytes in
byte-size chunks of hex + ASCII output, using printk level KERN_DEBUG.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 78db2ad6 12-May-2007 Daniel Walker <dwalker@mvista.com>

include/linux: trivial repair whitespace damage

Adding tabs where spaces currently are.

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99eaf3c4 10-May-2007 Randy Dunlap <randy.dunlap@oracle.com>

lib/hexdump

Based on ace_dump_mem() from Grant Likely for the Xilinx SystemACE
CompactFlash interface.

Add print_hex_dump() & hex_dumper() to lib/hexdump.c and linux/kernel.h.

This patch adds the functions print_hex_dump() & hex_dumper().
print_hex_dump() can be used to perform a hex + ASCII dump of data to
syslog, in an easily viewable format, thus providing a common text hex dump
format.

hex_dumper() provides a dump-to-memory function. It converts one "line" of
output (16 bytes of input) at a time.

Example usages:
print_hex_dump(KERN_DEBUG, DUMP_PREFIX_ADDRESS, frame->data, frame->len);
hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf));

Example output using %DUMP_PREFIX_OFFSET:
0009ab42: 40414243 44454647 48494a4b 4c4d4e4f-@ABCDEFG HIJKLMNO
Example output using %DUMP_PREFIX_ADDRESS:
ffffffff88089af0: 70717273 74757677 78797a7b 7c7d7e7f-pqrstuvw xyz{|}~.

[akpm@linux-foundation.org: cleanups, add export]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 218e180e 10-May-2007 Andrew Morton <akpm@linux-foundation.org>

add upper-32-bits macro

We keep on getting "right shift count >= width of type" warnings when doing
things like

sector_t s;

x = s >> 56;

because with CONFIG_LBD=n, s is only 32-bit. Similar problems can occur with
dma_addr_t's.

So add a simple wrapper function which code can use to avoid this warning.
The above example would become

x = upper_32_bits(s) >> 24;

The first user is in fact AFS.

Cc: James Bottomley <James.Bottomley@SteelEye.com>
Cc: "Cameron, Steve" <Steve.Cameron@hp.com>
Cc: "Miller, Mike (OS Dev)" <Mike.Miller@hp.com>
Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5e631cf 06-May-2007 Rusty Russell <rusty@rustcorp.com.au>

ARRAY_SIZE: check for type

We can use a gcc extension to ensure that ARRAY_SIZE() is handed an array,
not a pointer. This is especially important when code is changed from a
fixed array to a pointer. I assume the Intel compiler doesn't support
__builtin_types_compatible_p.

[jdike@addtoit.com: uml: update UML definition of ARRAY_SIZE]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 11443ec7 30-Apr-2007 Jeremy Fitzhardinge <jeremy@goop.org>

Add kvasprintf()

Add a kvasprintf() function to complement kasprintf().

No in-tree users yet, but I have some coming up.

[akpm@linux-foundation.org: EXPORT it]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Keir Fraser <keir@xensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 04a2e6a5 12-Feb-2007 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] pid: make session_of_pgrp use struct pid instead of pid_t

To properly implement a pid namespace I need to deal exclusively in terms of
struct pid, because pid_t values become ambiguous.

To this end session_of_pgrp is transformed to take and return a struct pid
pointer. To avoid the need to worry about reference counting I now require my
caller to hold the appropriate locks. Leaving callers repsonsible for
increasing the reference count if they need access to the result outside of
the locks.

Since session_of_pgrp currently only has one caller and that caller simply
uses only test the result for equality with another process group, the locking
change means I don't actually have to acquire the tasklist_lock at all.

tiocspgrp is also modified to take and release the lock. The logic there is a
little more complicated but nothing I won't need when I convert pgrp of a tty
to a struct pid pointer.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5be02f1d 12-Feb-2007 Richard Knutsson <ricknu-0@student.ltu.se>

[PATCH] include/linux/kernel.h: Remove labs()

Remove labs() since it is not used/needed.

Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e3e8a75d 10-Feb-2007 Kirill Korotaev <dev@sw.ru>

[PATCH] Extract and use wake_up_klogd()

Remove hack with printing space to wake up klogd. Use explicit
wake_up_klogd().

See earlier discussion
http://groups.google.com/group/fa.linux.kernel/browse_frm/thread/75f496668409f58d/1a8f28983a51e1ff?lnk=st&q=wake_up_klogd+group%3Afa.linux.kernel&rnum=2#1a8f28983a51e1ff

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d4d23add 10-Feb-2007 Kyle McMartin <kyle@parisc-linux.org>

[PATCH] Common compat_sys_sysinfo

I noticed that almost all architectures implemented exactly the same
sys32_sysinfo... except parisc, where a bug was to be found in handling of
the uptime. So let's remove a whole whack of code for fun and profit.
Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it
would be the best tested.

This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but
instead extracting out the common code from sys_sysinfo.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 34f5a398 10-Feb-2007 Theodore Ts'o <tytso@mit.edu>

[PATCH] Add TAINT_USER and ability to set taint flags from userspace

Allow taint flags to be set from userspace by writing to
/proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used
when userspace has potentially done something dangerous that might
compromise the kernel. This will allow support personnel to ask further
questions about what may have caused the user taint flag to have been set.

For example, they might examine the logs of the realtime JVM to see if the
Java program has used the really silly, stupid, dangerous, and
completely-non-portable direct access to physical memory feature which MUST
be implemented according to the Real-Time Specification for Java (RTSJ).
Sigh. What were those silly people at Sun thinking?

[akpm@osdl.org: build fix]
[bunk@stusta.de: cleanup]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 3eb3c740 10-Jan-2007 Roman Zippel <zippel@linux-m68k.org>

[PATCH] fix linux banner format string

Revert previous attempts at messing with the linux banner string and
simply use a separate format string for proc.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8993780a 11-Dec-2006 Linus Torvalds <torvalds@woody.osdl.org>

Make SLES9 "get_kernel_version" work on the kernel binary again

As reported by Andy Whitcroft, at least the SLES9 initrd build process
depends on getting the kernel version from the kernel binary. It does
that by simply trawling the binary and looking for the signature of the
"linux_banner" string (the string "Linux version " to be exact. Which
is really broken in itself, but whatever..)

That got broken when the string was changed to allow /proc/version to
change the UTS release information dynamically, and "get_kernel_version"
thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5:
"[PATCH] Fix linux banner utsname information").

This just restores "linux_banner" as a static string, which should fix
the version finding. And /proc/version simply uses a different string.

To avoid wasting even that miniscule amount of memory, the early boot
string should really be marked __initdata, but that just causes the same
bug in SLES9 to re-appear, since it will then find other occurrences of
"Linux version " first.

Cc: Andy Whitcroft <apw@shadowen.org>
Acked-by: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Steve Fox <drfickle@us.ibm.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 312a0c17 08-Dec-2006 David Howells <dhowells@redhat.com>

[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant

Alter roundup_pow_of_two() so that it can make use of ilog2() on a constant to
produce a constant value, retaining the ability for an arch to override it in
the non-const case.

This permits the function to be used to initialise variables.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f0d1b0b3 08-Dec-2006 David Howells <dhowells@redhat.com>

[PATCH] LOG2: Implement a general integer log2 facility in the kernel

This facility provides three entry points:

ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64

These facilities can either be used inside functions on dynamic data:

int do_something(long q)
{
...;
y = ilog2(x)
...;
}

Or can be used to statically initialise global variables with constant values:

unsigned n = ilog2(27);

When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.

When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.

[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e20ec991 29-Nov-2006 Jim Cromie <jim.cromie@gmail.com>

fix spelling error in include/linux/kernel.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 2ea58144 26-Nov-2006 Linus Torvalds <torvalds@woody.osdl.org>

Fix 'ALIGN()' macro, take 2

You wouldn't think that doing an ALIGN() macro that aligns something up
to a power-of-two boundary would be likely to have bugs, would you?

But hey, in the wonderful world of mixing integer types, you have to be
careful. This just makes sure that the alignment is interpreted in the
same type as the thing to be aligned.

Thanks to Roland Dreier, who noticed that the amso1100 driver got broken
by the previous fix (that just extended the mask to "unsigned long", but
was still broken in "unsigned long long" - it just happened to be the
same on 64-bit architectures).

See commit 4c8bd7eeee4c8f157fb61fb64b57500990b42e0e for the history of
bugs here...

Acked-by: Roland Dreier <rdreier@cisco.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: David Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f46c4833 02-Nov-2006 Andrew Morton <akpm@osdl.org>

[PATCH] Add printk_timed_ratelimit()

printk_ratelimit() has global state which makes it not useful for callers
which wish to perform ratelimiting at a particular frequency.

Add a printk_timed_ratelimit() which utilises caller-provided state storage to
permit more flexibility.

This function can in fact be used for things other than printk ratelimiting
and is perhaps poorly named.

Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8b2a1fd1 03-Oct-2006 Zach Brown <zach.brown@oracle.com>

[PATCH] pr_debug: check pr_debug() arguments

check pr_debug() arguments

When DEBUG isn't defined pr_debug() is defined away as an empty macro. By
throwing away the arguments we allow completely incorrect code to build.

Instead let's make it an empty inline which checks arguments and mark it so gcc
can check the format specification.

This results in a seemingly insignificant code size increase. A x86-64
allyesconfig:

text data bss dec hex filename
25354768 7191098 4854720 37400586 23ab00a vmlinux.before
25354945 7191138 4854720 37400803 23ab0e3 vmlinux

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 08e0f6a9 27-Sep-2006 Christoph Lameter <clameter@sgi.com>

[PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA

The NUMA_BUILD constant is always available and will be set to 1 on
NUMA_BUILDs. That way checks valid only under CONFIG_NUMA can easily be done
without #ifdef CONFIG_NUMA

F.e.

if (NUMA_BUILD && <numa_condition>) {
...
}

[akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is
causing ifdef explosion in core kernel, so let's see if this is a comfortable
way in whcih to control that]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 930631ed 26-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[PATCH] add DIV_ROUND_UP()

Add the DIV_ROUND_UP() helper macro: divide `n' by `d', rounding up.

Stolen from the gfs2 tree(!) because the swsusp patches need it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8da5adda 26-Sep-2006 Don Zickus <dzickus@redhat.com>

[PATCH] x86: Allow users to force a panic on NMI

To quote Alan Cox:

The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.

A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.

This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>


# 4c8bd7ee 22-Sep-2006 David Miller <davem@davemloft.net>

[KERNEL] Do not truncate to 'int' in ALIGN() macro.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1cc6daf2 07-Aug-2006 Pavel Machek <pavel@suse.cz>

pr_debug() should not be used in drivers

pr_debug() should not be used from drivers, add comment saying that.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# b4cac1a0 10-Jul-2006 David Howells <dhowells@redhat.com>

[PATCH] FDPIC: Move roundup() into linux/kernel.h

Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's
generally useful.

[akpm@osdl.org: nuke all the other implementations]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4552d5dc 26-Jun-2006 Jan Beulich <jbeulich@novell.com>

[PATCH] x86_64: reliable stack trace support

These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.

Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e905914f 25-Jun-2006 Jeremy Fitzhardinge <jeremy@xensource.com>

[PATCH] Implement kasprintf

Implement kasprintf, a kernel version of asprintf. This allocates the
memory required for the formatted string, including the trailing '\0'.
Returns NULL on allocation failure.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 368a5fa1 23-Jun-2006 Hua Zhong <hzhong@gmail.com>

[PATCH] remove unlikely() in might_sleep_if()

The likely() profiling tools show that __alloc_page() causes a lot of misses:

! 132 119193 __alloc_pages():mm/page_alloc.c@937

Because most __alloc_page() calls are not atomic.

Signed-off-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 111ebb6e 23-Jun-2006 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

[PATCH] writeback: fix range handling

When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request. Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".

To make all this sane, the patch changes range of writeback_control.

So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.

And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.

This patch does,

- Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
-1 is usually ok for range_end (type is long long). But, if someone did,

range_end += val; range_end is "val - 1"
u64val = range_end >> bits; u64val is "~(0ULL)"

or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
things, and uses LLONG_MAX for range_end.

- All callers of ->writepages() sets range_start/end or range_cyclic.

- Fix updates of ->writeback_index. It seems already bit strange.
If it starts at 0 and ended by check of nr_to_write, this last
index may reduce chance to scan end of file. So, this updates
->writeback_index only if range_cyclic is true or whole-file is
scanned.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5e376613 15-May-2006 Trent Piepho <xyzzy@speakeasy.org>

[PATCH] symbol_put_addr() locks kernel

Even since a previous patch:

Fix race between CONFIG_DEBUG_SLABALLOC and modules
Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000)
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe

The function symbol_put_addr() will deadlock the kernel.

symbol_put_addr() would acquire modlist_lock, then while holding the lock call
two functions kernel_text_address() and module_text_address() which also try
to acquire the same lock. This deadlocks the kernel of course.

This patch changes symbol_put_addr() to not acquire the modlist_lock, it
doesn't need it since it never looks at the module list directly. Also, it
now uses core_kernel_text() instead of kernel_text_address(). The latter has
an additional check for addr inside a module, but we don't need to do that
since we call module_text_address() (the same function kernel_text_address
uses) ourselves.

Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Cc: Zwane Mwaikambo <zwane@fsmlabs.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Johannes Stezenbach <js@linuxtv.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# aa727107 10-Apr-2006 Adrian Bunk <bunk@stusta.de>

[PATCH] the scheduled unexport of panic_timeout

Implement the scheduled unexport of panic_timeout.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# e041c683 27-Mar-2006 Alan Stern <stern@rowland.harvard.edu>

[PATCH] Notifier chain update: API changes

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain

It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 962749af 25-Mar-2006 Andrew Morton <akpm@osdl.org>

[PATCH] roundup_pow_of_two() 64-bit fix

fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger
than 2^32-1.

Fix this by implementing and using fls_long().

(Why does roundup_pow_of_two() return a long?)

(Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is
__attribute_pure__?)

(Why does long_log2() suck so much? Because we were missing fls_long()?)

Cc: Roland Dreier <rdreier@cisco.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: John Hawkes <hawkes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# dd287796 23-Mar-2006 Andrew Morton <akpm@osdl.org>

[PATCH] pause_on_oops command line option

Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.

If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.

The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 5c676f6d 27-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Macros removal in gfs2.h

As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a62eaf15 16-Feb-2006 Andi Kleen <ak@linux.intel.com>

[PATCH] x86_64: Add boot option to disable randomized mappings and cleanup

AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.

Also I moved the variable to a more appropiate place and make
it independent from sysctl

And marked __read_mostly which it is.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 9343e79a 17-Jan-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[IPV6]: Preserve procfs IPV6 address output format

Procfs always output IPV6 addresses without the colon
characters, and we cannot change that.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46b86a2d 13-Jan-2006 Joe Perches <joe@perches.com>

[NET]: Use NIP6_FMT in kernel.h

There are errors and inconsistency in the display of NIP6 strings.
ie: net/ipv6/ip6_flowlabel.c

There are errors and inconsistency in the display of NIPQUAD strings too.
ie: net/netfilter/nf_conntrack_ftp.c

This patch:
adds NIP6_FMT to kernel.h
changes all code to use NIP6_FMT
fixes net/ipv6/ip6_flowlabel.c
adds NIPQUAD_FMT to kernel.h
fixes net/netfilter/nf_conntrack_ftp.c
changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df2e71fb 09-Jan-2006 akpm@osdl.org <akpm@osdl.org>

[PATCH] dump_thread() cleanup


From: Adrian Bunk <bunk@stusta.de>

- create one common dump_thread() prototype in kernel.h

- dump_thread() is only used in fs/binfmt_aout.c and can therefore be
removed on all architectures where CONFIG_BINFMT_AOUT is not
available

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 711a660d 09-Jan-2006 Chuck Ebbert <76306.1226@compuserve.com>

[PATCH] mutex subsystem, add typecheck_fn(type, function)

add typecheck_fn(type, function) to do type-checking of function
pointers.

Modified-by: Ingo Molnar <mingo@elte.hu>

(made it typeof() based, instead of typedef based.)

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# a1365647 08-Jan-2006 Andrew Morton <akpm@osdl.org>

[PATCH] remove gcc-2 checks

Remove various things which were checking for gcc-1.x and gcc-2.x compilers.

From: Adrian Bunk <bunk@stusta.de>

Some documentation updates and removes some code paths for gcc < 3.2.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 729b4d4c 01-Dec-2005 Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>

[ACPI] fix reboot upon suspend-to-disk

http://bugzilla.kernel.org/show_bug.cgi?id=4320

Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>


# dfed0449 07-Nov-2005 Adrian Bunk <bunk@stusta.de>

[PATCH] __deprecated_for_modules: panic_timeout

This looks like something which out-of-tree code could possibly be using.
Give panic_timeout the twelve-month treatment.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b8887e6e 07-Nov-2005 Randy Dunlap <rdunlap@infradead.org>

[PATCH] kernel-docs: fix kernel-doc format problems

Convert to proper kernel-doc format.

Some have extra blank lines (not allowed immed. after the function name)
or need blank lines (after all parameters). Function summary must be only
one line.

Colon (":") in a function description does weird things (causes kernel-doc
to think that it's a new section head sadly).

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# c0398ee6 30-Oct-2005 Nikita Danilov <nikita@clusterfs.com>

[PATCH] include/linux/kernel.h:BUILD_BUG_ON(): fix a comment

Fix comment describing BUILD_BUG_ON: BUG_ON is not an assertion
(unfortunately).

Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 921717a2 13-Sep-2005 Andi Kleen <ak@linux.intel.com>

[PATCH] Make BUILD_BUG_ON fail at compile time.

Force a compiler error instead of a link error, because they are easier to
track down. Idea stolen from code by Jan Beulich <jbeulich@novell.com>

If the argument to BUILD_BUG_ON evaluates to non-zero the compiler will do:

t.c:6: error: size of array `type name' is negative

(surprised that gcc doesn't have an extension for this)

Signed-off-by: "Andi Kleen" <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# f8cbd99b 25-Jun-2005 Ingo Molnar <mingo@elte.hu>

[PATCH] sched: voluntary kernel preemption

This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The
3 models can be selected from a new menu:

(X) No Forced Preemption (Server)
( ) Voluntary Kernel Preemption (Desktop)
( ) Preemptible Kernel (Low-Latency Desktop)

we still default to the stock (Server) preemption model.

Voluntary preemption works by adding a cond_resched()
(reschedule-if-needed) call to every might_sleep() check. It is lighter
than CONFIG_PREEMPT - at the cost of not having as tight latencies. It
represents a different latency/complexity/overhead tradeoff.

It has no runtime impact at all if disabled. Here are size stats that show
how the various preemption models impact the kernel's size:

text data bss dec hex filename
3618774 547184 179896 4345854 424ffe vmlinux.stock
3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2%
3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5%

voluntary-preempt is +0.2% of .text, preempt is +3.5%.

This feature has been tested for many months by lots of people (and it's
also included in the RHEL4 distribution and earlier variants were in Fedora
as well), and it's intended for users and distributions who dont want to
use full-blown CONFIG_PREEMPT for one reason or another.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d59745ce 01-May-2005 Matt Mackall <mpm@selenic.com>

[PATCH] clean up kernel messages

Arrange for all kernel printks to be no-ops. Only available if
CONFIG_EMBEDDED.

This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!